Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Bai, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.18035  [pdf, other

    cs.LG stat.ML

    Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

    Authors: Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai

    Abstract: Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense o… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2211.11623

  2. arXiv:2403.07318  [pdf, ps, other

    stat.AP

    Test for high-dimensional linear hypothesis of mean vectors via random integration

    Authors: Jianghao Li, Shizhe Hong, Zhenzhen Niu, Zhidong Bai

    Abstract: In this paper, we investigate hypothesis testing for the linear combination of mean vectors across multiple populations through the method of random integration. We have established the asymptotic distributions of the test statistics under both null and alternative hypotheses. Additionally, we provide a theoretical explanation for the special use of our test statistics in situations when the nonze… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2403.05760  [pdf, other

    stat.AP

    Simultaneous test of the mean vectors and covariance matrices for high-dimensional data using RMT

    Authors: Zhenzhen Niu, Jianghao Li, Wenya Luo, Zhidong Bai

    Abstract: In this paper, we propose a new modified likelihood ratio test (LRT) for simultaneously testing mean vectors and covariance matrices of two-sample populations in high-dimensional settings. By employing tools from Random Matrix Theory (RMT), we derive the limiting null distribution of the modified LRT for generally distributed populations. Furthermore, we compare the proposed test with existing tes… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2402.03933  [pdf

    cs.SE stat.AP

    Development of a Evaluation Tool for Age-Appropriate Software in Aging Environments: A Delphi Study

    Authors: Zhenggang Bai, Yougxiang Fang, Hongtu Chen, Xinru Chen, Ning An, Min Zhang, Guoxin Rui, Jing Jin

    Abstract: Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were con… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  5. arXiv:2401.17143  [pdf, other

    math.ST stat.ME

    Test for high-dimensional mean vectors via the weighted $L_2$-norm

    Authors: Jianghao Li, Zhenzhen Niu, Shizhe Hong, Zhidong Bai

    Abstract: In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We also explain theoretically why our test statistics can be highly useful in weakly dense cases when the nonzero signal in mean vectors is present. Furthermore, w… ▽ More

    Submitted 31 January, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  6. arXiv:2307.08921  [pdf, other

    cs.LG stat.ML

    Optimistic Estimate Uncovers the Potential of Nonlinear Models

    Authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu

    Abstract: We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convoluti… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  7. arXiv:2303.17230  [pdf, other

    math.ST stat.ME

    KOO approach for scalable variable selection problem in large-dimensional regression

    Authors: Zhidong Bai, Kwok Pui Choi, Yasunori Fujikoshi, Jiang Hu

    Abstract: An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit t… ▽ More

    Submitted 25 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  8. arXiv:2211.15982  [pdf, ps, other

    math.PR math.ST stat.AP

    Revisit of a Diaconis urn model

    Authors: Li Yang, Jiang Hu, Zhidong Bai

    Abstract: Let $G$ be a finite Abelian group of order $d$. We consider an urn in which, initially, there are labeled balls that generate the group $G$. Choosing two balls from the urn with replacement, observe their labels, and perform a group multiplication on the respective group elements to obtain a group element. Then, we put a ball labeled with that resulting element into the urn. This model was formula… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  9. arXiv:2211.11891  [pdf, other

    stat.ML cs.LG

    A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

    Authors: Dong Min Roh, Zhaojun Bai, Ren-Cang Li

    Abstract: Much like the classical Fisher linear discriminant analysis (LDA), the recently proposed Wasserstein discriminant analysis (WDA) is a linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of different data classes and minimize the dispersion of same data classes via a bi-level optimization. In contrast to LDA, WDA can account for both global and local int… ▽ More

    Submitted 27 July, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

  10. arXiv:2211.11623  [pdf, other

    cs.LG stat.ML

    Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

    Authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu

    Abstract: Models with nonlinear architectures/parameterizations such as deep neural networks (DNNs) are well known for their mysteriously good generalization performance at overparameterization. In this work, we tackle this mystery from a novel perspective focusing on the transition of the target recovery/fitting accuracy as a function of the training data size. We propose a rank stratification for general… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  11. arXiv:2210.16435  [pdf, other

    cs.LG stat.ML

    Scalable Spectral Clustering with Group Fairness Constraints

    Authors: Ji Wang, Ding Lu, Ian Davidson, Zhaojun Bai

    Abstract: There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entiret… ▽ More

    Submitted 14 April, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6613-6629, 2023

  12. arXiv:2210.03859  [pdf, other

    stat.ML cs.LG

    Spectrally-Corrected and Regularized Linear Discriminant Analysis for Spiked Covariance Model

    Authors: Hua Li, Wenya Luo, Zhidong Bai, Huanchao Zhou, Zhangni Pu

    Abstract: This paper proposes an improved linear discriminant analysis called spectrally-corrected and regularized LDA (SRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. With the support of a large-dimensional random matrix analysis framework, it is proved that SRLDA has a linear classification global optimal solut… ▽ More

    Submitted 8 March, 2024; v1 submitted 7 October, 2022; originally announced October 2022.

  13. Block Model Guided Unsupervised Feature Selection

    Authors: Zilong Bai, Hoa Nguyen, Ian Davidson

    Abstract: Feature selection is a core area of data mining with a recent innovation of graph-driven unsupervised feature selection for linked data. In this setting we have a dataset $\mathbf{Y}$ consisting of $n$ instances each with $m$ features and a corresponding $n$ node graph (whose adjacency matrix is $\mathbf{A}$) with an edge indicating that the two instances are similar. Existing efforts for unsuperv… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

    Comments: Published at KDD2020

    Journal ref: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2020)

  14. arXiv:2005.04557  [pdf, other

    stat.AP cs.LG stat.ML

    A Multi-Variate Triple-Regression Forecasting Algorithm for Long-Term Customized Allergy Season Prediction

    Authors: Xiaoyu Wu, Zeyu Bai, Jianguo Jia, Youzhi Liang

    Abstract: In this paper, we propose a novel multi-variate algorithm using a triple-regression methodology to predict the airborne-pollen allergy season that can be customized for each patient in the long term. To improve the prediction accuracy, we first perform a pre-processing to integrate the historical data of pollen concentration and various inferential signals from other covariates such as the meteoro… ▽ More

    Submitted 10 December, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: 4 pages, 4 figures

  15. arXiv:1909.11527  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Self-consistent-field Iteration for Orthogonal Canonical Correlation Analysis

    Authors: Leihong Zhang, Li Wang, Zhaojun Bai, Ren-cang Li

    Abstract: We propose an efficient algorithm for solving orthogonal canonical correlation analysis (OCCA) in the form of trace-fractional structure and orthogonal linear projections. Even though orthogonality has been widely used and proved to be a useful criterion for pattern recognition and feature extraction, existing methods for solving OCCA problem are either numerical unstable by relying on a deflation… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  16. arXiv:1906.06713  [pdf, ps, other

    math.ST stat.AP stat.ME

    Community Detection Based on the $L_\infty$ convergence of eigenvectors in DCBM

    Authors: Yan Liu, Zhiqiang Hou, Zhigang Yao, Zhidong Bai, Jiang Hu, Shurong Zheng

    Abstract: Spectral clustering is one of the most popular algorithms for community detection in network analysis. Based on this rationale, in this paper we give the convergence rate of eigenvectors for the adjacency matrix in the $l_\infty$ norm, under the stochastic block model (BM) and degree corrected stochastic block model (DCBM), adding some mild and rational conditions. We also extend this result to a… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: 28 pages, 2 figures

  17. arXiv:1903.01734  [pdf

    cs.LG stat.ML

    A Novel Efficient Approach with Data-Adaptive Capability for OMP-based Sparse Subspace Clustering

    Authors: Jiaqiyu Zhan, Zhiqiang Bai, Yuesheng Zhu

    Abstract: Orthogonal Matching Pursuit (OMP) plays an important role in data science and its applications such as sparse subspace clustering and image processing. However, the existing OMP-based approaches lack of data adaptiveness so that the data cannot be represented well enough and may lose the accuracy. This paper proposes a novel approach to enhance the data-adaptive capability for OMP-based sparse sub… ▽ More

    Submitted 30 August, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

  18. arXiv:1808.05362  [pdf, other

    stat.ME math.ST

    Generalized Four Moment Theorem and an Application to CLT for Spiked Eigenvalues of Large-dimensional Covariance Matrices

    Authors: Dandan Jiang, Zhidong Bai

    Abstract: We consider a more generalized spiked covariance matrix $Σ$, which is a general non-definite matrix with the spiked eigenvalues scattered into a few bulks and the largest ones allowed to tend to infinity. By relaxing the matching of the 4th moment to a tail probability decay, a {\it Generalized Four Moment Theorem} (G4MT) is proposed to show the universality of the asymptotic law for the local spe… ▽ More

    Submitted 24 April, 2019; v1 submitted 16 August, 2018; originally announced August 2018.

    Comments: 48 pages, 4 figures,5 tables

    MSC Class: 60B20; 62H25; 60F05; 62H10

  19. arXiv:1707.01225  [pdf, other

    stat.ME

    Estimating the Number of Sources in Magnetoencephalography Using Spiked Population Eigenvalues

    Authors: Zhigang Yao, Ye Zhang, Zhidong Bai, William F. Eddy

    Abstract: Magnetoencephalography (MEG) is an advanced imaging technique used to measure the magnetic fields outside the human head produced by the electrical activity inside the brain. Various source localization methods in MEG require the knowledge of the underlying active sources, which are identified by a priori. Common methods used to estimate the number of sources include principal component analysis o… ▽ More

    Submitted 5 July, 2017; originally announced July 2017.

    Comments: 38 pages, 8 figures, 4 tables

  20. A New Test of Multivariate Nonlinear Causality

    Authors: Zhidong Bai, Yongchang Hui, Zhihui Lv, Wing-Keung Wong, Shurong Zheng, Zhenzhen Zhu

    Abstract: The multivariate nonlinear Granger causality developed by Bai et al. (2010) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate… ▽ More

    Submitted 3 March, 2017; originally announced March 2017.

    Comments: 20 pages. arXiv admin note: substantial text overlap with arXiv:1701.03992

  21. arXiv:1701.03992  [pdf, ps, other

    stat.ME

    The Hiemstra-Jones Test Revisited

    Authors: Zhidong Bai, Yongchang Hui, Zhihui Lv, Wing-Keung Wong, Zhen-Zhen Zhu

    Abstract: The famous Hiemstra-Jones (HJ) test developed by Hiemstra and Jones (1994) plays a significant role in studying nonlinear causality. Over the last two decades, there have been numerous applications and theoretical extensions based on this pioneering work. However, several works note that counterintuitive results are obtained from the HJ test, and some researchers find that the HJ test is seriously… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

  22. arXiv:1404.6633  [pdf, ps, other

    stat.ME

    Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing

    Authors: Shurong Zheng, Z. D. Bai, Jiangfeng Yao

    Abstract: Sample covariance matrices are widely used in multivariate statistical analysis. The central limit theorems (CLT's) for linear spectral statistics of high-dimensional non-centered sample covariance matrices have received considerable attention in random matrix theory and have been applied to many high-dimensional statistical problems. However, known population mean vectors are assumed for non-cent… ▽ More

    Submitted 26 April, 2014; originally announced April 2014.

    Comments: 36 pages, 23 references

    MSC Class: 62H15; 62H10

  23. arXiv:1302.0355  [pdf, other

    stat.ME

    Estimation of the population spectral distribution from a large dimensional sample covariance matrix

    Authors: Weiming Li, Jiaqi Chen, Yingli Qin, Jianfeng Yao, Zhidong Bai

    Abstract: This paper introduces a new method to estimate the spectral distribution of a population covariance matrix from high-dimensional data. The method is founded on a meaningful generalization of the seminal Marcenko-Pastur equation, originally defined in the complex plan, to the real line. Beyond its easy implementation and the established asymptotic consistency, the new estimator outperforms two exis… ▽ More

    Submitted 2 February, 2013; originally announced February 2013.

    Comments: 16 pages, 4 figures

  24. Testing linear hypotheses in high-dimensional regressions

    Authors: Z. Bai, D. Jiang, J. Yao, S. Zheng

    Abstract: For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $p\le 20$. On the other hand, assuming that… ▽ More

    Submitted 5 June, 2012; originally announced June 2012.

    Comments: Accepted 02/2012 for publication in "Statistics". 20 pages, 2 pages and 2 tables

    MSC Class: 62H15; 62H10

    Journal ref: Statistics: A Journal of Theoretical and Applied Statistics 47(6):1207-1223, June 2013,