Search | arXiv e-print repository

Unified Principal Components Analysis of Irregularly Observed Functional Time Series

Authors: Zerui Guo, Jianbin Tan, Hui Huang

Abstract: Irregularly observed functional time series (FTS) are increasingly available in many real-world applications. To analyze FTS, it is crucial to account for both serial dependencies and the irregularly observed nature of functional data. However, existing methods for FTS often rely on specific model assumptions in capturing serial dependencies, or cannot handle the irregular observational scheme of… ▽ More Irregularly observed functional time series (FTS) are increasingly available in many real-world applications. To analyze FTS, it is crucial to account for both serial dependencies and the irregularly observed nature of functional data. However, existing methods for FTS often rely on specific model assumptions in capturing serial dependencies, or cannot handle the irregular observational scheme of functional data. To solve these issues, one can perform dimension reduction on FTS via functional principal component analysis (FPCA) or dynamic FPCA. Nonetheless, these methods may either be not theoretically optimal or too redundant to represent serially dependent functional data. In this article, we introduce a novel dimension reduction method for FTS based on dynamic FPCA. Through a new concept called optimal functional filters, we unify the theories of FPCA and dynamic FPCA, providing a parsimonious and optimal representation for FTS adapting to its serial dependence structure. This framework is referred to as principal analysis via dependency-adaptivity (PADA). Under a hierarchical Bayesian model, we establish an estimation procedure for dimension reduction via PADA. Our method can be used for both sparsely and densely observed FTS, and is capable of predicting future functional data. We investigate the theoretical properties of PADA and demonstrate its effectiveness through extensive simulation studies. Finally, we illustrate our method via dimension reduction and prediction of daily PM2.5 data. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2407.17072 [pdf, other]

An Efficient Procedure for Computing Bayesian Network Structure Learning

Authors: Hongming Huang, Joe Suzuki

Abstract: We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Bayesian network structure discovery is a fundamental yet NP-hard problem in the field of probabilistic graphical models, and as the number of variables increases, memory usage grows exponentially. The simple and effective method proposed by Silander and Myllymäki has bee… ▽ More We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Bayesian network structure discovery is a fundamental yet NP-hard problem in the field of probabilistic graphical models, and as the number of variables increases, memory usage grows exponentially. The simple and effective method proposed by Silander and Myllymäki has been widely applied in this field, as it incrementally calculates local scores to achieve global optimality. However, existing methods that utilize disk storage, while capable of handling networks with a larger number of variables, introduce issues such as latency, fragmentation, and additional overhead associated with disk I/O operations. To avoid these problems, we explore how to further enhance computational efficiency and reduce peak memory usage using only memory. We introduce an efficient hierarchical computation method that requires only a single traversal of all local structures, retaining only the data and information necessary for the current computation, thereby improving efficiency and significantly reducing memory requirements. Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency compared to existing methods, demonstrating good scalability for handling larger networks and exhibiting stable experimental results. Ultimately, we successfully achieved the processing of a Bayesian network with 28 variables using only memory. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2406.10499 [pdf, other]

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US

Authors: Fangzhi Luo, Jianbin Tan, Donglan Zhang, Hui Huang, Ye Shen

Abstract: Understanding the longitudinally changing associations between Social Determinants of Health (SDOH) and stroke mortality is essential for effective stroke management. Previous studies have uncovered significant regional disparities in the relationships between SDOH and stroke mortality. However, existing studies have not utilized longitudinal associations to develop data-driven methods for regiona… ▽ More Understanding the longitudinally changing associations between Social Determinants of Health (SDOH) and stroke mortality is essential for effective stroke management. Previous studies have uncovered significant regional disparities in the relationships between SDOH and stroke mortality. However, existing studies have not utilized longitudinal associations to develop data-driven methods for regional division in stroke control. To fill this gap, we propose a novel clustering method to analyze SDOH -- stroke mortality associations in US counties. To enhance the interpretability of the clustering outcomes, we introduce a novel regularized expectation-maximization algorithm equipped with various sparsity-and-smoothness-pursued penalties, aiming at simultaneous clustering and variable selection in longitudinal associations. As a result, we can identify crucial SDOH that contribute to longitudinal changes in stroke mortality. This facilitates the clustering of US counties into different regions based on the relationships between these SDOH and stroke mortality. The effectiveness of our proposed method is demonstrated through extensive numerical studies. By applying our method to longitudinal data on SDOH and stroke mortality at the county level, we identify 18 important SDOH for stroke mortality and divide the US counties into two clusters based on these selected SDOH. Our findings unveil complex regional heterogeneity in the longitudinal associations between SDOH and stroke mortality, providing valuable insights into region-specific SDOH adjustments for mitigating stroke mortality. △ Less

Submitted 12 September, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.04690 [pdf, other]

Higher-order Structure Based Anomaly Detection on Attributed Networks

Authors: Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, Feng Xia

Abstract: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to… ▽ More Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to the lack of an effective mechanism in most existing graph learning methods, these complex interaction patterns fail to be applied in detecting anomalies, hindering the progress of anomaly detection to some extent. In order to address the aforementioned issue, we present a higher-order structure based anomaly detection (GUIDE) method. We exploit attribute autoencoder and structure autoencoder to reconstruct node attributes and higher-order structures, respectively. Moreover, we design a graph attention layer to evaluate the significance of neighbors to nodes through their higher-order structure differences. Finally, we leverage node attribute and higher-order structure reconstruction errors to find anomalies. Extensive experiments on five real-world datasets (i.e., ACM, Citation, Cora, DBLP, and Pubmed) are implemented to verify the effectiveness of GUIDE. Experimental results in terms of ROC-AUC, PR-AUC, and Recall@K show that GUIDE significantly outperforms the state-of-art methods. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.01561 [pdf, other]

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing… ▽ More Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and an FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. Our code and distilled one-step text-to-image generators are available at https://github.com/mingyuanzhou/SiD-LSG. △ Less

Submitted 8 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Code and model checkpoints available at https://github.com/mingyuanzhou/SiD-LSG

arXiv:2405.18108 [pdf]

Simulation of Single-Phase Natural Circulation within the BEPU Framework: Sketching Scaling Uncertainty Principle by Multi-Scale CFD Approaches

Authors: Haifu Huang, Jorge Perez, Nicolas Alpy, Marc Medale

Abstract: In order to enhance safety, nuclear reactors in the design phase consider natural circulation as a mean to remove residual power. The simulation of this passive mechanism must be qualified between the validation range and the scope of utilization (reactor case), introducing potential physical and numerical distortion effects. In this study, we simulate the flow of liquid sodium using the TrioCFD c… ▽ More In order to enhance safety, nuclear reactors in the design phase consider natural circulation as a mean to remove residual power. The simulation of this passive mechanism must be qualified between the validation range and the scope of utilization (reactor case), introducing potential physical and numerical distortion effects. In this study, we simulate the flow of liquid sodium using the TrioCFD code, employing both higher-fidelity (HF) LES and lower-fidelity (LF) URANS models. We tackle respectively numerical uncertainties through the Grid Convergence Index method, and physical modelling uncertainties through the Polynomial Chaos Expansion method available on the URANIE platform. HF simulations are shown to exhibit a strong resilience to physical distortion effects, with numerical uncertainties being intricately correlated. Conversely, the LF approach, the only one applicable at the reactor scale, is likely to present a reduced predictability. If so, the HF approach should be effective in pinpointing the LF weaknesses: the concept of scaling uncertainty is inline introduced as the growth of the LF simulation uncertainty associated with distortion effects. Thus, the paper outlines that a specific methodology within the BEPU framework - leveraging both HF and LF approaches - could pragmatically enable correlating distortion effects with scaling uncertainty, thereby providing a metric principle. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Journal ref: Best Estimate Plus Uncertainty International Conference (BEPU 2024), May 2024, Lucca, Italy

arXiv:2405.12453 [pdf, other]

One-step data-driven generative model via Schrödinger Bridge

Authors: Hanwen Huang

Abstract: Generating samples from a probability distribution is a fundamental task in machine learning and statistics. This article proposes a novel scheme for sampling from a distribution for which the probability density $μ({\bf x})$ for ${\bf x}\in{\mathbb{R}}^d$ is unknown, but finite independent samples are given. We focus on constructing a Schrödinger Bridge (SB) diffusion process on finite horizon… ▽ More Generating samples from a probability distribution is a fundamental task in machine learning and statistics. This article proposes a novel scheme for sampling from a distribution for which the probability density $μ({\bf x})$ for ${\bf x}\in{\mathbb{R}}^d$ is unknown, but finite independent samples are given. We focus on constructing a Schrödinger Bridge (SB) diffusion process on finite horizon $t\in[0,1]$ which induces a probability evolution starting from a fixed point at $t=0$ and ending with the desired target distribution $μ({\bf x})$ at $t=1$. The diffusion process is characterized by a stochastic differential equation whose drift function can be solely estimated from data samples through a simple one-step procedure. Compared to the classical iterative schemes developed for the SB problem, the methodology of this article is quite simple, efficient, and computationally inexpensive as it does not require the training of neural network and thus circumvents many of the challenges in building the network architecture. The performance of our new generative model is evaluated through a series of numerical experiments on multi-modal low-dimensional simulated data and high-dimensional benchmark image data. Experimental results indicate that the synthetic samples generated from our SB Bridge based algorithm are comparable with the samples generated from the state-of-the-art methods in the field. Our formulation opens up new opportunities for developing efficient diffusion models that can be directly applied to large scale real-world data. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 22 pages, 5 figures

arXiv:2404.13707 [pdf, other]

Robust inference for the unification of confidence intervals in meta-analysis

Authors: Wei Liang, Haicheng Huang, Hongsheng Dai, Yinghui Wei

Abstract: Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation un… ▽ More Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation unreliable. In the situation when K is small, it is not realistic to assume the random-effect follows Gaussian distribution. In this paper, we present a novel empirical likelihood method for combining confidence intervals under the meta-analysis framework. This method is free of the Gaussian assumption in effect size estimates from individual studies and from the random-effects. We establish the large-sample properties of the non-parametric estimator, and introduce a criterion governing the relationship between the number of studies, K, and the sample size of each study, n_i. Our methodology supersedes conventional meta-analysis techniques in both theoretical robustness and computational efficiency. We assess the performance of our proposed methods using simulation studies, and apply our proposed methods to two examples. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.06681 [pdf, other]

Causal Unit Selection using Tractable Arithmetic Circuits

Authors: Haiying Huang, Adnan Darwiche

Abstract: The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work… ▽ More The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work allows one to find optimal units exactly by reducing the causal objective to a classical objective on a meta-model, and then applying a variant of the classical Variable Elimination (VE) algorithm to the meta-model -- assuming a fully specified causal model is available. In practice, however, finding optimal units using this approach can be very expensive because the used VE algorithm must be exponential in the constrained treewidth of the meta-model, which is larger and denser than the original model. We address this computational challenge by introducing a new approach for unit selection that is not necessarily limited by the constrained treewidth. This is done through compiling the meta-model into a special class of tractable arithmetic circuits that allows the computation of optimal units in time linear in the circuit size. We finally present empirical results on random causal models that show order-of-magnitude speedups based on the proposed method for solving unit selection. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.04057 [pdf, other]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD △ Less

Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

arXiv:2403.14531 [pdf, other]

doi 10.1093/jrsssb/qkae031

Green's matching: an efficient approach to parameter estimation in complex dynamic systems

Authors: Jianbin Tan, Guoyu Zhang, Xueqin Wang, Hui Huang, Fang Yao

Abstract: Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist… ▽ More Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statistically efficient two-step method, which only needs to approximate trajectories in dynamic systems but not their derivatives due to the inverse of differential operators by Green's function. This yields a statistically optimal guarantee for parameter estimation in general-order equations, a feature not shared by existing methods, and provides an efficient framework for broad statistical inferences in complex dynamic systems. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 40 pages, 4 figures

Journal ref: Journal of the Royal Statistical Society: Series B, 2024

arXiv:2402.05342 [pdf, other]

doi 10.1016/B978-0-12-818630-5.10068-5

Nonlinear Regression Analysis

Authors: Hsin-Hsiung Huang, Qing He

Abstract: Nonlinear regression analysis is a popular and important tool for scientists and engineers. In this article, we introduce theories and methods of nonlinear regression and its statistical inferences using the frequentist and Bayesian statistical modeling and computation. Least squares with the Gauss-Newton method is the most widely used approach to parameters estimation. Under the assumption of nor… ▽ More Nonlinear regression analysis is a popular and important tool for scientists and engineers. In this article, we introduce theories and methods of nonlinear regression and its statistical inferences using the frequentist and Bayesian statistical modeling and computation. Least squares with the Gauss-Newton method is the most widely used approach to parameters estimation. Under the assumption of normally distributed errors, maximum likelihood estimation is equivalent to least squares estimation. The Wald confidence regions for parameters in a nonlinear regression model are affected by the curvatures in the mean function. Furthermore, we introduce the Newton-Raphson method and the generalized least squares method to deal with variance heterogeneity. Examples of simulation data analysis are provided to illustrate important properties of confidence regions and the statistical inferences using the nonlinear least squares estimation and Bayesian inference. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04345 [pdf, other]

doi 10.1016/j.jspi.2023.106098

A Framework of Zero-Inflated Bayesian Negative Binomial Regression Models For Spatiotemporal Data

Authors: Qing He, Hsin-Hsiung Huang

Abstract: Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process pr… ▽ More Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process priors, which have both the simplicity and flexibility in modeling nonlinear relationships through a covariance function. To conquer the computation bottleneck that GPs may suffer when the sample size is large, we adopt the nearest-neighbor GP approach that approximates the covariance matrix using local experts. For the simulation study, we adopt multiple settings with varying sizes of spatial locations to evaluate the performance of the proposed model such as spatial and temporal random effects estimation and compare the result to other methods. We also apply the proposed model to the COVID-19 death counts in the state of Florida, USA from 3/25/2020 through 7/29/2020 to examine relationships between social vulnerability and COVID-19 deaths. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Journal ref: Journal of Statistical Planning and Inference (2024). 229, 106098

arXiv:2402.00778 [pdf, other]

doi 10.1080/10485252.2024.2313137

Robust Sufficient Dimension Reduction via $α$-Distance Covariance

Authors: Hsin-Hsiung Huang, Feng Yu, Teng Zhang

Abstract: We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $α$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence prop… ▽ More We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $α$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence property of the proposed estimation under some regularity conditions. We compare the performance of our method with existing SDR methods by simulation and real data analysis and show that our algorithm improves the computational efficiency and effectiveness. △ Less

Submitted 4 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.10474 [pdf, other]

LDReg: Local Dimensionality Regularized Self-Supervised Learning

Authors: Hanxun Huang, Ricardo J. G. B. Campello, Sarah Monazam Erfani, Xingjun Ma, Michael E. Houle, James Bailey

Abstract: Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous wor… ▽ More Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels. △ Less

Submitted 14 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: ICLR 2024

arXiv:2401.06990 [pdf, other]

doi 10.1080/01621459.2024.2302198

Graphical Principal Component Analysis of Multivariate Functional Time Series

Authors: Jianbin Tan, Decai Liang, Yongtao Guan, Hui Huang

Abstract: In this paper, we consider multivariate functional time series with a two-way dependence structure: a serial dependence across time points and a graphical interaction among the multiple functions within each time point. We develop the notion of dynamic weak separability, a more general condition than those assumed in literature, and use it to characterize the two-way structure in multivariate func… ▽ More In this paper, we consider multivariate functional time series with a two-way dependence structure: a serial dependence across time points and a graphical interaction among the multiple functions within each time point. We develop the notion of dynamic weak separability, a more general condition than those assumed in literature, and use it to characterize the two-way structure in multivariate functional time series. Based on the proposed weak separability, we develop a unified framework for functional graphical models and dynamic principal component analysis, and further extend it to optimally reconstruct signals from contaminated functional data using graphical-level information. We investigate asymptotic properties of the resulting estimators and illustrate the effectiveness of our proposed approach through extensive simulations. We apply our method to hourly air pollution data that were collected from a monitoring network in China. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Journal of the American Statistical Association (2024)

Journal ref: Journal of the American Statistical Association (2024): 1-24

arXiv:2311.17143 [pdf, other]

Predicting the Age of Astronomical Transients from Real-Time Multivariate Time Series

Authors: Hali Huang, Daniel Muthukrishna, Prajna Nair, Zimi Zhang, Michael Fausnaugh, Torsha Majumder, Ryan J. Foley, George R. Ricker

Abstract: Astronomical transients, such as supernovae and other rare stellar explosions, have been instrumental in some of the most significant discoveries in astronomy. New astronomical sky surveys will soon record unprecedented numbers of transients as sparsely and irregularly sampled multivariate time series. To improve our understanding of the physical mechanisms of transients and their progenitor syste… ▽ More Astronomical transients, such as supernovae and other rare stellar explosions, have been instrumental in some of the most significant discoveries in astronomy. New astronomical sky surveys will soon record unprecedented numbers of transients as sparsely and irregularly sampled multivariate time series. To improve our understanding of the physical mechanisms of transients and their progenitor systems, early-time measurements are necessary. Prioritizing the follow-up of transients based on their age along with their class is crucial for new surveys. To meet this demand, we present the first method of predicting the age of transients in real-time from multi-wavelength time-series observations. We build a Bayesian probabilistic recurrent neural network. Our method can accurately predict the age of a transient with robust uncertainties as soon as it is initially triggered by a survey telescope. This work will be essential for the advancement of our understanding of the numerous young transients being detected by ongoing and upcoming astronomical surveys. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures. Accepted at the NeurIPS 2023 Machine Learning and the Physical Sciences workshop

arXiv:2310.18527 [pdf, other]

Multiple Imputation Method for High-Dimensional Neuroimaging Data

Authors: Tong Lu, Chixiang Chen, Hsin-Hsiung Huang, Peter Kochunov, Elliot Hong, Shuo Chen

Abstract: Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods to address this issue. While multiple imputation is a popular technique for handling missing data, its application to neuroimaging data is hindered by high dimen… ▽ More Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods to address this issue. While multiple imputation is a popular technique for handling missing data, its application to neuroimaging data is hindered by high dimensionality and complex dependence structures of multivariate neuroimaging variables. To tackle this challenge, we propose a novel approach, named High Dimensional Multiple Imputation (HIMA), based on Bayesian models. HIMA develops a new computational strategy for sampling large covariance matrices based on a robustly estimated posterior mode, which drastically enhances computational efficiency and numerical stability. To assess the effectiveness of HIMA, we conducted extensive simulation studies and real-data analysis using neuroimaging data from a Schizophrenia study. HIMA showcases a computational efficiency improvement of over 2000 times when compared to traditional approaches, while also producing imputed datasets with improved precision and stability. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 13 pages, 5 figures

arXiv:2308.12680 [pdf, other]

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

Authors: Hanchi Huang, Li Shen, Deheng Ye, Wei Liu

Abstract: We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduc… ▽ More We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: IEEE Transactions on Neural Networks and Learning Systems

arXiv:2307.01436 [pdf]

doi 10.5121/ijcsit.2023.15305

A PC-Kriging-HDMR integrated with an adaptive sequential sampling strategy for high-dimensional approximate modeling

Authors: Yili Zhang, Hanyan Huang, Mei Xiong, Zengquan Yao

Abstract: High-dimensional complex multi-parameter problems are prevalent in engineering, exceeding the capabilities of traditional surrogate models designed for low/medium-dimensional problems. These models face the curse of dimensionality, resulting in decreased modeling accuracy as the design parameter space expands. Furthermore, the lack of a parameter decoupling mechanism hinders the identification of… ▽ More High-dimensional complex multi-parameter problems are prevalent in engineering, exceeding the capabilities of traditional surrogate models designed for low/medium-dimensional problems. These models face the curse of dimensionality, resulting in decreased modeling accuracy as the design parameter space expands. Furthermore, the lack of a parameter decoupling mechanism hinders the identification of couplings between design variables, particularly in highly nonlinear cases. To address these challenges and enhance prediction accuracy while reducing sample demand, this paper proposes a PC-Kriging-HDMR approximate modeling method within the framework of Cut-HDMR. The method leverages the precision of PC-Kriging and optimizes test point placement through a multi-stage adaptive sequential sampling strategy. This strategy encompasses a first-stage adaptive proportional sampling criterion and a second-stage central-based maximum entropy criterion. Numerical tests and a practical application involving a cantilever beam demonstrate the advantages of the proposed method. Key findings include: (1) The performance of traditional single-surrogate models, such as Kriging, significantly deteriorates in high-dimensional nonlinear problems compared to combined surrogate models under the Cut-HDMR framework (e.g., Kriging-HDMR, PCE-HDMR, SVR-HDMR, MLS-HDMR, and PC-Kriging-HDMR); (2) The number of samples required for PC-Kriging-HDMR modeling increases polynomially rather than exponentially as the parameter space expands, resulting in substantial computational cost reduction; (3) Among existing Cut-HDMR methods, no single approach outperforms the others in all aspects. However, PC-Kriging-HDMR exhibits improved modeling accuracy and efficiency within the desired improvement range compared to PCE-HDMR and Kriging-HDMR, demonstrating robustness. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 17 pages with 7 figures and 9 tables

Journal ref: International Journal of Computer Science & Information Technology (IJCSIT) Vol 15, No 3, June 2023

arXiv:2306.14403 [pdf, other]

doi 10.1145/3580305.3599258

Anomaly Detection with Score Distribution Discrimination

Authors: Minqi Jiang, Songqiao Han, Hailiang Huang

Abstract: Recent studies give more attention to the anomaly detection (AD) methods that can leverage a handful of labeled anomalies along with abundant unlabeled data. These existing anomaly-informed AD methods rely on manually predefined score target(s), e.g., prior constant or margin hyperparameter(s), to realize discrimination in anomaly scores between normal and abnormal data. However, such methods woul… ▽ More Recent studies give more attention to the anomaly detection (AD) methods that can leverage a handful of labeled anomalies along with abundant unlabeled data. These existing anomaly-informed AD methods rely on manually predefined score target(s), e.g., prior constant or margin hyperparameter(s), to realize discrimination in anomaly scores between normal and abnormal data. However, such methods would be vulnerable to the existence of anomaly contamination in the unlabeled data, and also lack adaptation to different data scenarios. In this paper, we propose to optimize the anomaly scoring function from the view of score distribution, thus better retaining the diversity and more fine-grained information of input data, especially when the unlabeled data contains anomaly noises in more practical AD scenarios. We design a novel loss function called Overlap loss that minimizes the overlap area between the score distributions of normal and abnormal samples, which no longer depends on prior anomaly score targets and thus acquires adaptability to various datasets. Overlap loss consists of Score Distribution Estimator and Overlap Area Calculation, which are introduced to overcome challenges when estimating arbitrary score distributions, and to ensure the boundness of training loss. As a general loss component, Overlap loss can be effectively integrated into multiple network architectures for constructing AD models. Extensive experimental results indicate that Overlap loss based AD models significantly outperform their state-of-the-art counterparts, and achieve better performance on different types of anomalies. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted by KDD 2023. Detailed discussions can be found in https://openreview.net/forum?id=P1Worw-M1Tf&referrer=[the%20profile%20of%20Minqi%20Jiang](/profile?id=~Minqi_Jiang2)

arXiv:2306.07456 [pdf]

On the Temporal-spatial Analysis of Estimating Urban Traffic Patterns Via GPS Trace Data of Car-hailing Vehicles

Authors: Jiannan Mao, Lan Liu, Hao Huang, Weike Lu, Kaiyu Yang, Tianli Tang, Haotian Shi

Abstract: Car-hailing services have become a prominent data source for urban traffic studies. Extracting useful information from car-hailing trace data is essential for effective traffic management, while discrepancies between car-hailing vehicles and urban traffic should be considered. This paper proposes a generic framework for estimating and analyzing urban traffic patterns using car-hailing trace data.… ▽ More Car-hailing services have become a prominent data source for urban traffic studies. Extracting useful information from car-hailing trace data is essential for effective traffic management, while discrepancies between car-hailing vehicles and urban traffic should be considered. This paper proposes a generic framework for estimating and analyzing urban traffic patterns using car-hailing trace data. The framework consists of three layers: the data layer, the interactive software layer, and the processing method layer. By pre-processing car-hailing GPS trace data with operations such as data cutting, map matching, and trace correction, the framework generates tensor matrices that estimate traffic patterns for car-hailing vehicle flow and average road speed. An analysis block based on these matrices examines the relationships and differences between car-hailing vehicles and urban traffic patterns, which have been overlooked in previous research. Experimental results demonstrate the effectiveness of the proposed framework in examining temporal-spatial patterns of car-hailing vehicles and urban traffic. For temporal analysis, urban road traffic displays a bimodal characteristic while car-hailing flow exhibits a 'multi-peak' pattern, fluctuating significantly during holidays and thus generating a hierarchical structure. For spatial analysis, the heat maps generated from the matrices exhibit certain discrepancies, but the spatial distribution of hotspots and vehicle aggregation areas remains similar. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2303.12834 [pdf, other]

The power and limitations of learning quantum dynamics incoherently

Authors: Sofiene Jerbi, Joe Gibbs, Manuel S. Rudolph, Matthias C. Caro, Patrick J. Coles, Hsin-Yuan Huang, Zoë Holmes

Abstract: Quantum process learning is emerging as an important tool to study quantum systems. While studied extensively in coherent frameworks, where the target and model system can share quantum information, less attention has been paid to whether the dynamics of quantum systems can be learned without the system and target directly interacting. Such incoherent frameworks are practically appealing since the… ▽ More Quantum process learning is emerging as an important tool to study quantum systems. While studied extensively in coherent frameworks, where the target and model system can share quantum information, less attention has been paid to whether the dynamics of quantum systems can be learned without the system and target directly interacting. Such incoherent frameworks are practically appealing since they open up methods of transpiling quantum processes between the different physical platforms without the need for technically challenging hybrid entanglement schemes. Here we provide bounds on the sample complexity of learning unitary processes incoherently by analyzing the number of measurements that are required to emulate well-established coherent learning strategies. We prove that if arbitrary measurements are allowed, then any efficiently representable unitary can be efficiently learned within the incoherent framework; however, when restricted to shallow-depth measurements only low-entangling unitaries can be learned. We demonstrate our incoherent learning algorithm for low entangling unitaries by successfully learning a 16-qubit unitary on \texttt{ibmq\_kolkata}, and further demonstrate the scalabilty of our proposed algorithm through extensive numerical experiments. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 6+9 pages, 7 figures

Report number: LA-UR-23-22871

arXiv:2303.09491 [pdf, other]

doi 10.1038/s43588-022-00311-3

Challenges and Opportunities in Quantum Machine Learning

Authors: M. Cerezo, Guillaume Verdon, Hsin-Yuan Huang, Lukasz Cincio, Patrick J. Coles

Abstract: At the intersection of machine learning and quantum computing, Quantum Machine Learning (QML) has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry, and high-energy physics. Nevertheless, challenges remain regarding the trainability of QML models. Here we review current methods and applications for QML. We highlight diff… ▽ More At the intersection of machine learning and quantum computing, Quantum Machine Learning (QML) has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry, and high-energy physics. Nevertheless, challenges remain regarding the trainability of QML models. Here we review current methods and applications for QML. We highlight differences between quantum and classical machine learning, with a focus on quantum neural networks and quantum deep learning. Finally, we discuss opportunities for quantum advantage with QML. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 14 pages, 5 figures

Report number: LA-UR-21-31504

Journal ref: Nature Computational Science 2, 567-576 (2022)

arXiv:2302.14618 [pdf, other]

Barycenter Estimation of Positive Semi-Definite Matrices with Bures-Wasserstein Distance

Authors: Jingyi Zheng, Huajun Huang, Yuyan Yi, Yuexin Li, Shu-Chin Lin

Abstract: Brain-computer interface (BCI) builds a bridge between human brain and external devices by recording brain signals and translating them into commands for devices to perform the user's imagined action. The core of the BCI system is the classifier that labels the input signals as the user's imagined action. The classifiers that directly classify covariance matrices using Riemannian geometry are wide… ▽ More Brain-computer interface (BCI) builds a bridge between human brain and external devices by recording brain signals and translating them into commands for devices to perform the user's imagined action. The core of the BCI system is the classifier that labels the input signals as the user's imagined action. The classifiers that directly classify covariance matrices using Riemannian geometry are widely used not only in BCI domain but also in a variety of fields including neuroscience, remote sensing, biomedical imaging, etc. However, the existing Affine-Invariant Riemannian-based methods treat covariance matrices as positive definite while they are indeed positive semi-definite especially for high dimensional data. Besides, the Affine-Invariant Riemannian-based barycenter estimation algorithms become time consuming, not robust, and have convergence issues when the dimension and number of covariance matrices become large. To address these challenges, in this paper, we establish the mathematical foundation for Bures-Wasserstein distance and propose new algorithms to estimate the barycenter of positive semi-definite matrices efficiently and robustly. Both theoretical and computational aspects of Bures-Wasserstein distance and barycenter estimation algorithms are discussed. With extensive simulations, we comprehensively investigate the accuracy, efficiency, and robustness of the barycenter estimation algorithms coupled with Bures-Wasserstein distance. The results show that Bures-Wasserstein based barycenter estimation algorithms are more efficient and robust. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2302.05686 [pdf, other]

A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Authors: Kevin H. Huang, Xing Liu, Andrew B. Duncan, Axel Gandy

Abstract: We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate… ▽ More We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth. △ Less

Submitted 2 July, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: COLT camera-ready version

arXiv:2301.04204 [pdf, other]

A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

Authors: Chuan He, Heng Huang, Zhaosong Lu

Abstract: In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this p… ▽ More In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this problem. Under some mild assumptions, we show that our method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of general nonconvex conic optimization with high probability. Moreover, under a constraint qualification, these complexity bounds are improved to $\widetilde{\cal O}(ε^{-7/2})$ and $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$, respectively. To the best of our knowledge, this is the first study on the complexity of finding an approximate SOSP of general nonconvex conic optimization. Preliminary numerical results are presented to demonstrate superiority of the proposed method over first-order methods in terms of solution quality. △ Less

Submitted 30 August, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: To appear in Computational Optimization and Applications. arXiv admin note: text overlap with arXiv:2301.03139

MSC Class: 49M05; 49M15; 68Q25; 90C26; 90C30; 90C60

arXiv:2211.10541 [pdf, ps, other]

Phase transition and higher order analysis of $L_q$ regularization under dependence

Authors: Hanwen Huang, Peng Zeng, Qinglong Yang

Abstract: We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the… ▽ More We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the formulation $\hat{\mbox{$β$}}(λ,q)=\text{argmin}_{\mbox{$β$}\in{\bf R}^p}\frac{1}{2}\|{\bf y}-{\bf X}{\mbox{$β$}}\|^2_2+λ\|{\mbox{$β$}}\|_q^q$, where $\|\cdot\|_q$ $(0\le q\le 2)$ denotes the $L_q$-norm. In the setting $p,n,k\rightarrow\infty$ with fixed $k/p=ε$ and $n/p=δ$, we derive the asymptotic risk of $\hat{\mbox{$β$}}(λ,q)$ for arbitrary covariance matrix ${\mbox{$Σ$}}$ which generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim}N(0,1)$. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\mbox{$Σ$}}$ in the cases $0\le q< 1$ and $1< q\le 2$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\mbox{$Σ$}}$ on the performance of LQLS in the cases $0\le q< 1$ and $1<q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results. △ Less

Submitted 1 December, 2022; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: 35 pages, 11 figures

arXiv:2210.08231 [pdf, other]

Assessing Spatial Stationarity and Segmenting Spatial Processes into Stationary Components

Authors: ShengLi Tzeng, Bo-Yu Chen, Hsin-Cheng Huang

Abstract: In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes tha… ▽ More In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes that lack repeated measurements. We implement the fused lasso technique to elucidate nonstationary patterns at various resolutions. For prediction purposes, we segment the spatial domain into stationary sub-regions via Voronoi tessellations. Additionally, we devise a robust test for stationarity based on contrasting the sample means of our proposed statistics between two selected Voronoi subregions. The effectiveness of our method is demonstrated through simulation studies and its application to a precipitation dataset in Colorado. △ Less

Submitted 28 August, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 25 pages, 5 tables, 11 figures

MSC Class: 62M30

arXiv:2208.11411 [pdf, other]

doi 10.1103/PhysRevResearch.5.013090

Spectrum of non-Hermitian deep-Hebbian neural networks

Authors: Zijian Jiang, Ziming Chen, Tianqi Hou, Haiping Huang

Abstract: Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacob… ▽ More Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacobian matrix in neural dynamics. The spectra bears several distinct features, such as breaking rotational symmetry about the origin, and the emergence of nested voids within the spectrum boundary. The spectral density is thus highly non-uniformly distributed in the complex plane. The random matrix theory also predicts a transition to chaos. In particular, the edge of chaos provides computational benefits for the sequential retrieval of memories. Our work provides a systematic study of time-lagged correlations with arbitrary time delays, and thus can inspire future studies of a broad class of memory models, and even big data analysis of biological time series. △ Less

Submitted 16 January, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: 65 pages, 12 figures, revised version for publication

Journal ref: Phys. Rev. Research 5, 013090 (2023)

arXiv:2208.06058 [pdf, other]

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Authors: Runxue Bao, Bin Gu, Heng Huang

Abstract: Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite nu… ▽ More Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2207.01813 [pdf, other]

Stochastic Variational Methods in Generalized Hidden Semi-Markov Models to Characterize Functionality in Random Heteropolymers

Authors: Yun Zhou, Boying Gong, Tao Jiang, Ting Xu, Haiyan Huang

Abstract: Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP),… ▽ More Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP), which transport protons across lipid bilayers selectively and rapidly like natural proton channels. We utilized the probabilistic graphical model framework and developed a generalized hidden semi-Markov model (GHSMM-RHP) to extract the function-determining sequence features, including the transmembrane segments within a chain and the sequence heterogeneity among different chains. We developed stochastic variational methods for efficient inference on parameter estimation and predictions, and empirically studied their computational performance from a comparative perspective on Bayesian (i.e., stochastic variational Bayes) versus frequentist (i.e., stochastic variational expectation-maximization) frameworks that have been studied separately before. The real data results agree well with the laboratory experiments, and suggest GHSMM-RHP's potential in predicting protein-like behavior at the polymer-chain level. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.08353 [pdf, other]

Towards Understanding How Machines Can Learn Causal Overhypotheses

Authors: Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

Abstract: Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence. The extensive literature in cognitive science using the ``blicket detector'' environment shows that children are adept at many kinds of causal inference and learning. We propose to adapt that environment for machine learning agents. One of the k… ▽ More Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence. The extensive literature in cognitive science using the ``blicket detector'' environment shows that children are adept at many kinds of causal inference and learning. We propose to adapt that environment for machine learning agents. One of the key challenges for current machine learning algorithms is modeling and understanding causal overhypotheses: transferable abstract hypotheses about sets of causal relationships. In contrast, even young children spontaneously learn and use causal overhypotheses. In this work, we present a new benchmark -- a flexible environment which allows for the evaluation of existing techniques under variable causal overhypotheses -- and demonstrate that many existing state-of-the-art methods have trouble generalizing in this environment. The code and resources for this benchmark are available at https://github.com/CannyLab/casual_overhypotheses. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.06463 [pdf, other]

A statistical reconstruction algorithm for positronium lifetime imaging using time-of-flight positron emission tomography

Authors: Hsin-Hsiung Huang, Zheyuan Zhu, Slun Booppasiri, Zhuo Chen, Shuo Pang, Chien-Min Kao

Abstract: Positron emission tomography (PET) is an important modality for diagnosing diseases such as cancer and Alzheimer's disease, capable of revealing the uptake of radiolabeled molecules that target specific pathological markers of the diseases. Recently, positronium lifetime imaging (PLI) that adds to traditional PET the ability to explore properties of the tissue microenvironment beyond tracer uptake… ▽ More Positron emission tomography (PET) is an important modality for diagnosing diseases such as cancer and Alzheimer's disease, capable of revealing the uptake of radiolabeled molecules that target specific pathological markers of the diseases. Recently, positronium lifetime imaging (PLI) that adds to traditional PET the ability to explore properties of the tissue microenvironment beyond tracer uptake has been demonstrated with time-of-flight (TOF) PET and the use of non-pure positron emitters. However, achieving accurate reconstruction of lifetime images from data acquired by systems having a finite TOF resolution still presents a challenge. This paper focuses on the two-dimensional PLI, introducing a maximum likelihood estimation (MLE) method that employs an exponentially modified Gaussian (EMG) probability distribution that describes the positronium lifetime data produced by TOF PET. We evaluate the performance of our EMG-based MLE method against \st{traditional} approaches using exponential likelihood functions and penalized surrogate methods. Results from computer-simulated data reveal that the proposed EMG-MLE method can yield quantitatively accurate lifetime images. We also demonstrate that the proposed MLE formulation can be extended to handle PLI data containing multiple positron populations. △ Less

Submitted 9 May, 2024; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: Submitted to IEEE-TPRMS

arXiv:2205.07833 [pdf, other]

Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Authors: Yuting Ye, Christine Ho, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang

Abstract: Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile i… ▽ More Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile ii) ensuring the given class hierarchy. To address these challenges, in this article, we introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes in descending order of their true mLPRs can, in theory, ensure the class hierarchy and lead to the maximization of CATCH, an objective function we introduce that is related to the area under a hit curve. This approach is the first of its kind that handles both challenges in one objective function without additional constraints, thanks to the desirable statistical properties of CATCH and mLPR. In practice, however, true mLPRs are not available. In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy. The performance of this approach was evaluated on a synthetic data set and two real data sets; ours was found to be superior to several comparison methods on evaluation criteria based on metrics such as precision, recall, and $F_1$ score. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 34 pages, 11 figures, 9 tables

arXiv:2205.07106 [pdf, ps, other]

Robust Regularized Low-Rank Matrix Models for Regression and Classification

Authors: Hsin-Hsiung Huang, Feng Yu, Xing Fan, Teng Zhang

Abstract: While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regulariz… ▽ More While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regularization (e.g., sparsity), and a general loss function with three special cases considered: ordinary matrix regression, robust matrix regression, and matrix logistic regression. We also propose an alternating projected gradient descent algorithm. Based on analyzing our objective functions on manifolds with bounded curvature, we show that the algorithm is guaranteed to converge, all accumulation points of the iterates have estimation errors in the order of $O(1/\sqrt{n})$ asymptotically and substantially attaining the minimax rate. Our theoretical analysis can be applied to general optimization problems on manifolds with bounded curvature and can be considered an important technical contribution to this work. We validate the proposed method through simulation studies and real image data examples. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: 26 pages, 7 figures

MSC Class: 62J12

arXiv:2204.10981 [pdf, other]

Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

Authors: Runxue Bao, Xidong Wu, Wenhan Xian, Heng Huang

Abstract: Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Neverth… ▽ More Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Nevertheless, existing safe screening methods are limited to the sequential setting. In this paper, we propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively, which can achieve significant speedup without any loss of accuracy by simultaneously enjoying the sparsity of the model and dataset. To the best of our knowledge, this is the first work of distributed safe dynamic screening method. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2204.10268 [pdf, other]

doi 10.1038/s41467-023-39381-w

Out-of-distribution generalization for learning quantum dynamics

Authors: Matthias C. Caro, Hsin-Yuan Huang, Nicholas Ezzell, Joe Gibbs, Andrew T. Sornborger, Lukasz Cincio, Patrick J. Coles, Zoë Holmes

Abstract: Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we req… ▽ More Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a different distribution to the training distribution. Here, we prove out-of-distribution generalization for the task of learning an unknown unitary. In particular, we show that one can learn the action of a unitary on entangled states having trained only product states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics on near term quantum hardware, and further opens up new methods for both the classical and quantum compilation of quantum circuits. △ Less

Submitted 9 July, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 8 pages (main body) + 18 pages (references and appendix); 4+2 figures; V3 includes additional explanations and numerical experiments in the appendix

Report number: LA-UR-22-23623

Journal ref: Nat Commun 14, 3751 (2023)

arXiv:2203.04272 [pdf, other]

Policy-Based Bayesian Experimental Design for Non-Differentiable Implicit Models

Authors: Vincent Lim, Ellen Novoseller, Jeffrey Ichnowski, Huang Huang, Ken Goldberg

Abstract: For applications in healthcare, physics, energy, robotics, and many other fields, designing maximally informative experiments is valuable, particularly when experiments are expensive, time-consuming, or pose safety hazards. While existing approaches can sequentially design experiments based on prior observation history, many of these methods do not extend to implicit models, where simulation is po… ▽ More For applications in healthcare, physics, energy, robotics, and many other fields, designing maximally informative experiments is valuable, particularly when experiments are expensive, time-consuming, or pose safety hazards. While existing approaches can sequentially design experiments based on prior observation history, many of these methods do not extend to implicit models, where simulation is possible but computing the likelihood is intractable. Furthermore, they often require either significant online computation during deployment or a differentiable simulation system. We introduce Reinforcement Learning for Deep Adaptive Design (RL-DAD), a method for simulation-based optimal experimental design for non-differentiable implicit models. RL-DAD extends prior work in policy-based Bayesian Optimal Experimental Design (BOED) by reformulating it as a Markov Decision Process with a reward function based on likelihood-free information lower bounds, which is used to learn a policy via deep reinforcement learning. The learned design policy maps prior histories to experiment designs offline and can be quickly deployed during online execution. We evaluate RL-DAD and find that it performs competitively with baselines on three benchmarks. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: 15 pages, 3 figures

arXiv:2203.02433 [pdf, ps, other]

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Authors: Maxime Gasse, Quentin Cappart, Jonas Charfreitag, Laurent Charlin, Didier Chételat, Antonia Chmiela, Justin Dumouchelle, Ambros Gleixner, Aleksandr M. Kazachkov, Elias Khalil, Pawel Lichocki, Andrea Lodi, Miles Lubin, Chris J. Maddison, Christopher Morris, Dimitri J. Papageorgiou, Augustin Parjadis, Sebastian Pokutta, Antoine Prouvost, Lara Scavuzzo, Giulia Zarpellon, Linxin Yang, Sha Lai, Akang Wang, Xiaodong Luo , et al. (16 additional authors not shown)

Abstract: Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir… ▽ More Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants. △ Less

Submitted 17 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Neurips 2021 competition. arXiv admin note: text overlap with arXiv:2112.12251 by other authors

arXiv:2202.11592 [pdf, ps, other]

A Law of Robustness beyond Isoperimetry

Authors: Yihan Wu, Heng Huang, Hongyang Zhang

Abstract: We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness. Robust interpolation refers to the problem of interpolating $n$ noisy training data points in $\mathbb{R}^d$ by a Lipschitz function. Although this problem has been well understood when the samples are drawn from an isoperimetry distribution, much remains… ▽ More We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness. Robust interpolation refers to the problem of interpolating $n$ noisy training data points in $\mathbb{R}^d$ by a Lipschitz function. Although this problem has been well understood when the samples are drawn from an isoperimetry distribution, much remains unknown concerning its performance under generic or even the worst-case distributions. We prove a Lipschitzness lower bound $Ω(\sqrt{n/p})$ of the interpolating neural network with $p$ parameters on arbitrary data distributions. With this result, we validate the law of robustness conjecture in prior work by Bubeck, Li, and Nagaraj on two-layer neural networks with polynomial weights. We then extend our result to arbitrary interpolating approximators and prove a Lipschitzness lower bound $Ω(n^{1/d})$ for robust interpolation. Our results demonstrate a two-fold law of robustness: i) we show the potential benefit of overparametrization for smooth data interpolation when $n=\mathrm{poly}(d)$, and ii) we disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=\exp(ω(d))$. △ Less

Submitted 31 May, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: To appear in ICML 2023

arXiv:2202.09134 [pdf, other]

Data Augmentation in the Underparameterized and Overparameterized Regimes

Authors: Kevin Han Huang, Peter Orbanz, Morgane Austern

Abstract: We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction ris… ▽ More We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables. △ Less

Submitted 28 September, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Changed title and added an analysis on the effect of augmentations on the double-descent risk curve of a high-dimensional ridgeless estimator

arXiv:2111.13428 [pdf, other]

Nonstationary Spatial Modeling of Massive Global Satellite Data

Authors: Huang Huang, Lewis R. Blake, Matthias Katzfuss, Dorit M. Hammerling

Abstract: Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete,… ▽ More Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2111.13302 [pdf, ps, other]

doi 10.1103/PhysRevResearch.4.023023

Equivalence between algorithmic instability and transition to replica symmetry breaking in perceptron learning systems

Authors: Yang Zhao, Junbin Qiu, Mingshan Xie, Haiping Huang

Abstract: Binary perceptron is a fundamental model of supervised learning for the non-convex optimization, which is a root of the popular deep learning. Binary perceptron is able to achieve a classification of random high-dimensional data by computing the marginal probabilities of binary synapses. The relationship between the algorithmic instability and the equilibrium analysis of the model remains elusive.… ▽ More Binary perceptron is a fundamental model of supervised learning for the non-convex optimization, which is a root of the popular deep learning. Binary perceptron is able to achieve a classification of random high-dimensional data by computing the marginal probabilities of binary synapses. The relationship between the algorithmic instability and the equilibrium analysis of the model remains elusive. Here, we establish the relationship by showing that the instability condition around the algorithmic fixed point is identical to the instability for breaking the replica symmetric saddle point solution of the free energy function. Therefore, our analysis would hopefully provide insights towards other learning systems in bridging the gap between non-convex learning dynamics and statistical mechanics properties of more complex neural networks. △ Less

Submitted 7 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: 24 pages, 2 figures, revision to journal

Journal ref: Phys. Rev. Research 4, 023023 (2022)

arXiv:2111.10734 [pdf, other]

Deep Probability Estimation

Authors: Sheng Liu, Aakash Kaku, Weicheng Zhu, Matan Leibovich, Sreyas Mohan, Boyang Yu, Haoxiang Huang, Laure Zanna, Narges Razavian, Jonathan Niles-Weed, Carlos Fernandez-Granda

Abstract: Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous t… ▽ More Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the difference that the objective is to estimate probabilities rather than predicting the specific outcome. This work investigates probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on model (epistemic) uncertainty. For problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. We also give a theoretical analysis of a model for high-dimensional probability estimation which reproduces several of the phenomena evinced in our experiments. Finally, we propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data. △ Less

Submitted 11 October, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: SL, AK, WZ, ML, SM contributed equally to this work; 36 pages, 17 figures, 12 tables

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:13746-13781, 2022

arXiv:2111.05292 [pdf, other]

doi 10.1038/s41467-022-32550-3

Generalization in quantum machine learning from few training data

Authors: Matthias C. Caro, Hsin-Yuan Huang, M. Cerezo, Kunal Sharma, Andrew Sornborger, Lukasz Cincio, Patrick J. Coles

Abstract: Modern quantum machine learning (QML) methods involve variationally optimizing a parameterized quantum circuit on a training data set, and subsequently making predictions on a testing data set (i.e., generalizing). In this work, we provide a comprehensive study of generalization performance in QML after training on a limited number $N$ of training data points. We show that the generalization error… ▽ More Modern quantum machine learning (QML) methods involve variationally optimizing a parameterized quantum circuit on a training data set, and subsequently making predictions on a testing data set (i.e., generalizing). In this work, we provide a comprehensive study of generalization performance in QML after training on a limited number $N$ of training data points. We show that the generalization error of a quantum machine learning model with $T$ trainable gates scales at worst as $\sqrt{T/N}$. When only $K \ll T$ gates have undergone substantial change in the optimization process, we prove that the generalization error improves to $\sqrt{K / N}$. Our results imply that the compiling of unitaries into a polynomial number of native gates, a crucial application for the quantum computing industry that typically uses exponential-size training data, can be sped up significantly. We also show that classification of quantum states across a phase transition with a quantum convolutional neural network requires only a very small training data set. Other potential applications include learning quantum error correcting codes or quantum dynamical simulation. Our work injects new hope into the field of QML, as good generalization is guaranteed from few training data. △ Less

Submitted 5 September, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: 14+26 pages, 4+1 figures

Report number: LA-UR-21-31086

Journal ref: Nat Commun 13, 4919 (2022)

arXiv:2111.04951 [pdf, other]

American Hate Crime Trends Prediction with Event Extraction

Authors: Songqiao Han, Hailiang Huang, Jiangwei Liu, Shengsheng Xiao

Abstract: Social media platforms may provide potential space for discourses that contain hate speech, and even worse, can act as a propagation mechanism for hate crimes. The FBI's Uniform Crime Reporting (UCR) Program collects hate crime data and releases statistic report yearly. These statistics provide information in determining national hate crime trends. The statistics can also provide valuable holistic… ▽ More Social media platforms may provide potential space for discourses that contain hate speech, and even worse, can act as a propagation mechanism for hate crimes. The FBI's Uniform Crime Reporting (UCR) Program collects hate crime data and releases statistic report yearly. These statistics provide information in determining national hate crime trends. The statistics can also provide valuable holistic and strategic insight for law enforcement agencies or justify lawmakers for specific legislation. However, the reports are mostly released next year and lag behind many immediate needs. Recent research mainly focuses on hate speech detection in social media text or empirical studies on the impact of a confirmed crime. This paper proposes a framework that first utilizes text mining techniques to extract hate crime events from New York Times news, then uses the results to facilitate predicting American national-level and state-level hate crime trends. Experimental results show that our method can significantly enhance the prediction performance compared with time series or regression methods without event-related factors. Our framework broadens the methods of national-level and state-level hate crime trends prediction. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2111.03252 [pdf, other]

Test of Weak Separability for Spatially Stationary Functional Field

Authors: Decai Liang, Hui Huang, Yongtao Guan, Fang Yao

Abstract: For spatially dependent functional data, a generalized Karhunen-Loève expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separabilit… ▽ More For spatially dependent functional data, a generalized Karhunen-Loève expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separability, and propose a formal test to examine its validity for non-replicated spatially stationary functional field. The asymptotic distribution of the test statistic that adapts to potentially diverging ranks is derived by constructing lag covariance estimation, which is easy to compute for practical implementation. We demonstrate the efficacy of the proposed test via simulations and illustrate its usefulness in two real examples: China PM$_{2.5}$ data and Harvard Forest data. △ Less

Submitted 5 November, 2021; originally announced November 2021.

arXiv:2110.08005 [pdf, other]

Spatially Adaptive Calibrations of AirBox PM$_{2.5}$ Data

Authors: ShengLi Tzeng, Chi-Wei Lai, Hsin-Cheng Huang

Abstract: Two networks are available to monitor PM$_{2.5}$ in Taiwan, including the Taiwan Air Quality Monitoring Network (TAQMN) and the AirBox network. The TAQMN, managed by Taiwan's Environmental Protection Administration (EPA), provides high-quality PM$_{2.5}$ measurements at $77$ monitoring stations. More recently, the AirBox network was launched, consisting of low-cost, small internet-of-things (IoT)… ▽ More Two networks are available to monitor PM$_{2.5}$ in Taiwan, including the Taiwan Air Quality Monitoring Network (TAQMN) and the AirBox network. The TAQMN, managed by Taiwan's Environmental Protection Administration (EPA), provides high-quality PM$_{2.5}$ measurements at $77$ monitoring stations. More recently, the AirBox network was launched, consisting of low-cost, small internet-of-things (IoT) microsensors (i.e., AirBoxes) at thousands of locations. While the AirBox network provides broad spatial coverage, its measurements are not reliable and require calibrations. However, applying a universal calibration procedure to all AirBoxes does not work well because the calibration curves vary with several factors, including the chemical compositions of PM$_{2.5}$, which are not homogeneous in space. Therefore, different calibrations are needed at different locations with different local environments. Unfortunately, most AirBoxes are not close to EPA stations, making the calibration task challenging. In this article, we propose a spatial model with spatially varying coefficients to account for heteroscedasticity in the data. Our method gives adaptive calibrations of AirBoxes according to their local conditions and provides accurate PM$_{2.5}$ concentrations at any location in Taiwan, incorporating two types of measurements. In addition, the proposed method automatically calibrates measurements from a new AirBox once it is added to the network. We illustrate our approach using hourly PM$_{2.5}$ data in the year 2020. After the calibration, the results show that the PM$_{2.5}$ prediction improves about 37% to 67% in root mean-squared prediction error for matching EPA data. In particular, once the calibration curves are established, we can obtain reliable PM$_{2.5}$ values at any location in Taiwan, even if we ignore EPA data. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: 21 pages, 9 figures

MSC Class: 62P12

arXiv:2110.03825 [pdf, other]

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Authors: Hanxun Huang, Yisen Wang, Sarah Monazam Erfani, Quanquan Gu, James Bailey, Xingjun Ma

Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to… ▽ More Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. These architectural insights can help design adversarially robust DNNs. Code is available at \url{https://github.com/HanxunH/RobustWRN}. △ Less

Submitted 22 January, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021

Showing 1–50 of 158 results for author: Huang, H