Search | arXiv e-print repository

Determining number of factors under stability considerations

Abstract: This paper proposes a novel method for determining the number of factors in linear factor models under stability considerations. An instability measure is proposed based on the principal angle between the estimated loading spaces obtained by data splitting. Based on this measure, criteria for determining the number of factors are proposed and shown to be consistent. This consistency is obtained us… ▽ More This paper proposes a novel method for determining the number of factors in linear factor models under stability considerations. An instability measure is proposed based on the principal angle between the estimated loading spaces obtained by data splitting. Based on this measure, criteria for determining the number of factors are proposed and shown to be consistent. This consistency is obtained using results from random matrix theory, especially the complete delocalization of non-outlier eigenvectors. The advantage of the proposed methods over the existing ones is shown via weaker asymptotic requirements for consistency, simulation studies and a real data example. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 23 pages, 3 figures

arXiv:2409.07392 [pdf, other]

A Scalable Algorithm for Active Learning

Authors: Youguang Chen, Zheyu Wen, George Biros

Abstract: FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to… ▽ More FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: To be appeared at SC'24. Link: https://sc24.conference-program.com/presentation/?id=pap624&sess=sess397

arXiv:2409.06490 [pdf, other]

UAVDB: Trajectory-Guided Adaptable Bounding Boxes for UAV Detection

Authors: Yu-Hsi Chen

Abstract: With the rapid development of drone technology, accurate detection of Unmanned Aerial Vehicles (UAVs) has become essential for applications such as surveillance, security, and airspace management. In this paper, we propose a novel trajectory-guided method, the Patch Intensity Convergence (PIC) technique, which generates high-fidelity bounding boxes for UAV detection tasks and no need for the effor… ▽ More With the rapid development of drone technology, accurate detection of Unmanned Aerial Vehicles (UAVs) has become essential for applications such as surveillance, security, and airspace management. In this paper, we propose a novel trajectory-guided method, the Patch Intensity Convergence (PIC) technique, which generates high-fidelity bounding boxes for UAV detection tasks and no need for the effort required for labeling. The PIC technique forms the foundation for developing UAVDB, a database explicitly created for UAV detection. Unlike existing datasets, which often use low-resolution footage or focus on UAVs in simple backgrounds, UAVDB employs high-resolution video to capture UAVs at various scales, ranging from hundreds of pixels to nearly single-digit sizes. This broad-scale variation enables comprehensive evaluation of detection algorithms across different UAV sizes and distances. Applying the PIC technique, we can also efficiently generate detection datasets from trajectory or positional data, even without size information. We extensively benchmark UAVDB using YOLOv8 series detectors, offering a detailed performance analysis. Our findings highlight UAVDB's potential as a vital database for advancing UAV detection, particularly in high-resolution and long-distance tracking scenarios. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 7 pages, 5 figures, 3 tables

arXiv:2409.03980 [pdf, other]

Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows

Authors: Yudong Chen, Xumei Xi, Christina Lee Yu

Abstract: Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we i… ▽ More Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern. For additive matrices, the particular flow we used is the electrical flow and we establish error upper bounds customized to each entry as a function of the observation set, along with matching minimax lower bounds. Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph. Furthermore, we show that our estimator is equivalent to the least squares estimator. We apply our estimator to the two-way fixed effects model and show that it enables us to accurately infer individual causal effects and the unit-specific and time-specific confounders. For rank-$1$ matrices, we use edge-disjoint paths to form an estimator that achieves minimax optimal estimation when the sampling is sufficiently dense. Our discovery introduces a new family of estimators parametrized by network flows, which provide a fine-grained and intuitive understanding of the impact of the given sampling pattern on the relative difficulty of estimation at an entry-specific level. This graph-based approach allows us to quantify the inherent complexity of matrix completion for individual entries, rather than relying solely on global measures of performance. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.01410 [pdf, other]

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

Authors: Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian, Anthony Quinn, Fadwa Idlahcen, Yiran Chen

Abstract: Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather t… ▽ More Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather than oriented towards any particular learning task. In this work, we present a formal model of DD, arguing that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest. Without this task-specific focus, the DD problem is under-specified, and the selection of a DD algorithm for a particular task is merely heuristic. Our formalization reveals novel applications of DD across different modeling environments. We analyze existing DD methods through this broader lens, highlighting their strengths and limitations in terms of accuracy and faithfulness to optimal DD operation. Finally, we present numerical results for two case studies important in contemporary settings. Firstly, we address a critical challenge in medical data analysis: merging the knowledge from different datasets composed of intersecting, but not identical, sets of features, in order to construct a larger dataset in what is usually a small sample setting. Secondly, we consider out-of-distribution error across boundary conditions for physics-informed neural networks (PINNs), showing the potential for DD to provide more physically faithful data. By establishing this general formulation of DD, we aim to establish a new research paradigm by which DD can be understood and from which new DD techniques can arise. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.01194 [pdf, other]

Tonal coarticulation revisited: functional covariance analysis to investigate the planning of co-articulated tones by Standard Chinese speakers

Authors: Valentina Masarotto, Yiya Chen

Abstract: We aim to explain whether a stress memory task has a significant impact on tonal coarticulation. We contribute a novel approach to analyse tonal coarticulation in phonetics, where several f0 contours are compared with respect to their vibrations at higher resolution, something that in statistical terms is called variation of the second order. We identify speech recording frequency curves as functi… ▽ More We aim to explain whether a stress memory task has a significant impact on tonal coarticulation. We contribute a novel approach to analyse tonal coarticulation in phonetics, where several f0 contours are compared with respect to their vibrations at higher resolution, something that in statistical terms is called variation of the second order. We identify speech recording frequency curves as functional observations and harness inspiration from the mathematical fields of functional data analysis and optimal transport. By leveraging results from these two disciplines, we make one key observation:we identify the time and frequency covariance functions as crucial features for capturing the finer effects of tonal coarticulation. This observation leads us to propose a 2 steps approach where the mean functions are modelled via Generalized Additive Models, and the residuals of such models are investigated for any structure nested at covariance level. If such structure exist, we describe the variation manifested by the covariances through covariance principal component analysis. The 2-steps approach allows to uncover any variation not explained by generalized additive modelling, as well as fill a known shortcoming of these models into incorporating complex correlation structures in the data. The proposed method is illustrated on an articulatory dataset contrasting the pronunciation non-sensical bi-syllabic combinations in the presence of a short-memory challenge △ Less

Submitted 9 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.00843 [pdf, other]

Global Public Sentiment on Decentralized Finance: A Spatiotemporal Analysis of Geo-tagged Tweets from 150 Countries

Authors: Yuqi Chen, Yifan Li, Kyrie Zhixuan Zhou, Xiaokang Fu, Lingbo Liu, Shuming Bao, Daniel Sui, Luyao Zhang

Abstract: In the digital era, blockchain technology, cryptocurrencies, and non-fungible tokens (NFTs) have transformed financial and decentralized systems. However, existing research often neglects the spatiotemporal variations in public sentiment toward these technologies, limiting macro-level insights into their global impact. This study leverages Twitter data to explore public attention and sentiment acr… ▽ More In the digital era, blockchain technology, cryptocurrencies, and non-fungible tokens (NFTs) have transformed financial and decentralized systems. However, existing research often neglects the spatiotemporal variations in public sentiment toward these technologies, limiting macro-level insights into their global impact. This study leverages Twitter data to explore public attention and sentiment across 150 countries, analyzing over 150 million geotagged tweets from 2012 to 2022. Sentiment scores were derived using a BERT-based multilingual sentiment model trained on 7.4 billion tweets. The analysis integrates global cryptocurrency regulations and economic indicators from the World Development Indicators database. Results reveal significant global sentiment variations influenced by economic factors, with more developed nations engaging more in discussions, while less developed countries show higher sentiment levels. Geographically weighted regression indicates that GDP-tweet engagement correlation intensifies following Bitcoin price surges. Topic modeling shows that countries within similar economic clusters share discussion trends, while different clusters focus on distinct topics. This study highlights global disparities in sentiment toward decentralized finance, shaped by economic and regional factors, with implications for poverty alleviation, cryptocurrency crime, and sustainable development. The dataset and code are publicly available on GitHub. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2409.00679 [pdf, other]

Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach

Authors: Jiawei Qiao, Yunxiao Chen, Zhiliang Ying

Abstract: Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires the specification of an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case an exploratory form of bi-factor analysis i… ▽ More Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires the specification of an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case an exploratory form of bi-factor analysis is needed to find the bi-factor structure. Unfortunately, there are few methods for exploratory bi-factor analysis, with the exception of a rotation-based method proposed in Jennrich and Bentler (2011, 2012). However, this method only finds approximate bi-factor structures, as it does not yield an exact bi-factor loading structure, even after applying hard thresholding. In this paper, we propose a constraint-based optimisation method that learns an exact bi-factor loading structure from data, overcoming the issue with the rotation-based method. The key to the proposed method is a mathematical characterisation of the bi-factor loading structure as a set of equality constraints, which allows us to formulate the exploratory bi-factor analysis problem as a constrained optimisation problem in a continuous domain and solve the optimisation problem with an augmented Lagrangian method. The power of the proposed method is shown via simulation studies and a real data example. Extending the proposed method to exploratory hierarchical factor analysis is also discussed. The codes are available on ``https://anonymous.4open.science/r/Bifactor-ALM-C1E6". △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2408.16862 [pdf, other]

Probabilistic Decomposed Linear Dynamical Systems for Robust Discovery of Latent Neural Dynamics

Authors: Yenho Chen, Noga Mudrik, Kyle A. Johnsen, Sankaraleengam Alagapan, Adam S. Charles, Christopher J. Rozell

Abstract: Time-varying linear state-space models are powerful tools for obtaining mathematically interpretable representations of neural signals. For example, switching and decomposed models describe complex systems using latent variables that evolve according to simple locally linear dynamics. However, existing methods for latent variable estimation are not robust to dynamical noise and system nonlinearity… ▽ More Time-varying linear state-space models are powerful tools for obtaining mathematically interpretable representations of neural signals. For example, switching and decomposed models describe complex systems using latent variables that evolve according to simple locally linear dynamics. However, existing methods for latent variable estimation are not robust to dynamical noise and system nonlinearity due to noise-sensitive inference procedures and limited model formulations. This can lead to inconsistent results on signals with similar dynamics, limiting the model's ability to provide scientific insight. In this work, we address these limitations and propose a probabilistic approach to latent variable estimation in decomposed models that improves robustness against dynamical noise. Additionally, we introduce an extended latent dynamics model to improve robustness against system nonlinearities. We evaluate our approach on several synthetic dynamical systems, including an empirically-derived brain-computer interface experiment, and demonstrate more accurate latent variable inference in nonlinear systems with diverse noise conditions. Furthermore, we apply our method to a real-world clinical neurophysiology dataset, illustrating the ability to identify interpretable and coherent structure where previous models cannot. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.14821 [pdf, other]

Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems

Authors: Yuan Chen, Dongbin Xiu

Abstract: We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effecti… ▽ More We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effective dynamics of the slow variables in distribution. We present a comprehensive set of numerical examples to demonstrate the performance of the proposed method. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.15747

MSC Class: 60H10; 60H35; 62M45; 65C30

arXiv:2408.13115 [pdf, ps, other]

Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

Authors: Yifan Chen, Xiaoou Cheng, Jonathan Niles-Weed, Jonathan Weare

Abstract: The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or… ▽ More The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or $\sqrt{d}$. In this paper, we argue that, despite this poor scaling of the $W_2$ error for the full set of variables, the behavior for a small number of variables can be significantly better: a number of iterations proportional to $K$, up to logarithmic terms in $d$, often suffices for the algorithm to converge to within a desired $W_2$ error for all $K$-marginals. We refer to this effect as delocalization of bias. We show that the delocalization effect does not hold universally and prove its validity for Gaussian distributions and strongly log-concave distributions with certain sparse interactions. Our analysis relies on a novel $W_{2,\ell^\infty}$ metric to measure convergence. A key technical challenge we address is the lack of a one-step contraction property in this metric. Finally, we use asymptotic arguments to explore potential generalizations of the delocalization effect beyond the Gaussian and sparse interactions setting. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.12063 [pdf, other]

A Deconfounding Approach to Climate Model Bias Correction

Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

Abstract: Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec… ▽ More Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11003 [pdf, other]

DEEPEAST technique to enhance power in two-sample tests via the same-attraction function

Authors: Yiting Chen, Min Gao, Wei Lin, Andrew Jirasek, Kirsty Milligan, Xiaoping Shi

Abstract: Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-s… ▽ More Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-sample homogeneity tests, which assess mean and/or scale changes in distributions often suffer from low statistical power or indeterminate asymptotic distributions. To overcome these challenges, we introduced a DEEPEAST (depth-explored same-attraction sample-to-sample central-outward ranking) technique for improving statistical power in two-sample tests via the same-attraction function. We proposed two novel and powerful depth-based test statistics: the sum test statistic and the product test statistic, which are rooted in Q statistics, share a "common attractor" and are applicable across all depth functions. We further proved the asymptotic distribution of these statistics for various depth functions. To assess the performance of power gain, we apply three depth functions: Mahalanobis depth (Liu and Singh, 1993), Spatial depth (Brown, 1958; Gower, 1974), and Projection depth (Liu, 1992). Through two-sample simulations, we have demonstrated that our sum and product statistics exhibit superior power performance, utilizing a strategic block permutation algorithm and compare favourably with popular methods in literature. Our tests are further validated through analysis on Raman spectral data, acquired from cellular and tissue samples, highlighting the effectiveness of the proposed tests highlighting the effective discrimination between health and cancerous samples. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.09377 [pdf, other]

Mutual Information Multinomial Estimation

Authors: Yanzhi Chen, Zijing Ou, Adrian Weller, Yingzhen Li

Abstract: Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this br… ▽ More Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this bridge distribution we can easily obtain the true difference between the joint distributions and the marginal distributions. Experiments on diverse tasks including non-Gaussian synthetic problems with known ground-truth and real-world applications demonstrate the advantages of our method. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.07796 [pdf]

Ranking and Combining Latent Structured Predictive Scores without Labeled Data

Authors: Shiva Afshar, Yinghan Chen, Shizhong Han, Ying Lin

Abstract: Combining multiple predictors obtained from distributed data sources to an accurate meta-learner is promising to achieve enhanced performance in lots of prediction problems. As the accuracy of each predictor is usually unknown, integrating the predictors to achieve better performance is challenging. Conventional ensemble learning methods assess the accuracy of predictors based on extensive labeled… ▽ More Combining multiple predictors obtained from distributed data sources to an accurate meta-learner is promising to achieve enhanced performance in lots of prediction problems. As the accuracy of each predictor is usually unknown, integrating the predictors to achieve better performance is challenging. Conventional ensemble learning methods assess the accuracy of predictors based on extensive labeled data. In practical applications, however, the acquisition of such labeled data can prove to be an arduous task. Furthermore, the predictors under consideration may exhibit high degrees of correlation, particularly when similar data sources or machine learning algorithms were employed during their model training. In response to these challenges, this paper introduces a novel structured unsupervised ensemble learning model (SUEL) to exploit the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights. Two novel correlation-based decomposition algorithms are further proposed to estimate the SUEL model, constrained quadratic optimization (SUEL.CQO) and matrix-factorization-based (SUEL.MF) approaches. The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery. The results compellingly demonstrate that the proposed methods can efficiently integrate the dependent predictors to an ensemble model without the need of ground truth data. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.07193 [pdf, other]

A comparison of methods for estimating the average treatment effect on the treated for externally controlled trials

Authors: Huan Wang, Fei Wu, Yeh-Fong Chen

Abstract: While randomized trials may be the gold standard for evaluating the effectiveness of the treatment intervention, in some special circumstances, single-arm clinical trials utilizing external control may be considered. The causal treatment effect of interest for single-arm studies is usually the average treatment effect on the treated (ATT) rather than the average treatment effect (ATE). Although me… ▽ More While randomized trials may be the gold standard for evaluating the effectiveness of the treatment intervention, in some special circumstances, single-arm clinical trials utilizing external control may be considered. The causal treatment effect of interest for single-arm studies is usually the average treatment effect on the treated (ATT) rather than the average treatment effect (ATE). Although methods have been developed to estimate the ATT, the selection and use of these methods require a thorough comparison and in-depth understanding of the advantages and disadvantages of these methods. In this study, we conducted simulations under different identifiability assumptions to compare the performance metrics (e.g., bias, standard deviation (SD), mean squared error (MSE), type I error rate) for a variety of methods, including the regression model, propensity score matching, Mahalanobis distance matching, coarsened exact matching, inverse probability weighting, augmented inverse probability weighting (AIPW), AIPW with SuperLearner, and targeted maximum likelihood estimator (TMLE) with SuperLearner. Our simulation results demonstrate that the doubly robust methods in general have smaller biases than other methods. In terms of SD, nonmatching methods in general have smaller SDs than matching-based methods. The performance of MSE is a trade-off between the bias and SD, and no method consistently performs better in term of MSE. The identifiability assumptions are critical to the models' performance: violation of the positivity assumption can lead to a significant inflation of type I errors in some methods; violation of the unconfoundedness assumption can lead to a large bias for all methods... (Further details are available in the main body of the paper). △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 24 pages, 13 figures

arXiv:2408.04739 [pdf, other]

Accurate deep learning-based filtering for chaotic dynamics by identifying instabilities without an ensemble

Authors: Marc Bocquet, Alban Farchi, Tobias S. Finn, Charlotte Durand, Sibo Cheng, Yumeng Chen, Ivo Pasmans, Alberto Carrassi

Abstract: We investigate the ability to discover data assimilation (DA) schemes meant for chaotic dynamics with deep learning. The focus is on learning the analysis step of sequential DA, from state trajectories and their observations, using a simple residual convolutional neural network, while assuming the dynamics to be known. Experiments are performed with the Lorenz 96 dynamics, which display spatiotemp… ▽ More We investigate the ability to discover data assimilation (DA) schemes meant for chaotic dynamics with deep learning. The focus is on learning the analysis step of sequential DA, from state trajectories and their observations, using a simple residual convolutional neural network, while assuming the dynamics to be known. Experiments are performed with the Lorenz 96 dynamics, which display spatiotemporal chaos and for which solid benchmarks for DA performance exist. The accuracy of the states obtained from the learned analysis approaches that of the best possibly tuned ensemble Kalman filter, and is far better than that of variational DA alternatives. Critically, this can be achieved while propagating even just a single state in the forecast step. We investigate the reason for achieving ensemble filtering accuracy without an ensemble. We diagnose that the analysis scheme actually identifies key dynamical perturbations, mildly aligned with the unstable subspace, from the forecast state alone, without any ensemble-based covariances representation. This reveals that the analysis scheme has learned some multiplicative ergodic theorem associated to the DA process seen as a non-autonomous random dynamical system. △ Less

Submitted 9 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.04154 [pdf, other]

The Data Addition Dilemma

Authors: Judy Hanwen Shen, Inioluwa Deborah Raji, Irene Y. Chen

Abstract: In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the \textit{Data Addition Dilemma}, demonstrating that adding training data in this multi-source s… ▽ More In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the \textit{Data Addition Dilemma}, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance. We find that this possibly arises from an empirically observed trade-off between model performance improvements due to data scaling and model deterioration from distribution shift. We thus establish baseline strategies for navigating this dilemma, introducing distribution shift heuristics to guide decision-making on which data sources to add in data scaling, in order to yield the expected model performance improvements. We conclude with a discussion of the required considerations for data collection and suggestions for studying data composition and scale in the age of increasingly larger models. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: Machine Learning For Health Care 2024 (MLHC)

arXiv:2408.02320 [pdf, ps, other]

A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen

Abstract: Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functio… ▽ More Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functions. For distributions in $\mathbb{R}^d$, we prove that $d/\varepsilon$ iterations -- modulo some logarithmic and lower-order terms -- are sufficient to approximate the target distribution to within $\varepsilon$ total-variation distance. This is the first result establishing nearly linear dimension-dependency (in $d$) for the probability flow ODE sampler. Imposing only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), our results also characterize how $\ell_2$ score estimation errors affect the quality of the data generation processes. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach without the need of resorting to SDE and ODE toolboxes. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: This manuscript presents improved theory for probability flow ODEs compared to its earlier version arXiv:2306.09251

arXiv:2408.02279 [pdf, other]

doi 10.1145/3627673.3679724

DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

Authors: Ruixin Ding, Yuqi Chen, Yu-Ting Lan, Wei Zhang

Abstract: Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse… ▽ More Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse characteristics across various scales. Moreover, time series data exhibit diverse variations and fluctuations across different temporal scales, which traditional approaches struggle to model effectively. In this paper, we propose a dynamic tokenizer with a dynamic sparse learning algorithm to capture diverse receptive fields and sparse patterns of time series data. In order to build hierarchical receptive fields, we develop a multi-scale Transformer model, coupled with multi-scale sequence extraction, capable of capturing multi-resolution features. Additionally, we introduce a group-aware rotary position encoding technique to enhance intra- and inter-group position awareness among representations across different temporal scales. Our proposed model, named DRFormer, is evaluated on various real-world datasets, and experimental results demonstrate its superiority compared to existing methods. Our code is available at: https://github.com/ruixindingECNU/DRFormer. △ Less

Submitted 5 August, 2024; originally announced August 2024.

ACM Class: I.2.6

arXiv:2408.00139 [pdf, other]

Multiway Alignment of Political Attitudes

Authors: Letizia Iannucci, Ali Faqeeh, Ali Salloum, Ted Hsuan Yun Chen, Mikko Kivelä

Abstract: The related concepts of partisan belief systems, issue alignment, and partisan sorting are central to our understanding of politics. These phenomena have been studied using measures of alignment between pairs of topics, or how much individuals' attitudes toward a topic reveal about their attitudes toward another topic. We introduce a higher-order measure that extends the assessment of alignment be… ▽ More The related concepts of partisan belief systems, issue alignment, and partisan sorting are central to our understanding of politics. These phenomena have been studied using measures of alignment between pairs of topics, or how much individuals' attitudes toward a topic reveal about their attitudes toward another topic. We introduce a higher-order measure that extends the assessment of alignment beyond pairs of topics by quantifying the amount of information individuals' opinions on one topic reveal about a set of topics simultaneously. Our multiway alignment measure indicates how much individuals' opinions on multiple topics align into a single ideological divide. Applying this approach to legislative voting behavior reveals that parliamentary systems typically exhibit similar multiway alignment characteristics, but can change in response to shifting intergroup dynamics. In American National Election Studies surveys, our approach reveals a growing significance of party identification together with a consistent rise in multiway alignment over time. Similarly, the growing multiway alignment among topical issues in Finnish online discussions suggests a trend towards a more ideologically driven political landscape. Our case studies demonstrate that the multiway alignment measure is a versatile tool for understanding societal polarization and partisan belief systems across diverse domains. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2407.16936 [pdf, ps, other]

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Authors: Wei Guo, Molei Tao, Yongxin Chen

Abstract: We address the outstanding problem of sampling from an unnormalized density that may be non-log-concave and multimodal. To enhance the performance of simple Markov chain Monte Carlo (MCMC) methods, techniques of annealing type have been widely used. However, quantitative theoretical guarantees of these techniques are under-explored. This study takes a first step toward providing a non-asymptotic a… ▽ More We address the outstanding problem of sampling from an unnormalized density that may be non-log-concave and multimodal. To enhance the performance of simple Markov chain Monte Carlo (MCMC) methods, techniques of annealing type have been widely used. However, quantitative theoretical guarantees of these techniques are under-explored. This study takes a first step toward providing a non-asymptotic analysis of annealed MCMC. Specifically, we establish, for the first time, an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\cal A}^2}{\varepsilon^6}\right)$ for simple annealed Langevin Monte Carlo algorithm to achieve $\varepsilon^2$ accuracy in Kullback-Leibler divergence to the target distribution $π\propto{\rm e}^{-V}$ on $\mathbb{R}^d$ with $β$-smooth potential $V$. Here, ${\cal A}$ represents the action of a curve of probability measures interpolating the target distribution $π$ and a readily sampleable distribution. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.12132 [pdf, other]

Maximum-likelihood regression with systematic errors for astronomy and the physical sciences: I. Methodology and goodness-of-fit statistic of Poisson data

Authors: Max Bonamente, Yang Chen, Dale Zimmerman

Abstract: The paper presents a new statistical method that enables the use of systematic errors in the maximum-likelihood regression of integer-count Poisson data to a parametric model. The method is primarily aimed at the characterization of the goodness-of-fit statistic in the presence of the over-dispersion that is induced by sources of systematic error, and is based on a quasi-maximum-likelihood method… ▽ More The paper presents a new statistical method that enables the use of systematic errors in the maximum-likelihood regression of integer-count Poisson data to a parametric model. The method is primarily aimed at the characterization of the goodness-of-fit statistic in the presence of the over-dispersion that is induced by sources of systematic error, and is based on a quasi-maximum-likelihood method that retains the Poisson distribution of the data. We show that the Poisson deviance, which is the usual goodness-of-fit statistic and that is commonly referred to in astronomy as the Cash statistics, can be easily generalized in the presence of systematic errors, under rather general conditions. The method and the associated statistics are first developed theoretically, and then they are tested with the aid of numerical simulations and further illustrated with real-life data from astronomical observations. The statistical methods presented in this paper are intended as a simple general-purpose framework to include additional sources of uncertainty for the analysis of integer-count data in a variety of practical data analysis situations. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: ApJ accepted

arXiv:2407.11887 [pdf, other]

On the optimal prediction of extreme events in heavy-tailed time series with applications to solar flare forecasting

Authors: Victor Verma, Stilian Stoev, Yang Chen

Abstract: The prediction of extreme events in time series is a fundamental problem arising in many financial, scientific, engineering, and other applications. We begin by establishing a general Neyman-Pearson-type characterization of optimal extreme event predictors in terms of density ratios. This yields new insights and several closed-form optimal extreme event predictors for additive models. These result… ▽ More The prediction of extreme events in time series is a fundamental problem arising in many financial, scientific, engineering, and other applications. We begin by establishing a general Neyman-Pearson-type characterization of optimal extreme event predictors in terms of density ratios. This yields new insights and several closed-form optimal extreme event predictors for additive models. These results naturally extend to time series, where we study optimal extreme event prediction for heavy-tailed autoregressive and moving average models. Using a uniform law of large numbers for ergodic time series, we establish the asymptotic optimality of an empirical version of the optimal predictor for autoregressive models. Using multivariate regular variation, we also obtain expressions for the optimal extremal precision in heavy-tailed infinite moving averages, which provide theoretical bounds on the ability to predict extremes in this general class of models. The developed theory and methodology is applied to the important problem of solar flare prediction based on the state-of-the-art GOES satellite flux measurements of the Sun. Our results demonstrate the success and limitations of long-memory autoregressive as well as long-range dependent heavy-tailed FARIMA models for the prediction of extreme solar flares. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 57 pages, 5 figures

MSC Class: 62G32 (Primary) 62G20; 62M10; 62M20 (Secondary)

arXiv:2407.04970 [pdf, other]

Idiographic Personality Gaussian Process for Psychological Assessment

Authors: Yehu Chen, Muchen Xi, Jacob Montgomery, Joshua Jackson, Roman Garnett

Abstract: We develop a novel measurement framework based on a Gaussian process coregionalization model to address a long-lasting debate in psychometrics: whether psychological features like personality share a common structure across the population, vary uniquely for individuals, or some combination. We propose the idiographic personality Gaussian process (IPGP) framework, an intermediate model that accommo… ▽ More We develop a novel measurement framework based on a Gaussian process coregionalization model to address a long-lasting debate in psychometrics: whether psychological features like personality share a common structure across the population, vary uniquely for individuals, or some combination. We propose the idiographic personality Gaussian process (IPGP) framework, an intermediate model that accommodates both shared trait structure across a population and "idiographic" deviations for individuals. IPGP leverages the Gaussian process coregionalization model to handle the grouped nature of battery responses, but adjusted to non-Gaussian ordinal data. We further exploit stochastic variational inference for efficient latent factor estimation required for idiographic modeling at scale. Using synthetic and real data, we show that IPGP improves both prediction of actual responses and estimation of individualized factor structures relative to existing benchmarks. In a third study, we show that IPGP also identifies unique clusters of personality taxonomies in real-world data, displaying great potential in advancing individualized approaches to psychological diagnosis and treatment. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2406.17827 [pdf, other]

Practical identifiability and parameter estimation of compartmental epidemiological models

Authors: Q. Y. Chen, Z. Rapti, Y. Drossinos, J. Cuevas-Maraver, G. A. Kevrekidis, P. G. Kevrekidis

Abstract: Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular… ▽ More Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular emphasis is placed on the role of initial conditions, which are assumed unknown, except those that are directly measured. Instead of just focusing on one method of estimation, we use and compare results from various broadly used methods, including maximum likelihood and Markov Chain Monte Carlo (MCMC) estimation. Among other findings, our analysis revealed that the MCMC estimator is overall more robust than the point estimators considered. Its estimates and predictions are improved when the initial conditions of certain compartments are fixed so that the model becomes globally identifiable. For the point estimators, whether fixing or fitting the that are not directly measured improves parameter estimates is model-dependent. Specifically, in the standard SEIR model, fixing the initial condition for the susceptible population S(0) improved parameter estimates, while this was not true when fixing the initial condition of the asymptomatic population in a more involved model. Our study corroborates the change in quality of parameter estimates upon usage of pre-peak or post-peak time-series under consideration. Finally, our examples suggest that in the presence of significantly noisy data, the value of structural identifiability is moot. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.12525 [pdf, other]

Anatomy of Elite and Mass Polarization in Social Networks

Authors: Ali Salloum, Ted Hsuan Yun Chen, Mikko Kivelä

Abstract: Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon.… ▽ More Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon. Notably, opposing groups can have unequal impact on polarization, and the elites are often understood to be more divided than the masses, making it critical to differentiate their roles in polarized systems. We propose a method to characterize these distinct hierarchies in polarized networks, enabling separate polarization measurements for these groups within a single social system. Applied to polarized topics in the Finnish Twittersphere surrounding the 2019 and 2023 parliamentary elections, our analysis reveals valuable insights: 1) The impact of opposing groups on observed polarization is rarely balanced, and 2) while the elite strongly contributes to structural polarization and consistently display greater alignment across various topics, the masses have also recently experienced a surge in issue alignment, a special form of polarization. Our findings suggest that the masses may not be as immune to an increasingly polarized environment as previously thought. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.09311 [pdf, other]

Learning High-dimensional Latent Variable Models via Doubly Stochastic Optimisation by Unadjusted Langevin

Authors: Motonori Oka, Yunxiao Chen, Irini Moustaki

Abstract: Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integ… ▽ More Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integrals involved. To address this challenge, stochastic optimisation, which combines stochastic approximation and sampling techniques, has been shown to be effective. This method iterates between two steps -- (1) sampling the latent variables from their posterior distribution based on the current parameter estimate, and (2) updating the fixed parameters using an approximate stochastic gradient constructed from the latent variable samples. In this paper, we propose a computationally more efficient stochastic optimisation algorithm. This improvement is achieved through the use of a minibatch of observations when sampling latent variables and constructing stochastic gradients, and an unadjusted Langevin sampler that utilises the gradient of the negative complete-data log-likelihood to sample latent variables. Theoretical results are established for the proposed algorithm, showing that the iterative parameter update converges to the marginal maximum likelihood estimate as the number of iterations goes to infinity. Furthermore, the proposed algorithm is shown to scale well to high-dimensional settings through simulation studies and a personality test application with 30,000 respondents, 300 items, and 30 latent dimensions. △ Less

Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08748 [pdf, other]

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

Authors: Qinghua Tao, Francesco Tonin, Alex Lambert, Yingyi Chen, Panagiotis Patrinos, Johan A. K. Suykens

Abstract: In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variati… ▽ More In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variational objective can be unbounded, and needs further numerical evaluation and exploration towards machine learning. In this work, i) we introduce a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) through covariance operators, allowing infinite-dimensional feature maps. The solution to CCE is ultimately obtained from the SVD of the induced asymmetric kernel matrix, providing links to KSVD. ii) Starting from the integral equations corresponding to a pair of coupled adjoint eigenfunctions, we formalize the asymmetric Nyström method through a finite sample approximation to speed up training. iii) We provide the first empirical evaluations verifying the practical utility and benefits of KSVD and compare with methods resorting to symmetrization or linear SVD across multiple tasks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 19 pages, 9 tables, 6 figures

Journal ref: the 41st International Conference on Machine Learning (ICML), 2024

arXiv:2406.07955 [pdf, other]

How Interpretable Are Interpretable Graph Neural Networks?

Authors: Yongqiang Chen, Yatao Bian, Bo Han, James Cheng

Abstract: Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explo… ▽ More Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explored. In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (SubMT). Extracting the desired interpretable subgraph requires an accurate approximation of SubMT, yet we find that the existing XGNNs can have a huge gap in fitting SubMT. Consequently, the SubMT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: ICML2024, 44 pages, 21 figures, 12 tables

arXiv:2406.07651 [pdf, ps, other]

surveygenmod2: A SAS macro for estimating complex survey adjusted generalized linear models and Wald-type tests

Authors: R. Noah Padgett, Ying Chen

Abstract: surveygenmod2 builds on the macro written by da Silva (2017) for generalized linear models under complex survey designs. The updated macro fixed several minor bugs we encountered while updating the macro for use in SAS\textregistered. We added additional features for conducting basic Wald-type tests on groups of parameters based on the estimated regression coefficients and parameter variance-covar… ▽ More surveygenmod2 builds on the macro written by da Silva (2017) for generalized linear models under complex survey designs. The updated macro fixed several minor bugs we encountered while updating the macro for use in SAS\textregistered. We added additional features for conducting basic Wald-type tests on groups of parameters based on the estimated regression coefficients and parameter variance-covariance matrix. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04743 [pdf, other]

When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain

Authors: Lei Xu, Yulong Chen, Yuntian Chen, Longfeng Nie, Xuetao Wei, Liang Xue, Dongxiao Zhang

Abstract: Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the… ▽ More Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the centralized server with a blockchain-based distributed network to address the security and privacy issues inherent in Federated Learning (FL)'s centralized architecture. Within this distributed Collaborative Learning framework, each participating organization governs nodes for inter-organizational communication. Devices from various organizations utilize smart contracts for parameter uploading and retrieval. Consensus mechanism ensures distributed consistency throughout the learning process, guarantees the transparent trustworthiness and immutability of parameters on-chain. The efficacy of the proposed framework is substantiated across three real-world energy series modeling scenarios with superior performance compared to Local Learning approaches, simultaneously emphasizing enhanced data security and privacy over Centralized Learning and FL method. Notably, as the number of data volume and the count of local epochs increases within a threshold, there is an improvement in model performance accompanied by a reduction in the variance of performance errors. Consequently, this leads to an increased stability and reliability in the outcomes produced by the model. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04575 [pdf, other]

Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning

Authors: Zhongzheng Wang, Yuntian Chen, Guodong Chen, Dongxiao Zhang

Abstract: Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation… ▽ More Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03849 [pdf]

A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole resistivity logging encounters challenges of high-frequency disaster (the problem of inadequate learning by neural networks in high-frequency features) and noise interference, badly affecting accuracy. To address these challenges, frequency-aware framework and temporal anti-noise block are proposed to build frequency aware LSTM (FAL). The frequency-aware framework implements a dual-stream structure through wavelet transformation, allowing the neural network to simultaneously handle high-frequency and low-frequency flows of time-series data, thus avoiding high-frequency disaster. The temporal anti-noise block integrates multiple attention mechanisms and soft-threshold attention mechanisms, enabling the model to better distinguish noise from redundant features. Ablation experiments demonstrate that the frequency-aware framework and temporal anti-noise block contribute significantly to performance improvement. FAL achieves a 24.3% improvement in R2 over LSTM, reaching the highest value of 0.91 among all models. In robustness experiments, the impact of noise on FAL is approximately 1/8 of the baseline, confirming the noise resistance of FAL. The proposed FAL effectively reduces noise interference in predicting formation resistivity from cased transient electromagnetic well logging curves, better learns high-frequency features, and thereby enhances the prediction accuracy and noise resistance of the neural network model. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03808 [pdf]

Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting

Authors: Jiaxin Gao, Qinglong Cao, Yuntian Chen, Dongxiao Zhang

Abstract: Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cro… ▽ More Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting) to address these challenges and enhance PV power forecasting accuracy. PV-Client employs an ENhanced Transformer module to capture complex interactions of various features in PV systems, and utilizes a linear module to learn trend information in PV power. Diverging from conventional time series-based Transformer models that use cross-time Attention to learn dependencies between different time steps, the Enhanced Transformer module integrates cross-variable Attention to capture dependencies between PV power and weather factors. Furthermore, PV-Client streamlines the embedding and position encoding layers by replacing the Decoder module with a projection layer. Experimental results on three real-world PV power datasets affirm PV-Client's state-of-the-art (SOTA) performance in PV power forecasting. Specifically, PV-Client surpasses the second-best model GRU by 5.3% in MSE metrics and 0.9% in accuracy metrics at the Jingang Station. Similarly, PV-Client outperforms the second-best model SVR by 10.1% in MSE metrics and 0.2% in accuracy metrics at the Xinqingnian Station, and PV-Client exhibits superior performance compared to the second-best model SVR with enhancements of 3.4% in MSE metrics and 0.9% in accuracy metrics at the Hongxing Station. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03171 [pdf, other]

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Authors: Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

Abstract: This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of… ▽ More This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.00695 [pdf, other]

Discovering an interpretable mathematical expression for a full wind-turbine wake with artificial intelligence enhanced symbolic regression

Authors: Ding Wang, Yuntian Chen, Shiyi Chen

Abstract: The rapid expansion of wind power worldwide underscores the critical significance of engineering-focused analytical wake models in both the design and operation of wind farms. These theoretically-derived ana lytical wake models have limited predictive capabilities, particularly in the near-wake region close to the turbine rotor, due to assumptions that do not hold. Knowledge discovery methods can… ▽ More The rapid expansion of wind power worldwide underscores the critical significance of engineering-focused analytical wake models in both the design and operation of wind farms. These theoretically-derived ana lytical wake models have limited predictive capabilities, particularly in the near-wake region close to the turbine rotor, due to assumptions that do not hold. Knowledge discovery methods can bridge these gaps by extracting insights, adjusting for theoretical assumptions, and developing accurate models for physical processes. In this study, we introduce a genetic symbolic regression (SR) algorithm to discover an interpretable mathematical expression for the mean velocity deficit throughout the wake, a previously unavailable insight. By incorporating a double Gaussian distribution into the SR algorithm as domain knowledge and designing a hierarchical equation structure, the search space is reduced, thus efficiently finding a concise, physically informed, and robust wake model. The proposed mathematical expression (equation) can predict the wake velocity deficit at any location in the full-wake region with high precision and stability. The model's effectiveness and practicality are validated through experimental data and high-fidelity numerical simulations. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00322 [pdf, other]

Adaptive Penalized Likelihood method for Markov Chains

Authors: Yining Zhou, Ming Gao, Yiting Chen, Xiaoping Shi

Abstract: Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov… ▽ More Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov chains. In this study, we extended the adaptive Lasso technique from linear models to Markov chains and proposed a novel model by applying penalized maximum likelihood estimation to optimize the estimation of the transition probability matrix. Meanwhile, we demonstrated that the new model enjoys oracle properties, which means the estimated transition probability matrix has the same performance as the real one when given. Simulations show that our new method behave very well overall in comparison with various competitors. Real data analysis further convince the value of our proposed method. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.19803 [pdf, other]

Dynamic Factor Analysis of High-dimensional Recurrent Events

Authors: Fangyi Chen, Yunxiao Chen, Zhiliang Ying, Kangjie Zhou

Abstract: Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving large numbers of event types and observations become prevalent with the advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction and prediction of high-dimensiona… ▽ More Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving large numbers of event types and observations become prevalent with the advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction and prediction of high-dimensional recurrent event data. The proposed model imposes a low-dimensional structure on the mean intensity functions of the event types while allowing for dependencies. A nearly rate-optimal smoothing-based estimator is proposed. An information criterion that consistently selects the number of factors is also developed. Simulation studies demonstrate the effectiveness of these inference tools. The proposed method is applied to grocery shopping data, for which an interpretable factor structure is obtained. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19637 [pdf, other]

Inference in semiparametric formation models for directed networks

Authors: Lianqiang Qu, Lu Chen, Ting Yan, Yuguo Chen

Abstract: We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown di… ▽ More We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown distributions. Our interest lies in inferring the unknown degree parameters and homophily parameters. The dimension of the degree parameters increases with the number of nodes. Under the high-dimensional regime, we develop a kernel-based least squares approach to estimate the unknown parameters. The major advantage of our estimator is that it does not encounter the incidental parameter problem for the homophily parameters. We prove consistency of all the resulting estimators of the degree parameters and homophily parameters. We establish high-dimensional central limit theorems for the proposed estimators and provide several applications of our general theory, including testing the existence of degree heterogeneity, testing sparse signals and recovering the support. Simulation studies and a real data application are conducted to illustrate the finite sample performance of the proposed methods. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 28 pages, 3 figures

arXiv:2405.19559 [pdf, ps, other]

Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm

Authors: Mohamed Seif, Yanxi Chen

Abstract: In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined… ▽ More In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined conditions. Compared to those derived in \cite{mitra2008clustering}, our improved separation conditions are obtained. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18782 [pdf, other]

Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

Authors: Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman

Abstract: Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior de… ▽ More Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17862 [pdf, other]

Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property

Authors: Yu Chen, Edoardo Patelli, Zhen Yang, Adolphus Lye

Abstract: Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approac… ▽ More Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approach that is both uncertainty- and prior knowledge-informed, aiming at trustful predictions of material properties for the nuclear reactor design. It is suited for robust learning under limited data. Uncertainty has been accounted for where a distribution of predictor functions are produced for extrapolation. Results suggest it achieves superior performance than existing empirical methods in rupture life prediction, a case which is typically under a small data regime. While demonstrated herein with rupture properties, this learning approach is transferable to solve similar problems of data scarcity across the nuclear industry. It is of great importance to boosting the AI analytics in the nuclear industry by proving the applicability and robustness while providing tools that can be trusted. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 8 pages, submitted to REC 2024 (International Workshop on Reliable Engineering Computing)

arXiv:2405.17401 [pdf, other]

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2405.16732 [pdf, ps, other]

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Authors: Dongyan Huo, Yixuan Zhang, Yudong Chen, Qiaomin Xie

Abstract: In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two stru… ▽ More In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $θ_k$ and Markovian data $x_k$. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process $(x_k, θ_k)_{k\geq0}$. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by $\mathbb{E}[θ_\infty]-θ^\ast=α(b_\text{m}+b_\text{n}+b_\text{c})+O(α^{3/2})$. Here, $b_\text{m}$ is associated with the Markovian noise, $b_\text{n}$ is tied to the nonlinearity, and notably, $b_\text{c}$ represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment $\mathbb{E}[\|θ_k-θ^\ast\|^{2p}]$ and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15053 [pdf, other]

A Latent Variable Approach to Learning High-dimensional Multivariate longitudinal Data

Authors: Sze Ming Lee, Yunxiao Chen, Tony Sit

Abstract: High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduce… ▽ More High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shopping records to predict and understand shopping behavior. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13535 [pdf, other]

Generalized Laplace Approximation

Authors: Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim

Abstract: In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework… ▽ More In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors. We interpret the generalization of the posterior with a temperature factor as a correction for misspecified models through adjustments to the joint probability model, and the recalibration of priors by redistributing probability mass on models within the hypothesis space using data samples. Additionally, we highlight a distinctive feature of Laplace approximation, which ensures that the generalized normalizing constant can be treated as invariant, unlike the typical scenario in general Bayesian learning where this constant varies with model parameters post-generalization. Building on this insight, we propose the generalized Laplace approximation, which involves a simple adjustment to the computation of the Hessian matrix of the regularized loss function. This method offers a flexible and scalable framework for obtaining high-quality posterior distributions. We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets. △ Less

Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13149 [pdf, other]

Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the condition… ▽ More The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $ξ\mid F\circ φ(ξ)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12343 [pdf, other]

Determine the Number of States in Hidden Markov Models via Marginal Likelihood

Authors: Yang Chen, Cheng-Der Fuh, Chu-Lan Michael Kao

Abstract: Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covar… ▽ More Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method. △ Less

Submitted 17 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.12331 [pdf, other]

Solar Imaging Data Analytics: A Selective Overview of Challenges and Opportunities

Authors: Yang Chen, Ward Manchester, Meng Jin, Alexei Pevtsov

Abstract: We give a gentle introduction to solar imaging data, focusing on the challenges and opportunities of data-driven approaches for solar eruptions. The various solar phenomenon prediction problems that might benefit from statistical methods are presented. Available data products and software are described. State-of-the-art solar eruption forecasting models with data-driven approaches are summarized a… ▽ More We give a gentle introduction to solar imaging data, focusing on the challenges and opportunities of data-driven approaches for solar eruptions. The various solar phenomenon prediction problems that might benefit from statistical methods are presented. Available data products and software are described. State-of-the-art solar eruption forecasting models with data-driven approaches are summarized and discussed. Based on the characteristics of the datasets and state-of-the-art approaches, we point out several promising directions to explore from statistical modeling and computational perspectives. △ Less

Submitted 2 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Showing 1–50 of 910 results for author: Chen, Y