Zum Hauptinhalt springen

Showing 1–50 of 910 results for author: Chen, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.07617  [pdf, other

    stat.ME

    Determining number of factors under stability considerations

    Authors: Sze Ming Lee, Yunxiao Chen

    Abstract: This paper proposes a novel method for determining the number of factors in linear factor models under stability considerations. An instability measure is proposed based on the principal angle between the estimated loading spaces obtained by data splitting. Based on this measure, criteria for determining the number of factors are proposed and shown to be consistent. This consistency is obtained us… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 23 pages, 3 figures

  2. arXiv:2409.07392  [pdf, other

    cs.LG stat.ML

    A Scalable Algorithm for Active Learning

    Authors: Youguang Chen, Zheyu Wen, George Biros

    Abstract: FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: To be appeared at SC'24. Link: https://sc24.conference-program.com/presentation/?id=pap624&sess=sess397

  3. arXiv:2409.06490  [pdf, other

    cs.CV stat.AP

    UAVDB: Trajectory-Guided Adaptable Bounding Boxes for UAV Detection

    Authors: Yu-Hsi Chen

    Abstract: With the rapid development of drone technology, accurate detection of Unmanned Aerial Vehicles (UAVs) has become essential for applications such as surveillance, security, and airspace management. In this paper, we propose a novel trajectory-guided method, the Patch Intensity Convergence (PIC) technique, which generates high-fidelity bounding boxes for UAV detection tasks and no need for the effor… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 7 pages, 5 figures, 3 tables

  4. arXiv:2409.03980  [pdf, other

    stat.ML cs.LG

    Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows

    Authors: Yudong Chen, Xumei Xi, Christina Lee Yu

    Abstract: Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we i… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.01410  [pdf, other

    cs.LG stat.CO

    Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

    Authors: Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian, Anthony Quinn, Fadwa Idlahcen, Yiran Chen

    Abstract: Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather t… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.01194  [pdf, other

    stat.AP

    Tonal coarticulation revisited: functional covariance analysis to investigate the planning of co-articulated tones by Standard Chinese speakers

    Authors: Valentina Masarotto, Yiya Chen

    Abstract: We aim to explain whether a stress memory task has a significant impact on tonal coarticulation. We contribute a novel approach to analyse tonal coarticulation in phonetics, where several f0 contours are compared with respect to their vibrations at higher resolution, something that in statistical terms is called variation of the second order. We identify speech recording frequency curves as functi… ▽ More

    Submitted 9 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2409.00843  [pdf, other

    econ.GN cs.CE cs.CY q-fin.CP stat.ML

    Global Public Sentiment on Decentralized Finance: A Spatiotemporal Analysis of Geo-tagged Tweets from 150 Countries

    Authors: Yuqi Chen, Yifan Li, Kyrie Zhixuan Zhou, Xiaokang Fu, Lingbo Liu, Shuming Bao, Daniel Sui, Luyao Zhang

    Abstract: In the digital era, blockchain technology, cryptocurrencies, and non-fungible tokens (NFTs) have transformed financial and decentralized systems. However, existing research often neglects the spatiotemporal variations in public sentiment toward these technologies, limiting macro-level insights into their global impact. This study leverages Twitter data to explore public attention and sentiment acr… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  8. arXiv:2409.00679  [pdf, other

    stat.ME math.ST

    Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach

    Authors: Jiawei Qiao, Yunxiao Chen, Zhiliang Ying

    Abstract: Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires the specification of an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case an exploratory form of bi-factor analysis i… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  9. arXiv:2408.16862  [pdf, other

    stat.ML cs.LG

    Probabilistic Decomposed Linear Dynamical Systems for Robust Discovery of Latent Neural Dynamics

    Authors: Yenho Chen, Noga Mudrik, Kyle A. Johnsen, Sankaraleengam Alagapan, Adam S. Charles, Christopher J. Rozell

    Abstract: Time-varying linear state-space models are powerful tools for obtaining mathematically interpretable representations of neural signals. For example, switching and decomposed models describe complex systems using latent variables that evolve according to simple locally linear dynamics. However, existing methods for latent variable estimation are not robust to dynamical noise and system nonlinearity… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  10. arXiv:2408.14821  [pdf, other

    cs.LG math.NA stat.ML

    Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems

    Authors: Yuan Chen, Dongbin Xiu

    Abstract: We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effecti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.15747

    MSC Class: 60H10; 60H35; 62M45; 65C30

  11. arXiv:2408.13115  [pdf, ps, other

    stat.ML cs.LG math.PR stat.CO

    Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

    Authors: Yifan Chen, Xiaoou Cheng, Jonathan Niles-Weed, Jonathan Weare

    Abstract: The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  12. arXiv:2408.12063  [pdf, other

    stat.ML cs.AI cs.LG physics.ao-ph

    A Deconfounding Approach to Climate Model Bias Correction

    Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

    Abstract: Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  13. arXiv:2408.11003  [pdf, other

    stat.ME

    DEEPEAST technique to enhance power in two-sample tests via the same-attraction function

    Authors: Yiting Chen, Min Gao, Wei Lin, Andrew Jirasek, Kirsty Milligan, Xiaoping Shi

    Abstract: Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.09377  [pdf, other

    cs.LG cs.IT stat.ML

    Mutual Information Multinomial Estimation

    Authors: Yanzhi Chen, Zijing Ou, Adrian Weller, Yingzhen Li

    Abstract: Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this br… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  15. arXiv:2408.07796  [pdf

    stat.ML cs.LG stat.AP

    Ranking and Combining Latent Structured Predictive Scores without Labeled Data

    Authors: Shiva Afshar, Yinghan Chen, Shizhong Han, Ying Lin

    Abstract: Combining multiple predictors obtained from distributed data sources to an accurate meta-learner is promising to achieve enhanced performance in lots of prediction problems. As the accuracy of each predictor is usually unknown, integrating the predictors to achieve better performance is challenging. Conventional ensemble learning methods assess the accuracy of predictors based on extensive labeled… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  16. arXiv:2408.07193  [pdf, other

    stat.ME

    A comparison of methods for estimating the average treatment effect on the treated for externally controlled trials

    Authors: Huan Wang, Fei Wu, Yeh-Fong Chen

    Abstract: While randomized trials may be the gold standard for evaluating the effectiveness of the treatment intervention, in some special circumstances, single-arm clinical trials utilizing external control may be considered. The causal treatment effect of interest for single-arm studies is usually the average treatment effect on the treated (ATT) rather than the average treatment effect (ATE). Although me… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 24 pages, 13 figures

  17. arXiv:2408.04739  [pdf, other

    nlin.CD physics.ao-ph stat.ML

    Accurate deep learning-based filtering for chaotic dynamics by identifying instabilities without an ensemble

    Authors: Marc Bocquet, Alban Farchi, Tobias S. Finn, Charlotte Durand, Sibo Cheng, Yumeng Chen, Ivo Pasmans, Alberto Carrassi

    Abstract: We investigate the ability to discover data assimilation (DA) schemes meant for chaotic dynamics with deep learning. The focus is on learning the analysis step of sequential DA, from state trajectories and their observations, using a simple residual convolutional neural network, while assuming the dynamics to be known. Experiments are performed with the Lorenz 96 dynamics, which display spatiotemp… ▽ More

    Submitted 9 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  18. arXiv:2408.04154  [pdf, other

    cs.LG cs.AI stat.ML

    The Data Addition Dilemma

    Authors: Judy Hanwen Shen, Inioluwa Deborah Raji, Irene Y. Chen

    Abstract: In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the \textit{Data Addition Dilemma}, demonstrating that adding training data in this multi-source s… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Machine Learning For Health Care 2024 (MLHC)

  19. arXiv:2408.02320  [pdf, ps, other

    cs.LG eess.SP math.NA math.ST stat.ML

    A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

    Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen

    Abstract: Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This manuscript presents improved theory for probability flow ODEs compared to its earlier version arXiv:2306.09251

  20. arXiv:2408.02279  [pdf, other

    cs.LG cs.AI stat.ML

    DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

    Authors: Ruixin Ding, Yuqi Chen, Yu-Ting Lan, Wei Zhang

    Abstract: Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    ACM Class: I.2.6

  21. arXiv:2408.00139  [pdf, other

    cs.SI physics.soc-ph stat.AP

    Multiway Alignment of Political Attitudes

    Authors: Letizia Iannucci, Ali Faqeeh, Ali Salloum, Ted Hsuan Yun Chen, Mikko Kivelä

    Abstract: The related concepts of partisan belief systems, issue alignment, and partisan sorting are central to our understanding of politics. These phenomena have been studied using measures of alignment between pairs of topics, or how much individuals' attitudes toward a topic reveal about their attitudes toward another topic. We introduce a higher-order measure that extends the assessment of alignment be… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  22. arXiv:2407.16936  [pdf, ps, other

    stat.ML cs.LG math.ST stat.CO

    Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

    Authors: Wei Guo, Molei Tao, Yongxin Chen

    Abstract: We address the outstanding problem of sampling from an unnormalized density that may be non-log-concave and multimodal. To enhance the performance of simple Markov chain Monte Carlo (MCMC) methods, techniques of annealing type have been widely used. However, quantitative theoretical guarantees of these techniques are under-explored. This study takes a first step toward providing a non-asymptotic a… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  23. arXiv:2407.12132  [pdf, other

    astro-ph.IM stat.ME

    Maximum-likelihood regression with systematic errors for astronomy and the physical sciences: I. Methodology and goodness-of-fit statistic of Poisson data

    Authors: Max Bonamente, Yang Chen, Dale Zimmerman

    Abstract: The paper presents a new statistical method that enables the use of systematic errors in the maximum-likelihood regression of integer-count Poisson data to a parametric model. The method is primarily aimed at the characterization of the goodness-of-fit statistic in the presence of the over-dispersion that is induced by sources of systematic error, and is based on a quasi-maximum-likelihood method… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ApJ accepted

  24. arXiv:2407.11887  [pdf, other

    math.ST stat.AP stat.ME

    On the optimal prediction of extreme events in heavy-tailed time series with applications to solar flare forecasting

    Authors: Victor Verma, Stilian Stoev, Yang Chen

    Abstract: The prediction of extreme events in time series is a fundamental problem arising in many financial, scientific, engineering, and other applications. We begin by establishing a general Neyman-Pearson-type characterization of optimal extreme event predictors in terms of density ratios. This yields new insights and several closed-form optimal extreme event predictors for additive models. These result… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 57 pages, 5 figures

    MSC Class: 62G32 (Primary) 62G20; 62M10; 62M20 (Secondary)

  25. arXiv:2407.04970  [pdf, other

    cs.LG stat.ML

    Idiographic Personality Gaussian Process for Psychological Assessment

    Authors: Yehu Chen, Muchen Xi, Jacob Montgomery, Joshua Jackson, Roman Garnett

    Abstract: We develop a novel measurement framework based on a Gaussian process coregionalization model to address a long-lasting debate in psychometrics: whether psychological features like personality share a common structure across the population, vary uniquely for individuals, or some combination. We propose the idiographic personality Gaussian process (IPGP) framework, an intermediate model that accommo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  26. arXiv:2406.17827  [pdf, other

    stat.ME

    Practical identifiability and parameter estimation of compartmental epidemiological models

    Authors: Q. Y. Chen, Z. Rapti, Y. Drossinos, J. Cuevas-Maraver, G. A. Kevrekidis, P. G. Kevrekidis

    Abstract: Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  27. arXiv:2406.12525  [pdf, other

    cs.SI physics.soc-ph stat.AP

    Anatomy of Elite and Mass Polarization in Social Networks

    Authors: Ali Salloum, Ted Hsuan Yun Chen, Mikko Kivelä

    Abstract: Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.09311  [pdf, other

    stat.CO

    Learning High-dimensional Latent Variable Models via Doubly Stochastic Optimisation by Unadjusted Langevin

    Authors: Motonori Oka, Yunxiao Chen, Irini Moustaki

    Abstract: Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integ… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  29. arXiv:2406.08748  [pdf, other

    cs.LG cs.AI stat.ML

    Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

    Authors: Qinghua Tao, Francesco Tonin, Alex Lambert, Yingyi Chen, Panagiotis Patrinos, Johan A. K. Suykens

    Abstract: In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variati… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 9 tables, 6 figures

    Journal ref: the 41st International Conference on Machine Learning (ICML), 2024

  30. arXiv:2406.07955  [pdf, other

    cs.LG stat.ML

    How Interpretable Are Interpretable Graph Neural Networks?

    Authors: Yongqiang Chen, Yatao Bian, Bo Han, James Cheng

    Abstract: Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: ICML2024, 44 pages, 21 figures, 12 tables

  31. arXiv:2406.07651  [pdf, ps, other

    stat.ME stat.CO

    surveygenmod2: A SAS macro for estimating complex survey adjusted generalized linear models and Wald-type tests

    Authors: R. Noah Padgett, Ying Chen

    Abstract: surveygenmod2 builds on the macro written by da Silva (2017) for generalized linear models under complex survey designs. The updated macro fixed several minor bugs we encountered while updating the macro for use in SAS\textregistered. We added additional features for conducting basic Wald-type tests on groups of parameters based on the estimated regression coefficients and parameter variance-covar… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  32. arXiv:2406.04743  [pdf, other

    cs.LG cs.CR cs.DC stat.AP

    When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain

    Authors: Lei Xu, Yulong Chen, Yuntian Chen, Longfeng Nie, Xuetao Wei, Liang Xue, Dongxiao Zhang

    Abstract: Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.04575  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning

    Authors: Zhongzheng Wang, Yuntian Chen, Guodong Chen, Dongxiao Zhang

    Abstract: Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  34. arXiv:2406.03849  [pdf

    cs.LG stat.AP stat.ML

    A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

    Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

    Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.03808  [pdf

    cs.LG cs.AI stat.AP

    Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting

    Authors: Jiaxin Gao, Qinglong Cao, Yuntian Chen, Dongxiao Zhang

    Abstract: Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cro… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  36. arXiv:2406.03171  [pdf, other

    stat.ML cs.LG

    High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

    Authors: Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

    Abstract: This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  37. arXiv:2406.00695  [pdf, other

    physics.flu-dyn cs.LG cs.SC stat.AP

    Discovering an interpretable mathematical expression for a full wind-turbine wake with artificial intelligence enhanced symbolic regression

    Authors: Ding Wang, Yuntian Chen, Shiyi Chen

    Abstract: The rapid expansion of wind power worldwide underscores the critical significance of engineering-focused analytical wake models in both the design and operation of wind farms. These theoretically-derived ana lytical wake models have limited predictive capabilities, particularly in the near-wake region close to the turbine rotor, due to assumptions that do not hold. Knowledge discovery methods can… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  38. arXiv:2406.00322  [pdf, other

    stat.ME stat.AP

    Adaptive Penalized Likelihood method for Markov Chains

    Authors: Yining Zhou, Ming Gao, Yiting Chen, Xiaoping Shi

    Abstract: Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  39. arXiv:2405.19803  [pdf, other

    stat.ME math.ST

    Dynamic Factor Analysis of High-dimensional Recurrent Events

    Authors: Fangyi Chen, Yunxiao Chen, Zhiliang Ying, Kangjie Zhou

    Abstract: Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving large numbers of event types and observations become prevalent with the advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction and prediction of high-dimensiona… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  40. arXiv:2405.19637  [pdf, other

    stat.ME math.ST

    Inference in semiparametric formation models for directed networks

    Authors: Lianqiang Qu, Lu Chen, Ting Yan, Yuguo Chen

    Abstract: We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown di… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 28 pages, 3 figures

  41. arXiv:2405.19559  [pdf, ps, other

    cs.LG stat.ML

    Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm

    Authors: Mohamed Seif, Yanxi Chen

    Abstract: In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  42. arXiv:2405.18782  [pdf, other

    eess.IV cs.CV stat.ML

    Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

    Authors: Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman

    Abstract: Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior de… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  43. arXiv:2405.17862  [pdf, other

    cs.LG stat.ML

    Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property

    Authors: Yu Chen, Edoardo Patelli, Zhen Yang, Adolphus Lye

    Abstract: Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approac… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 8 pages, submitted to REC 2024 (International Workshop on Reliable Engineering Computing)

  44. arXiv:2405.17401  [pdf, other

    cs.LG cs.CV stat.ML

    RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

    Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

    Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  45. arXiv:2405.16732  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

    Authors: Dongyan Huo, Yixuan Zhang, Yudong Chen, Qiaomin Xie

    Abstract: In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two stru… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  46. arXiv:2405.15053  [pdf, other

    stat.ME

    A Latent Variable Approach to Learning High-dimensional Multivariate longitudinal Data

    Authors: Sze Ming Lee, Yunxiao Chen, Tony Sit

    Abstract: High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduce… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  47. arXiv:2405.13535  [pdf, other

    cs.LG stat.ML

    Generalized Laplace Approximation

    Authors: Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim

    Abstract: In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  48. arXiv:2405.13149  [pdf, other

    stat.ML cs.LG math.NA math.PR stat.CO

    Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

    Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

    Abstract: The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the condition… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  49. arXiv:2405.12343  [pdf, other

    math.ST stat.ME

    Determine the Number of States in Hidden Markov Models via Marginal Likelihood

    Authors: Yang Chen, Cheng-Der Fuh, Chu-Lan Michael Kao

    Abstract: Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covar… ▽ More

    Submitted 17 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  50. arXiv:2405.12331  [pdf, other

    stat.AP astro-ph.IM astro-ph.SR

    Solar Imaging Data Analytics: A Selective Overview of Challenges and Opportunities

    Authors: Yang Chen, Ward Manchester, Meng Jin, Alexei Pevtsov

    Abstract: We give a gentle introduction to solar imaging data, focusing on the challenges and opportunities of data-driven approaches for solar eruptions. The various solar phenomenon prediction problems that might benefit from statistical methods are presented. Available data products and software are described. State-of-the-art solar eruption forecasting models with data-driven approaches are summarized a… ▽ More

    Submitted 2 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.