Zum Hauptinhalt springen

Showing 1–50 of 67 results for author: Chang, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.18389  [pdf, other

    stat.ME stat.AP

    Doubly Robust Targeted Estimation of Conditional Average Treatment Effects for Time-to-event Outcomes with Competing Risks

    Authors: Runjia Li, Victor B. Talisa, Chung-Chou H. Chang

    Abstract: In recent years, precision treatment strategy have gained significant attention in medical research, particularly for patient care. We propose a novel framework for estimating conditional average treatment effects (CATE) in time-to-event data with competing risks, using ICU patients with sepsis as an illustrative example. Our approach, based on cumulative incidence functions and targeted maximum l… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 42 pages, 8 figures

  2. arXiv:2406.05944  [pdf, other

    stat.ME math.ST

    Embedding Network Autoregression for time series analysis and causal peer effect inference

    Authors: Jae Ho Chang, Subhadeep Paul

    Abstract: We propose an Embedding Network Autoregressive Model (ENAR) for multivariate networked longitudinal data. We assume the network is generated from a latent variable model, and these unobserved variables are included in a structural peer effect model or a time series network autoregressive model as additive effects. This approach takes a unified view of two related problems, (1) modeling and predict… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2404.18786  [pdf, ps, other

    math.ST stat.ME

    Randomization-based confidence intervals for the local average treatment effect

    Authors: P. M. Aronow, Haoge Chang, Patrick Lopatto

    Abstract: We consider the problem of generating confidence intervals in randomized experiments with noncompliance. We show that a refinement of a randomization-based procedure proposed by Imbens and Rosenbaum (2005) has desirable properties. Namely, we show that using a studentized Anderson-Rubin-type statistic as a test statistic yields confidence intervals that are finite-sample exact under treatment effe… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 40 pages

  4. arXiv:2404.03867  [pdf, other

    stat.CO math.PR stat.ML

    Dimension-free Relaxation Times of Informed MCMC Samplers on Discrete Spaces

    Authors: Hyunwoong Chang, Quan Zhou

    Abstract: Convergence analysis of Markov chain Monte Carlo methods in high-dimensional statistical applications is increasingly recognized. In this paper, we develop general mixing time bounds for Metropolis-Hastings algorithms on discrete spaces by building upon and refining some recent theoretical advancements in Bayesian model selection problems. We establish sufficient conditions for a class of informed… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    MSC Class: 60J10; 60J20; 82M31; 62F15

  5. arXiv:2403.11163  [pdf, ps, other

    stat.ME cs.LG math.ST stat.CO

    A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

    Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, Jing Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

    Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  6. arXiv:2311.12016  [pdf, other

    stat.ME stat.AP

    Estimating Heterogeneous Exposure Effects in the Case-Crossover Design using BART

    Authors: Jacob Englert, Stefanie Ebelt, Howard Chang

    Abstract: Epidemiological approaches for examining human health responses to environmental exposures in observational studies often control for confounding by implementing clever matching schemes and using statistical methods based on conditional likelihood. Nonparametric regression models have surged in popularity in recent years as a tool for estimating individual-level heterogeneous effects, which provid… ▽ More

    Submitted 9 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 29 pages, 4 figures

  7. arXiv:2310.14893  [pdf, other

    cs.LG eess.SY stat.AP

    Data Drift Monitoring for Log Anomaly Detection Pipelines

    Authors: Dipak Wani, Samuel Ackerman, Eitan Farchi, Xiaotong Liu, Hau-wen Chang, Sarasi Lalithsena

    Abstract: Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updat… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  8. arXiv:2309.02160  [pdf, other

    cs.LG cs.CY stat.ML

    Bias Propagation in Federated Learning

    Authors: Hongyan Chang, Reza Shokri

    Abstract: We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as gender or race) can propagate through the network to all the parties in the network. We analyze and explain bias propagation in federated learning on naturally partitioned real-world datasets. Our anal… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Journal ref: The Eleventh International Conference on Learning Representations, 2023

  9. arXiv:2305.09906  [pdf, ps, other

    stat.ME

    Fast computation of exact confidence intervals for randomized experiments with binary outcomes

    Authors: P. M. Aronow, Haoge Chang, Patrick Lopatto

    Abstract: Given a randomized experiment with binary outcomes, exact confidence intervals for the average causal effect of the treatment can be computed through a series of permutation tests. This approach requires minimal assumptions and is valid for all sample sizes, as it does not rely on large-sample approximations such as the central limit theorem. We show that these confidence intervals can be found in… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 37 pages

  10. arXiv:2302.02110  [pdf, other

    stat.AP

    A Scalar-on-Quantile-Function Approach for Estimating Short-term Health Effects of Environmental Exposures

    Authors: Yuzi Zhang, Howard H. Chang, Joshua L. Warren, Stefanie T. Ebelt

    Abstract: Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial an… ▽ More

    Submitted 4 February, 2023; originally announced February 2023.

  11. arXiv:2301.12396  [pdf, other

    stat.ME

    Sensitivity Analysis of Causal Treatment Effect Estimation for Clustered Observational Data with Unmeasured Confounding

    Authors: Yang Ou, Lu Tang, Chung-Chou H. Chang

    Abstract: Identifying causal treatment (or exposure) effects in observational studies requires the data to satisfy the unconfoundedness assumption which is not testable using the observed data. With sensitivity analysis, one can determine how the conclusions might change if assumptions are violated to a certain degree. In this paper, we propose a new technique for sensitivity analysis applicable to clusters… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  12. arXiv:2208.08472  [pdf, other

    stat.ME

    Bayesian response adaptive randomization design with a composite endpoint of mortality and morbidity

    Authors: Zhongying Xu, Andriy I. Bandos, Tianzhou Ma, Lu Tang, Victor B. Talisa, Chung-Chou H. Chang

    Abstract: Allocating patients to treatment arms during a trial based on the observed responses accumulated prior to the decision point, and sequential adaptation of this allocation,, could minimize the expected number of failures or maximize total benefit to patients. In this study, we developed a Bayesian response adaptive randomization (RAR) design targeting the endpoint of organ support-free days (OSFD)… ▽ More

    Submitted 31 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  13. arXiv:2207.00689  [pdf, other

    stat.ME stat.CO

    Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems

    Authors: Hyunwoong Chang, Changwoo J. Lee, Zhao Tang Luo, Huiyan Sang, Quan Zhou

    Abstract: The multiple-try Metropolis (MTM) algorithm is an extension of the Metropolis-Hastings (MH) algorithm by selecting the proposed state among multiple trials according to some weight function. Although MTM has gained great popularity owing to its faster empirical convergence and mixing than the standard MH algorithm, its theoretical mixing property is rarely studied in the literature due to its comp… ▽ More

    Submitted 14 October, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted to Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  14. arXiv:2203.16627  [pdf, other

    stat.ME

    A Bayesian framework for incorporating exposure uncertainty into health analyses with application to air pollution and stillbirth

    Authors: Saskia Comess, Howard H. Chang, Joshua L. Warren

    Abstract: Studies of the relationships between environmental exposures and adverse health outcomes often rely on a two-stage statistical modeling approach, where exposure is modeled/predicted in the first stage and used as input to a separately fit health outcome analysis in the second stage. Uncertainty in these predictions is frequently ignored, or accounted for in an overly simplistic manner, when estima… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  15. arXiv:2202.05150  [pdf, other

    stat.CO stat.ME

    Order-based Structure Learning without Score Equivalence

    Authors: Hyunwoong Chang, James Cai, Quan Zhou

    Abstract: We propose an empirical Bayes formulation of the structure learning problem, where the prior specification assumes that all node variables have the same error variance, an assumption known to ensure the identifiability of the underlying causal directed acyclic graph (DAG). To facilitate efficient posterior computation, we approximate the posterior probability of each ordering by that of a best DAG… ▽ More

    Submitted 16 August, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

  16. arXiv:2201.00698  [pdf

    cs.LG cs.AI stat.ML

    Deep-learning-based upscaling method for geologic models via theory-guided convolutional neural network

    Authors: Nanzhe Wang, Qinzhuo Liao, Haibin Chang, Dongxiao Zhang

    Abstract: Large-scale or high-resolution geologic models usually comprise a huge number of grid blocks, which can be computationally demanding and time-consuming to solve with numerical simulators. Therefore, it is advantageous to upscale geologic models (e.g., hydraulic conductivity) from fine-scale (high-resolution grids) to coarse-scale systems. Numerical upscaling methods have been proven to be effectiv… ▽ More

    Submitted 31 December, 2021; originally announced January 2022.

    Comments: 37 pages, 21 pages

  17. arXiv:2110.08425  [pdf, other

    stat.ME econ.EM

    Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials

    Authors: Haoge Chang, Joel Middleton, P. M. Aronow

    Abstract: In an influential critique of empirical practice, Freedman (2008) showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman's assumptions, we derive exact closed-form bias corrections for the linear regression estimator with and without treatment-by-covariate interactions. We show that the limiting distribu… ▽ More

    Submitted 25 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

  18. arXiv:2106.06526  [pdf, other

    cs.LG cs.AI stat.ML

    Online Continual Adaptation with Active Self-Training

    Authors: Shiji Zhou, Han Zhao, Shanghang Zhang, Lianzhe Wang, Heng Chang, Zhi Wang, Wenwu Zhu

    Abstract: Models trained with offline data often suffer from continual distribution shifts and expensive labeling in changing environments. This calls for a new online learning paradigm where the learner can continually adapt to changing environments with limited labels. In this paper, we propose a new online setting -- Online Active Continual Adaptation, where the learner aims to continually adapt to chang… ▽ More

    Submitted 20 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

  19. arXiv:2104.09730  [pdf, other

    stat.ME stat.AP

    Critical Window Variable Selection for Mixtures: Estimating the Impact of Multiple Air Pollutants on Stillbirth

    Authors: Joshua L. Warren, Howard H. Chang, Lauren K. Warren, Matthew J. Strickland, Lyndsey A. Darrow, James A. Mulholland

    Abstract: Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable sub-populations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend Critical Window Variable Sel… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  20. arXiv:2103.06261  [pdf, other

    stat.ML cs.LG stat.ME

    A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

    Authors: Xiaoqing Tan, Chung-Chou H. Chang, Ling Zhou, Lu Tang

    Abstract: Accurately estimating personalized treatment effects within a study site (e.g., a hospital) has been challenging due to limited sample size. Furthermore, privacy considerations and lack of resources prevent a site from leveraging subject-level data from other sites. We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) a… ▽ More

    Submitted 15 June, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted at ICML 2022. Previously titled "A Tree-based Federated Learning Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources"

  21. arXiv:2101.04084  [pdf, other

    math.ST stat.CO

    Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes

    Authors: Quan Zhou, Hyunwoong Chang

    Abstract: Structure learning via MCMC sampling is known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. Theoretical results on the mixing behavior are lacking. In this work, we prove the rapid mixing of a random walk Metropolis-Hastings algorithm, which reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomia… ▽ More

    Submitted 5 April, 2023; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: only minor changes

    MSC Class: 62F15; 62J05

  22. Theory-guided Auto-Encoder for Surrogate Construction and Inverse Modeling

    Authors: Nanzhe Wang, Haibin Chang, Dongxiao Zhang

    Abstract: A Theory-guided Auto-Encoder (TgAE) framework is proposed for surrogate construction and is further used for uncertainty quantification and inverse modeling tasks. The framework is built based on the Auto-Encoder (or Encoder-Decoder) architecture of convolutional neural network (CNN) via a theory-guided training process. In order to achieve the theory-guided training, the governing equations of th… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Journal ref: Comput. Methods Appl. Mech. Engrg., 385 (2021), 114037

  23. arXiv:2011.03731  [pdf, other

    stat.ML cs.CR cs.CY cs.LG

    On the Privacy Risks of Algorithmic Fairness

    Authors: Hongyan Chang, Reza Shokri

    Abstract: Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this… ▽ More

    Submitted 7 April, 2021; v1 submitted 7 November, 2020; originally announced November 2020.

  24. arXiv:2010.13599  [pdf, other

    stat.ME math.ST stat.AP

    Design-Based Inference for Spatial Experiments under Unknown Interference

    Authors: Ye Wang, Cyrus Samii, Haoge Chang, P. M. Aronow

    Abstract: We consider design-based causal inference for spatial experiments in which treatments may have effects that bleed out and feed back in complex ways. Such spatial spillover effects violate the standard ``no interference'' assumption for standard causal inference methods. The complexity of spatial spillover effects also raises the risk of misspecification and bias in model-based analyses. We offer a… ▽ More

    Submitted 2 August, 2024; v1 submitted 26 October, 2020; originally announced October 2020.

  25. arXiv:2009.06211  [pdf, other

    cs.LG stat.ML

    Implicit Graph Neural Networks

    Authors: Fangda Gu, Heng Chang, Wenwu Zhu, Somayeh Sojoudi, Laurent El Ghaoui

    Abstract: Graph Neural Networks (GNNs) are widely used deep learning models that learn meaningful representations from graph-structured data. Due to the finite nature of the underlying recurrent structure, current GNN methods may struggle to capture long-range dependencies in underlying graphs. To overcome this difficulty, we propose a graph learning framework, called Implicit Graph Neural Networks (IGNN),… ▽ More

    Submitted 1 June, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Accepted by NeurIPS 2020 at: https://papers.nips.cc/paper/2020/hash/8b5c8441a8ff8e151b191c53c1842a38-Abstract.html

    Journal ref: Advances in Neural Information Processing Systems 33 (2020) 11984-11995

  26. arXiv:2007.15580  [pdf

    eess.SP cs.LG math.OC physics.comp-ph stat.ML

    Deep-Learning based Inverse Modeling Approaches: A Subsurface Flow Example

    Authors: Nanzhe Wang, Haibin Chang, Dongxiao Zhang

    Abstract: Deep-learning has achieved good performance and shown great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for probl… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: 53 pages, 22 figures, 7 tables

    Journal ref: Journal of Geophysical Research: Solid Earth, e2020JB020549, 2020

  27. arXiv:2007.14550  [pdf, ps, other

    math.OC cs.LG stat.ML

    An Index-based Deterministic Asymptotically Optimal Algorithm for Constrained Multi-armed Bandit Problems

    Authors: Hyeong Soo Chang

    Abstract: For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an optimal feasible arm to one over infinite horizon. The algorithm is built upon Locatelli et al.'s "anytime parameter-free thresholding" algorithm under the assumption… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  28. arXiv:2006.10222  [pdf, other

    cs.LG stat.ML

    Class-Attentive Diffusion Network for Semi-Supervised Classification

    Authors: Jongin Lim, Daeho Um, Hyung Jin Chang, Dae Ung Jo, Jin Young Choi

    Abstract: Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class am… ▽ More

    Submitted 29 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to AAAI 2021

  29. arXiv:2006.08669  [pdf, other

    stat.ML cs.CR cs.CY cs.LG

    On Adversarial Bias and the Robustness of Fair Machine Learning

    Authors: Hongyan Chang, Ta Duy Nguyen, Sasi Kumar Murakonda, Ehsan Kazemi, Reza Shokri

    Abstract: Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different groups, by imposing a fairness constraint on models. However, we show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in tr… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

  30. arXiv:2006.04072  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Implications of Human Irrationality for Reinforcement Learning

    Authors: Haiyang Chen, Hyung Jin Chang, Andrew Howes

    Abstract: Recent work in the behavioural sciences has begun to overturn the long-held belief that human decision making is irrational, suboptimal and subject to biases. This turn to the rational suggests that human decision making may be a better source of ideas for constraining how machine learning problems are defined than would otherwise be the case. One promising idea concerns human decision making that… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: 12 pages, 5 figures

  31. arXiv:2004.13560  [pdf

    eess.SP cs.LG physics.comp-ph stat.ML

    Efficient Uncertainty Quantification for Dynamic Subsurface Flow with Surrogate by Theory-guided Neural Network

    Authors: Nanzhe Wang, Haibin Chang, Dongxiao Zhang

    Abstract: Subsurface flow problems usually involve some degree of uncertainty. Consequently, uncertainty quantification is commonly necessary for subsurface flow prediction. In this work, we propose a methodology for efficient uncertainty quantification for dynamic subsurface flow with a surrogate constructed by the Theory-guided Neural Network (TgNN). The TgNN here is specially designed for problems with s… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

  32. arXiv:2004.08410  [pdf, ps, other

    cs.LG stat.ML

    Deep Reinforcement Learning for Adaptive Learning Systems

    Authors: Xiao Li, Hanchen Xu, Jinming Zhang, Hua-hua Chang

    Abstract: In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive learning systems as a Markov decision process (MDP). We assume latent traits to be continuous with an unknown transition model. We apply a model-free deep rein… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

  33. arXiv:2004.08032  [pdf, other

    stat.ME stat.AP

    A non-convex regularization approach for stable estimation of loss development factors

    Authors: Himchan Jeong, Hyunwoong Chang, Emiliano A. Valdez

    Abstract: In this article, we apply non-convex regularization methods in order to obtain stable estimation of loss development factors in insurance claims reserving. Among the non-convex regularization methods, we focus on the use of the log-adjusted absolute deviation (LAAD) penalty and provide discussion on optimization of LAAD penalized regression model, which we prove to converge with a coordinate desce… ▽ More

    Submitted 6 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: 23 pages, 11 Tables, 6 Figures

    MSC Class: 62P05

  34. arXiv:2003.11249  [pdf, other

    cs.LG cs.CV stat.ML

    VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

    Authors: Jongwon Choi, Kwang Moo Yi, Jihoon Kim, Jinho Choo, Byoungjip Kim, Jin-Yeop Chang, Youngjune Gwon, Hyung Jin Chang

    Abstract: Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes' rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms shoul… ▽ More

    Submitted 3 December, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

  35. arXiv:2003.07450  [pdf, other

    cs.LG cs.SI stat.ML

    Spectral Graph Attention Network with Fast Eigen-approximation

    Authors: Heng Chang, Yu Rong, Tingyang Xu, Wenbing Huang, Somayeh Sojoudi, Junzhou Huang, Wenwu Zhu

    Abstract: Variants of Graph Neural Networks (GNNs) for representation learning have been proposed recently and achieved fruitful results in various fields. Among them, Graph Attention Network (GAT) first employs a self-attention strategy to learn attention weights for each edge in the spatial domain. However, learning the attentions over edges can only focus on the local information of graphs and greatly in… ▽ More

    Submitted 27 July, 2021; v1 submitted 16 March, 2020; originally announced March 2020.

    Comments: Accepted by Deep Learning on Graphs: Method and Applications (DLG-KDD21)

  36. arXiv:1912.11279  [pdf, ps, other

    stat.ML cs.CR cs.LG

    Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer

    Authors: Hongyan Chang, Virat Shejwalkar, Reza Shokri, Amir Houmansadr

    Abstract: Collaborative (federated) learning enables multiple parties to train a model without sharing their private data, but through repeated sharing of the parameters of their local models. Despite its advantages, this approach has many known privacy and security weaknesses and performance overhead, in addition to being limited only to models with homogeneous architectures. Shared parameters leak a signi… ▽ More

    Submitted 24 December, 2019; originally announced December 2019.

  37. arXiv:1911.07335  [pdf, other

    cs.CL cs.LG stat.ML

    Using Error Decay Prediction to Overcome Practical Issues of Deep Active Learning for Named Entity Recognition

    Authors: Haw-Shiuan Chang, Shankar Vembu, Sunil Mohan, Rheeya Uppaal, Andrew McCallum

    Abstract: Existing deep active learning algorithms achieve impressive sampling efficiency on natural language processing tasks. However, they exhibit several weaknesses in practice, including (a) inability to use uncertainty sampling with black-box models, (b) lack of robustness to labeling noise, and (c) lack of transparency. In response, we propose a transparent batch active sampling framework by estimati… ▽ More

    Submitted 20 July, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

    Comments: This is a pre-print of an article published in Springer Machine Learning journal. The final authenticated version is available online at: https://doi.org/10.1007/s10994-020-05897-1

  38. Deep Learning of Subsurface Flow via Theory-guided Neural Network

    Authors: Nanzhe Wang, Dongxiao Zhang, Haibin Chang, Heng Li

    Abstract: Active researches are currently being performed to incorporate the wealth of scientific knowledge into data-driven approaches (e.g., neural networks) in order to improve the latter's effectiveness. In this study, the Theory-guided Neural Network (TgNN) is proposed for deep learning of subsurface flow. In the TgNN, as supervised learning, the neural network is trained with available observations or… ▽ More

    Submitted 24 October, 2019; originally announced November 2019.

    Journal ref: Journal of Hydrology, 2020, 584, 124700

  39. arXiv:1909.03816  [pdf, other

    stat.AP stat.ME

    Multivariate spectral downscaling for PM2.5 species

    Authors: Yawen Guan, Brian J Reich, James A Mulholland, Howard H Chang

    Abstract: Fine particulate matter (PM2.5) is a mixture of air pollutants that has adverse effects on human health. Understanding the health effects of PM2.5 mixture and its individual species has been a research priority over the past two decades. However, the limited availability of speciated PM2.5 measurements continues to be a major challenge in exposure assessment for conducting large-scale population-b… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

  40. arXiv:1908.04463  [pdf

    stat.ML cs.LG physics.comp-ph

    DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data

    Authors: Hao Xu, Haibin Chang, Dongxiao Zhang

    Abstract: In recent years, data-driven methods have been developed to learn dynamical systems and partial differential equations (PDE). The goal of such work is discovering unknown physics and the corresponding equations. However, prior to achieving this goal, major challenges remain to be resolved, including learning PDE under noisy data and limited discrete data. To overcome these challenges, in this work… ▽ More

    Submitted 6 April, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

    Journal ref: Communications in Computational Physics. 2021, 29, 698-728

  41. arXiv:1908.02441  [pdf, other

    cs.LG cs.CV stat.ML

    Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning

    Authors: Jiwoong Park, Minsik Lee, Hyung Jin Chang, Kyuewang Lee, Jin Young Choi

    Abstract: We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpe… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: 10 pages, 3 figures, ICCV 2019 accepted

  42. arXiv:1908.01297  [pdf, other

    cs.SI cs.CR cs.LG stat.ML

    A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models

    Authors: Heng Chang, Yu Rong, Tingyang Xu, Wenbing Huang, Honglei Zhang, Peng Cui, Wenwu Zhu, Junzhou Huang

    Abstract: With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial lo… ▽ More

    Submitted 17 December, 2019; v1 submitted 4 August, 2019; originally announced August 2019.

    Comments: Accepted by the AAAI 2020

  43. arXiv:1908.01113  [pdf

    stat.ML cs.LG stat.ME

    Ensemble Neural Networks (ENN): A gradient-free stochastic method

    Authors: Yuntian Chen, Haibin Chang, Meng Jin, Dongxiao Zhang

    Abstract: In this study, an efficient stochastic gradient-free method, the ensemble neural networks (ENN), is developed. In the ENN, the optimization process relies on covariance matrices rather than derivatives. The covariance matrices are calculated by the ensemble randomized maximum likelihood algorithm (EnRML), which is an inverse modeling method. The ENN is able to simultaneously provide estimations an… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Journal ref: Neural Networks, 110, 170-185 (2019)

  44. arXiv:1907.08040  [pdf, other

    cs.LG cs.NE stat.ML

    Convolutional Reservoir Computing for World Models

    Authors: Hanten Chang, Katsuya Futagami

    Abstract: Recently, reinforcement learning models have achieved great success, completing complex tasks such as mastering Go and other games with higher scores than human players. Many of these models collect considerable data on the tasks and improve accuracy by extracting visual and time-series features using convolutional neural networks (CNNs) and recurrent neural networks, respectively. However, these… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  45. arXiv:1905.10029  [pdf, other

    cs.LG cs.CR stat.ML

    Power up! Robust Graph Convolutional Network via Graph Powering

    Authors: Ming Jin, Heng Chang, Wenwu Zhu, Somayeh Sojoudi

    Abstract: Graph convolutional networks (GCNs) are powerful tools for graph-structured data. However, they have been recently shown to be vulnerable to topological attacks. To enhance adversarial robustness, we go beyond spectral graph theory to robust graph theory. By challenging the classical graph Laplacian, we propose a new convolution operator that is provably robust in the spectral domain and is incorp… ▽ More

    Submitted 21 September, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted by AAAI 2021 at: https://ojs.aaai.org/index.php/AAAI/article/view/16976

    Journal ref: In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 9, pp. 8004-8012, 2021)

  46. arXiv:1905.03680  [pdf, other

    stat.ML cs.LG stat.AP

    A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables

    Authors: Shu Wang, Jonathan G. Yabes, Chung-Chou H. Chang

    Abstract: Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers subject to limits of detection (LOD) are common to encounter in clinical data, which brings censored… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: 34 pages, 12 table and figures

  47. arXiv:1905.02257  [pdf, other

    stat.ML cs.LG stat.AP

    Hybrid Density- and Partition-based Clustering Algorithm for Data with Mixed-type Variables

    Authors: Shu Wang, Jonathan G. Yabes, Chung-Chou H. Chang

    Abstract: Clustering is an essential technique for discovering patterns in data. The steady increase in amount and complexity of data over the years led to improvements and development of new clustering algorithms. However, algorithms that can cluster data with mixed variable types (continuous and categorical) remain limited, despite the abundance of data with mixed types particularly in the medical field.… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

    Journal ref: Journal of Data Science 19(2021)15-36

  48. arXiv:1904.09002  [pdf, ps, other

    stat.ME

    Landmark Proportional Subdistribution Hazards Models for Dynamic Prediction of Cumulative Incidence Functions

    Authors: Qing Liu, Gong Tang, Joseph P. Costantino, Chung-Chou H. Chang

    Abstract: An individualized risk prediction model that dynamically updates the probability of a clinical event from a specific cause is valuable for physicians to be able to optimize personalized treatment strategies in real-time by incorporating all available information collected over the follow-up. However, this is more complex and challenging when competing risks are present, because it requires simulta… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

  49. A comparison of statistical and machine learning methods for creating national daily maps of ambient PM$_{2.5}$ concentration

    Authors: Veronica J. Berrocal, Yawen Guan, Amanda Muyskens, Haoyu Wang, Brian J Reich, James A. Mulholland, Howard H. Chang

    Abstract: A typical problem in air pollution epidemiology is exposure assessment for individuals for which health data are available. Due to the sparsity of monitoring sites and the limited temporal frequency with which measurements of air pollutants concentrations are collected (for most pollutants, once every 3 or 6 days), epidemiologists have been moving away from characterizing ambient air pollution exp… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  50. arXiv:1901.00032  [pdf, other

    cond-mat.mtrl-sci cs.AI stat.ML

    Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

    Authors: Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, Elsa Olivetti

    Abstract: Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a… ▽ More

    Submitted 17 February, 2019; v1 submitted 31 December, 2018; originally announced January 2019.

    Comments: Added new funding support to the acknowledgments section in this version