Search | arXiv e-print repository

Bayesian Joint Additive Factor Models for Multiview Learning

Authors: Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick

Abstract: It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of… ▽ More It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar. △ Less

Submitted 2 June, 2024; originally announced June 2024.

MSC Class: 62F15

arXiv:2312.13484 [pdf, other]

Bayesian Transfer Learning

Authors: Piotr M. Suder, Jason Xu, David B. Dunson

Abstract: Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical… ▽ More Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical engineering have sought to bring these ideas into focus, primarily surveying general methodologies and works from these disciplines. This article highlights Bayesian approaches to transfer learning, which have received relatively limited attention despite their innate compatibility with the notion of drawing upon prior knowledge to guide new learning tasks. Our survey encompasses a wide range of Bayesian transfer learning frameworks applicable to a variety of practical settings. We discuss how these methods address the problem of finding the optimal information to transfer between domains, which is a central question in transfer learning. We illustrate the utility of Bayesian transfer learning methods via a simulation study where we compare performance against frequentist competitors. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.14829

Proximal Algorithms for Accelerated Langevin Dynamics

Authors: Duy H. Thai, Alexander L. Young, David B. Dunson

Abstract: We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient… ▽ More We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient versions of the proposed Langevin dynamics are also provided. Experimental illustrations show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing including better mixing of the resulting Markov chains. △ Less

Submitted 28 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: The technical proofs for the paper will be revised

arXiv:2304.11251 [pdf, other]

Machine Learning and the Future of Bayesian Computation

Authors: Steven Winter, Trevor Campbell, Lizhen Lin, Sanvesh Srivastava, David B. Dunson

Abstract: Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional model… ▽ More Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional models with many observations. In this article we discuss the potential to improve posterior computation using ideas from machine learning. Concrete future directions are explored in vignettes on normalizing flows, Bayesian coresets, distributed Bayesian inference, and variational inference. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.10630 [pdf, other]

Ellipsoid fitting with the Cayley transform

Authors: Omar Melikechi, David B. Dunson

Abstract: We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over th… ▽ More We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms. △ Less

Submitted 27 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2110.07478 [pdf, other]

Inferring Manifolds From Noisy Data Using Gaussian Processes

Authors: David B Dunson, Nan Wu

Abstract: In analyzing complex datasets, it is often of interest to infer lower dimensional structure underlying the higher dimensional observations. As a flexible class of nonlinear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower dimensional coordinates without providing an estimate of the manifold in the observation… ▽ More In analyzing complex datasets, it is often of interest to infer lower dimensional structure underlying the higher dimensional observations. As a flexible class of nonlinear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower dimensional coordinates without providing an estimate of the manifold in the observation space or using the manifold to denoise the original data. This article proposes a new methodology for addressing these problems, allowing interpolation of the estimated manifold between fitted data points. The proposed approach is motivated by novel theoretical properties of local covariance matrices constructed from noisy samples on a manifold. Our results enable us to turn a global manifold reconstruction problem into a local regression problem, allowing application of Gaussian processes for probabilistic manifold reconstruction. In addition to theory justifying the algorithm, we provide simulated and real data examples to illustrate the performance. △ Less

Submitted 24 May, 2024; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: 51 pages, 20 figures

arXiv:2010.08908 [pdf, other]

Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds

Authors: Lizhen Lin, Bayan Saparbayeva, Michael Minyi Zhang, David B. Dunson

Abstract: We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficu… ▽ More We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficulty of verifying the complexity of the objective function, e.g., whether the objective function is convex or non-convex, and the degree of non-convexity. Our proposed algorithm adapts to the level of complexity in the objective function. We show that when the objective function is convex, the algorithm provably converges to the optimum and leads to accelerated convergence. When the objective function is non-convex, the algorithm will converge to a stationary point. Our proposed method unifies insights from Nesterov's original idea for accelerating gradient descent algorithms with recent developments in optimization algorithms in Euclidean space. We demonstrate the utility of our algorithms on several manifold optimization tasks such as estimating intrinsic and extrinsic Fréchet means on spheres and low-rank matrix factorization with Grassmann manifolds applied to the Netflix rating data set. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2001.03988 [pdf, other]

Domain Adaptive Bootstrap Aggregating

Authors: Meimei Liu, David B. Dunson

Abstract: When there is a distributional shift between data used to train a predictive algorithm and current data, performance can suffer. This is known as the domain adaptation problem. Bootstrap aggregating, or bagging, is a popular method for improving stability of predictive algorithms, while reducing variance and protecting against over-fitting. This article proposes a domain adaptive bagging method co… ▽ More When there is a distributional shift between data used to train a predictive algorithm and current data, performance can suffer. This is known as the domain adaptation problem. Bootstrap aggregating, or bagging, is a popular method for improving stability of predictive algorithms, while reducing variance and protecting against over-fitting. This article proposes a domain adaptive bagging method coupled with a new iterative nearest neighbor sampler. The key idea is to draw bootstrap samples from the training data in such a manner that their distribution equals that of new testing data. The proposed approach provides a general ensemble framework that can be applied to arbitrary classifiers. We further modify the method to allow anomalous samples in the test data corresponding to outliers in the training data. Theoretical support is provided, and the approach is compared to alternatives in simulations and real data applications. △ Less

Submitted 16 June, 2020; v1 submitted 12 January, 2020; originally announced January 2020.

arXiv:1911.02728 [pdf, other]

Auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets

Authors: Meimei Liu, Zhengwu Zhang, David B. Dunson

Abstract: There has been huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationship with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean natu… ▽ More There has been huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationship with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer the relationships between brain structural connectomes and human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to understand the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference and computing efficiency. We found that structural connectomes have a stronger association with a wide range of human cognitive traits than was apparent using previous approaches. △ Less

Submitted 13 September, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

Comments: 31 pages, 12 figures, 5 tables

arXiv:1904.05850 [pdf, other]

Consistent Entropy Estimation for Stationary Time Series

Authors: Alexander L Young, David B Dunson

Abstract: Entropy estimation, due in part to its connection with mutual information, has seen considerable use in the study of time series data including causality detection and information flow. In many cases, the entropy is estimated using $k$-nearest neighbor (Kozachenko-Leonenko) based methods. However, analytic results on this estimator are limited to independent data. In the article, we show rigorous… ▽ More Entropy estimation, due in part to its connection with mutual information, has seen considerable use in the study of time series data including causality detection and information flow. In many cases, the entropy is estimated using $k$-nearest neighbor (Kozachenko-Leonenko) based methods. However, analytic results on this estimator are limited to independent data. In the article, we show rigorous bounds on the rate of decay of the bias in the number of samples, $N$, assuming they are drawn from a stationary process which satisfies a suitable mixing condition. Numerical examples are presented which demonstrate the efficiency of the estimator when applied to a Markov process with stationary Gaussian density. These results support the asymptotic rates derived in the theoretical work. △ Less

Submitted 3 August, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: 16 pages, 2 figures

MSC Class: 62G05; 62G20

arXiv:1903.00985 [pdf, other]

doi 10.1093/biomet/asaa033

Classification via local manifold approximation

Authors: Didong Li, David B Dunson

Abstract: Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classi… ▽ More Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classifier based on obtaining a local approximation to the support of the data within each class in a neighborhood of the feature to be classified, and assigning the feature to the class having the closest support. This general algorithm is referred to as LOcal Manifold Approximation (LOMA) classification. As a simple and theoretically supported special case having excellent performance in a broad variety of examples, we use spheres for local approximation, obtaining a SPherical Approximation (SPA) classifier. We illustrate substantial gains for SPA over competitors on a variety of challenging simulated and real data examples. △ Less

Submitted 3 March, 2019; originally announced March 2019.

arXiv:1901.00172 [pdf, other]

Supervised Multiscale Dimension Reduction for Spatial Interaction Networks

Authors: Shaobo Han, David B. Dunson

Abstract: We introduce a multiscale supervised dimension reduction method for SPatial Interaction Network (SPIN) data, which consist of a collection of spatially coordinated interactions. This type of predictor arises when the sampling unit of data is composed of a collection of primitive variables, each of them being essentially unique, so that it becomes necessary to group the variables in order to simpli… ▽ More We introduce a multiscale supervised dimension reduction method for SPatial Interaction Network (SPIN) data, which consist of a collection of spatially coordinated interactions. This type of predictor arises when the sampling unit of data is composed of a collection of primitive variables, each of them being essentially unique, so that it becomes necessary to group the variables in order to simplify the representation and enhance interpretability. In this paper, we introduce an empirical Bayes approach called spinlets, which first constructs a partitioning tree to guide the reduction over multiple spatial granularities, and then refines the representation of predictors according to the relevance to the response. We consider an inverse Poisson regression model and propose a new multiscale generalized double Pareto prior, which is induced via a tree-structured parameter expansion scheme. Our approach is motivated by an application in soccer analytics, in which we obtain compact vectorial representations and readily interpretable visualizations of the complex network objects, supervised by the response of interest. △ Less

Submitted 8 June, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

Comments: 30 pages, 12 figures, revised for clarity and conciseness

arXiv:1810.13431 [pdf, other]

Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states

Authors: Rihui Ou, Deborshee Sen, Alexander L Young, David B Dunson

Abstract: Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. These approximate the full posterior by using small random subsequences of the data at each MCMC iteration within stochastic gradient MCMC. In the… ▽ More Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. These approximate the full posterior by using small random subsequences of the data at each MCMC iteration within stochastic gradient MCMC. In the presence of imbalanced data resulting from rare latent states, subsequences often exclude rare latent state data, leading to inaccurate inference and prediction/detection of rare events. We propose a targeted sub-sampling (TASS) approach that over-samples observations corresponding to rare latent states when calculating the stochastic gradient of parameters associated with them. TASS uses an initial clustering of the data to construct subsequence weights that reduce the variance in gradient estimation. This leads to improved sampling efficiency, in particular in settings where the rare latent states correspond to extreme observations. We demonstrate substantial gains in predictive and inferential accuracy on real and synthetic examples. △ Less

Submitted 25 July, 2024; v1 submitted 31 October, 2018; originally announced October 2018.

arXiv:1810.08537 [pdf, other]

Bayesian Distance Clustering

Authors: Leo L Duan, David B Dunson

Abstract: Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likel… ▽ More Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness. △ Less

Submitted 25 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

arXiv:1803.01203 [pdf, other]

Multiresolution Tensor Decomposition for Multiple Spatial Passing Networks

Authors: Shaobo Han, David B. Dunson

Abstract: This article is motivated by soccer positional passing networks collected across multiple games. We refer to these data as replicated spatial passing networks---to accurately model such data it is necessary to take into account the spatial positions of the passer and receiver for each passing event. This spatial registration and replicates that occur across games represent key differences with usu… ▽ More This article is motivated by soccer positional passing networks collected across multiple games. We refer to these data as replicated spatial passing networks---to accurately model such data it is necessary to take into account the spatial positions of the passer and receiver for each passing event. This spatial registration and replicates that occur across games represent key differences with usual social network data. As a key step before investigating how the passing dynamics influence team performance, we focus on developing methods for summarizing different team's passing strategies. Our proposed approach relies on a novel multiresolution data representation framework and Poisson nonnegative block term decomposition model, which automatically produces coarse-to-fine low-rank network motifs. The proposed methods are applied to detailed passing record data collected from the 2014 FIFA World Cup. △ Less

Submitted 3 March, 2018; originally announced March 2018.

Comments: 34 pages, 15 figures

arXiv:1611.05559 [pdf, other]

Boosting Variational Inference

Authors: Fangjian Guo, Xiangyu Wang, Kai Fan, Tamara Broderick, David B. Dunson

Abstract: Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit po… ▽ More Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit point. Thus, no matter how long VI is run, the resulting approximation will not approach the exact posterior. We propose to instead consider a more flexible approximating family consisting of all possible finite mixtures of a parametric base distribution (e.g., Gaussian). For efficient inference, we borrow ideas from gradient boosting to develop an algorithm we call boosting variational inference (BVI). BVI iteratively improves the current approximation by mixing it with a new component from the base distribution family and thereby yields progressively more accurate posterior approximations as more computing time is spent. Unlike a number of common VI variants including mean-field VI, BVI is able to capture multimodality, general posterior covariance, and nonstandard posterior shapes. △ Less

Submitted 1 March, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

Comments: 17 pages, 7 figures

arXiv:1605.05798 [pdf, other]

MCMC for Imbalanced Categorical Data

Authors: James E. Johndrow, Aaron Smith, Natesh Pillai, David B. Dunson

Abstract: Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying dif… ▽ More Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying different types of MCMC approaches. This article was motivated by an application to quantitative advertising in which we encountered extremely poor computational performance for common data augmentation MCMC algorithms but obtained excellent performance for adaptive Metropolis. To obtain a deeper understanding of this behavior, we give strong theory results on computational complexity in an infinitely imbalanced asymptotic regime. Our results show computational complexity of Metropolis is logarithmic in sample size, while data augmentation is polynomial in sample size. The root cause of poor performance of data augmentation is a discrepancy between the rates at which the target density and MCMC step sizes concentrate. In general, MCMC algorithms that have a similar discrepancy will fail in large samples - a result with substantial practical impact. △ Less

Submitted 26 June, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

MSC Class: 62

arXiv:1603.05324 [pdf, other]

Fast moment estimation for generalized latent Dirichlet models

Authors: Shiwen Zhao, Barbara E. Engelhardt, Sayan Mukherjee, David B. Dunson

Abstract: We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational adv… ▽ More We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational advan- tage of our method (MELD) is that parameter estimation does not require instantiation of the latent variables. Moreover, a representational advantage of the GMM approach is that the behavior of the model is agnostic to distributional assumptions of the observations. We derive population moment conditions after marginalizing out the sample-specific Dirichlet latent variables. The moment conditions only depend on component mean parameters. We illustrate the utility of our approach on simulated data, comparing results from MELD to alternative methods, and we show the promise of our approach through the application of MELD to several data sets. △ Less

Submitted 23 March, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

Comments: corrected a typo in figure

arXiv:1506.05860 [pdf, ps, other]

Variational Gaussian Copula Inference

Authors: Shaobo Han, Xuejun Liao, David B. Dunson, Lawrence Carin

Abstract: We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparam… ▽ More We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparametric transformations based on Bernstein polynomials provide ample flexibility in characterizing the univariate marginal posteriors. △ Less

Submitted 18 May, 2016; v1 submitted 18 June, 2015; originally announced June 2015.

Comments: Appearing in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume 51

arXiv:1502.06895 [pdf, ps, other]

On the consistency theory of high dimensional variable screening

Authors: Xiangyu Wang, Chenlei Leng, David B. Dunson

Abstract: Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of varia… ▽ More Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of variables without discarding the important ones. When the data dimension $p$ is substantially larger than the sample size $n$, variable screening becomes crucial as 1) Faster feature selection algorithms are needed; 2) Conditions guaranteeing selection consistency might fail to hold. This article studies a class of linear screening methods and establishes consistency theory for this special class. In particular, we prove the restricted diagonally dominant (RDD) condition is a necessary and sufficient condition for strong screening consistency. As concrete examples, we show two screening methods $SIS$ and $HOLP$ are both strong screening consistent (subject to additional constraints) with large probability if $n > O((ρs + σ/τ)^2\log p)$ under random designs. In addition, we relate the RDD condition to the irrepresentable condition, and highlight limitations of $SIS$. △ Less

Submitted 6 June, 2015; v1 submitted 24 February, 2015; originally announced February 2015.

Comments: adding comments on REC

arXiv:1403.2660 [pdf, other]

Robust and Scalable Bayes via a Median of Subset Posterior Measures

Authors: Stanislav Minsker, Sanvesh Srivastava, Lizhen Lin, David B. Dunson

Abstract: We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed a… ▽ More We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method. △ Less

Submitted 1 June, 2016; v1 submitted 11 March, 2014; originally announced March 2014.

MSC Class: Primary 62F15; secondary 68W15; 62G35

arXiv:1401.3632 [pdf, other]

Bayesian Conditional Density Filtering

Authors: Shaan Qamar, Rajarshi Guhaniyogi, David B. Dunson

Abstract: We propose a Conditional Density Filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by propagating surrogate conditional sufficient statistics (a function of data and parameter estimates) as new data arrive. These quantities eliminate the need to store or p… ▽ More We propose a Conditional Density Filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by propagating surrogate conditional sufficient statistics (a function of data and parameter estimates) as new data arrive. These quantities eliminate the need to store or process the entire dataset simultaneously and offer a number of desirable features. Often, these include a reduction in memory requirements and runtime and improved mixing, along with state-of-the-art parameter inference and prediction. These improvements are demonstrated through several illustrative examples including an application to high dimensional compressed regression. Finally, we show that C-DF samples converge to the target posterior distribution asymptotically as sampling proceeds and more data arrives. △ Less

Submitted 22 September, 2015; v1 submitted 15 January, 2014; originally announced January 2014.

Comments: 41 pages, 7 figures, 12 tables

arXiv:1312.4605 [pdf, ps, other]

Parallelizing MCMC via Weierstrass Sampler

Authors: Xiangyu Wang, David B. Dunson

Abstract: With the rapidly growing scales of statistical problems, subset based communication-free parallel MCMC methods are a promising future for large scale Bayesian analysis. In this article, we propose a new Weierstrass sampler for parallel MCMC based on independent subsets. The new sampler approximates the full data posterior samples via combining the posterior draws from independent subset MCMC chain… ▽ More With the rapidly growing scales of statistical problems, subset based communication-free parallel MCMC methods are a promising future for large scale Bayesian analysis. In this article, we propose a new Weierstrass sampler for parallel MCMC based on independent subsets. The new sampler approximates the full data posterior samples via combining the posterior draws from independent subset MCMC chains, and thus enjoys a higher computational efficiency. We show that the approximation error for the Weierstrass sampler is bounded by some tuning parameters and provide suggestions for choice of the values. Simulation study shows the Weierstrass sampler is very competitive compared to other methods for combining MCMC chains generated for subsets, including averaging and kernel smoothing. △ Less

Submitted 25 May, 2014; v1 submitted 16 December, 2013; originally announced December 2013.

Comments: The original Algorithm 1 removed. Provided some theoretical justification for refinement sampling (Theorem 2). Added a new algorithm in addition to the rejection sampling for handling dimensionality curse. New simulations and graphs (with new colors and designs). A real data analysis is also provided

arXiv:1312.1099 [pdf, other]

Multiscale Dictionary Learning for Estimating Conditional Distributions

Authors: Francesca Petralia, Joshua Vogelstein, David B. Dunson

Abstract: Nonparametric estimation of the conditional distribution of a response given high-dimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a co… ▽ More Nonparametric estimation of the conditional distribution of a response given high-dimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efficiently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features. △ Less

Submitted 4 December, 2013; originally announced December 2013.

Journal ref: Proceeding of Neural Information Processing Systems, Lake Tahoe, Nevada December 2013

arXiv:1304.7230 [pdf, other]

Learning Densities Conditional on Many Interacting Features

Authors: David C. Kessler, Jack Taylor, David B. Dunson

Abstract: Learning a distribution conditional on a set of discrete-valued features is a commonly encountered task. This becomes more challenging with a high-dimensional feature set when there is the possibility of interaction between the features. In addition, many frequently applied techniques consider only prediction of the mean, but the complete conditional density is needed to answer more complex questi… ▽ More Learning a distribution conditional on a set of discrete-valued features is a commonly encountered task. This becomes more challenging with a high-dimensional feature set when there is the possibility of interaction between the features. In addition, many frequently applied techniques consider only prediction of the mean, but the complete conditional density is needed to answer more complex questions. We demonstrate a novel nonparametric Bayes method based upon a tensor factorization of feature-dependent weights for Gaussian kernels. The method makes use of multistage feature selection for dimension reduction. The resulting conditional density morphs flexibly with the selected features. △ Less

Submitted 29 April, 2013; v1 submitted 26 April, 2013; originally announced April 2013.

arXiv:1303.0642 [pdf, other]

Bayesian Compressed Regression

Authors: Rajarshi Guhaniyogi, David B. Dunson

Abstract: As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimension… ▽ More As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimensionality reduction approaches, the exact posterior distribution conditional on the compressed data is available analytically, speeding up computation by many orders of magnitude while also bypassing robustness issues due to convergence and mixing problems with MCMC. Model averaging is used to reduce sensitivity to the random projection matrix, while accommodating uncertainty in the subspace dimension. Strong theoretical support is provided for the approach by showing near parametric convergence rates for the predictive density in the large p small n asymptotic paradigm. Practical performance relative to competitors is illustrated in simulations and real data applications. △ Less

Submitted 22 March, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

Comments: 29 pages, 4 figures

arXiv:1302.7280 [pdf, other]

doi 10.1093/bioinformatics/btt425

Bayesian Consensus Clustering

Authors: Eric F. Lock, David B. Dunson

Abstract: The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework… ▽ More The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source separately. This work is motivated by the integrated analysis of heterogeneous biomedical data, and we present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Software is available at http://people.duke.edu/~el113/software.html. △ Less

Submitted 28 February, 2013; originally announced February 2013.

Comments: 32 pages, 13 figures

Journal ref: Bioinformatics 29 (2013) 2610-2616

Showing 1–27 of 27 results for author: Dunson, D B