Zum Hauptinhalt springen

Showing 1–27 of 27 results for author: Dunson, D B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00778  [pdf, other

    stat.ML cs.AI cs.LG stat.CO stat.ME

    Bayesian Joint Additive Factor Models for Multiview Learning

    Authors: Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick

    Abstract: It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    MSC Class: 62F15

  2. arXiv:2312.13484  [pdf, other

    stat.ML cs.LG

    Bayesian Transfer Learning

    Authors: Piotr M. Suder, Jason Xu, David B. Dunson

    Abstract: Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  3. arXiv:2311.14829   

    cs.CE cs.CV

    Proximal Algorithms for Accelerated Langevin Dynamics

    Authors: Duy H. Thai, Alexander L. Young, David B. Dunson

    Abstract: We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient… ▽ More

    Submitted 28 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: The technical proofs for the paper will be revised

  4. arXiv:2304.11251  [pdf, other

    stat.ML cs.LG

    Machine Learning and the Future of Bayesian Computation

    Authors: Steven Winter, Trevor Campbell, Lizhen Lin, Sanvesh Srivastava, David B. Dunson

    Abstract: Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional model… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  5. arXiv:2304.10630  [pdf, other

    stat.ML cs.LG stat.AP

    Ellipsoid fitting with the Cayley transform

    Authors: Omar Melikechi, David B. Dunson

    Abstract: We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over th… ▽ More

    Submitted 27 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  6. arXiv:2110.07478  [pdf, other

    stat.ML cs.LG

    Inferring Manifolds From Noisy Data Using Gaussian Processes

    Authors: David B Dunson, Nan Wu

    Abstract: In analyzing complex datasets, it is often of interest to infer lower dimensional structure underlying the higher dimensional observations. As a flexible class of nonlinear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower dimensional coordinates without providing an estimate of the manifold in the observation… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 51 pages, 20 figures

  7. arXiv:2010.08908  [pdf, other

    stat.CO cs.LG math.OC

    Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds

    Authors: Lizhen Lin, Bayan Saparbayeva, Michael Minyi Zhang, David B. Dunson

    Abstract: We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficu… ▽ More

    Submitted 17 October, 2020; originally announced October 2020.

  8. arXiv:2001.03988  [pdf, other

    stat.ME cs.LG stat.ML

    Domain Adaptive Bootstrap Aggregating

    Authors: Meimei Liu, David B. Dunson

    Abstract: When there is a distributional shift between data used to train a predictive algorithm and current data, performance can suffer. This is known as the domain adaptation problem. Bootstrap aggregating, or bagging, is a popular method for improving stability of predictive algorithms, while reducing variance and protecting against over-fitting. This article proposes a domain adaptive bagging method co… ▽ More

    Submitted 16 June, 2020; v1 submitted 12 January, 2020; originally announced January 2020.

  9. arXiv:1911.02728  [pdf, other

    stat.ML cs.LG q-bio.NC

    Auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets

    Authors: Meimei Liu, Zhengwu Zhang, David B. Dunson

    Abstract: There has been huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationship with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean natu… ▽ More

    Submitted 13 September, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: 31 pages, 12 figures, 5 tables

  10. arXiv:1904.05850  [pdf, other

    math.ST cs.IT

    Consistent Entropy Estimation for Stationary Time Series

    Authors: Alexander L Young, David B Dunson

    Abstract: Entropy estimation, due in part to its connection with mutual information, has seen considerable use in the study of time series data including causality detection and information flow. In many cases, the entropy is estimated using $k$-nearest neighbor (Kozachenko-Leonenko) based methods. However, analytic results on this estimator are limited to independent data. In the article, we show rigorous… ▽ More

    Submitted 3 August, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: 16 pages, 2 figures

    MSC Class: 62G05; 62G20

  11. Classification via local manifold approximation

    Authors: Didong Li, David B Dunson

    Abstract: Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classi… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

  12. arXiv:1901.00172  [pdf, other

    cs.LG cs.SI stat.ML

    Supervised Multiscale Dimension Reduction for Spatial Interaction Networks

    Authors: Shaobo Han, David B. Dunson

    Abstract: We introduce a multiscale supervised dimension reduction method for SPatial Interaction Network (SPIN) data, which consist of a collection of spatially coordinated interactions. This type of predictor arises when the sampling unit of data is composed of a collection of primitive variables, each of them being essentially unique, so that it becomes necessary to group the variables in order to simpli… ▽ More

    Submitted 8 June, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

    Comments: 30 pages, 12 figures, revised for clarity and conciseness

  13. arXiv:1810.13431  [pdf, other

    stat.ML cs.LG

    Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states

    Authors: Rihui Ou, Deborshee Sen, Alexander L Young, David B Dunson

    Abstract: Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. These approximate the full posterior by using small random subsequences of the data at each MCMC iteration within stochastic gradient MCMC. In the… ▽ More

    Submitted 25 July, 2024; v1 submitted 31 October, 2018; originally announced October 2018.

  14. arXiv:1810.08537  [pdf, other

    stat.ML cs.LG

    Bayesian Distance Clustering

    Authors: Leo L Duan, David B Dunson

    Abstract: Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likel… ▽ More

    Submitted 25 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

  15. arXiv:1803.01203  [pdf, other

    stat.AP cs.LG cs.SI stat.ML

    Multiresolution Tensor Decomposition for Multiple Spatial Passing Networks

    Authors: Shaobo Han, David B. Dunson

    Abstract: This article is motivated by soccer positional passing networks collected across multiple games. We refer to these data as replicated spatial passing networks---to accurately model such data it is necessary to take into account the spatial positions of the passer and receiver for each passing event. This spatial registration and replicates that occur across games represent key differences with usu… ▽ More

    Submitted 3 March, 2018; originally announced March 2018.

    Comments: 34 pages, 15 figures

  16. arXiv:1611.05559  [pdf, other

    stat.ML cs.LG

    Boosting Variational Inference

    Authors: Fangjian Guo, Xiangyu Wang, Kai Fan, Tamara Broderick, David B. Dunson

    Abstract: Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit po… ▽ More

    Submitted 1 March, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: 17 pages, 7 figures

  17. arXiv:1605.05798  [pdf, other

    math.ST cs.CC stat.CO

    MCMC for Imbalanced Categorical Data

    Authors: James E. Johndrow, Aaron Smith, Natesh Pillai, David B. Dunson

    Abstract: Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying dif… ▽ More

    Submitted 26 June, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

    MSC Class: 62

  18. arXiv:1603.05324  [pdf, other

    math.ST cs.LG stat.AP stat.ME

    Fast moment estimation for generalized latent Dirichlet models

    Authors: Shiwen Zhao, Barbara E. Engelhardt, Sayan Mukherjee, David B. Dunson

    Abstract: We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational adv… ▽ More

    Submitted 23 March, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

    Comments: corrected a typo in figure

  19. arXiv:1506.05860  [pdf, ps, other

    stat.ML cs.LG stat.CO

    Variational Gaussian Copula Inference

    Authors: Shaobo Han, Xuejun Liao, David B. Dunson, Lawrence Carin

    Abstract: We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparam… ▽ More

    Submitted 18 May, 2016; v1 submitted 18 June, 2015; originally announced June 2015.

    Comments: Appearing in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume 51

  20. arXiv:1502.06895  [pdf, ps, other

    math.ST cs.LG stat.ML

    On the consistency theory of high dimensional variable screening

    Authors: Xiangyu Wang, Chenlei Leng, David B. Dunson

    Abstract: Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of varia… ▽ More

    Submitted 6 June, 2015; v1 submitted 24 February, 2015; originally announced February 2015.

    Comments: adding comments on REC

  21. arXiv:1403.2660  [pdf, other

    math.ST cs.DC cs.LG

    Robust and Scalable Bayes via a Median of Subset Posterior Measures

    Authors: Stanislav Minsker, Sanvesh Srivastava, Lizhen Lin, David B. Dunson

    Abstract: We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed a… ▽ More

    Submitted 1 June, 2016; v1 submitted 11 March, 2014; originally announced March 2014.

    MSC Class: Primary 62F15; secondary 68W15; 62G35

  22. arXiv:1401.3632  [pdf, other

    stat.ML cs.LG stat.CO

    Bayesian Conditional Density Filtering

    Authors: Shaan Qamar, Rajarshi Guhaniyogi, David B. Dunson

    Abstract: We propose a Conditional Density Filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by propagating surrogate conditional sufficient statistics (a function of data and parameter estimates) as new data arrive. These quantities eliminate the need to store or p… ▽ More

    Submitted 22 September, 2015; v1 submitted 15 January, 2014; originally announced January 2014.

    Comments: 41 pages, 7 figures, 12 tables

  23. arXiv:1312.4605  [pdf, ps, other

    stat.CO cs.DC stat.ML

    Parallelizing MCMC via Weierstrass Sampler

    Authors: Xiangyu Wang, David B. Dunson

    Abstract: With the rapidly growing scales of statistical problems, subset based communication-free parallel MCMC methods are a promising future for large scale Bayesian analysis. In this article, we propose a new Weierstrass sampler for parallel MCMC based on independent subsets. The new sampler approximates the full data posterior samples via combining the posterior draws from independent subset MCMC chain… ▽ More

    Submitted 25 May, 2014; v1 submitted 16 December, 2013; originally announced December 2013.

    Comments: The original Algorithm 1 removed. Provided some theoretical justification for refinement sampling (Theorem 2). Added a new algorithm in addition to the rejection sampling for handling dimensionality curse. New simulations and graphs (with new colors and designs). A real data analysis is also provided

  24. arXiv:1312.1099  [pdf, other

    stat.ML cs.LG

    Multiscale Dictionary Learning for Estimating Conditional Distributions

    Authors: Francesca Petralia, Joshua Vogelstein, David B. Dunson

    Abstract: Nonparametric estimation of the conditional distribution of a response given high-dimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a co… ▽ More

    Submitted 4 December, 2013; originally announced December 2013.

    Journal ref: Proceeding of Neural Information Processing Systems, Lake Tahoe, Nevada December 2013

  25. arXiv:1304.7230  [pdf, other

    stat.ML cs.LG

    Learning Densities Conditional on Many Interacting Features

    Authors: David C. Kessler, Jack Taylor, David B. Dunson

    Abstract: Learning a distribution conditional on a set of discrete-valued features is a commonly encountered task. This becomes more challenging with a high-dimensional feature set when there is the possibility of interaction between the features. In addition, many frequently applied techniques consider only prediction of the mean, but the complete conditional density is needed to answer more complex questi… ▽ More

    Submitted 29 April, 2013; v1 submitted 26 April, 2013; originally announced April 2013.

  26. arXiv:1303.0642  [pdf, other

    stat.ML cs.LG

    Bayesian Compressed Regression

    Authors: Rajarshi Guhaniyogi, David B. Dunson

    Abstract: As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimension… ▽ More

    Submitted 22 March, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

    Comments: 29 pages, 4 figures

  27. Bayesian Consensus Clustering

    Authors: Eric F. Lock, David B. Dunson

    Abstract: The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework… ▽ More

    Submitted 28 February, 2013; originally announced February 2013.

    Comments: 32 pages, 13 figures

    Journal ref: Bioinformatics 29 (2013) 2610-2616