Zum Hauptinhalt springen

Showing 1–40 of 40 results for author: Engelhardt, B E

.
  1. arXiv:2401.03106  [pdf, other

    stat.ME

    Contrastive linear regression

    Authors: Boyang Zhang, Sarah Nyquist, Andrew Jones, Barbara E. Engelhardt, Didong Li

    Abstract: Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the una… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  2. arXiv:2311.09483  [pdf, other

    cs.LG cs.AI

    Adaptive Interventions with User-Defined Goals for Health Behavior Change

    Authors: Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill

    Abstract: Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize intervention… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 5 pages Full paper to be presented at Conference on Health Inference and Learning (CHIL) 2024, June 27th, 2024, New York City, United States, 11 pages

  3. arXiv:2310.09482  [pdf, other

    stat.AP q-bio.CB q-bio.TO

    Answering open questions in biology using spatial genomics and structured methods

    Authors: Siddhartha G Jena, Archit Verma, Barbara E Engelhardt

    Abstract: Genomics methods have uncovered patterns in a range of biological systems, but obscure important aspects of cell behavior: the shape, relative locations of, movement of, and interactions between cells in space. Spatial technologies that collect genomic or epigenomic data while preserving spatial information have begun to overcome these limitations. These new data promise a deeper understanding of… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  4. arXiv:2306.08352  [pdf, other

    stat.ML cs.AI cs.LG

    Bayesian Non-linear Latent Variable Modeling via Random Fourier Features

    Authors: Michael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt

    Abstract: The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to over… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  5. arXiv:2303.06827  [pdf, other

    cs.LG cs.AI

    Kernel Density Bayesian Inverse Reinforcement Learning

    Authors: Aishwarya Mandyam, Didong Li, Diana Cai, Andrew Jones, Barbara E. Engelhardt

    Abstract: Inverse reinforcement learning~(IRL) is a powerful framework to infer an agent's reward function by observing its behavior, but IRL algorithms that learn point estimates of the reward function can be misleading because there may be several functions that describe an agent's behavior equally well. A Bayesian approach to IRL models a distribution over candidate reward functions, alleviating the shor… ▽ More

    Submitted 12 October, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

  6. arXiv:2110.08411  [pdf, other

    stat.ME stat.AP

    Multi-group Gaussian Processes

    Authors: Didong Li, Andrew Jones, Sudipto Banerjee, Barbara E. Engelhardt

    Abstract: Gaussian processes (GPs) are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Modern scientific data sets are typically heterogeneous and often contain multiple known discrete subgroups of samples. For example, in genomics applications samples may be grouped according to tissue type or drug exposure. In the modeling process it is de… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  7. arXiv:2110.07064  [pdf, other

    cs.LG stat.ME

    Variance Minimization in the Wasserstein Space for Invariant Causal Prediction

    Authors: Guillaume Martinet, Alexander Strzalkowski, Barbara E. Engelhardt

    Abstract: Selecting powerful predictors for an outcome is a cornerstone task for machine learning. However, some types of questions can only be answered by identifying the predictors that causally affect the outcome. A recent approach to this causal inference problem leverages the invariance property of a causal mechanism across differing experimental environments (Peters et al., 2016; Heinze-Deml et al., 2… ▽ More

    Submitted 27 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  8. arXiv:2110.06122  [pdf, other

    stat.ME cs.LG q-bio.GN q-bio.QM

    Nonnegative spatial factorization

    Authors: F. William Townes, Barbara E. Engelhardt

    Abstract: Gaussian processes are widely used for the analysis of spatial data due to their nonparametric flexibility and ability to quantify uncertainty, and recently developed scalable approximations have facilitated application to massive datasets. For multivariate outcomes, linear models of coregionalization combine dimension reduction with spatial correlation. However, their real-valued latent factors a… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  9. arXiv:2103.14224  [pdf, other

    stat.ML cs.LG

    Active multi-fidelity Bayesian online changepoint detection

    Authors: Gregory W. Gundersen, Diana Cai, Chuteng Zhou, Barbara E. Engelhardt, Ryan P. Adams

    Abstract: Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement aff… ▽ More

    Submitted 25 July, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: 37th Conference on Uncertainty in Artificial Intelligence

  10. arXiv:2102.06731  [pdf, other

    stat.ME q-bio.GN q-bio.QM

    Contrastive latent variable modeling with application to case-control sequencing experiments

    Authors: Andrew Jones, F. William Townes, Didong Li, Barbara E. Engelhardt

    Abstract: High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often it is of interest to quantify and summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using univariate tests to measure gene-wise shifts in expression. However, these methods largely ignore changes in tr… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  11. arXiv:2006.11145  [pdf, other

    stat.ML cs.LG stat.ME

    Latent variable modeling with random features

    Authors: Gregory W. Gundersen, Michael Minyi Zhang, Barbara E. Engelhardt

    Abstract: Gaussian process-based latent variable models are flexible and theoretically grounded tools for nonlinear dimension reduction, but generalizing to non-Gaussian data likelihoods within this nonlinear framework is statistically challenging. Here, we use random features to develop a family of nonlinear dimension reduction models that are easily extensible to non-Gaussian data likelihoods; we call the… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 21 pages, 7 figures

  12. arXiv:2003.07718  [pdf, other

    cs.LG stat.ML

    Nonparametric Deconvolution Models

    Authors: Allison J. B. Chaney, Archit Verma, Young-suk Lee, Barbara E. Engelhardt

    Abstract: We describe nonparametric deconvolution models (NDMs), a family of Bayesian nonparametric models for collections of data in which each observation is the average over the features from heterogeneous particles. For example, these types of data are found in elections, where we observe precinct-level vote tallies (observations) of individual citizens' votes (particles) across each of the candidates o… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 33 pages, 11 figures

    ACM Class: I.5.1

  13. arXiv:1911.07099  [pdf, other

    stat.ME q-bio.QM

    Bayesian Ordinal Quantile Regression with a Partially Collapsed Gibbs Sampler

    Authors: Isabella N Grabski, Roberta De Vito, Barbara E Engelhardt

    Abstract: Unlike standard linear regression, quantile regression captures the relationship between covariates and the conditional response distribution as a whole, rather than only the relationship between covariates and the expected value of the conditional response. However, while there are well-established quantile regression methods for continuous variables and some forms of discrete data, there is no w… ▽ More

    Submitted 16 November, 2019; originally announced November 2019.

  14. arXiv:1910.05355  [pdf, other

    stat.AP

    Nonparametric Bayesian multi-armed bandits for single cell experiment design

    Authors: Federico Camerlenghi, Bianca Dumitrascu, Federico Ferrari, Barbara E. Engelhardt, Stefano Favaro

    Abstract: The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA… ▽ More

    Submitted 20 September, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

  15. arXiv:1906.00226  [pdf, ps, other

    stat.ML cs.LG

    Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes

    Authors: Li-Fang Cheng, Bianca Dumitrascu, Michael Zhang, Corey Chivers, Michael Draugelis, Kai Li, Barbara E. Engelhardt

    Abstract: Multi-output Gaussian processes (GPs) are a flexible Bayesian nonparametric framework that has proven useful in jointly modeling the physiological states of patients in medical time series data. However, capturing the short-term effects of drugs and therapeutic interventions on patient physiological state remains challenging. We propose a novel approach that models the effect of interventions as a… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

  16. arXiv:1905.13167  [pdf, other

    cs.LG stat.ML

    Defining Admissible Rewards for High Confidence Policy Evaluation

    Authors: Niranjani Prasad, Barbara E Engelhardt, Finale Doshi-Velez

    Abstract: A key impediment to reinforcement learning (RL) in real applications with limited, batch data is defining a reward function that reflects what we implicitly know about reasonable behaviour for a task and allows for robust off-policy evaluation. In this work, we develop a method to identify an admissible set of reward functions for policies that (a) do not diverge too far from past behaviour, and (… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  17. Sequential Gaussian Processes for Online Learning of Nonstationary Functions

    Authors: Michael Minyi Zhang, Bianca Dumitrascu, Sinead A. Williamson, Barbara E. Engelhardt

    Abstract: Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several… ▽ More

    Submitted 6 May, 2023; v1 submitted 23 May, 2019; originally announced May 2019.

    Journal ref: IEEE Transactions on Signal Processing, vol. 71, pp. 1539-1550, 2023

  18. arXiv:1808.04679  [pdf, other

    cs.AI stat.AP

    An Optimal Policy for Patient Laboratory Tests in Intensive Care Units

    Authors: Li-Fang Cheng, Niranjani Prasad, Barbara E Engelhardt

    Abstract: Laboratory testing is an integral tool in the management of patient care in hospitals, particularly in intensive care units (ICUs). There exists an inherent trade-off in the selection and timing of lab tests between considerations of the expected utility in clinical decision-making of a given test at a specific time, and the associated cost or risk it poses to the patient. In this work, we introdu… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

    Comments: The first two authors contributed equally to this work. Preprint of an article submitted for consideration in Pacific Symposium on Biocomputing copyright 2018 [copyright World Scientific Publishing Company] [https://psb.stanford.edu/]

  19. arXiv:1805.07458  [pdf, other

    stat.ML cs.LG

    PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

    Authors: Bianca Dumitrascu, Karen Feng, Barbara E Engelhardt

    Abstract: We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal per… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

  20. arXiv:1710.11214  [pdf, other

    cs.CY cs.LG stat.ML

    How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility

    Authors: Allison J. B. Chaney, Brandon M. Stewart, Barbara E. Engelhardt

    Abstract: Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confoun… ▽ More

    Submitted 26 November, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

  21. arXiv:1705.10813  [pdf, other

    stat.ML

    Large Linear Multi-output Gaussian Process Learning

    Authors: Vladimir Feinberg, Li-Fang Cheng, Kai Li, Barbara E Engelhardt

    Abstract: Gaussian processes (GPs), or distributions over arbitrary functions in a continuous domain, can be generalized to the multi-output case: a linear model of coregionalization (LMC) is one approach. LMCs estimate and exploit correlations across the multiple outputs. While model estimation can be performed efficiently for single-output GPs, these assume stationarity, but in the multi-output case the c… ▽ More

    Submitted 23 October, 2017; v1 submitted 30 May, 2017; originally announced May 2017.

    Comments: 9 pages, 4 figures, 4 tables

  22. arXiv:1704.06300  [pdf, other

    cs.AI

    A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units

    Authors: Niranjani Prasad, Li-Fang Cheng, Corey Chivers, Michael Draugelis, Barbara E Engelhardt

    Abstract: The management of invasive mechanical ventilation, and the regulation of sedation and analgesia during ventilation, constitutes a major part of the care of patients admitted to intensive care units. Both prolonged dependence on mechanical ventilation and premature extubation are associated with increased risk of complications and higher hospital costs, but clinical opinion on the best protocol for… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

  23. arXiv:1703.09112  [pdf, other

    stat.ML

    Sparse Multi-Output Gaussian Processes for Medical Time Series Prediction

    Authors: Li-Fang Cheng, Gregory Darnell, Bianca Dumitrascu, Corey Chivers, Michael E Draugelis, Kai Li, Barbara E Engelhardt

    Abstract: In the scenario of real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab tests is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accura… ▽ More

    Submitted 21 June, 2018; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: Add new results and appendices

  24. BIISQ: Bayesian nonparametric discovery of Isoforms and Individual Specific Quantification

    Authors: Derek Aguiar, Li-Fang Cheng, Bianca Dumitrascu, Fantine Mordelet, Athma A Pai, Barbara E Engelhardt

    Abstract: Most human protein-coding genes can be transcribed into multiple possible distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and exist in tissue- and sample-… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.

  25. arXiv:1701.02058  [pdf, other

    cs.LG cs.AI stat.ML

    Coupled Compound Poisson Factorization

    Authors: Mehmet E. Basbug, Barbara E. Engelhardt

    Abstract: We present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model. We derive a stochastic variational inference algorithm for the resulting model and, as examples of our framework, implement three different data-generating mod… ▽ More

    Submitted 8 January, 2017; originally announced January 2017.

    Comments: Under review at AISTATS 2017

  26. arXiv:1608.04839  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamic Collaborative Filtering with Compound Poisson Factorization

    Authors: Ghassen Jerfel, Mehmet E. Basbug, Barbara E. Engelhardt

    Abstract: Model-based collaborative filtering analyzes user-item interactions to infer latent factors that represent user preferences and item characteristics in order to predict future interactions. Most collaborative filtering algorithms assume that these latent factors are static, although it has been shown that user preferences and item perceptions drift over time. In this paper, we propose a conjugate… ▽ More

    Submitted 1 November, 2016; v1 submitted 16 August, 2016; originally announced August 2016.

  27. arXiv:1604.03853  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Compound Poisson Factorization

    Authors: Mehmet E. Basbug, Barbara E. Engelhardt

    Abstract: Non-negative matrix factorization models based on a hierarchical Gamma-Poisson structure capture user and item behavior effectively in extremely sparse data sets, making them the ideal choice for collaborative filtering applications. Hierarchical Poisson factorization (HPF) in particular has proved successful for scalable recommendation systems with extreme sparsity. HPF, however, suffers from a t… ▽ More

    Submitted 26 May, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

    Comments: Will appear on Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48

  28. arXiv:1603.05324  [pdf, other

    math.ST cs.LG stat.AP stat.ME

    Fast moment estimation for generalized latent Dirichlet models

    Authors: Shiwen Zhao, Barbara E. Engelhardt, Sayan Mukherjee, David B. Dunson

    Abstract: We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational adv… ▽ More

    Submitted 23 March, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

    Comments: corrected a typo in figure

  29. arXiv:1602.04889  [pdf, other

    cs.LG cs.AI

    Unsupervised Domain Adaptation Using Approximate Label Matching

    Authors: Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt

    Abstract: Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to lear… ▽ More

    Submitted 1 March, 2017; v1 submitted 15 February, 2016; originally announced February 2016.

  30. arXiv:1512.02306  [pdf, other

    stat.AP q-bio.GN stat.ML

    Nonparametric Reduced-Rank Regression for Multi-SNP, Multi-Trait Association Mapping

    Authors: Ashlee Valente, Geoffrey Ginsburg, Barbara E Engelhardt

    Abstract: Genome-wide association studies have proven to be essential for understanding the genetic basis of disease. However, many complex traits---personality traits, facial features, disease subtyping---are inherently high-dimensional, impeding simple approaches to association mapping. We developed a nonparametric Bayesian reduced rank regression model for multi-SNP, multi-trait association mapping that… ▽ More

    Submitted 7 December, 2015; originally announced December 2015.

  31. arXiv:1512.01616  [pdf, other

    q-bio.QM q-bio.GN stat.ME

    A Bayesian test to identify variance effects

    Authors: Bianca Dumitrascu, Gregory Darnell, Julien Ayroles, Barbara E Engelhardt

    Abstract: Identifying genetic variants that regulate quantitative traits, or QTLs, is the primary focus of the field of statistical genetics. Most current methods are limited to identifying mean effects, or associations between genotype and the mean value of a quantitative trait. It is possible, however, that a genetic variant may affect the variance of the quantitative trait in lieu of, or in addition to,… ▽ More

    Submitted 4 December, 2015; originally announced December 2015.

  32. arXiv:1504.03183  [pdf, other

    stat.ML q-bio.QM

    Adaptive Randomized Dimension Reduction on Massive Data

    Authors: Gregory Darnell, Stoyan Georgiev, Sayan Mukherjee, Barbara E Engelhardt

    Abstract: The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational an… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

    Comments: arXiv admin note: substantial text overlap with arXiv:1211.1642

  33. arXiv:1411.2698  [pdf, other

    stat.ME q-bio.QM stat.ML

    Bayesian group latent factor analysis with structured sparsity

    Authors: Shiwen Zhao, Chuan Gao, Sayan Mukherjee, Barbara E Engelhardt

    Abstract: Latent factor models are the canonical statistical tool for exploratory analyses of low-dimensional linear structure for an observation matrix with p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlation… ▽ More

    Submitted 11 November, 2015; v1 submitted 10 November, 2014; originally announced November 2014.

  34. arXiv:1411.1997  [pdf, other

    stat.ME q-bio.GN q-bio.MN stat.ML

    Differential gene co-expression networks via Bayesian biclustering models

    Authors: Chuan Gao, Shiwen Zhao, Ian C. McDowell, Christopher D. Brown, Barbara E. Engelhardt

    Abstract: Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regul… ▽ More

    Submitted 7 November, 2014; originally announced November 2014.

  35. arXiv:1407.2235  [pdf, other

    stat.ME q-bio.GN

    Bayesian Structured Sparsity from Gaussian Fields

    Authors: Barbara E. Engelhardt, Ryan P. Adams

    Abstract: Substantial research on structured sparsity has contributed to analysis of many different applications. However, there have been few Bayesian procedures among this work. Here, we develop a Bayesian model for structured sparsity that uses a Gaussian process (GP) to share parameters of the sparsity-inducing prior in proportion to feature similarity as defined by an arbitrary positive definite kernel… ▽ More

    Submitted 8 July, 2014; originally announced July 2014.

    Comments: 23 pages, 7 figures

  36. Expandable Factor Analysis

    Authors: Sanvesh Srivastava, Barbara E. Engelhardt, David B. Dunson

    Abstract: Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor models when the number of factors is unknown. The method relies on a continuous shrinkage prior for efficient maximum a posteriori estimation of… ▽ More

    Submitted 19 June, 2018; v1 submitted 4 July, 2014; originally announced July 2014.

    Comments: 28 pages, 4 figures

    Journal ref: Biometrika. vol. 104. number 3. pp. 649-663. 2017

  37. arXiv:1407.0050  [pdf, other

    stat.ME q-bio.GN q-bio.PE stat.AP

    Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure

    Authors: David Mimno, David M Blei, Barbara E Engelhardt

    Abstract: Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (P… ▽ More

    Submitted 30 June, 2014; originally announced July 2014.

  38. arXiv:1310.4792  [pdf, other

    stat.AP q-bio.GN

    A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects

    Authors: Chuan Gao, Christopher D Brown, Barbara E Engelhardt

    Abstract: One important problem in genome science is to determine sets of co-regulated genes based on measurements of gene expression levels across samples, where the quantification of expression levels includes substantial technical and biological noise. To address this problem, we developed a Bayesian sparse latent factor model that uses a three parameter beta prior to flexibly model shrinkage in the load… ▽ More

    Submitted 17 October, 2013; originally announced October 2013.

  39. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements

    Authors: Weiwei Zhang, Tim D Spector, Panos Deloukas, Jordana T Bell, Barbara E Engelhardt

    Abstract: Background: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is important, but current approaches tackle average methylation within a genomic locus and are often limited to specific genomic regions. Re… ▽ More

    Submitted 9 August, 2013; originally announced August 2013.

  40. Integrative modeling of eQTLs and cis-regulatory elements suggest mechanisms underlying cell type specificity of eQTLs

    Authors: Christopher D Brown, Lara M Mangravite, Barbara E Engelhardt

    Abstract: Genetic variants in cis-regulatory elements or trans-acting regulators commonly influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associat… ▽ More

    Submitted 11 October, 2012; originally announced October 2012.

    Comments: 25 pages, 7 figures, 3 tables