Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects

Leena Choi; Robert J Carroll; Cole Beck; Jonathan D Mosley; Dan M Roden; Joshua C Denny; Sara L Van Driest

doi:10.1093/bioinformatics/bty306

Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects

Bioinformatics. 2018 Sep 1;34(17):2988-2996. doi: 10.1093/bioinformatics/bty306.

Authors

Leena Choi¹, Robert J Carroll², Cole Beck¹, Jonathan D Mosley³, Dan M Roden^{2

3

4}, Joshua C Denny^{2

3}, Sara L Van Driest^{3

5}

Affiliations

¹ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
² Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁴ Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA.
⁵ Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.

Abstract

Motivation: Phenome-wide association studies (PheWAS) have been used to discover many genotype-phenotype relationships and have the potential to identify therapeutic and adverse drug outcomes using longitudinal data within electronic health records (EHRs). However, the statistical methods for PheWAS applied to longitudinal EHR medication data have not been established.

Results: In this study, we developed methods to address two challenges faced with reuse of EHR for this purpose: confounding by indication, and low exposure and event rates. We used Monte Carlo simulation to assess propensity score (PS) methods, focusing on two of the most commonly used methods, PS matching and PS adjustment, to address confounding by indication. We also compared two logistic regression approaches (the default of Wald versus Firth's penalized maximum likelihood, PML) to address complete separation due to sparse data with low exposure and event rates. PS adjustment resulted in greater power than PS matching, while controlling Type I error at 0.05. The PML method provided reasonable P-values, even in cases with complete separation, with well controlled Type I error rates. Using PS adjustment and the PML method, we identify novel latent drug effects in pediatric patients exposed to two common antibiotic drugs, ampicillin and gentamicin.

Availability and implementation: R packages PheWAS and EHR are available at https://github.com/PheWAS/PheWAS and at CRAN (https://www.r-project.org/), respectively. The R script for data processing and the main analysis is available at https://github.com/choileena/EHR.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Datasets as Topic*
Drug Discovery
Drug-Related Side Effects and Adverse Reactions
Electronic Health Records
Humans
Logistic Models
Probability

Abstract

Publication types

MeSH terms

Grants and funding