RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis

Brennan H Baker; Sheela Sathyanarayana; Adam A Szpiro; James W MacDonald; Alison G Paquette

doi:10.1186/s13059-024-03376-7

RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis

Genome Biol. 2024 Sep 3;25(1):236. doi: 10.1186/s13059-024-03376-7.

Authors

Brennan H Baker^{1

2}, Sheela Sathyanarayana^{3

4

5

6}, Adam A Szpiro^#⁷, James W MacDonald^#³, Alison G Paquette^#^{3

5

8}

Affiliations

¹ Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA. [email protected].
² Center for Child Health, Behavior, and Development, Seattle Children's Research Institute, Seattle, WA, USA. [email protected].
³ Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA.
⁴ Center for Child Health, Behavior, and Development, Seattle Children's Research Institute, Seattle, WA, USA.
⁵ Department of Pediatrics, University of Washington, Seattle, WA, USA.
⁶ Department of Epidemiology, University of Washington, Seattle, WA, USA.
⁷ Department of Biostatistics, University of Washington, Seattle, WA, USA.
⁸ Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA.

^# Contributed equally.

Abstract

Missing covariate data is a common problem that has not been addressed in observational studies of gene expression. Here, we present a multiple imputation method that accommodates high dimensional gene expression data by incorporating principal component analysis of the transcriptome into the multiple imputation prediction models to avoid bias. Simulation studies using three datasets show that this method outperforms complete case and single imputation analyses at uncovering true positive differentially expressed genes, limiting false discovery rates, and minimizing bias. This method is easily implemented via an R Bioconductor package, RNAseqCovarImpute that integrates with the limma-voom pipeline for differential expression analysis.

Keywords: Differential expression analysis; Gene expression; Missing data; Multiple imputation; RNA-sequencing.

MeSH terms

Gene Expression Profiling / methods
Humans
Principal Component Analysis
Sequence Analysis, RNA / methods
Software*
Transcriptome