Annotation of Human Exome Gene Variants with Consensus Pathogenicity

Victor Jaravine; James Balmford; Patrick Metzger; Melanie Boerries; Harald Binder; Martin Boeker

doi:10.3390/genes11091076

Annotation of Human Exome Gene Variants with Consensus Pathogenicity

Genes (Basel). 2020 Sep 14;11(9):1076. doi: 10.3390/genes11091076.

Authors

Victor Jaravine¹, James Balmford¹, Patrick Metzger², Melanie Boerries², Harald Binder¹, Martin Boeker¹

Affiliations

¹ Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79104 Freiburg im Breisgau, Germany.
² Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg im Breisgau, Germany.

Abstract

A novel approach is developed to address the challenge of annotating with phenotypic effects those exome variants for which relevant empirical data are lacking or minimal. The predictive annotation method is implemented as a stacked ensemble of supervised base-learners, including distributed random forest and gradient boosting machines. Ensemble models were trained and cross-validated on evidence-based categorical variant effect annotations from the ClinVar database, and were applied to 84 million non-synonymous single nucleotide variants (SNVs). The consensus model combined 39 functional mutation impacts, cross-species conservation score, and gene indispensability score. The indispensability score, accounting for differences in variant pathogenicities including in essential and mutation-tolerant genes, considerably improved the predictions. The consensus combination is consistent with as many input scores as possible while minimizing false predictions. The input scores are ranked based on their ability to predict effects. The score rankings and categorical phenotypic variant effect predictions are aimed for direct use in clinical and biological applications to prioritize human exome variants and mutations.

Keywords: alternative allele frequency (AAF); hit ratio (HR); next generation sequencing (NGS); single-nucleotide variant (SNV); stacked ensemble of supervised learners (SESL); variant effect prediction (VEP); variant of unknown significance (VUS).

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Disease / genetics*
Exome Sequencing
Exome*
Genome, Human
High-Throughput Nucleotide Sequencing
Humans
Molecular Sequence Annotation*
Mutation*
Polymorphism, Single Nucleotide*
Software