Computational assessment of feature combinations for pathogenic variant prediction

Mol Genet Genomic Med. 2016 Mar 14;4(4):431-46. doi: 10.1002/mgg3.214. eCollection 2016 Jul.

Abstract

Background: Although several methods have been proposed for predicting the effects of genetic variants and their role in disease, it is still a challenge to identify and prioritize pathogenic variants within sequencing studies.

Methods: Here, we compare different variant and gene-specific features as well as existing methods and investigate their best combination to explore potential performance gains.

Results: We found that combining the number of "biological process" Gene Ontology annotations of a gene with the methods PON-P2, and PROVEAN significantly improves prediction of pathogenic variants, outperforming all individual methods. A comprehensive analysis of the Gene Ontology feature suggests that it is not a variant-dependent annotation bias but reflects the multifunctional nature of disease genes. Furthermore, we identified a set of difficult variants where different prediction methods fail.

Conclusion: Existing pathogenicity prediction methods can be further improved.

Keywords: Feature analysis; GO annotation bias; feature combination; gene features; pathogenic variant prediction; variant features.