Background: Although several methods have been proposed for predicting the effects of genetic variants and their role in disease, it is still a challenge to identify and prioritize pathogenic variants within sequencing studies.
Methods: Here, we compare different variant and gene-specific features as well as existing methods and investigate their best combination to explore potential performance gains.
Results: We found that combining the number of "biological process" Gene Ontology annotations of a gene with the methods PON-P2, and PROVEAN significantly improves prediction of pathogenic variants, outperforming all individual methods. A comprehensive analysis of the Gene Ontology feature suggests that it is not a variant-dependent annotation bias but reflects the multifunctional nature of disease genes. Furthermore, we identified a set of difficult variants where different prediction methods fail.
Conclusion: Existing pathogenicity prediction methods can be further improved.
Keywords: Feature analysis; GO annotation bias; feature combination; gene features; pathogenic variant prediction; variant features.