Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies

PLoS Genet. 2013;9(1):e1003143. doi: 10.1371/journal.pgen.1003143. Epub 2013 Jan 17.

Abstract

Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ~22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution / genetics*
  • Consanguinity
  • Exome
  • Gene Frequency / genetics*
  • Genes, Recessive
  • Genetic Predisposition to Disease*
  • Humans
  • Models, Genetic
  • Mutation
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Analysis, DNA
  • Software

Grants and funding

This work was funded by Hong Kong Research Grants Council GRF HKU 768610M, HKU 776412M, and HKU 777511M; Hong Kong Research Grants Council Theme-Based Research Scheme T12-705/11; European Community Seventh Framework Programme Grant on European Network of National Schizophrenia Networks Studying Gene-Environment Interactions (EU-GEI); the Small Project Funding HKU 201007176166 and 201109176063; and The University of Hong Kong Strategic Research Theme on Genomics. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.