Accurate assignment of disease liability to genetic variants using only population data

Genet Med. 2022 Jan;24(1):87-99. doi: 10.1016/j.gim.2021.08.012. Epub 2021 Nov 30.

Abstract

Purpose: The growing size of public variant repositories prompted us to test the accuracy of pathogenicity prediction of DNA variants using population data alone.

Methods: Under the a priori assumption that the ratio of the prevalence of variants in healthy population vs that in affected populations form 2 distinct distributions (pathogenic and benign), we used a Bayesian method to assign probability to a variant belonging to either distribution.

Results: The approach, termed Bayesian prevalence ratio (BayPR), accurately parsed 300 of 313 expertly curated CFTR variants: 284 of 296 pathogenic/likely pathogenic variants in 1 distribution and 16 of 17 benign/likely benign variants in another. BayPR produced an area under the receiver operating characteristic curve of 0.99 for 103 functionally confirmed missense CFTR variants, which is equal to or exceeds 10 commonly used algorithms (area under the receiver operating characteristic curve range = 0.54-0.99). Application of BayPR to expertly curated variants in 8 genes associated with 7 Mendelian conditions led to the assignment of a disease-causing probability of ≥80% to 1350 of 1374 (98.3%) pathogenic/likely pathogenic variants and of ≤20% to 22 of 23 (95.7%) benign/likely benign variants.

Conclusion: Irrespective of the variant type or functional effect, the BayPR approach provides probabilities of pathogenicity for DNA variants responsible for Mendelian disorders using only the variant counts in affected and unaffected population samples.

Keywords: Bayesian analysis; Population frequency; Prevalence ratio; Variant classification; Variant interpretation.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Humans
  • Mutation, Missense*
  • ROC Curve