Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

Mol Syst Biol. 2020 Jul;16(7):e9380. doi: 10.15252/msb.20199380.

Abstract

To deal with the huge number of novel protein-coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top-ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.

Keywords: missense mutations; phenotype prediction; protein structure; saturation mutagenesis; variant effect.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution / genetics*
  • Benchmarking
  • Computational Biology / methods*
  • Correlation of Data
  • Crystallography, X-Ray
  • Databases, Genetic
  • Databases, Protein
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Genetic Predisposition to Disease*
  • Genotype
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mutation, Missense
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Protein Domains / genetics
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism*
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism
  • Software

Substances

  • Proteins

Associated data

  • figshare/10.6084/m9.figshare.12369359.v1
  • figshare/10.6084/m9.figshare.12369452.v1