Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning

Protein Sci. 2020 Jan;29(1):247-257. doi: 10.1002/pro.3774. Epub 2019 Nov 25.

Abstract

Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE: Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.

Keywords: amino acid substitution probabilities; drug resistance; genetic disorders; machine learning; mutations; protein stability and interactions; protein structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Drug Resistance
  • Genetic Predisposition to Disease
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Machine Learning
  • Models, Molecular
  • Mutation*
  • Protein Binding
  • Protein Conformation
  • Protein Stability
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism*
  • Sequence Analysis, DNA
  • Software

Substances

  • Proteins