In silico saturation mutagenesis of cancer genes

Nature. 2021 Aug;596(7872):428-432. doi: 10.1038/s41586-021-03771-1. Epub 2021 Jul 28.

Abstract

Despite the existence of good catalogues of cancer genes1,2, identifying the specific mutations of those genes that drive tumorigenesis across tumour types is still a largely unsolved problem. As a result, most mutations identified in cancer genes across tumours are of unknown significance to tumorigenesis3. We propose that the mutations observed in thousands of tumours-natural experiments testing their oncogenic potential replicated across individuals and tissues-can be exploited to solve this problem. From these mutations, features that describe the mechanism of tumorigenesis of each cancer gene and tissue may be computed and used to build machine learning models that encapsulate these mechanisms. Here we demonstrate the feasibility of this solution by building and validating 185 gene-tissue-specific machine learning models that outperform experimental saturation mutagenesis in the identification of driver and passenger mutations. The models and their assessment of each mutation are designed to be interpretable, thus avoiding a black-box prediction device. Using these models, we outline the blueprints of potential driver mutations in cancer genes, and demonstrate the role of mutation probability in shaping the landscape of observed driver mutations. These blueprints will support the interpretation of newly sequenced tumours in patients and the study of the mechanisms of tumorigenesis of cancer genes across tissues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Transformation, Neoplastic / genetics
  • Computer Simulation*
  • Humans
  • Machine Learning*
  • Models, Genetic
  • Mutagenesis*
  • Mutation*
  • Neoplasms / genetics*
  • Oncogenes / genetics*
  • Organ Specificity / genetics
  • Precision Medicine
  • Probability
  • Reproducibility of Results