Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome

PLoS Comput Biol. 2019 Jun 14;15(6):e1007112. doi: 10.1371/journal.pcbi.1007112. eCollection 2019 Jun.

Abstract

Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Autism Spectrum Disorder / genetics
  • Autism Spectrum Disorder / physiopathology
  • Computational Biology
  • Databases, Genetic
  • Genetic Predisposition to Disease / genetics*
  • Genome, Human* / genetics
  • Genome, Human* / physiology
  • Humans
  • INDEL Mutation* / genetics
  • INDEL Mutation* / physiology
  • Machine Learning
  • ROC Curve