Using Machine Learning for Predicting the Effect of Mutations in the Initiation Codon

IEEE J Biomed Health Inform. 2022 Nov;26(11):5750-5756. doi: 10.1109/JBHI.2022.3200966. Epub 2022 Nov 10.

Abstract

The effect of mutations has been traditionally predicted by studying what may happen due to the substitution of one amino acid for another one. This approach may be effective for mutations with impact in the function of the protein, but ineffective for mutations in the translation initiation codon. Such mutation might avoid the generation of the protein. Consequently, specific methods for predicting the effect of mutations in the translation initiation codon are needed. We propose a method for predicting the effect of mutations in the canonical translation initiation codon based on a biological model that considers specific features of such mutations, like the distance to a potential alternative initiation codon. Our predictor has been developed using tree-based machine learning algorithms and data extracted from Ensembl. Our final model is able to detect whether a mutation in the canonical initiation codon is deleterious or benign with a precision of 44.28% and an accuracy of 98.32%, which improves the results of state of the art tools such as PolyPhen, SIFT, or CADD for this type of mutation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon / genetics
  • Codon, Initiator
  • Humans
  • Machine Learning*
  • Mutation / genetics

Substances

  • Codon, Initiator
  • Codon