PredCID: prediction of driver frameshift indels in human cancer

Brief Bioinform. 2021 May 20;22(3):bbaa119. doi: 10.1093/bib/bbaa119.

Abstract

The discrimination of driver from passenger mutations has been a hot topic in the field of cancer biology. Although recent advances have improved the identification of driver mutations in cancer genomic research, there is no computational method specific for the cancer frameshift indels (insertions or/and deletions) yet. In addition, existing pathogenic frameshift indel predictors may suffer from plenty of missing values because of different choices of transcripts during the variant annotation processes. In this study, we proposed a computational model, called PredCID (Predictor for Cancer driver frameshift InDels), for accurately predicting cancer driver frameshift indels. Gene, DNA, transcript and protein level features are combined together and selected for classification with eXtreme Gradient Boosting classifier. Benchmarking results on the cross-validation dataset and independent dataset showed that PredCID achieves better and robust performance compared with existing noncancer-specific methods in distinguishing cancer driver frameshift indels from passengers and is therefore a valuable method for deeper understanding of frameshift indels in human cancer. PredCID is freely available for academic research at http://bioinfo.ahu.edu.cn:8080/PredCID.

Keywords: cancer; driver mutation; frameshift indel; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Frameshift Mutation*
  • Genes, Neoplasm*
  • Humans
  • INDEL Mutation*
  • Neoplasm Proteins / genetics*
  • Neoplasms / genetics*
  • Software*

Substances

  • Neoplasm Proteins