[Identification of characteristic methylation sites in gastric cancer using genomics-based machine learning]

Zhonghua Bing Li Xue Za Zhi. 2021 Apr 8;50(4):363-368. doi: 10.3760/cma.j.cn112151-20201124-00863.
[Article in Chinese]

Abstract

Objective: To construct a prediction model of gastric cancer related methylation using machine learning algorithms based on genomic data. Methods: The gene mutation data, gene expression data and methylation chip data of gastric cancer were downloaded from The Caner Genome Atlas database, feature selection was conducted, and support vector machine (radial basis function), random forest and error back propagation (BP) neural network models were constructed; the model was verified in the new data set. Results: Among the three machine learning models, BP neural network had the highest test efficiency (F1 score=0.89,Kappa=0.66, area under curve=0.93). Conclusion: Machine learning algorithms, particularly BP neural network, can be used to take advantages of the genomic data for discovering molecular markers, and to help identify characteristic methylation sites of gastric cancer.

目的: 基于基因组学的数据,通过机器学习,构建胃癌相关甲基化预测模型。 方法: 下载TCGA(The Cancer Genome Atlas)数据库中胃癌基因突变数据、基因表达数据和甲基化芯片数据,进行特征选择,构建支持向量机(径向基核函数)、随机森林和误差反向传播(error back propagation,BP)神经网络模型,并在新的数据集中进行模型的验证。 结果: 在3个模型中BP神经网络的检验效能最高(F1 值=0.89,Kappa=0.66,受试者工作特征曲线下面积=0.93)。 结论: BP神经网络能够充分利用分子检测的基因组数据进行机器学习,可以用于胃癌相关甲基化预测。.

MeSH terms

  • Algorithms
  • Genomics
  • Humans
  • Machine Learning
  • Methylation
  • Neural Networks, Computer
  • Stomach Neoplasms* / genetics