Identification of 12 cancer types through genome deep learning

Sci Rep. 2019 Nov 21;9(1):17256. doi: 10.1038/s41598-019-53989-3.

Abstract

Cancer is a major cause of death worldwide, and an early diagnosis is required for a favorable prognosis. Histological examination is the gold standard for cancer identification; however, large amount of inter-observer variability exists in histological diagnosis. Numerous studies have shown cancer genesis is accompanied by an accumulation of harmful mutations, potentiating the identification of cancer based on genomic information. We have proposed a method, GDL (genome deep learning), to study the relationship between genomic variations and traits based on deep neural networks. We analyzed 6,083 samples' WES (Whole Exon Sequencing) mutations files from 12 cancer types obtained from the TCGA (The Cancer Genome Atlas) and 1,991 healthy samples' WES data from the 1000 Genomes project. We constructed 12 specific models to distinguish between certain type of cancer and healthy tissues, a total-specific model that can identify healthy and cancer tissues, and a mixture model to distinguish between all 12 types of cancer based on GDL. We demonstrate that the accuracy of specific, mixture and total specific model are 97.47%, 70.08% and 94.70% for cancer identification. We developed an efficient method for the identification of cancer based on genomic information that offers a new direction for disease diagnosis.

MeSH terms

  • Databases, Genetic
  • Deep Learning
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Mutation / genetics
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Neural Networks, Computer