Analysis of genomic and transcriptomic variations as prognostic signature for lung adenocarcinoma

BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):368. doi: 10.1186/s12859-020-03691-3.

Abstract

Background: Lung cancer is the leading cause of the largest number of deaths worldwide and lung adenocarcinoma is the most common form of lung cancer. In order to understand the molecular basis of lung adenocarcinoma, integrative analysis have been performed by using genomics, transcriptomics, epigenomics, proteomics and clinical data. Besides, molecular prognostic signatures have been generated for lung adenocarcinoma by using gene expression levels in tumor samples. However, we need signatures including different types of molecular data, even cohort or patient-based biomarkers which are the candidates of molecular targeting.

Results: We built an R pipeline to carry out an integrated meta-analysis of the genomic alterations including single-nucleotide variations and the copy number variations, transcriptomics variations through RNA-seq and clinical data of patients with lung adenocarcinoma in The Cancer Genome Atlas project. We integrated significant genes including single-nucleotide variations or the copy number variations, differentially expressed genes and those in active subnetworks to construct a prognosis signature. Cox proportional hazards model with Lasso penalty and LOOCV was used to identify best gene signature among different gene categories. We determined a 12-gene signature (BCHE, CCNA1, CYP24A1, DEPTOR, MASP2, MGLL, MYO1A, PODXL2, RAPGEF3, SGK2, TNNI2, ZBTB16) for prognostic risk prediction based on overall survival time of the patients with lung adenocarcinoma. The patients in both training and test data were clustered into high-risk and low-risk groups by using risk scores of the patients calculated based on selected gene signature. The overall survival probability of these risk groups was highly significantly different for both training and test datasets.

Conclusions: This 12-gene signature could predict the prognostic risk of the patients with lung adenocarcinoma in TCGA and they are potential predictors for the survival-based risk clustering of the patients with lung adenocarcinoma. These genes can be used to cluster patients based on molecular nature and the best candidates of drugs for the patient clusters can be proposed. These genes also have a high potential for targeted cancer therapy of patients with lung adenocarcinoma.

Keywords: Active subnetwork; CNV; Cox proportional hazards regression; Differential expression; Lung adenocarcinoma; Lung cancer; SNV; Signature; Survival; TCGA.

MeSH terms

  • Adenocarcinoma of Lung / genetics
  • Adenocarcinoma of Lung / mortality
  • Adenocarcinoma of Lung / pathology*
  • Area Under Curve
  • Cluster Analysis
  • DNA Copy Number Variations
  • Databases, Genetic
  • Gene Expression Regulation, Neoplastic
  • Genomics / methods*
  • Humans
  • Lung Neoplasms / genetics
  • Lung Neoplasms / mortality
  • Lung Neoplasms / pathology*
  • Neoplasm Staging
  • Prognosis
  • Proportional Hazards Models
  • Protein Interaction Maps / genetics
  • ROC Curve
  • Risk Factors
  • Survival Rate
  • Transcriptome*