Collagen family genes and related genes might be associated with prognosis of patients with gastric cancer: an integrated bioinformatics analysis and experimental validation

Transl Cancer Res. 2020 Oct;9(10):6246-6262. doi: 10.21037/tcr-20-1726.

Abstract

Background: Gastric cancer (GC) is disease with a high morbidity. The purpose of this study was to identify genes essential to GC development in patients and to reveal the underlying mechanisms of progression.

Methods: Bioinformatics analysis is an effective tool for discovering essential genes of different disease states. We used the Gene Expression Omnibus (GEO) database to identify differentially expressed genes (DEGs), the DAVID online tool to perform Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs, the STRING database to construct the protein-protein interaction (PPI) network of DEGs, the Oncomine and the Cancer Genome Atlas-Stomach Adenocarcinoma (TCGA-STAD) databases to analyze the gene expression differences, the Human pan-Cancer Methylation database (MethHC) to compare the DNA methylation of genes, and the Kaplan-Meier plotter to show the survival analysis of DEGs. We performed Real-Time quantitative PCR (RT-qPCR) experiment to confirm our analysis results.

Results: After the integration of four Gene Expression Series (GSEs), we identified 407 DEGs. GO and KEGG pathway analysis indicated that the upregulated DEGs were significantly enriched in Extracellular Matrix (ECM) related functions and pathways. The main DEGs were collagens (COLs). Moreover, the downregulated DEGs were enriched in ethanol oxidation. Several groups of DEGs, such as insulin-like growth factor binding protein (IGFBP), collagen (COL) and serpin peptidase inhibitors (SERPIN) gene families, constituted several PPI networks. In the Oncomine database, all of the collagen genes were highly expressed in breast cancer, esophageal cancer, GC, head and neck cancer and pancreatic cancer, compared with normal tissues. Consistently, from the TCGA-STAD database, most of the collagens (COLs) were highly expressed and exhibited methylated variation in GC patients. In GC patients, some of these collagen (COL) genes related to worse prognosis, as evidenced by the results from the Kaplan-Meier plotter database analysis. Our RT-qPCR results showed that collagen type III α1 chain (COL3A1) was highly expressed in GC cells. Collagen type V α1 chain (COL5A1) was highly expressed, except in AGS cells, which was consistent with our analysis.

Conclusions: Collagen (COL) family genes might serve as progression and prognosis markers of GC.

Keywords: Gastric cancer (GC); bioinformatics analysis; collagens; experimental validation; prognosis.