Bioinformatics tools were used to identify prognosis-related molecular subtypes and biomarkers of hepatocellular carcinoma (HCC). Differential expression analysis of four datasets identified 3330 overlapping differentially expressed genes (DEGs) in the same direction in all four datasets. Those genes were involved in the cell cycle, FOXO signaling pathway, as well as complement and coagulation cascades. Based on non-negative matrix decomposition, two molecular subtypes of HCC with different prognoses were identified, with subtype C2 showing better overall survival than subtype C1. Cox regression and Kaplan-Meier analysis showed that 217 of the overlapping DEGs were closely associated with HCC prognosis. The subset of those genes showing an area under the curve >0.80 was used to construct random survival forest and least absolute shrinkage and selection operator models, which identified seven feature genes (SORBS2, DHRS1, SLC16A2, RCL1, IGFALS, GNA14, and FANCI) that may be involved in HCC occurrence and prognosis. Based on the feature genes, risk score and recurrence models were constructed, while a univariate Cox model identified FANCI as a key gene involved mainly in the cell cycle, DNA replication, and mismatch repair. Further analysis showed that FANCI had two mutation sites and that its gene may undergo methylation. Single-sample gene set enrichment analysis showed that Th2 and T helper cells are significantly upregulated in HCC patients compared to controls. Our results identify FANCI as a potential prognostic biomarker for HCC.
Keywords: FANCI; bioinformatics; hepatocellular carcinoma; molecular subtypes; prognostic biomarker.