Objective: To screen differentially expressed gene (DEG) related to myelodysplastic syndrome (MDS) based on Gene Expression Omnibus (GEO) database, and explore the core genes and pathogenesis of MDS by analyzing the biological functions and related signaling pathways of DEG.
Methods: The expression profiles of GSE4619, GSE19429, GSE58831 including MDS patients and normal controls were downloaded from GEO database. The gene expression analysis tool (GEO2R) of GEO database was used to screen DEG according to | log FC (fold change) |≥1 and P<0.01. David online database was used to annotate gene ontology function (GO). Metascape online database was used to enrich and analyze differential genes in Kyoto Encyclopedia of Genes and Genomes (KEGG). The protein-protein interaction network (PPI) was constructed by using STRING database. CytoHubba and Mcode plug-ins of Cytoscape were used to analyze the key gene clusters and hub genes. R language was used to diagnose hub genes and draw the ROC curve. GSEA enrichment analysis was performed on GSE19429 according to the expression of LEF1.
Results: A total of 74 co-DEG were identified, including 14 up-regulated genes and 60 down regulated genes. GO enrichment analysis indicated that BP of down regulated genes was mainly enriched in the transcription and regulation of RNA polymerase II promoter, negative regulation of cell proliferation, and immune response. CC of down regulated genes was mainly enriched in the nucleus, transcription factor complexes, and adhesion spots. MF was mainly enriched in protein binding, DNA binding, and β-catenin binding. KEGG pathway was enriched in primary immunodeficiency, Hippo signaling pathway, cAMP signaling pathway, transcriptional mis-regulation in cancer and hematopoietic cell lineage. BP of up-regulated genes was mainly enriched in type I interferon signaling pathway and viral response. CC was mainly enriched in cytoplasm. MF was mainly enriched in RNA binding. Ten hub genes and three important gene clusters were screened by STRING database and Cytoscape software. The functions of the three key gene clusters were closely related to immune regulation. ROC analysis showed that the hub genes had a good diagnostic significance for MDS. GSEA analysis indicated that LEF1 may affect the normal function of hematopoietic stem cells by regulating inflammatory reaction, which further revealed the pathogenesis of MDS.
Conclusion: Bioinformatics can effectively screen the core genes and key signaling pathways of MDS, which provides a new strategy for the diagnosis and treatment of MDS.
题目: 骨髓增生异常综合征核心基因及关键通路的生物信息学分析.
目的: 基于基因表达数据库(GEO)筛选骨髓增生异常综合征(MDS)相关差异基因(DEG),通过分析DEG的生物学功能及相关信号通路,探讨MDS的核心基因及其发病机制.
方法: 从GEO数据库筛选包含有MDS患者和正常对照组的基因芯片数据集(GSE4619,GSE19429,GSE58831),借助GEO数据库的基因表达分析工具(GEO2R),以 |log FC(fold change)| ≥1以及P<0.01为标准筛选数据库中的DEG。利用David在线数据库对DEG进行基因本体功能注释(GO);利用Metascape在线数据库对DEG进行京都基因与基因组百科全书(KEGG)富集分析。利用 STRING 数据库构建蛋白质互作用网络(PPI)并利用Cytoscape的CytoHubba和Mcode插件分析其关键基因簇及核心基因。利用R语言对筛选出的核心基因进行诊断试验分析并绘制ROC曲线。根据LEF1的表达量对GSE19429进行GSEA富集分析.
结果: 本研究共筛选出74个共同差异基因,其中上调14个,下调60个。GO分析结果显示,下调基因的BP主要富集在RNA聚合酶Ⅱ启动子转录及调控、细胞增殖的负调控以及免疫应答;CC主要富集在细胞核、转录因子复合物和黏着斑;MF主要富集在蛋白质结合、DNA结合以及β-连环蛋白结合;KEGG通路则主要富集在原发性免疫缺陷、Hippo信号通路、cAMP信号通路、癌症中的转录失调以及造血细胞系。上调基因的BP主要富集在Ⅰ型干扰素信号通路、病毒反应等,CC主要富集在细胞质,MF主要富集在RNA结合。通过 STRING 数据库及 Cytoscape 软件共筛选出10个核心基因以及3个重要的基因簇,进一步通过基因功能富集发现3个关键基因簇的功能均与免疫调控关系密切。ROC分析发现筛选出的核心基因对MDS具有较好的诊断意义。通过对核心基因进行GSEA分析发现LEF1可能通过调控炎症反应等影响造血干细胞的正常功能,进一步揭示了MDS的发病机制.
结论: 利用生物信息学方法能有效筛选出MDS的核心基因和关键信号通路,为MDS的诊治提供新的策略.
Keywords: LEF1 gene; bioinformatics; differentially expressed gene; myelodysplastic syndrome; pathogenesis.