Background: Breast cancer is a heterogeneous disease. Small tumors with extensive lymph node involvement (STEL) in breast cancer often reflect a biologically aggressive phenotype and poor prognosis. The aim of this study was to identify key genes associated with STEL and investigate their prognostic values in breast cancer. Methods: RNA sequence data from breast cancer specimens were acquired from The Cancer Genome Atlas (TCGA) database for differential analysis. Weighted gene correlation network analyses (WGCNA) were performed to identify coexpressed gene modules associated with tumor size and lymph node metastases. Gene set enrichment analysis (GSEA) was employed to investigate the biological functions of the identified genes. A combination of LASSO and Cox regression analyses was conducted to establish a risk predictive signature, and time-dependent receiver operating characteristic (tdROC) and Kaplan-Meier analyses were used to evaluate its prediction precision. Quantitative RT-PCR was employed to validate the expression levels of the key genes from the signature set. Results: A total of 2777 genes from three coexpressed gene modules were identified by WGCNA, and 880 differentially expressed genes were identified by transcriptome analyses. The 63 overlapping genes identified by both methods were considered STEL-associated genes, and a 9-gene risk-predictive signature was established based on them, with AUCs at 3, 5, and 7 years reaching 0.810, 0.811, and 0.753, respectively. Conclusion: This study demonstrated the transcriptomic profile of STEL breast cancer and successfully established a risk predictive signature with satisfactory accuracy. These findings may provide insights in to the genetic etiology of breast cancer.
Keywords: LASSO; WGCNA; breast cancer; lymph node metastasis; risk score.