Comprehensive genomic testing will be required to identify appropriate targets for the precision therapy of breast cancer. Although RNA sequencing (RNA-seq) is an unparalleled platform for this purpose, existing molecular-based prognostic signatures are not optimal for RNA-seq data. In this study, we analyzed RNA-seq datasets to generate a novel prognostic gene signature for breast cancer patients. RNA-seq and clinical datasets from breast cancer patients were obtained from The Cancer Genome Atlas and randomly assigned to training (n = 379) and test (n = 378) cohorts. Using the training cohort, sequential univariate Cox analysis, robust likelihood-based survival analysis, and stepwise multivariable Cox analysis identified a five-gene signature composed of one long noncoding RNA gene and four protein-coding genes. The five-gene signature was then used to dichotomize patients into risk groups and validated using Kaplan-Meier and multivariable Cox analyses. In the full test cohort, the high-risk group had worse overall survival (hazard ratio [HR] = 4.74, 95% confidence interval [CI] = 2.33-9.64, p < 0.0001) and worse relapse-free survival (HR = 2.26, 95% CI = 1.11-4.61, p = 0.024) than the low-risk group. Similarly, overall survival was worse in the high-risk group within nearly all clinically important subsets, including early stage disease (I/II) (HR = 7.87, 95% CI = 3.69-16.77, p < 0.0001), and luminal A (HR = 4.23, 95% CI = 1.11-16.12, p = 0.034), luminal B (HR = 12.79, 95% CI = 2.74-59.69, p = 0.001), and basal (HR = 18.11, 95% CI = 3.21-102.05, p = 0.001) subtypes. Notably, the five-gene signature exhibited superior prognostic performance compared with the Oncotype DX 21-gene signature. This novel five-gene signature may therefore be a powerful prognostic tool for personalized treatment of breast cancer patients as part of an integrated RNA-seq clinical sequencing program.
Keywords: RNA-seq; breast cancer; lncRNA; mRNA; prognostic signature.