Background: Extensive efforts have been undertaken to discover genes relevant for breast cancer prognosis. Yet, in current opinion, with little overlap in findings. We aimed to reanalyze molecular prediction of breast cancer recurrence.
Methods: From 44 published gene lists relevant for breast cancer prognosis, we extracted 374 genes, which, besides other quality criteria, are recorded at least twice. From eight published microarray datasets, a single dataset of 1,067 breast cancer patients was created, using transformation to 'probability of expression' scale. For recurrence analysis, the Cox proportional hazards model was applied.
Results: The 374 genes, termed '374 Gene Set', are highly enriched in cell cycle genes. The '374 Gene Set' is significantly associated with breast cancer recurrence (p = 2 x 10(-12), log-rank test) in the meta set of 1,067 patients, showing an estimated Hazard Ratio of recurrence for the 'poor' prognosis group compared to the 'good' prognosis group of 2.03 (95% confidence interval, 1.66-2.48). Notably, the '374 Gene Set' is significantly associated with recurrence in untreated patients. In multivariate analysis, including the standard histopathological parameters, only tumor size and the '374 Gene Set' remain independent predictors of recurrence. External validation further confirmed the prognostic relevance of the gene set (253 patients, p = 0.001, log-rank test).
Conclusions: The '374 Gene Set' comprises a molecular basis of metastatic breast cancer progression. Starting from this gene set it might be possible to construct a clinically relevant classifier, which then again needs to be validated.