Background: The development of clinical -omic biomarkers for predicting patient prognosis has mostly focused on multi-gene models. However, several studies have described significant weaknesses of multi-gene biomarkers. Indeed, some high-profile reports have even indicated that multi-gene biomarkers fail to consistently outperform simple single-gene ones. Given the continual improvements in -omics technologies and the availability of larger, better-powered datasets, we revisited this "single-gene hypothesis" using new techniques and datasets.
Results: By deeply sampling the population of available gene sets, we compare the intrinsic properties of single-gene biomarkers to multi-gene biomarkers in twelve different partitions of a large breast cancer meta-dataset. We show that simple multi-gene models consistently outperformed single-gene biomarkers in all twelve partitions. We found 270 multi-gene biomarkers (one per ~11,111 sampled) that always made better predictions than the best single-gene model.
Conclusions: The single-gene hypothesis for breast cancer does not appear to retain its validity in the face of improved statistical models, lower-noise genomic technology and better-powered patient cohorts. These results highlight that it is critical to revisit older hypotheses in the light of newer techniques and datasets.
Keywords: Multi-gene models; Single-gene models; Survival models.