Background: Compared to the many uses of DNA-level testing in clinical oncology, development of RNA-based diagnostics has been more limited. An exception to this trend is the growing use of mRNA-based methods in early-stage breast cancer. Although DNA and mRNA are used together in breast cancer research, the distinct contribution of mRNA beyond that of DNA in clinical challenges has not yet been directly assessed. We hypothesize that mRNA harbors prognostically useful information independently of genomic variation. To validate this, we use both genomic mutations and gene expression to predict five-year breast cancer recurrence in an integrated test model. This is accomplished first by comparing the feature importance of DNA and mRNA features in a model trained on both, and second, by evaluating the difference in performance of models trained on DNA and mRNA data separately.
Results: We find that models trained on DNA and mRNA data give more weight to mRNA features than to DNA features, and models trained only on mRNA outperform models trained on DNA alone.
Conclusions: The evaluation process presented here may serve as a framework for the interpretation of the relative contribution of individual molecular markers. It also suggests that mRNA has a distinct contribution in a diagnostic setting, beyond and independently of DNA mutation data.
Keywords: Breast cancer recurrence; Data science; Gene expression; Genomics; Machine learning; Machine learning explainability; Oncology.