The prevalence of malignant cells in clinical specimens, or tumour purity, is affected by both intrinsic biological factors and extrinsic sampling bias. Molecular characterization of large clinical cohorts is typically performed on bulk samples; data analysis and interpretation can be biased by tumour purity variability. Transcription-based strategies to estimate tumour purity have been proposed, but no breast cancer specific method is available yet. We interrogated over 6000 expression profiles from 10 breast cancer datasets to develop and validate a 9-gene Breast Cancer Purity Score (BCPS). BCPS outperformed existing methods for estimating tumour content. Adjusting transcriptomic profiles using the BCPS reduces sampling bias and aids data interpretation. BCPS-estimated tumour purity improved prognostication in luminal breast cancer, correlated with pathologic complete response in on-treatment biopsies from triple-negative breast cancer patients undergoing neoadjuvant treatment and effectively stratified the risk of relapse in HER2+ residual disease post-neoadjuvant treatment.
© 2024. The Author(s).