Background: Women with a family history of breast cancer face considerable uncertainty about whether to pursue standard screening, intensive screening, or prophylactic surgery. Accurate and individualized risk-estimation approaches may help these women make more informed decisions. Although highly penetrant genetic variants have been associated with familial breast cancer (FBC) risk, many individuals do not carry these variants, and many carriers never develop breast cancer. Common risk variants have a relatively modest effect on risk and show limited potential for predicting FBC development. As an alternative, we hypothesized that additional genomic data types, such as gene-expression levels, which can reflect genetic and epigenetic variation, could contribute to classifying a person's risk status. Specifically, we aimed to identify common patterns in gene-expression levels across individuals who develop FBC.
Methods: We profiled peripheral blood mononuclear cells from women with a family history of breast cancer (with or without a germline BRCA1/2 variant) and from controls. We used the support vector machines algorithm to differentiate between patients who developed FBC and those who did not. Our study used two independent datasets, a training set of 124 women from Utah (USA) and an external validation (test) set from Ontario (Canada) of 73 women (197 total). We controlled for expression variation associated with clinical, demographic, and treatment variables as well as lymphocyte markers.
Results: Our multigene biomarker provided accurate, individual-level estimates of FBC occurrence for the Utah cohort (AUC = 0.76 [0.67-84]) . Even at their lower confidence bounds, these accuracy estimates meet or exceed estimates from alternative approaches. Our Ontario cohort resulted in similarly high levels of accuracy (AUC = 0.73 [0.59-0.86]), thus providing external validation of our findings. Individuals deemed to have "high" risk by our model would have an estimated 2.4 times greater odds of developing familial breast cancer than individuals deemed to have "low" risk.
Conclusions: Together, these findings suggest that gene-expression levels in peripheral blood cells reflect genomic variation associated with breast cancer risk and that such data have potential to be used as a non-invasive biomarker for familial breast cancer risk.