Application of three statistical approaches to explore effects of dietary intake of multiple persistent organic pollutants on ER-positive breast cancer risk in the French E3N cohort

Sci Rep. 2025 Jan 15;15(1):2058. doi: 10.1038/s41598-025-85438-9.

Abstract

Persistent organic pollutants (POPs) are a group of organic chemical compounds. Contradictory results have emerged in epidemiological studies attempting to elucidate their relationship with breast cancer risk. This study explored the relationship between dietary exposures to multiple POPs and ER-positive breast cancer risk in the French E3N cohort study, using three different approaches to handle multicollinearity among exposures. Intakes of 81 POPs were estimated using food consumption data from a validated semi-quantitative food frequency questionnaire and food contamination data. In the first approach, hierarchical clustering was performed to identify clusters of correlated POPs. For each cluster, the levels of POPs belonging to it were averaged. These average levels were then included in a Cox model to estimate their associations with ER-positive breast cancer occurrence. The second and third approaches applied in the present study were Principal component Cox regression (PCR-Cox) and partial least squares Cox regression (PLS-Cox) respectively, both being dimension-reduction methods (respectively unsupervised and supervised) coupled to a Cox model, used to identify principal components of POPs and to estimate their associations with ER-positive breast occurrence. All models were adjusted for potential confounders previously identified using a directed acyclic graph. The study included 66,722 women with a median follow-up of 20.3 years, during which 3,739 developed an incident ER-positive breast cancer. The variable clustering method did not identify any association between the averaged variables and ER-positive breast cancer risk. Five components were retained using both the PCR-Cox and PLS-Cox methods explaining 82% and 77% of the variance in the initial exposure matrix respectively. Among these components, none was significantly associated with the occurrence of ER-positive breast cancer. This study provides an illustrative example of the application of three distinct statistical methods in the context of highly correlated environmental exposures, discussing their potential relevance and limitations within this specific framework.

Keywords: Breast cancer; Multicollinearity; Persistent organic pollutants.

MeSH terms

  • Adult
  • Aged
  • Breast Neoplasms* / chemically induced
  • Breast Neoplasms* / epidemiology
  • Breast Neoplasms* / etiology
  • Breast Neoplasms* / metabolism
  • Cohort Studies
  • Diet
  • Dietary Exposure / adverse effects
  • Female
  • Food Contamination / analysis
  • France / epidemiology
  • Humans
  • Middle Aged
  • Persistent Organic Pollutants*
  • Proportional Hazards Models
  • Receptors, Estrogen / metabolism
  • Risk Factors

Substances

  • Persistent Organic Pollutants
  • Receptors, Estrogen