Objective: CYP2D6 plays a critical role in metabolizing tamoxifen into its active metabolite, endoxifen, which is crucial for its therapeutic effect in estrogen receptor-positive breast cancer. Single nucleotide polymorphisms (SNPs) in the CYP2D6 gene can affect enzyme activity and thus impact tamoxifen efficacy. This study aimed to use machine learning algorithms (MLAs) to identify significant predictors of Breast Cancer-Free Interval (BCFI) and to apply bioinformatics tools to investigate the structural and functional implications of CYP2D6 SNPs.
Patients and methods: The study utilized data from 4,974 breast cancer patients recruited by the International Tamoxifen Pharmacogenomics Consortium (ITPC), focusing on 898 patients with available BCFI data. Predictors included age, ethnicity, menopausal status, breast cancer grade, CYP2D6 genotype, and BCFI. An ensemble MLA model was developed, incorporating regression, CHAID, artificial neural networks (ANN), and classification and regression trees (CART). Bioinformatics tools, such as STRING-DB and GEPIA2, were used to analyze protein-protein interactions and survival data related to CYP2D6.
Results: The ensemble model identified age and CYP2D6 genotypes as significant predictors of BCFI. The mean prediction error for the training and testing cohorts was 13.8 and 40.2 days, respectively. Bioinformatics analysis revealed reduced CYP2D6 functional activity associated with decreased survival, and Kaplan-Meier analysis demonstrated that lower CYP2D6 expression significantly reduced survival rates.
Conclusions: This study highlights the utility of MLAs in identifying key predictors of tamoxifen response and the value of bioinformatics in understanding CYP2D6's role in breast cancer outcomes. Personalized treatment approaches based on CYP2D6 metabolizer status could enhance tamoxifen therapy effectiveness.