Knowledge discovery of patients reviews on breast cancer drugs: Segmentation of side effects using machine learning techniques

Heliyon. 2024 Sep 26;10(19):e38563. doi: 10.1016/j.heliyon.2024.e38563. eCollection 2024 Oct 15.

Abstract

Breast cancer stands as the most frequently diagnosed life-threatening cancer among women worldwide. Understanding patients' drug experiences is essential to improving treatment strategies and outcomes. In this research, we conduct knowledge discovery on breast cancer drugs using patients' reviews. A new machine learning approach is developed by employing clustering, text mining and regression techniques. We first use Latent Dirichlet Allocation (LDA) technique to discover the main aspects of patients' experiences from the patients' reviews on breast cancer drugs. We also use Expectation-Maximization (EM) algorithm to segment the data based on patients' overall satisfaction. We then use the Forward Entry Regression technique to find the relationship between aspects of patients' experiences and drug's effectiveness in each segment. The textual reviews analysis on breast cancer drugs found 8 main side effects: Musculoskeletal Effects, Menopausal Effects, Dermatological Effects, Metabolic Effects, Gastrointestinal Effects, Neurological and Cognitive Effects, Respiratory Effects and Cardiovascular. The results are provided and discussed. The findings of this study are expected to offer valuable insights and practical guidance for prospective patients, aiding them in making informed decisions regarding breast cancer drug consumption.

Keywords: Breast cancer; Drugs; Knowledge discovery; Machine learning; Online reviews; Public health; Text mining.