Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence

Diagnostics (Basel). 2023 Apr 21;13(8):1506. doi: 10.3390/diagnostics13081506.

Abstract

Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease's problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.

Keywords: ensemble learning; explainable machine learning; machine learning; polycystic ovary syndrome.

Grants and funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R330), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.