Screening of serum biomarkers in patients with PCOS through lipid omics and ensemble machine learning

PLoS One. 2025 Jan 7;20(1):e0313494. doi: 10.1371/journal.pone.0313494. eCollection 2025.

Abstract

Polycystic ovary syndrome (PCOS) is a primary endocrine disorder affecting premenopausal women involving metabolic dysregulation. We aimed to screen serum biomarkers in PCOS patients using untargeted lipidomics and ensemble machine learning. Serum from PCOS patients and non-PCOS subjects were collected for untargeted lipidomics analysis. Through analyzing the classification of differential lipid metabolites and the association between differential lipid metabolites and clinical indexes, ensemble machine learning, data preprocessing, statistical test pre-screening, ensemble learning method secondary screening, biomarkers verification and evaluation, and diagnostic panel model construction and verification were performed on the data of untargeted lipidomics. Results indicated that different lipid metabolites not only differ between groups but also have close effects on different corresponding clinical indexes. PI (18:0/20:3)-H and PE (18:1p/22:6)-H were identified as candidate biomarkers. Three machine learning models, logistic regression, random forest, and support vector machine, showed that screened biomarkers had better classification ability and effect. In addition, the correlation of candidate biomarkers was low, indicating that the overlap between the selected biomarkers was low, and the combination of panels was more optimized. When the AUC value of the test set of the constructed diagnostic panel model was 0.815, the model's accuracy in the test set was 0.74, specificity was 0.88, and sensitivity was 0.7. This study demonstrated the applicability and robustness of machine learning algorithms to analyze lipid metabolism data for efficient and reliable biomarker screening. PI (18:0/20:3)-H and PE (18:1p/22:6)-H showed great potential in diagnosing PCOS.

MeSH terms

  • Adult
  • Biomarkers* / blood
  • Female
  • Humans
  • Lipidomics / methods
  • Lipids / blood
  • Machine Learning*
  • Polycystic Ovary Syndrome* / blood
  • Polycystic Ovary Syndrome* / diagnosis
  • Young Adult

Substances

  • Biomarkers
  • Lipids

Grants and funding

This research was financially supported in part by a grant from the Key Laboratory of Science and Technology Innovation (Longhua District, Shenzhen, China) grant number of 20170913A0410028 and scientific Research Projects of Medical and Health Institutions of Longhua District (Shenzhen, China) grant number of 2022033.