Salivary metabolomics with alternative decision tree-based machine learning methods for breast cancer discrimination

Breast Cancer Res Treat. 2019 Oct;177(3):591-601. doi: 10.1007/s10549-019-05330-9. Epub 2019 Jul 8.

Abstract

Purpose: The aim of this study is to explore new salivary biomarkers to discriminate breast cancer patients from healthy controls.

Methods: Saliva samples were collected after 9 h fasting and were immediately stored at - 80 °C. Capillary electrophoresis and liquid chromatography with mass spectrometry were used to quantify hundreds of hydrophilic metabolites. Conventional statistical analyses and artificial intelligence-based methods were used to assess the discrimination abilities of the quantified metabolites. A multiple logistic regression (MLR) model and an alternative decision tree (ADTree)-based machine learning method were used. The generalization abilities of these mathematical models were validated in various computational tests, such as cross-validation and resampling methods.

Results: One hundred sixty-six unstimulated saliva samples were collected from 101 patients with invasive carcinoma of the breast (IC), 23 patients with ductal carcinoma in situ (DCIS), and 42 healthy controls (C). Of the 260 quantified metabolites, polyamines were significantly elevated in the saliva of patients with breast cancer. Spermine showed the highest area under the receiver operating characteristic curves [0.766; 95% confidence interval (CI) 0.671-0.840, P < 0.0001] to discriminate IC from C. In addition to spermine, polyamines and their acetylated forms were elevated in IC only. Two hundred each of two-fold, five-fold, and ten-fold cross-validation using different random values were conducted and the MLR model had slightly better accuracy. The ADTree with an ensemble approach showed higher accuracy (0.912; 95% CI 0.838-0.961, P < 0.0001). These prediction models also included spermine as a predictive factor.

Conclusions: These data indicated that combinations of salivary metabolomics with the ADTree-based machine learning methods show potential for non-invasive screening of breast cancer.

Keywords: Alternative decision tree; Biomarker; Breast cancer; Metabolomics; Polyamines; Saliva.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / metabolism*
  • Clinical Decision-Making* / methods
  • Cross-Sectional Studies
  • Female
  • Humans
  • Machine Learning*
  • Metabolomics* / methods
  • Middle Aged
  • ROC Curve
  • Saliva / metabolism*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization