Prediction of molecular subtypes of breast cancer using BI-RADS features based on a "white box" machine learning approach in a multi-modal imaging setting

Eur J Radiol. 2019 May:114:175-184. doi: 10.1016/j.ejrad.2019.03.015. Epub 2019 Mar 21.

Abstract

Purpose: To develop and validate an interpretable and repeatable machine learning model approach to predict molecular subtypes of breast cancer from clinical metainformation together with mammography and MRI images.

Methods: We retrospectively assessed 363 breast cancer cases (Luminal A 151, Luminal B 96, HER2 76, and BLBC 40). Eighty-two features defined in the BI-RADS lexicon were visually described. A decision tree model with the Chi-squared automatic interaction detector (CHAID) algorithm was applied for feature selection and classification. A 10-fold cross-validation was performed to investigate the performance (i.e., accuracy, positive predictive value, sensitivity, and F1-score) of the decision tree model.

Results: Seven of the 82 variables were derived from the decision tree-based feature selection and used as features for the classification of molecular subtypes including mass margin calcification on mammography, mass margin types of kinetic curves in the delayed phase, mass internal enhancement characteristics, non-mass enhancement distribution on MRI, and breastfeeding history. The decision tree model accuracy was 74.1%. For each molecular subtype group, Luminal A achieved a sensitivity, positive predictive value, and F1-score of 79.47%, 75.47%, and 77.42%, respectively; Luminal B showed a sensitivity, positive predictive value, and F1-score of 64.58%, 55.86%, and 59.90%, respectively; HER2 had a sensitivity, positive predictive value, and F1-scores of 81.58%, 95.38%, and 87.94%, respectively; BLBC showed sensitivity, positive predictive value, and F1-scores of 62.50%, 89.29%, and 73.53%, respectively.

Conclusions: We applied a complete "white box" machine learning method to predict the molecular subtype of breast cancer based on the BI-RADS feature description in a multi-modal setting. By combining BI-RADS features in both mammography and MRI, the prediction accuracy is boosted and robust. The proposed method can be easily applied widely regardless of variability of imaging vendors and settings because of the applicability and acceptance of the BI-RADS.

Keywords: Breast cancer; Decision tree; MRI; Machine learning; Mammography; Molecular subtype.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Breast Neoplasms / diagnostic imaging*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism*
  • Breast Neoplasms / pathology
  • Female
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Machine Learning*
  • Middle Aged
  • Multimodal Imaging*
  • Neoplasm Staging
  • Predictive Value of Tests
  • Retrospective Studies