Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice

EBioMedicine. 2021 May:67:103388. doi: 10.1016/j.ebiom.2021.103388. Epub 2021 May 14.

Abstract

Background: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77).

Methods: Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data.

Findings: The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively.

Implications: Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool.

Funding: National Institutes of Health and American Lung Association.

Keywords: Deep learning; Diversity Outbred mice; Gene expression; Histopathology; Tuberculosis.

MeSH terms

  • Animals
  • Female
  • Genetic Predisposition to Disease*
  • Hybridization, Genetic
  • Machine Learning*
  • Mice
  • Transcriptome*
  • Tuberculosis / genetics*
  • Tuberculosis / metabolism
  • Tuberculosis / pathology