Phenotype Classification using Proteome Data in a Data-Independent Acquisition Tensor Format

J Am Soc Mass Spectrom. 2020 Nov 4;31(11):2296-2304. doi: 10.1021/jasms.0c00254. Epub 2020 Oct 26.

Abstract

A novel approach for phenotype prediction is developed for data-independent acquisition (DIA) mass spectrometric (MS) data without the need for peptide precursor identification using existing DIA software tools. The first step converts the DIA-MS data file into a new file format called DIA tensor (DIAT), which can be used for the convenient visualization of all the ions from peptide precursors and fragments. DIAT files can be fed directly into a deep neural network to predict phenotypes such as appearances of cats, dogs, and microscopic images. As a proof of principle, we applied this approach to 102 hepatocellular carcinoma samples and achieved an accuracy of 96.8% in distinguishing malignant from benign samples. We further applied a refined model to classify thyroid nodules. Deep learning based on 492 training samples achieved an accuracy of 91.7% in an independent cohort of 216 test samples. This approach surpassed the deep-learning model based on peptide and protein matrices generated by OpenSWATH. In summary, we present a new strategy for DIA data analysis based on a novel data format called DIAT, which enables facile two-dimensional visualization of DIA proteomics data. DIAT files can be directly used for deep learning for biological and clinical phenotype classification. Future research will interpret the deep-learning models emerged from DIAT analysis.

MeSH terms

  • Carcinoma, Hepatocellular / chemistry
  • Carcinoma, Hepatocellular / diagnosis
  • Deep Learning
  • Humans
  • Liver Neoplasms / chemistry
  • Liver Neoplasms / diagnosis
  • Mass Spectrometry / methods*
  • Peptides / analysis
  • Proteome / analysis*
  • Proteomics / methods*
  • Software
  • Thyroid Gland / chemistry

Substances

  • Peptides
  • Proteome