Artificial intelligence for proteomics and biomarker discovery

Matthias Mann; Chanchal Kumar; Wen-Feng Zeng; Maximilian T Strauss

doi:10.1016/j.cels.2021.06.006

Artificial intelligence for proteomics and biomarker discovery

Cell Syst. 2021 Aug 18;12(8):759-770. doi: 10.1016/j.cels.2021.06.006.

Authors

Matthias Mann¹, Chanchal Kumar², Wen-Feng Zeng³, Maximilian T Strauss⁴

Affiliations

¹ Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany. Electronic address: [email protected].
² Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden. Electronic address: [email protected].
³ Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany. Electronic address: [email protected].
⁴ OmicEra Diagnostics, Planegg, Germany. Electronic address: [email protected].

PMID: 34411543
DOI: 10.1016/j.cels.2021.06.006

Abstract

There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.

Keywords: FAIR principles; bioinformatics; data integration; data privacy; mass spectrometry; open source; plasma proteomics; transparent science.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Artificial Intelligence*
Biomarkers / analysis
Mass Spectrometry / methods
Proteomics* / methods
Reproducibility of Results

Substances

Biomarkers