Clinical proteomics analysis is of great significance for analyzing pathological mechanisms and discovering disease-related biomarkers. Using computational methods to accurately predict disease types can effectively improve patient disease diagnosis and prognosis. However, how to eliminate the errors introduced by peptide precursor identification and protein identification for pathological diagnosis remains a major unresolved issue. Here, we develop a powerful end-to-end deep learning model, termed "MS1Former", that is able to classify hepatocellular carcinoma tumors and adjacent non-tumor (normal) tissues directly using raw MS1 spectra without peptide precursor identification. Our model provides accurate discrimination of subtle m/z differences in MS1 between tumor and adjacent non-tumor tissue, as well as more general performance predictions for data-dependent acquisition, data-independent acquisition, and full-scan data. Our model achieves the best performance on multiple external validation datasets. Additionally, we perform a detailed exploration of the model's interpretability. Prospectively, we expect that the advanced end-to-end framework will be more applicable to the classification of other tumors.
© 2024. The Author(s).