Deep learning-based metabolomics data study of prostate cancer

BMC Bioinformatics. 2024 Dec 26;25(1):391. doi: 10.1186/s12859-024-06016-w.

Abstract

As a heterogeneous disease, prostate cancer (PCa) exhibits diverse clinical and biological features, which pose significant challenges for early diagnosis and treatment. Metabolomics offers promising new approaches for early diagnosis, treatment, and prognosis of PCa. However, metabolomics data are characterized by high dimensionality, noise, variability, and small sample sizes, presenting substantial challenges for classification. Despite the wide range of applications of deep learning methods, the use of deep learning in metabolomics research has not been extensively explored. In this study, we propose a hybrid model, TransConvNet, which combines transformer and convolutional neural networks for the classification of prostate cancer metabolomics data. We introduce a 1D convolution layer for the inputs to the dot-product attention mechanism, enabling the interaction of both local and global information. Additionally, a gating mechanism is incorporated to dynamically adjust the attention weights. The features extracted by multi-head attention are further refined through 1D convolution, and a residual network is introduced to alleviate the gradient vanishing problem in the convolutional layers. We conducted comparative experiments with seven other machine learning algorithms. Through five-fold cross-validation, TransConvNet achieved an accuracy of 81.03% and an AUC of 0.89, significantly outperforming the other algorithms. Additionally, we validated TransConvNet's generalization ability through experiments on the lung cancer dataset, with the results demonstrating its robustness and adaptability to different metabolomics datasets. We also proposed the MI-RF (Mutual Information-based random forest) model, which effectively identified key biomarkers associated with prostate cancer by leveraging comprehensive feature weight coefficients. In contrast, traditional methods identified only a limited number of biomarkers. In summary, these results highlight the potential of TransConvNet and MI-RF in both classification tasks and biomarker discovery, providing valuable insights for the clinical application of prostate cancer diagnosis.

Keywords: Biomarker discovery; CNN; Hybrid deep learning; Metabolomics; Prostate cancer; Transformer.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / metabolism
  • Deep Learning*
  • Humans
  • Male
  • Metabolomics* / methods
  • Neural Networks, Computer
  • Prostatic Neoplasms* / metabolism

Substances

  • Biomarkers, Tumor