Investigation of machine learning algorithms for taxonomic classification of marine metagenomes

Microbiol Spectr. 2023 Oct 17;11(5):e0523722. doi: 10.1128/spectrum.05237-22. Epub 2023 Sep 11.

Abstract

Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.

Keywords: machine learning; marine microbiology; metagenomics.

MeSH terms

  • Aquatic Organisms*
  • Classification* / methods
  • Machine Learning*
  • Metagenome* / genetics
  • Microbiota* / genetics
  • Models, Biological
  • Neural Networks, Computer