Machine Learning Algorithms for Classification of MALDI-TOF MS Spectra from Phylogenetically Closely Related Species Brucella melitensis, Brucella abortus and Brucella suis

Microorganisms. 2022 Aug 17;10(8):1658. doi: 10.3390/microorganisms10081658.

Abstract

(1) Background: MALDI-TOF mass spectrometry (MS) is the gold standard for microbial fingerprinting, however, for phylogenetically closely related species, the resolution power drops down to the genus level. In this study, we analyzed MALDI-TOF spectra from 44 strains of B. melitensis, B. suis and B. abortus to identify the optimal classification method within popular supervised and unsupervised machine learning (ML) algorithms. (2) Methods: A consensus feature selection strategy was applied to pinpoint from among the 500 MS features those that yielded the best ML model and that may play a role in species differentiation. Unsupervised k-means and hierarchical agglomerative clustering were evaluated using the silhouette coefficient, while the supervised classifiers Random Forest, Support Vector Machine, Neural Network, and Multinomial Logistic Regression were explored in a fine-tuning manner using nested k-fold cross validation (CV) with a feature reduction step between the two CV loops. (3) Results: Sixteen differentially expressed peaks were identified and used to feed ML classifiers. Unsupervised and optimized supervised models displayed excellent predictive performances with 100% accuracy. The suitability of the consensus feature selection strategy for learning system accuracy was shown. (4) Conclusion: A meaningful ML approach is here introduced, to enhance Brucella spp. classification using MALDI-TOF MS data.

Keywords: B. abortus; B. suis; Brucella melitensis; MALDI-TOF MS; R; feature selection; machine learning; nested k-fold cross validation.

Grants and funding

This research received no external funding.