Background: Machine learning algorithms for bacterial strain identification using Raman spectroscopy have been widely used in microbiology. During the training phase, existing datasets are augmented and used to optimize model architecture and hyperparameters. After training, it is presumed that the models have reached their peak performance and are used for inference without being further enhanced. Our methodology combines Monte Carlo Dropout (MCD) with convolutional neural networks (CNNs) by utilizing dropout during the inference phase, which enables to measure the model uncertainty, a critical but often ignored aspect in deep learning models.
Results: We categorize unseen input data into two subsets based on the uncertainty of their prediction by employing MCD and defining the threshold using the Gaussian Mixture Model (GMM). The final prediction is obtained on the subset of testing data that exhibits lower model uncertainty, thereby enhancing the reliability of the results. To validate our method, we applied it to two Raman spectra datasets. As a result, we have observed an increase in accuracy of 9 % for Dataset 1 (from 83.10 % to 92.10 %) and 12.82 % for Dataset 2 (from 83.86 % to 96.68 %). These improvements were observed within specific subsets of the data: 826 out of 1206 spectra in Dataset 1 and 1700 out of 3000 spectra in Dataset 2. This demonstrates the effectiveness of our approach in improving prediction accuracy by focusing on data with lower uncertainty.
Significance: Different from routine prediction based on mere probabilities, we believe this uncertainty-guided prediction is more effective to ensure a high prediction rate rather than the prediction on the entire dataset. By guiding the decision-making of a model on higher-confidence subsets, our methodology can enhance the accuracy of classification in critical areas like disease diagnosis and safety monitoring. This targeted approach is to advance microbial identification and produces more trustworthy predictions.
Keywords: Bacteria classification; Machine learning; Raman spectroscopy; Uncertainty.
Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.