Hybrid learning method based on feature clustering and scoring for enhanced COVID-19 breath analysis by an electronic nose

Artif Intell Med. 2022 Jul:129:102323. doi: 10.1016/j.artmed.2022.102323. Epub 2022 May 17.

Abstract

Breath pattern analysis based on an electronic nose (e-nose), which is a noninvasive, fast, and low-cost method, has been continuously used for detecting human diseases, including the coronavirus disease 2019 (COVID-19). Nevertheless, having big data with several available features is not always beneficial because only a few of them will be relevant and useful to distinguish different breath samples (i.e., positive and negative COVID-19 samples). In this study, we develop a hybrid machine learning-based algorithm combining hierarchical agglomerative clustering analysis and permutation feature importance method to improve the data analysis of a portable e-nose for COVID-19 detection (GeNose C19). Utilizing this learning approach, we can obtain an effective and optimum feature combination, enabling the reduction by half of the number of employed sensors without downgrading the classification model performance. Based on the cross-validation test results on the training data, the hybrid algorithm can result in accuracy, sensitivity, and specificity values of (86 ± 3)%, (88 ± 6)%, and (84 ± 6)%, respectively. Meanwhile, for the testing data, a value of 87% is obtained for all the three metrics. These results exhibit the feasibility of using this hybrid filter-wrapper feature-selection method to pave the way for optimizing the GeNose C19 performance.

Trial registration: ClinicalTrials.gov NCT04558372.

Keywords: Breath analysis; Electronic nose; Feature permutation importance; GeNose C19; Hierarchical agglomerative clustering; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breath Tests / methods
  • COVID-19*
  • Cluster Analysis
  • Electronic Nose*
  • Humans
  • Machine Learning

Associated data

  • ClinicalTrials.gov/NCT04558372