The unreliability of crackles: insights from a breath sound study using physicians and artificial intelligence

NPJ Prim Care Respir Med. 2024 Oct 15;34(1):28. doi: 10.1038/s41533-024-00392-9.

Abstract

Background and introduction: In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge.

Objectives: This study aims to evaluate the difference in the identification ability of different breath sound.

Methods/description: In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC).

Results/outcome: A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing.

Conclusion: Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Artificial Intelligence*
  • Child
  • Female
  • Humans
  • Male
  • Middle Aged
  • Physicians
  • Prospective Studies
  • ROC Curve
  • Reproducibility of Results
  • Respiratory Sounds* / diagnosis
  • Sensitivity and Specificity