Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters

Jaemin Song; Hyunbum Kim; Yong Oh Lee

doi:10.1016/j.heliyon.2024.e40748

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters

Heliyon. 2024 Nov 30;10(24):e40748. doi: 10.1016/j.heliyon.2024.e40748. eCollection 2024 Dec 30.

Authors

Jaemin Song¹, Hyunbum Kim², Yong Oh Lee¹

Affiliations

¹ Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea.
² Department of Otolaryngology-Head and Neck Surgery, The Catholic University of Korea, Seoul, South Korea.

Abstract

Introduction: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

Problem statement: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

Objectives: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

Methods: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

Results: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

Conclusion: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.

Keywords: 1/3 octave band filter; Laryngeal disease; MFCC (mel-frequency cepstral coefficients); Octave frequency spectrum energy; Voice.