Towards interpretable speech biomarkers: exploring MFCCs

Brian Tracey; Dmitri Volfson; James Glass; R'mani Haulcy; Melissa Kostrzebski; Jamie Adams; Tairmae Kangarloo; Amy Brodtmann; E Ray Dorsey; Adam Vogel

doi:10.1038/s41598-023-49352-2

Towards interpretable speech biomarkers: exploring MFCCs

Sci Rep. 2023 Dec 21;13(1):22787. doi: 10.1038/s41598-023-49352-2.

Authors

Brian Tracey¹, Dmitri Volfson², James Glass³, R'mani Haulcy³, Melissa Kostrzebski⁴, Jamie Adams⁴, Tairmae Kangarloo², Amy Brodtmann^{5

6}, E Ray Dorsey⁴, Adam Vogel^{6

7}

Affiliations

¹ Takeda Pharamaceuticals, Data Science Institute, Cambridge, MA, 02142, USA. [email protected].
² Takeda Pharamaceuticals, Data Science Institute, Cambridge, MA, 02142, USA.
³ Massachusetts Institute of Technology, CSAIL, Cambridge, MA, 02139, USA.
⁴ Center for Health + Technology (CHeT), University of Rochester Medical Center, Rochester, NY, USA.
⁵ Monash University, Melbourne, VIC, Australia.
⁶ University of Melbourne, Parkville, VIC, 3010, Australia.
⁷ Redenlab Inc, Melbourne, VIC, 3010, Australia.

Abstract

While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning approaches may lack clinical interpretability. As an example, Mel frequency cepstral coefficients (MFCCs) have been identified in several studies as a useful marker of disease, but are regarded as uninterpretable. Here we explore correlations between MFCC coefficients and more interpretable speech biomarkers. In particular we quantify the MFCC2 endpoint, which can be interpreted as a weighted ratio of low- to high-frequency energy, a concept which has been previously linked to disease-induced voice changes. By exploring MFCC2 in several datasets, we show how its sensitivity to disease can be increased by adjusting computation parameters.

MeSH terms

Signal Processing, Computer-Assisted
Speech Acoustics*
Speech*