Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels

Youri Maryn; Paul Corthals; Paul Van Cauwenberge; Nelson Roy; Marc De Bodt

doi:10.1016/j.jvoice.2008.12.014

Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels

J Voice. 2010 Sep;24(5):540-55. doi: 10.1016/j.jvoice.2008.12.014. Epub 2009 Nov 2.

Authors

Youri Maryn¹, Paul Corthals, Paul Van Cauwenberge, Nelson Roy, Marc De Bodt

Affiliation

¹ Department of Otorhinolaryngology, Head and Neck Surgery, Speech-Language Pathology and Audiology, Sint-Jan General Hospital, Bruges, Belgium. [email protected]

PMID: 19883993
DOI: 10.1016/j.jvoice.2008.12.014

Abstract

To improve ecological validity, perceptual and instrumental assessment of disordered voice, including overall voice quality, should ideally sample both sustained vowels and continuous speech. This investigation assessed the utility of combining both voice contexts for the purpose of auditory-perceptual ratings as well as acoustic measurement of overall voice quality. Sustained vowel and continuous speech samples from 251 subjects with (n=229) or without (n=22) various voice disorders were concatenated and perceptually rated on overall voice quality by five experienced voice clinicians. After removing the nonvoiced segments within the continuous speech samples, the concatenated samples were analyzed using 13 acoustic measures based on fundamental frequency perturbation, amplitude perturbation, spectral and cepstral analyses. Stepwise multiple regression analysis yielded a six-variable acoustic model for the multiparametric measurement of overall voice quality of the concatenated samples (with a cepstral measure as the main contributor to the prediction of overall voice quality). The correlation of this model with mean ratings of overall voice quality resulted in r(s)=0.78. A cross-validation approach involving the iterated internal cross-correlations with 30 subgroups of 100, 50, and 10 samples confirmed a comparable degree of association. Furthermore, the ability of the model to distinguish voice-disordered from vocally normal participants was assessed using estimates of diagnostic precision including receiver operating characteristic (ROC) curve analysis, sensitivity, and specificity, as well as likelihood ratios (LRs), which adjust for base-rate differences between the groups. Depending on the cutoff criteria employed, the analyses revealed an impressive area under ROC=0.895 as well as respectable sensitivity, specificity, and LR. The results support the diagnostic utility of combining voice samples from both continuous speech and sustained vowels in acoustic and perceptual analysis of disordered voice. The findings are discussed in relation to the extant literature and the need for further refinement of the acoustic algorithm.

MeSH terms

Acoustic Stimulation
Adolescent
Adult
Aged
Aged, 80 and over
Belgium
Case-Control Studies
Child
Dysphonia / diagnosis*
Dysphonia / physiopathology
Dysphonia / psychology
Female
Humans
Logistic Models
Male
Middle Aged
Observer Variation
Phonetics*
Predictive Value of Tests
ROC Curve
Reproducibility of Results
Sensitivity and Specificity
Severity of Illness Index
Signal Processing, Computer-Assisted
Sound Spectrography
Speech Acoustics*
Speech Perception*
Speech Production Measurement
Time Factors
Voice Quality*
Young Adult