Objective Indices of Perceived Vocal Strain

J Voice. 2019 Nov;33(6):838-845. doi: 10.1016/j.jvoice.2018.06.005. Epub 2018 Jul 29.

Abstract

Background: A limited number of experiments have investigated the perception of strain compared to the voice qualities of breathiness and roughness despite its widespread occurrence in patients who have hyperfunctional voice disorders, adductor spasmodic dysphonia, and vocal fold paralysis among others.

Objective: The purpose of this study is to determine the perceptual basis of strain through identification and exploration of acoustic and psychoacoustic measures.

Methods: Twelve listeners evaluated the degree of strain for 28 dysphonic phonation samples on a five-point rating scale task. Computational estimates based on cepstrum, sharpness, and spectral moments (linear and transformed with auditory processing front-end) were correlated to the perceptual ratings.

Results: Perceived strain was strongly correlated with cepstral peak prominence, sharpness, and a subset of the spectral metrics. Spectral energy distribution measures from the output of an auditory processing front-end (ie, excitation pattern and specific loudness pattern) accounted for 77-79% of the model variance for strained voices in combination with the cepstral measure.

Conclusions: Modeling the perception of strain using an auditory front-end prior to acoustic analysis provides better characterization of the perceptual ratings of strain, similar to our prior work on breathiness and roughness. Results also provide evidence that the sharpness model of Fastl and Zwicker (2007) is one of the strong predictors of strain perception.

Keywords: Cepstral peak prominence (CPP); Listener perception; Spectral moments; Spectral sharpness; Strained voice.

MeSH terms

  • Acoustics
  • Auditory Perception*
  • Dysphonia / diagnosis*
  • Dysphonia / physiopathology
  • Humans
  • Judgment
  • Models, Theoretical
  • Observer Variation
  • Psychoacoustics
  • Severity of Illness Index
  • Sound Spectrography
  • Stress, Physiological*
  • Voice Quality*