Biological, linguistic, and individual factors govern voice qualitya)

J Acoust Soc Am. 2025 Jan 1;157(1):482-492. doi: 10.1121/10.0034848.

Abstract

Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking. The present findings extend this picture by demonstrating that in four languages that vary fundamental frequency (fo) and/or phonation type contrastively (Korean, Thai, Gujarati, and White Hmong), additional acoustic variability is systematically related to the phonology of the language spoken, and the amount of variability along each dimension is consistent across speaker groups. This study concludes that acoustic voice spaces are structured in a remarkably consistent way: first by biologically driven, evolutionarily grounded factors, second by learned linguistic factors, and finally by variations within a talker over utterances, possibly due to personal style, emotional state, social setting, or other dynamic factors. Implications for models of speaker recognition are also discussed.

MeSH terms

  • Adult
  • Arousal
  • Emotions
  • Female
  • Humans
  • Male
  • Phonation
  • Phonetics
  • Speech Acoustics*
  • Speech Perception / physiology
  • Voice Quality*
  • Young Adult