Automatic detection of the second subglottal resonance and its application to speaker normalization

J Acoust Soc Am. 2009 Dec;126(6):3268-77. doi: 10.1121/1.3257185.

Abstract

Speaker normalization typically focuses on inter-speaker variabilities of the supraglottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies have shown that the subglottal airways also affect spectral properties of speech sounds, and promising results were reported using the subglottal resonances for speaker normalization. This paper proposes a reliable algorithm to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on children's speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method using Sg2 is then presented. This method is computationally more efficient than maximum-likelihood based vocal tract length normalization (VTLN), with performance better than VTLN for limited adaptation data and cross-language adaptation. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Automation*
  • Calibration
  • Child
  • Child, Preschool
  • Female
  • Humans
  • Language
  • Larynx / physiology*
  • Male
  • Models, Biological
  • Multilingualism
  • Phonetics
  • Sound Spectrography
  • Speech / physiology*
  • Speech Acoustics*