Neural estimation of mutual information in speech signals processed by an auditory model

J Acoust Soc Am. 2025 Jan 1;157(1):355-368. doi: 10.1121/10.0034854.

Abstract

The amount of information contained in speech signals is a fundamental concern of speech-based technologies and is particularly relevant in speech perception. Measuring the mutual information of actual speech signals is non-trivial, and quantitative measurements have not been extensively conducted to date. Recent advancements in machine learning have made it possible to directly measure mutual information using data. This study utilized neural estimators of mutual information to estimate the information content in speech signals. The high-dimensional speech signal was divided into segments and then compressed using Mel-scale filter bank, which approximates the non-linear frequency perception of the human ear. The filter bank outputs were then truncated based on the dynamic range of the auditory system. This data compression preserved a significant amount of information from the original high-dimensional speech signal. The amount of information varied, depending on the categories of the speech sounds, with relatively higher mutual information in vowels compared to consonants. Furthermore, the information available in the speech signals, as processed by the auditory model, decreased as the dynamic range was reduced.

MeSH terms

  • Acoustic Stimulation / methods
  • Adult
  • Female
  • Humans
  • Male
  • Phonetics
  • Signal Processing, Computer-Assisted
  • Speech Acoustics
  • Speech Perception* / physiology