The amount of information contained in speech signals is a fundamental concern of speech-based technologies and is particularly relevant in speech perception. Measuring the mutual information of actual speech signals is non-trivial, and quantitative measurements have not been extensively conducted to date. Recent advancements in machine learning have made it possible to directly measure mutual information using data. This study utilized neural estimators of mutual information to estimate the information content in speech signals. The high-dimensional speech signal was divided into segments and then compressed using Mel-scale filter bank, which approximates the non-linear frequency perception of the human ear. The filter bank outputs were then truncated based on the dynamic range of the auditory system. This data compression preserved a significant amount of information from the original high-dimensional speech signal. The amount of information varied, depending on the categories of the speech sounds, with relatively higher mutual information in vowels compared to consonants. Furthermore, the information available in the speech signals, as processed by the auditory model, decreased as the dynamic range was reduced.
© 2025 Acoustical Society of America.