Neural estimation of mutual information in speech signals processed by an auditory model

Donghoon Shin; Hyung Soon Kim

doi:10.1121/10.0034854

Neural estimation of mutual information in speech signals processed by an auditory model

J Acoust Soc Am. 2025 Jan 1;157(1):355-368. doi: 10.1121/10.0034854.

Authors

Donghoon Shin^{1

2}, Hyung Soon Kim¹

Affiliations

¹ Department of Electronics Engineering, Pusan National University, Busan, South Korea.
² Agency for Defense Development, Changwon, 51698, South Korea.

PMID: 39835827
DOI: 10.1121/10.0034854

Abstract

The amount of information contained in speech signals is a fundamental concern of speech-based technologies and is particularly relevant in speech perception. Measuring the mutual information of actual speech signals is non-trivial, and quantitative measurements have not been extensively conducted to date. Recent advancements in machine learning have made it possible to directly measure mutual information using data. This study utilized neural estimators of mutual information to estimate the information content in speech signals. The high-dimensional speech signal was divided into segments and then compressed using Mel-scale filter bank, which approximates the non-linear frequency perception of the human ear. The filter bank outputs were then truncated based on the dynamic range of the auditory system. This data compression preserved a significant amount of information from the original high-dimensional speech signal. The amount of information varied, depending on the categories of the speech sounds, with relatively higher mutual information in vowels compared to consonants. Furthermore, the information available in the speech signals, as processed by the auditory model, decreased as the dynamic range was reduced.

MeSH terms

Acoustic Stimulation / methods
Adult
Female
Humans
Male
Phonetics
Signal Processing, Computer-Assisted
Speech Acoustics
Speech Perception* / physiology