An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona)

J Acoust Soc Am. 2024 Dec 1;156(6):3958-3969. doi: 10.1121/10.0034599.

Abstract

Hearing impairment is often characterized by poor speech-in-noise recognition. State-of-the-art laboratory-based noise-reduction technology can eliminate background sounds from a corrupted speech signal and improve intelligibility, but it can also hinder environmental sound recognition (ESR), which is essential for personal independence and safety. This paper presents a time-frequency mask, the ideal compressed mask (ICM), that aims to provide listeners with improved speech intelligibility without substantially reducing ESR. This is accomplished by limiting the maximum attenuation that the mask performs. Speech intelligibility and ESR for hearing-impaired and normal-hearing listeners were measured using stimuli that had been processed by ICMs with various levels of maximum attenuation. This processing resulted in significantly improved intelligibility while retaining high ESR performance for both types of listeners. It was also found that the same level of maximum attenuation provided the optimal balance of intelligibility and ESR for both listener types. It is argued that future deep-learning-based noise reduction algorithms may provide better outcomes by balancing the levels of the target speech and the background environmental sounds, rather than eliminating all signals except for the target speech. The ICM provides one such simple solution for frequency-domain models.

MeSH terms

  • Acoustic Stimulation / methods
  • Adult
  • Aged
  • Female
  • Hearing Loss / physiopathology
  • Hearing Loss / rehabilitation
  • Humans
  • Male
  • Middle Aged
  • Noise* / adverse effects
  • Perceptual Masking*
  • Sound
  • Speech Intelligibility*
  • Speech Perception*
  • Young Adult