The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition

Ear Hear. 2024 Nov-Dec;45(6):1444-1460. doi: 10.1097/AUD.0000000000001532. Epub 2024 May 31.

Abstract

Objectives: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks.

Design: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions.

Results: The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially.

Conclusions: A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.

MeSH terms

  • Adult
  • Aged
  • Attention* / physiology
  • Case-Control Studies
  • Female
  • Hearing Loss, Sensorineural / physiopathology
  • Hearing Loss, Sensorineural / rehabilitation
  • Humans
  • Male
  • Middle Aged
  • Noise
  • Psychometrics
  • Speech Perception* / physiology
  • Young Adult