Central processing of speech sounds and non-speech sounds with similar spectral distribution: An auditory evoked potential study

Shinsuke Kaneshiro; Harukazu Hiraumi; Hiroaki Sato

doi:10.1016/j.anl.2020.02.008

Central processing of speech sounds and non-speech sounds with similar spectral distribution: An auditory evoked potential study

Auris Nasus Larynx. 2020 Oct;47(5):727-733. doi: 10.1016/j.anl.2020.02.008. Epub 2020 Feb 24.

Authors

Shinsuke Kaneshiro¹, Harukazu Hiraumi², Hiroaki Sato¹

Affiliations

¹ Department of Otolaryngology - Head and Neck Surgery, Iwate Medical University, 19-1, Uchimaru, Morioka, Iwate, Japan.
² Department of Otolaryngology - Head and Neck Surgery, Iwate Medical University, 19-1, Uchimaru, Morioka, Iwate, Japan. Electronic address: [email protected].

PMID: 32102744
DOI: 10.1016/j.anl.2020.02.008

Abstract

Objective: The purpose of this study was to measure the auditory evoked potentials for speech and non-speech sounds with similar spectral distributions.

Methods: We developed two types of sounds, comprising naturally spoken vowels (natural speech sounds) and complex synthesized sounds (synthesized sounds). Natural speech sounds consisted of 5 Japanese vowels. Synthesized sounds consisted of a fundamental frequency and its second to fifteenth harmonics equivalent to those of natural speech sounds. The synthesized sound was filtered to have a similar spectral distribution to that of each natural speech sound. These sounds were low-pass filtered at 2000 Hz. The auditory evoked potential elicited by the natural speech sound /o/ and synthesized counterpart for /o/ were measured in 10 right-handed healthy adults with normal hearing.

Results: The natural speech sounds were significantly highly recognized as speech compared to the synthesized sounds (74.4% v.s. 13.8%, p < 0.01). The natural speech and synthesized sounds for the vowel /o/ contrasted strongly for speech perception (96.9% vs. 9.4%, p <0.01). However, the vowel /i/ and its counterpart were barely recognized as speech (4.7 v.s. 3.1%, p = 1.00). The N1 peak amplitudes and latencies evoked by the natural speech sound /o/ were not different from those evoked by the synthesized sound (p = 0.58 and p = 0.28, respectively). The P2 amplitudes evoked by the natural speech sound /o/ were not different from those evoked by the synthesized sound (p = 0.51). The P2 latencies evoked by the natural speech sound /o/ were significantly shorter than those evoked by the synthesized sound (p < 0.01). This modulation was not observed in a control study using the vowel /i/ and its counterpart (p = 0.29).

Conclusion: The early P2 observed may reflect central auditory processing of the 'speechness' of complex sounds.

Keywords: Auditory evoked potential; P2; Spectrum; Speech; Vowel.

MeSH terms

Adult
Auditory Perception / physiology*
Electroencephalography
Evoked Potentials, Auditory / physiology*
Female
Healthy Volunteers
Humans
Male
Reaction Time / physiology
Speech / physiology
Speech Acoustics
Speech Perception / physiology*