The human auditory system can localize multiple sound sources using time, intensity, and frequency cues in the sound received by the two ears. Being able to spatially segregate the sources helps perception in a challenging condition when multiple sounds coexist. This study used model simulations to explore an algorithm for localizing multiple sources in azimuth with binaural (i.e., two) microphones. The algorithm relies on the "sparseness" property of daily signals in the time-frequency domain, and sound coming from different locations carrying unique spatial features will form clusters. Based on an interaural normalization procedure, the model generated spiral patterns for sound sources in the frontal hemifield. The model itself was created using broadband noise for better accuracy, because speech typically has sporadic energy at high frequencies. The model at an arbitrary frequency can be used to predict locations of speech and music that occurred alone or concurrently, and a classification algorithm was applied to measure the localization error. Under anechoic conditions, averaged errors in azimuth increased from 4.5° to 19° with RMS errors ranging from 6.4° to 26.7° as model frequency increased from 300 to 3000 Hz. The low-frequency model performance using short speech sound was notably better than the generalized cross-correlation model. Two types of room reverberations were then introduced to simulate difficult listening conditions. Model performance under reverberation was more resilient at low frequencies than at high frequencies. Overall, our study presented a spiral model for rapidly predicting horizontal locations of concurrent sound that is suitable for real-world scenarios.
Keywords: HRTF; ILD; ITD; Reverberations; Robotics; Sound localization; Sparseness.
Copyright © 2023. Published by Elsevier B.V.