Face and voice identity matching accuracy is not improved by multimodal identity information

Harriet M J Smith; Kay L Ritchie; Thom S Baguley; Nadine Lavan

doi:10.1111/bjop.12757

Face and voice identity matching accuracy is not improved by multimodal identity information

Br J Psychol. 2024 Dec 17. doi: 10.1111/bjop.12757. Online ahead of print.

Authors

Harriet M J Smith¹, Kay L Ritchie², Thom S Baguley¹, Nadine Lavan³

Affiliations

¹ NTU Psychology, Nottingham Trent University, Nottingham, UK.
² School of Psychology, University of Lincoln, Lincoln, UK.
³ Department of Biological and Experimental Psychology, Queen Mary University London, London, UK.

PMID: 39690725
DOI: 10.1111/bjop.12757

Abstract

Identity verification from both faces and voices can be error-prone. Previous research has shown that faces and voices signal concordant information and cross-modal unfamiliar face-to-voice matching is possible, albeit often with low accuracy. In the current study, we ask whether performance on a face or voice identity matching task can be improved by using multimodal stimuli which add a second modality (voice or face). We find that overall accuracy is higher for face matching than for voice matching. However, contrary to predictions, presenting one unimodal and one multimodal stimulus within a matching task did not improve face or voice matching compared to presenting two unimodal stimuli. Additionally, we find that presenting two multimodal stimuli does not improve accuracy compared to presenting two unimodal face stimuli. Thus, multimodal information does not improve accuracy. However, intriguingly, we find that cross-modal face-voice matching accuracy predicts voice matching accuracy but not face matching accuracy. This suggests cross-modal information can nonetheless play a role in identity matching, and face and voice information combine to inform matching decisions. We discuss our findings in light of current models of person perception, and consider the implications for identity verification in security and forensic settings.

Keywords: cross‐modal; face; matching; multimodal; unimodal; voice.