Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare

Maryam Zolnoori; Sasha Vergez; Zidu Xu; Elyas Esmaeili; Ali Zolnour; Krystal Anne Briggs; Jihye Kim Scroggins; Seyed Farid Hosseini Ebrahimabad; James M Noble; Maxim Topaz; Suzanne Bakken; Kathryn H Bowles; Ian Spens; Nicole Onorato; Sridevi Sridharan; Margaret V McDonald

doi:10.1093/jamiaopen/ooae130

Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare

JAMIA Open. 2024 Dec 10;7(4):ooae130. doi: 10.1093/jamiaopen/ooae130. eCollection 2024 Dec.

Authors

Maryam Zolnoori^{1

2

3}, Sasha Vergez³, Zidu Xu², Elyas Esmaeili¹, Ali Zolnour¹, Krystal Anne Briggs⁴, Jihye Kim Scroggins², Seyed Farid Hosseini Ebrahimabad⁵, James M Noble^{1

6}, Maxim Topaz^{1

2

3

7}, Suzanne Bakken^{2

7

8}, Kathryn H Bowles^{3

9}, Ian Spens³, Nicole Onorato³, Sridevi Sridharan³, Margaret V McDonald³

Affiliations

¹ Columbia University Irving Medical Center, New York, NY 10032, United States.
² School of Nursing, Columbia University, New York, NY 10032, United States.
³ Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States.
⁴ Department of Computer Science, Columbia University, New York, NY 10027, United States.
⁵ Department of Automatic Control and Computer Science, Politehnica University of Bucharest, Bucharest RO-060042, Romania.
⁶ Department of Neurology, Taub Institute for Research on Alzheimer's Disease and the Aging Brain, GH Sergievsky Center, Columbia University, New York, NY 10032, United States.
⁷ Data Science Institute, Columbia University, New York, NY 10027, United States.
⁸ Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States.
⁹ University of Pennsylvania School of Nursing, Philadelphia, PA 19104, United States.

Abstract

Objectives: As artificial intelligence evolves, integrating speech processing into home healthcare (HHC) workflows is increasingly feasible. Audio-recorded communications enhance risk identification models, with automatic speech recognition (ASR) systems as a key component. This study evaluates the transcription accuracy and equity of 4 ASR systems-Amazon Web Services (AWS) General, AWS Medical, Whisper, and Wave2Vec-in transcribing patient-nurse communication in US HHC, focusing on their ability in accurate transcription of speech from Black and White English-speaking patients.

Materials and methods: We analyzed audio recordings of patient-nurse encounters from 35 patients (16 Black and 19 White) in a New York City-based HHC service. Overall, 860 utterances were available for study, including 475 drawn from Black patients and 385 from White patients. Automatic speech recognition performance was measured using word error rate (WER), benchmarked against a manual gold standard. Disparities were assessed by comparing ASR performance across racial groups using the linguistic inquiry and word count (LIWC) tool, focusing on 10 linguistic dimensions, as well as specific speech elements including repetition, filler words, and proper nouns (medical and nonmedical terms).

Results: The average age of participants was 67.8 years (SD = 14.4). Communication lasted an average of 15 minutes (range: 11-21 minutes) with a median of 1186 words per patient. Of 860 total utterances, 475 were from Black patients and 385 from White patients. Amazon Web Services General had the highest accuracy, with a median WER of 39%. However, all systems showed reduced accuracy for Black patients, with significant discrepancies in LIWC dimensions such as "Affect," "Social," and "Drives." Amazon Web Services Medical performed best for medical terms, though all systems have difficulties with filler words, repetition, and nonmedical terms, with AWS General showing the lowest error rates at 65%, 64%, and 53%, respectively.

Discussion: While AWS systems demonstrated superior accuracy, significant disparities by race highlight the need for more diverse training datasets and improved dialect sensitivity. Addressing these disparities is critical for ensuring equitable ASR performance in HHC settings and enhancing risk prediction models through audio-recorded communication.

Keywords: automatic speech recognition (ASR); health disparities; home healthcare; linguistic inquiry and word count (LIWC); speech to text; word error rate (WER).