Blending speech output and visual text in the multimodal interface

Hum Factors. 2008 Oct;50(5):782-8. doi: 10.1518/001872008X354165.

Abstract

Objective: Simultaneous reading and listening with a redundant display of visual text with speech output was investigated to determine how variations in verbal working memory capacity and content complexity affected comprehension.

Background: Previous work has found some evidence of a benefit for displays that blend speech and visual text; content complexity and verbal working memory capacity are likely to significantly determine this benefit.

Method: In the experiment reported here, a multimodal display of e-mail messages was compared with speech output alone and with a purely visual display. Comprehension of the messages was examined in relation to verbal working memory capacity and the complexity of the messages. Thirty-two users participated in the study, which used a repeated measures design.

Results: The data show that the multimodal interface did not affect comprehension relative to a purely visual interface, even when the content was more complex, although it did improve the comprehension of complex information relative to a purely auditory interface. Lower-capacity participants were neither especially advantaged nor disadvantaged by the multimodal interface. Participants expressed a marked preference for the multimodal display of the more complex sentences.

Conclusion: The experiment suggests that a redundant multimodal display will neither assist nor disrupt understanding when compared with a purely visual display, but it will assist understanding of complex content when compared with speech output alone.

Application: Redundant displays of visual text and speech have potential application in multitask situations, in multimedia presentations, and for devices with small screens.

MeSH terms

  • Comprehension*
  • Electronic Mail
  • Humans
  • Memory
  • Multimedia*
  • Speech Perception
  • User-Computer Interface*
  • Visual Perception