The JIBO Kids Corpus: A speech dataset of child-robot interactions in a classroom environment

Natarajan Balaji Shankar; Amber Afshan; Alexander Johnson; Aurosweta Mahapatra; Alejandra Martin; Haolun Ni; Hae Won Park; Marlen Quintero Perez; Gary Yeung; Alison Bailey; Cynthia Breazeal; Abeer Alwan

doi:10.1121/10.0034195

The JIBO Kids Corpus: A speech dataset of child-robot interactions in a classroom environment

JASA Express Lett. 2024 Nov 1;4(11):115201. doi: 10.1121/10.0034195.

Affiliations

¹ Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, California 90095, USA.
² Department of Education, University of California Los Angeles, Los Angeles, California 90095, USA.
³ MIT Media Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected].

PMID: 39485321
DOI: 10.1121/10.0034195

Abstract

This paper describes an original dataset of children's speech, collected through the use of JIBO, a social robot. The dataset encompasses recordings from 110 children, aged 4-7 years old, who participated in a letter and digit identification task and extended oral discourse tasks requiring explanation skills, totaling 21 h of session data. Spanning a 2-year collection period, this dataset contains a longitudinal component with a subset of participants returning for repeat recordings. The dataset, with session recordings and transcriptions, is publicly available, providing researchers with a valuable resource to advance investigations into child language development.

MeSH terms

Child
Child, Preschool
Female
Humans
Language Development
Male
Robotics*
Schools
Speech*