Computer Audition for Fighting the SARS-CoV-2 Corona Crisis-Introducing the Multitask Speech Corpus for COVID-19

IEEE Internet Things J. 2021 Mar 22;8(21):16035-16046. doi: 10.1109/JIOT.2021.3067605. eCollection 2021 Nov 1.

Abstract

Computer audition (CA) has experienced a fast development in the past decades by leveraging advanced signal processing and machine learning techniques. In particular, for its noninvasive and ubiquitous character by nature, CA-based applications in healthcare have increasingly attracted attention in recent years. During the tough time of the global crisis caused by the coronavirus disease 2019 (COVID-19), scientists and engineers in data science have collaborated to think of novel ways in prevention, diagnosis, treatment, tracking, and management of this global pandemic. On the one hand, we have witnessed the power of 5G, Internet of Things, big data, computer vision, and artificial intelligence in applications of epidemiology modeling, drug and/or vaccine finding and designing, fast CT screening, and quarantine management. On the other hand, relevant studies in exploring the capacity of CA are extremely lacking and underestimated. To this end, we propose a novel multitask speech corpus for COVID-19 research usage. We collected 51 confirmed COVID-19 patients' in-the-wild speech data in Wuhan city, China. We define three main tasks in this corpus, i.e., three-category classification tasks for evaluating the physical and/or mental status of patients, i.e., sleep quality, fatigue, and anxiety. The benchmarks are given by using both classic machine learning methods and state-of-the-art deep learning techniques. We believe this study and corpus cannot only facilitate the ongoing research on using data science to fight against COVID-19, but also the monitoring of contagious diseases for general purpose.

Keywords: Computer audition; coronavirus disease 2019 (COVID-19); deep learning Internet of Medical Things (IoMT); machine learning.

Grants and funding

This work was supported in part by the Zhejiang Lab’s International Talent Fund for Young Professionals under (Project HANAMI), China; in part by the JSPS Postdoctoral Fellowship for Research in Japan under Grant P19081 from the Japan Society for the Promotion of Science (JSPS), Japan; in part by the Grants-in-Aid for Scientific Research under Grant 19F19081 and Grant 20H00569 from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan; and in part by the European Union’s Horizon 2020 Programme by the Smart Environments for Person-Centered Sustainable Work and Well-Being (SustAGE) under Grant 826506.