-
UserLibri: A Dataset for ASR Personalization Using Only Text
Authors:
Theresa Breiner,
Swaroop Ramaswamy,
Ehsan Variani,
Shefali Garg,
Rajiv Mathews,
Khe Chai Sim,
Kilol Gupta,
Mingqing Chen,
Lara McConnaughey
Abstract:
Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech co…
▽ More
Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Authors:
Joseph Roth,
Sourish Chaudhuri,
Ondrej Klejch,
Radhika Marvin,
Andrew Gallagher,
Liat Kaver,
Sharadh Ramaswamy,
Arkadiusz Stopczynski,
Cordelia Schmid,
Zhonghua Xi,
Caroline Pantofaru
Abstract:
Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made com…
▽ More
Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made comparisons and improvements difficult. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) that will be released publicly to facilitate algorithm development and enable comparisons. The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio. We also present a new audio-visual approach for active speaker detection, and analyze its performance, demonstrating both its strength and the contributions of the dataset.
△ Less
Submitted 24 May, 2019; v1 submitted 4 January, 2019;
originally announced January 2019.
-
Design Contracts For Networked Automation Systems Co-design
Authors:
B. Sreram,
Seshadhri Srinivasan,
B. Subathra,
Srini Ramaswamy
Abstract:
Networked automation systems (NAS) are characterized by confluence of control, computation, communication and Information (C3I) technologies. Design decisions of one domain are affected by the constraints posed by others. Reliable NAS design should address the requirements of the system, and simultaneously meet the constraints posed by other domains and this is called co-design in literature. Co-d…
▽ More
Networked automation systems (NAS) are characterized by confluence of control, computation, communication and Information (C3I) technologies. Design decisions of one domain are affected by the constraints posed by others. Reliable NAS design should address the requirements of the system, and simultaneously meet the constraints posed by other domains and this is called co-design in literature. Co-design requires clear definition of interfaces among these domains. Control design in NAS is affected by the timing imperfections posed by other domains. In this investigation, we first study the different sources of timing imperfections in NAS, and classify them based on their occurrence. The concept of jitter is used to define the timing imperfections induced by various system components. Using this analysis, we classify the jitter based on their behavior and domain of occurrence. Our analysis shows that the jitter induced in NAS can be classified based on domain as- hardware, software and communication. Next, we use this analysis to model the jitter from the components of NAS. Modeling timing imperfections helps in capturing the interfaces among the domains, and we use the concept of design contracts to capture the interfaces. Design contracts describe the semantic mapping among the domains and are specified using the jitter margins. Implementing design contracts requires knowledge of the jitter margin and, the results from control theory are used to this extent.
△ Less
Submitted 10 July, 2015;
originally announced July 2015.
-
Verifying Response Times in Networked Automation Systems Using Jitter Bounds
Authors:
Seshadhri Srinivasan,
Furio Buonopane,
Srini Ramaswamy,
Juri Vain
Abstract:
Networked Automation Systems (NAS) have to meet stringent response time during operation. Verifying response time of automation is an important step during design phase before deployment. Timing discrepancies due to hardware, software and communication components of NAS affect the response time. This investigation uses model templates for verifying the response time in NAS. First, jitter bounds mo…
▽ More
Networked Automation Systems (NAS) have to meet stringent response time during operation. Verifying response time of automation is an important step during design phase before deployment. Timing discrepancies due to hardware, software and communication components of NAS affect the response time. This investigation uses model templates for verifying the response time in NAS. First, jitter bounds model the timing fluctuations of NAS components. These jitter bounds are the inputs to model templates that are formal models of timing fluctuations. The model templates are atomic action patterns composed of three composition operators- sequential, alternative, and parallel and embedded in time wrapper that specifies clock driven activation conditions. Model templates in conjunction with formal model of technical process offer an easier way to verify the response time. The investigation demonstrates the proposed verification method using an industrial steam boiler with typical NAS components in plant floor.
△ Less
Submitted 15 July, 2015;
originally announced July 2015.