Showing 1–2 of 2 results for author: Piazza, A

Search v0.5.6 released 2020-02-24

arXiv:2407.04293 [pdf, other]

cs.CL cs.SD eess.AS

Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency

Authors: Roman Aperdannier, Sigurd Schacht, Alexander Piazza

Abstract: In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML… ▽ More In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML, and the end-to-end online diarization system FS-EEND are compared. The lowest latency is achieved for the DIART-pipeline with the embedding model pyannote/embedding and the segmentation model pyannote/segmentation. The FS-EEND system shows a similarly good latency. In general there is currently no published research that compares several online diarization systems in terms of their latency. This makes this work even more relevant. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 6 pages
arXiv:2406.14464 [pdf, other]

cs.SD cs.CL eess.AS

A Review of Common Online Speaker Diarization Methods

Authors: Roman Aperdannier, Sigurd Schacht, Alexander Piazza

Abstract: Speaker diarization provides the answer to the question "who spoke when?" for an audio file. This information can be used to complete audio transcripts for further processing steps. Most speaker diarization systems assume that the audio file is available as a whole. However, there are scenarios in which the speaker labels are needed immediately after the arrival of an audio segment. Speaker diariz… ▽ More Speaker diarization provides the answer to the question "who spoke when?" for an audio file. This information can be used to complete audio transcripts for further processing steps. Most speaker diarization systems assume that the audio file is available as a whole. However, there are scenarios in which the speaker labels are needed immediately after the arrival of an audio segment. Speaker diarization with a correspondingly low latency is referred to as online speaker diarization. This paper provides an overview. First the history of online speaker diarization is briefly presented. Next a taxonomy and datasets for training and evaluation are given. In the sections that follow, online diarization methods and systems are discussed in detail. This paper concludes with the presentation of challenges that still need to be solved by future research in the field of online speaker diarization. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 6 pages

Search v0.5.6 released 2020-02-24