Search | arXiv e-print repository

Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency

Authors: Roman Aperdannier, Sigurd Schacht, Alexander Piazza

Abstract: In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML… ▽ More In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML, and the end-to-end online diarization system FS-EEND are compared. The lowest latency is achieved for the DIART-pipeline with the embedding model pyannote/embedding and the segmentation model pyannote/segmentation. The FS-EEND system shows a similarly good latency. In general there is currently no published research that compares several online diarization systems in terms of their latency. This makes this work even more relevant. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 6 pages

arXiv:2406.14464 [pdf, other]

A Review of Common Online Speaker Diarization Methods

Authors: Roman Aperdannier, Sigurd Schacht, Alexander Piazza

Abstract: Speaker diarization provides the answer to the question "who spoke when?" for an audio file. This information can be used to complete audio transcripts for further processing steps. Most speaker diarization systems assume that the audio file is available as a whole. However, there are scenarios in which the speaker labels are needed immediately after the arrival of an audio segment. Speaker diariz… ▽ More Speaker diarization provides the answer to the question "who spoke when?" for an audio file. This information can be used to complete audio transcripts for further processing steps. Most speaker diarization systems assume that the audio file is available as a whole. However, there are scenarios in which the speaker labels are needed immediately after the arrival of an audio segment. Speaker diarization with a correspondingly low latency is referred to as online speaker diarization. This paper provides an overview. First the history of online speaker diarization is briefly presented. Next a taxonomy and datasets for training and evaluation are given. In the sections that follow, online diarization methods and systems are discussed in detail. This paper concludes with the presentation of challenges that still need to be solved by future research in the field of online speaker diarization. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 6 pages

arXiv:2401.10580 [pdf, other]

PHOENIX: Open-Source Language Adaption for Direct Preference Optimization

Authors: Matthias Uhlig, Sigurd Schacht, Sudarshan Kamath Barkur

Abstract: Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other lang… ▽ More Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2303.01980 [pdf]

Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle

Authors: Vanessa Mehlin, Sigurd Schacht, Carsten Lanquillon

Abstract: Deep Learning has enabled many advances in machine learning applications in the last few years. However, since current Deep Learning algorithms require much energy for computations, there are growing concerns about the associated environmental costs. Energy-efficient Deep Learning has received much attention from researchers and has already made much progress in the last couple of years. This pape… ▽ More Deep Learning has enabled many advances in machine learning applications in the last few years. However, since current Deep Learning algorithms require much energy for computations, there are growing concerns about the associated environmental costs. Energy-efficient Deep Learning has received much attention from researchers and has already made much progress in the last couple of years. This paper aims to gather information about these advances from the literature and show how and at which points along the lifecycle of Deep Learning (IT-Infrastructure, Data, Modeling, Training, Deployment, Evaluation) it is possible to reduce energy consumption. △ Less

Submitted 5 February, 2023; originally announced March 2023.

arXiv:cmp-lg/9410019 [pdf, ps]

Concurrent Lexicalized Dependency Parsing: A Behavioral View on ParseTalk Events

Authors: Susanne Schacht, Udo Hahn, Norbert Broeker

Abstract: The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message pas… ▽ More The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message passing between actors. In particular, we here elaborate on principles of how the global behavior of a lexically distributed grammar and its corresponding parser can be specified in terms of event type networks and event networks, resp. △ Less

Submitted 24 October, 1994; originally announced October 1994.

Comments: 68kB, 5pages Postscript

Report number: CLIF Report 9/94

Journal ref: Proc.15th Intl Conference on Computational Linguistics, Kyoto, Japan, August 1994, pp.498-493

arXiv:cmp-lg/9410017 [pdf, ps]

Concurrent Lexicalized Dependency Parsing: The ParseTalk Model

Authors: Norbert Broeker, Udo Hahn, Susanne Schacht

Abstract: A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passin… ▽ More A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passing protocols for establishing dependency relations and ambiguity handling. △ Less

Submitted 24 October, 1994; originally announced October 1994.

Comments: 90kB, 7pages Postscript

Report number: CLIF Report 9/94

Journal ref: Proc.15th Intl Conference on Computational Linguistics, Kyoto, Japan, August 1994, pp.379-385

Showing 1–6 of 6 results for author: Schacht, S