Skip to main content

Showing 1–23 of 23 results for author: Papi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14177  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by combining AlignAtt and SeamlessM4T in its medium configuration. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.06097  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the challenges of handling continuous and unbounded audio streams. This requires additional decisions about what to retain of the previous history, which is impractical… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 main conference

  3. arXiv:2405.10741  [pdf, other

    cs.CL

    SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

    Abstract: Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subta… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 main conference

  4. arXiv:2402.13208  [pdf, other

    cs.CL cs.AI

    How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity. Consequently, research efforts in the last few years focused on finding more efficient alternatives. Among them, Hyena (Poli et al., 2023) stands out for achieving competitive results in both language modeling and image classification,… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  5. arXiv:2402.12025  [pdf, other

    cs.CL

    Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP. This paradigm has extended to other modalities, including speech, where researchers are actively exploring the combination of Speech Foundation Models (SFMs) and LLMs into single, uni… ▽ More

    Submitted 17 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to the ACL 2024 main conference

  6. arXiv:2310.15752  [pdf, other

    cs.CL cs.AI

    Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

    Authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

    Abstract: When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST d… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  7. arXiv:2310.14806  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

    Authors: Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

    Abstract: The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in c… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  8. arXiv:2309.15554  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

    Authors: Sara Papi, Marco Gaido, Matteo Negri

    Abstract: This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks: for the simultaneous one, we leveraged the knowledge already acquired by offline-trained models and directly applied a policy to obtain the real-time inference; for the su… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Published at IWSTL 2023

    Journal ref: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

  9. arXiv:2307.03354  [pdf, other

    cs.CL cs.SD eess.AS

    Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

    Authors: Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

    Abstract: In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transformer-Transducer that jointly generates automatic speech recognition (ASR) and speech translation (ST) outputs using a single decoder. To produce ASR and… ▽ More

    Submitted 2 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted at ASRU 2023

  10. arXiv:2305.11408  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

    Authors: Sara Papi, Marco Turchi, Matteo Negri

    Abstract: Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the cas… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

    Journal ref: Proceedings of INTERSPEECH 2023

  11. arXiv:2303.16166  [pdf, other

    cs.CL cs.AI

    When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP

    Authors: Sara Papi, Marco Gaido, Andrea Pilzer, Matteo Negri

    Abstract: Despite its crucial role in research experiments, code correctness is often presumed only on the basis of the perceived quality of results. This assumption comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on reproducibility should go hand in hand with the emphasis on software quality. We present a case study in wh… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted at ACL 2024 main conference

  12. Attention as a Guide for Simultaneous Speech Translation

    Authors: Sara Papi, Matteo Negri, Marco Turchi

    Abstract: The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this… ▽ More

    Submitted 11 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023

    Journal ref: Proceedings of ACL 2023

  13. Joint Speech Translation and Named Entity Recognition

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Marco Turchi

    Abstract: Modern automatic translation systems aim at place the human at the center by providing contextual support and knowledge. In this context, a critical task is enriching the output with information regarding the mentioned entities, which is currently achieved processing the generated translation with named entity recognition (NER) and entity linking systems. In light of the recent promising results s… ▽ More

    Submitted 20 May, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted at INTERSPEECH 2023

  14. arXiv:2209.13192  [pdf, other

    cs.CL

    Direct Speech Translation for Automatic Subtitling

    Authors: Sara Papi, Marco Gaido, Alina Karakanta, Mauro Cettolo, Matteo Negri, Marco Turchi

    Abstract: Automatic subtitling is the task of automatically translating the speech of audiovisual content into short pieces of timed text, i.e. subtitles and their corresponding timestamps. The generated subtitles need to conform to space and time requirements, while being synchronised with the speech and segmented in a way that facilitates comprehension. Given its considerable complexity, the task has so f… ▽ More

    Submitted 25 July, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted at TACL

  15. arXiv:2209.10608  [pdf, other

    cs.CL

    Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora

    Authors: Sara Papi, Alina Karakanta, Matteo Negri, Marco Turchi

    Abstract: Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, however, the text has to be also annotated with subtitle… ▽ More

    Submitted 16 November, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Journal ref: AACL 2022

  16. Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL). In this paper we highlight that, despite its widespread adoption, AL provides underestimated scores for systems that generate longer predictions compared to the corresponding references. We also show that this problem has pr… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: AutoSimTrans Workshop @ NAACL2022

    Journal ref: Proceedings of the Third Workshop on Automatic Simultaneous Translation (AutoSimTrans 2022)

  17. Efficient yet Competitive Speech Translation: FBK@IWSLT2022

    Authors: Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri, Marco Turchi

    Abstract: The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ra… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: IWSLT 2022 System Description

    Journal ref: Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

  18. Does Simultaneous Speech Translation need Simultaneous Models?

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task. To meet the latency constraints posed by the different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, motivated by the increased social and environmental imp… ▽ More

    Submitted 16 November, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Findings of EMNLP 2022

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2022

  19. arXiv:2111.00514  [pdf, ps, other

    cs.CL

    Visualization: the missing factor in Simultaneous Speech Translation

    Authors: Sara Papi, Matteo Negri, Marco Turchi

    Abstract: Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input. In recent years, SimulST has become popular due to the spread of cross-lingual application scenarios, like international live conferences and streaming lectures, in which on-the-fly speech translation can facilitate users' access to audio-visual content. In thi… ▽ More

    Submitted 8 November, 2021; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted at CLIC-it 2021

    Journal ref: Italian Conference on Computational Linguistics 2021

  20. Speechformer: Reducing Information Loss in Direct Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer's quadratic complexity with respect to the input sequence length prevents its adoption as is with audio signals, which are typically represented by long sequences. Current solutions resort to an initial sub-optimal compression… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021 Main Conference

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

  21. arXiv:2107.08807  [pdf, other

    cs.CL

    Simultaneous Speech Translation for Live Subtitling: from Delay to Display

    Authors: Alina Karakanta, Sara Papi, Matteo Negri, Marco Turchi

    Abstract: With the increased audiovisualisation of communication, the need for live subtitles in multilingual events is more relevant than ever. In an attempt to automatise the process, we aim at exploring the feasibility of simultaneous speech translation (SimulST) for live subtitling. However, the word-for-word rate of generation of SimulST systems is not optimal for displaying the subtitles in a comprehe… ▽ More

    Submitted 20 July, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Journal ref: Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW 2021)

  22. arXiv:2106.12607  [pdf, other

    cs.CL cs.SD eess.AS

    Dealing with training and test segmentation mismatch: FBK@IWSLT2021

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning st… ▽ More

    Submitted 28 June, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted at IWSLT2021

    Journal ref: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

  23. Mixtures of Deep Neural Experts for Automated Speech Scoring

    Authors: Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna

    Abstract: The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts. The task has significant relevance to the field of computer assisted language learning. The approach presented in the paper relies on two separate modules: (1) an automatic speech recognition system that yields text transcripts of the spoken interaction… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of INTERSPEECH 2020