Skip to main content

Showing 1–41 of 41 results for author: Estève, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05746  [pdf, other

    cs.AI cs.SD eess.AS

    MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition

    Authors: Jarod Duret, Mickael Rouvier, Yannick Estève

    Abstract: In this work, we detail our submission to the 2024 edition of the MSP-Podcast Speech Emotion Recognition (SER) Challenge. This challenge is divided into two distinct tasks: Categorical Emotion Recognition and Emotional Attribute Prediction. We concentrated our efforts on Task 1, which involves the categorical classification of eight emotional states using data from the MSP-Podcast dataset. Our app… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Journal ref: Odyssey 2024, Jun 2024, Quebec, France

  2. arXiv:2407.04533  [pdf, other

    cs.CL cs.SD eess.AS

    Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect

    Authors: Salima Mdhaffar, Haroun Elleuch, Fethi Bougares, Yannick Estève

    Abstract: Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks, including Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). For instance, fine-tuning SSL models for such tasks has shown significant potential, leading to improvements in the SOTA performance across challenging datasets. In contrast to e… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted in ArabicNLP 2024

  3. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu , et al. (7 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese… ▽ More

    Submitted 18 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  4. arXiv:2406.13269  [pdf, other

    cs.AI cs.CL cs.HC eess.SP

    Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

    Authors: Lucas Druart, Valentin Vielzeuf, Yannick Estève

    Abstract: In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: 27th International Conference on Text, Speech and Dialogue, Sep 2024, Brno (R{é}p. Tch{è}que), Czech Republic

  5. arXiv:2406.12141  [pdf, other

    cs.CL cs.SD eess.AS

    A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding

    Authors: Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Yannick Estève

    Abstract: Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations. A recent study investigated SAMU-XLSR in-domain semantic enrichm… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: In Proceedings of Interspeech 2024

  6. arXiv:2405.19342  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

    Authors: Chloé Sekkat, Fanny Leroy, Salima Mdhaffar, Blake Perry Smith, Yannick Estève, Joseph Dureau, Alice Coucke

    Abstract: Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed of voice assistant requests for North American Engl… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  7. arXiv:2405.04296  [pdf, other

    cs.CL cs.LG

    Open Implementation and Study of BEST-RQ for Speech Processing

    Authors: Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève

    Abstract: Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such a… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted in IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond (SASB 2024)

  8. arXiv:2311.04923  [pdf, other

    cs.CL cs.AI eess.AS eess.SP

    Is one brick enough to break the wall of spoken dialogue state tracking?

    Authors: Lucas Druart, Valentin Vielzeuf, Yannick Estève

    Abstract: In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's requests (\textit{a.k.a} dialogue state tracking) is key to a smooth interaction. Traditionally, TOD systems perform this update in three steps: transcription of the user's utterance, semantic extraction of the key concepts, and contextualization with the previously identified concepts. Such cascad… ▽ More

    Submitted 1 July, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  9. arXiv:2310.07279  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing expressivity transfer in textless speech-to-speech translation

    Authors: Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet

    Abstract: Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques. However, existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages. Expressivity plays a vital role in conveying emotions, nuances, and cultural subtleties, thereby enhancing communic… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Journal ref: ASRU, Dec 2023, Taipei, France

  10. arXiv:2310.04481  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

    Authors: Manon Macary, Marie Tahon, Yannick Estève, Daniel Luzzati

    Abstract: The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    ACM Class: I.2.7

  11. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  12. Semantic enrichment towards efficient speech representations

    Authors: Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève

    Abstract: Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: INTERSPEECH 2023

    Journal ref: Proc. Interspeech 2023, 705-709

  13. arXiv:2306.17199  [pdf, other

    eess.AS cs.CL cs.SD

    Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

    Authors: Jarod Duret, Titouan Parcollet, Yannick Estève

    Abstract: We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a language-independent manner. We show that this embedding can be used to predict the pitch and duration of speech units in a target language, allowing us to resynthesiz… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Journal ref: Speech Synthesis Workshop (SSW), Aug 2023, Grenoble, France

  14. arXiv:2306.03773  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

    Authors: Lucas Maison, Yannick Estève

    Abstract: Automatic speech recognition (ASR) systems become increasingly efficient thanks to new advances in neural network training like self-supervised learning. However, they are known to be unfair toward certain groups, for instance, people speaking with an accent. In this work, we use the French Common Voice dataset to quantify the biases of a pre-trained wav2vec~2.0 model toward several demographic gr… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures. Accepted to Interspeech 2023

  15. arXiv:2304.11073  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    OLISIA: a Cascade System for Spoken Dialogue State Tracking

    Authors: Léo Jacqmin, Lucas Druart, Yannick Estève, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

    Abstract: Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language.In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model. We introduce several adaptations in the ASR and DST modules to… ▽ More

    Submitted 31 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  16. arXiv:2303.07924  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Improving Accented Speech Recognition with Multi-Domain Training

    Authors: Lucas Maison, Yannick Estève

    Abstract: Thanks to the rise of self-supervised learning, automatic speech recognition (ASR) systems now achieve near-human performance on a wide variety of datasets. However, they still lack generalization capability and are not robust to domain shifts like accent variations. In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustn… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures. Accepted to ICASSP 2023

  17. arXiv:2302.10790  [pdf, other

    eess.AS cs.LG cs.SD

    Federated Learning for ASR based on Wav2vec 2.0

    Authors: Tuan Nguyen, Salima Mdhaffar, Natalia Tomashenko, Jean-François Bonastre, Yannick Estève

    Abstract: This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. Carried out on the well-known TED-LIUM 3 dataset, our experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set, without sharing any data from the different users. We al… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 5 pages, accepted in ICASSP 2023

  18. arXiv:2210.05291  [pdf, other

    cs.CL cs.SD eess.AS

    On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

    Authors: Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève

    Abstract: In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted in IEEE SLT 2022. This work was performed using HPC resources from GENCI/IDRIS (grant 2022 AD011012565) and received funding from the EU H2020 research and innovation programme under the Marie Sklodowska-Curie ESPERANTO project (grant agreement No 101007666), through the SELMA project (grant No 957017) and from the French ANR through the AISSPER project (ANR-19-CE23-0004)

  19. arXiv:2205.01987  [pdf, ps, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

    Authors: Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tu… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: IWSLT 2022 system paper

  20. arXiv:2204.01397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

    Authors: Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

    Abstract: Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to und… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

  21. arXiv:2204.00803  [pdf, other

    cs.CL cs.SD eess.AS

    End-to-end model for named entity recognition from speech without paired training data

    Authors: Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève

    Abstract: Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this paper, we propose an app… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  22. arXiv:2201.05051  [pdf, ps, other

    cs.CL

    Speech Resources in the Tamasheq Language

    Authors: Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève

    Abstract: In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours)… ▽ More

    Submitted 11 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Accepted to LREC 2022

  23. arXiv:2111.04194  [pdf, other

    cs.CL cs.SD eess.AS

    Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

    Authors: Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

    Abstract: The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific speaker data, can be built. A question that naturally arises is whether the dissemination of p… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  24. arXiv:2111.03777  [pdf, other

    cs.CL cs.CR cs.SD eess.AS

    Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

    Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

    Abstract: This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an a… ▽ More

    Submitted 14 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976

  25. arXiv:2106.13045  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Where are we in semantic concept extraction for Spoken Language Understanding?

    Authors: Sahar Ghannay, Antoine Caubrière, Salima Mdhaffar, Gaëlle Laperrière, Bassam Jabaian, Yannick Estève

    Abstract: Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with the emergence of end-to-end neural approaches. Spoken language understanding refers to natural language processing tasks related to semantic extraction from speech signal, like named entity recognition from speech or slot filling task in a context of human-machine dialogue. Classically, SLU tasks were… ▽ More

    Submitted 11 October, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted in the SPECOM 2021 conference

  26. arXiv:2104.14470  [pdf, other

    cs.CL cs.SD eess.AS

    Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

    Authors: Ha Nguyen, Yannick Estève, Laurent Besacier

    Abstract: Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects o… ▽ More

    Submitted 14 June, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted for presentation at Interspeech 2021

  27. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  28. arXiv:2103.03233  [pdf, other

    cs.CL

    An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

    Authors: Ha Nguyen, Yannick Estève, Laurent Besacier

    Abstract: This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: This paper has been accepted for presentation at IEEE ICASSP 2021

  29. End2End Acoustic to Semantic Transduction

    Authors: Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato De Mori, Antoine Caubrière, Yannick Estève, Sylvain Meignier

    Abstract: In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture capable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system re… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted at IEEE ICASSP 2021

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  30. arXiv:2011.09212  [pdf, other

    cs.CL

    On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition

    Authors: Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau

    Abstract: Pre-training for feature extraction is an increasingly studied approach to get better continuous representations of audio and text content. In the present work, we use wav2vec and camemBERT as self-supervised learned models to represent our data in order to perform continuous emotion recognition from speech (SER) on AlloSat, a large French emotional database describing the satisfaction dimension,… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

    Comments: Accepted in IEEE SLT 2021

  31. arXiv:2007.15296  [pdf, ps, other

    cs.CL

    Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization

    Authors: Paul Tardy, Louis de Seynes, François Hernandez, Vincent Nguyen, David Janiszek, Yannick Estève

    Abstract: Supervised approaches for Neural Abstractive Summarization require large annotated corpora that are costly to build. We present a French meeting summarization task where reports are predicted based on the automatic transcription of the meeting audio recordings. In order to build a corpus for this task, it is necessary to obtain the (automatic or manual) transcription of each meeting, and then to s… ▽ More

    Submitted 17 September, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: To be published in Proceedings of SPECOM 2020

  32. arXiv:2007.07841  [pdf, other

    cs.CL

    Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

    Authors: Paul Tardy, David Janiszek, Yannick Estève, Vincent Nguyen

    Abstract: Summarizing texts is not a straightforward task. Before even considering text summarization, one should determine what kind of summary is expected. How much should the information be compressed? Is it relevant to reformulate or should the summary stick to the original phrasing? State-of-the-art on automatic text summarization mostly revolves around news articles. We suggest that considering a wide… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Journal ref: LREC 2020 -- Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 6718--6724

  33. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  34. arXiv:2003.06894  [pdf, other

    eess.AS cs.CL cs.SD

    Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

    Authors: Natalia Tomashenko, Yuri Khokhlov, Yannick Esteve

    Abstract: In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC featu… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: 36 pages; originally was submitted to CSL in February 2017

  35. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

    Authors: Natalia Tomashenko, Christian Raymond, Antoine Caubriere, Renato De Mori, Yannick Esteve

    Abstract: This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted for ICASSP 2020 (Submitted: October 21, 2019)

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  36. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  37. Recent Advances in End-to-End Spoken Language Understanding

    Authors: Natalia Tomashenko, Antoine Caubriere, Yannick Esteve, Antoine Laurent, Emmanuel Morin

    Abstract: This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the model performance, we explore various techniques inclu… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Journal ref: Statistical Language and Speech Processing. SLSP 2019

  38. arXiv:1906.07601  [pdf, other

    cs.CL cs.SD eess.AS

    Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability

    Authors: Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève

    Abstract: We present an end-to-end approach to extract semantic concepts directly from the speech audio signal. To overcome the lack of data available for this spoken language understanding approach, we investigate the use of a transfer learning strategy based on the principles of curriculum learning. This approach allows us to exploit out-of-domain data that can help to prepare a fully neural architecture.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to the INTERSPEECH 2019 conference. Submitted on March 29, 2019 (Paper submission deadline)

  39. arXiv:1805.12045  [pdf, other

    cs.CL

    End-to-end named entity extraction from speech

    Authors: Sahar Ghannay, Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin

    Abstract: Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propagation, metric to tune ASR systems sub-opt… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Submitted to Interspeech 2018

    ACM Class: I.2.7

  40. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

    Authors: François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève

    Abstract: In this paper, we present TED-LIUM release 3 corpus dedicated to speech recognition in English, that multiplies by more than two the available data to train acoustic models in comparison with TED-LIUM 2. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing… ▽ More

    Submitted 13 June, 2019; v1 submitted 12 May, 2018; originally announced May 2018.

    Comments: Submitted to SPECOM 2018, 20th International Conference on Speech and Computer; TED-LIUM 3 corpus available on https://lium.univ-lemans.fr/en/ted-lium3/

    ACM Class: I.2.7

    Journal ref: SPECOM 2018. Lecture Notes in Computer Science, vol 11096, pp 198-208

  41. arXiv:1705.09515  [pdf, other

    cs.CL cs.AI cs.NE

    ASR error management for improving spoken language understanding

    Authors: Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori

    Abstract: This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels w… ▽ More

    Submitted 26 May, 2017; originally announced May 2017.

    Comments: Interspeech 2017, Aug 2017, Stockholm, Sweden. 2017