Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Vielzeuf, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13269  [pdf, other

    cs.AI cs.CL cs.HC eess.SP

    Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

    Authors: Lucas Druart, Valentin Vielzeuf, Yannick Estève

    Abstract: In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: 27th International Conference on Text, Speech and Dialogue, Sep 2024, Brno (R{é}p. Tch{è}que), Czech Republic

  2. arXiv:2406.07696  [pdf, other

    cs.CL

    Sustainable self-supervised learning for speech representations

    Authors: Luis Lugo, Valentin Vielzeuf

    Abstract: Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive, generating environmental concerns because of their high energy consumption. Thus, we propose a sustainable self-supervised model to learn speech representation… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2405.08402  [pdf, other

    cs.CL

    Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining

    Authors: Valentin Vielzeuf

    Abstract: Self-supervised learning has shown great success in Speech Recognition. However, it has been observed that finetuning all layers of the learned model leads to lower performance compared to resetting top layers. This phenomenon is attributed to the ''autoencoder'' behavior: top layers contain information closer to the input and are less suitable for tasks that require linguistic information, such a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  4. arXiv:2312.11142  [pdf, other

    cs.CL

    Efficiency-oriented approaches for self-supervised speech representation learning

    Authors: Luis Lugo, Valentin Vielzeuf

    Abstract: Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are model… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 16 pages, 3 figures

    MSC Class: A.1

  5. arXiv:2311.04923  [pdf, other

    cs.CL cs.AI eess.AS eess.SP

    Is one brick enough to break the wall of spoken dialogue state tracking?

    Authors: Lucas Druart, Valentin Vielzeuf, Yannick Estève

    Abstract: In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's requests (\textit{a.k.a} dialogue state tracking) is key to a smooth interaction. Traditionally, TOD systems perform this update in three steps: transcription of the user's utterance, semantic extraction of the key concepts, and contextualization with the previously identified concepts. Such cascad… ▽ More

    Submitted 1 July, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  6. arXiv:2311.04922  [pdf, other

    cs.CL cs.AI eess.AS eess.SP

    Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?

    Authors: Lucas Druart, Léo Jacqmin, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

    Abstract: In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's needs is key to a smooth interaction. Traditionally TOD systems are composed of several modules that interact with one another. While each of these components is the focus of active research communities, their behavior in interaction can be overlooked. This paper proposes a comprehensive analysis o… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Submitted to IEEE ICASSP 2024© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  7. arXiv:2304.11073  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    OLISIA: a Cascade System for Spoken Dialogue State Tracking

    Authors: Léo Jacqmin, Lucas Druart, Yannick Estève, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

    Abstract: Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language.In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model. We introduce several adaptations in the ASR and DST modules to… ▽ More

    Submitted 31 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  8. arXiv:2112.12572  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Are E2E ASR models ready for an industrial usage?

    Authors: Valentin Vielzeuf, Grigory Antipov

    Abstract: The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of the fully-neural (End-to-End, E2E) approaches. At the same time, the conventional hybrid model remains the standard choice for the practical usage of ASR. According to previous studies, the adoption of E2E ASR in real-world applications was hindered by two main limitations: their ability to generali… ▽ More

    Submitted 21 October, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  9. arXiv:2109.01163  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

    Authors: Maxime Burchi, Valentin Vielzeuf

    Abstract: The recently proposed Conformer architecture has shown state-of-the-art performances in Automatic Speech Recognition by combining convolution with attention to model both local and global dependencies. In this paper, we study how to reduce the Conformer architecture complexity with a limited computing budget, leading to a more efficient architecture design that we call Efficient Conformer. We intr… ▽ More

    Submitted 8 September, 2021; v1 submitted 31 August, 2021; originally announced September 2021.

    Journal ref: ASRU 2021, Dec 2021, Cartagena, Colombia

  10. arXiv:1911.03222  [pdf, other

    cs.LG stat.ML

    Towards a General Model of Knowledge for Facial Analysis by Multi-Source Transfer Learning

    Authors: Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, Frédéric Jurie

    Abstract: This paper proposes a step toward obtaining general models of knowledge for facial analysis, by addressing the question of multi-source transfer learning. More precisely, the proposed approach consists in two successive training steps: the first one consists in applying a combination operator to define a common embedding for the multiple sources materialized by different existing trained models. T… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

  11. arXiv:1903.06496  [pdf, other

    cs.LG cs.CV cs.NE

    MFAS: Multimodal Fusion Architecture Search

    Authors: Juan-Manuel Pérez-Rúa, Valentin Vielzeuf, Stéphane Pateux, Moez Baccouche, Frédéric Jurie

    Abstract: We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonst… ▽ More

    Submitted 15 March, 2019; originally announced March 2019.

    Comments: CVPR 2019, Jun 2019, Long Beach, United States http://cvpr2019.thecvf.com/

  12. arXiv:1811.02447  [pdf, other

    cs.CV

    Multi-Level Sensor Fusion with Deep Learning

    Authors: Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, Frédéric Jurie

    Abstract: In the context of deep learning, this article presents an original deep network, namely CentralNet, for the fusion of information coming from different sensors. This approach is designed to efficiently and automatically balance the trade-off between early and late fusion (i.e. between the fusion of low-level vs high-level information). More specifically, at each level of abstraction-the different… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1808.07275

  13. arXiv:1810.13197  [pdf, other

    cs.NE cs.AI cs.CV

    The Many Moods of Emotion

    Authors: Valentin Vielzeuf, Corentin Kervadec, Stéphane Pateux, Frédéric Jurie

    Abstract: This paper presents a novel approach to the facial expression generation problem. Building upon the assumption of the psychological community that emotion is intrinsically continuous, we first design our own continuous emotion representation with a 3-dimensional latent space issued from a neural network trained on discrete emotion classification. The so-obtained representation can be used to annot… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

  14. arXiv:1808.07275  [pdf, other

    cs.AI cs.CV cs.MM

    CentralNet: a Multilayer Approach for Multimodal Fusion

    Authors: Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, Frédéric Jurie

    Abstract: This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from bo… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Journal ref: European Conference on Computer Vision Workshops: Multimodal Learning and Applications, Sep 2018, Munich, Germany. https://mula2018.github.io/

  15. arXiv:1808.02668  [pdf, other

    cs.AI cs.CV cs.NE stat.ML

    An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

    Authors: Valentin Vielzeuf, Corentin Kervadec, Stéphane Pateux, Alexis Lechervy, Frédéric Jurie

    Abstract: This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of t… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Journal ref: ICMI (EmotiW) 2018, Oct 2018, Boulder, Colorado, United States

  16. arXiv:1807.11215  [pdf, other

    cs.AI cs.CV cs.NE

    CAKE: Compact and Accurate K-dimensional representation of Emotion

    Authors: Corentin Kervadec, Valentin Vielzeuf, Stéphane Pateux, Alexis Lechervy, Frédéric Jurie

    Abstract: Numerous models describing the human emotional states have been built by the psychology community. Alongside, Deep Neural Networks (DNN) are reaching excellent performances and are becoming interesting features extraction tools in many computer vision tasks.Inspired by works from the psychology community, we first study the link between the compact two-dimensional representation of the emotion kno… ▽ More

    Submitted 3 August, 2018; v1 submitted 30 July, 2018; originally announced July 2018.

    Journal ref: Image Analysis for Human Facial and Activity Recognition (BMVC Workshop), Sep 2018, Newcastle, United Kingdom. http://juz-dev.myweb.port.ac.uk/BMVCWorkshop/index.html

  17. arXiv:1709.07200  [pdf, other

    cs.CV cs.LG cs.MM

    Temporal Multimodal Fusion for Video Emotion Classification in the Wild

    Authors: Valentin Vielzeuf, Stéphane Pateux, Frédéric Jurie

    Abstract: This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework -- lying in describing videos by audio and visual features used by a supervised classifier to infer the labels -- this paper investigates several novel directi… ▽ More

    Submitted 21 September, 2017; originally announced September 2017.

    Journal ref: ACM - ICMI 2017, Nov 2017, Glasgow, United Kingdom