Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Carbonneau, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12881  [pdf, other

    cs.CL cs.AI

    BinaryAlign: Word Alignment as Binary Sequence Labeling

    Authors: Gaetan Lopez Latouche, Marc-André Carbonneau, Ben Swanson

    Abstract: Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment training data for a particular language pair. We propose BinaryAlign, a novel word alignment technique based on binary sequence labeling that outperforms existing app… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024

  2. arXiv:2407.11854  [pdf, other

    cs.CL cs.AI

    Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection

    Authors: Gaetan Lopez Latouche, Marc-André Carbonneau, Ben Swanson

    Abstract: Grammatical Error Detection (GED) methods rely heavily on human annotated error corpora. However, these annotations are unavailable in many low-resource languages. In this paper, we investigate GED in this context. Leveraging the zero-shot cross-lingual transfer capabilities of multilingual pre-trained language models, we train a model using data from a diverse set of languages to generate synthet… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Submitted to EMNLP 2024

  3. arXiv:2404.14634  [pdf, other

    cs.CV

    UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues

    Authors: Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad

    Abstract: We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image… ▽ More

    Submitted 9 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to ECCV 2024, 32 pages, 12 figures

  4. arXiv:2312.13091  [pdf, other

    cs.CV cs.GR cs.LG

    MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

    Authors: Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

    Abstract: Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light… ▽ More

    Submitted 21 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: https://ubisoft-laforge.github.io/character/mosar/

    MSC Class: 68T45 (Primary) 68T07; 68T01 (Secondary) ACM Class: I.2.10; I.4; I.3.3; I.5

  5. arXiv:2311.08667  [pdf, other

    cs.SD eess.AS

    EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

    Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan

    Abstract: Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining wit… ▽ More

    Submitted 18 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS Workshop: Machine Learning for Audio (Camera Ready)

  6. arXiv:2307.06040  [pdf, other

    eess.AS cs.LG cs.SD

    Rhythm Modeling for Voice Conversion

    Authors: Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper

    Abstract: Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representatio… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: 5 pages, 4 figures, 4 tables, submitted to IEEE Signal Processing Letters

  7. arXiv:2209.07556  [pdf, other

    cs.GR cs.LG cs.SD

    ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

    Authors: Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus F. Troje, Marc-André Carbonneau

    Abstract: We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scalin… ▽ More

    Submitted 23 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  8. A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

    Authors: Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper

    Abstract: The goal of voice conversion is to transform source speech into a target voice, keeping the content unchanged. In this paper, we focus on self-supervised representation learning for voice conversion. Specifically, we compare discrete and soft speech units as input features. We find that discrete representations effectively remove speaker information but discard some linguistic content - leading to… ▽ More

    Submitted 8 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 5 pages, 2 figures, 2 tables. Accepted at ICASSP 2022

  9. Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

    Authors: Julian Zaïdi, Hugo Seuté, Benjamin van Niekerk, Marc-André Carbonneau

    Abstract: This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cross-speaker prosody transfer on any text. This is one of the most challenging, and rarely directly addressed, task in speech synthesis, especially for highly expressive data. Daft-Exprt uses FiLM conditioning layers to strategically inject different prosodic information in all parts of the architect… ▽ More

    Submitted 5 April, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: Submitted to Interspeech 2022, 5 pages, 5 figures, 2 tables

    Journal ref: Proc. Interspeech (2022) 4591-4595

  10. arXiv:2103.12177  [pdf, other

    cs.LG eess.SP

    Energy Disaggregation using Variational Autoencoders

    Authors: Antoine Langevin, Marc-André Carbonneau, Mohamed Cheriet, Ghyslain Gagnon

    Abstract: Non-intrusive load monitoring (NILM) is a technique that uses a single sensor to measure the total power consumption of a building. Using an energy disaggregation method, the consumption of individual appliances can be estimated from the aggregate measurement. Recent disaggregation algorithms have significantly improved the performance of NILM systems. However, the generalization capability of the… ▽ More

    Submitted 19 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: 13 pages, 2 figures, results for the REFIT dataset added

  11. arXiv:2012.09276  [pdf, ps, other

    cs.LG cs.AI

    Measuring Disentanglement: A Review of Metrics

    Authors: Marc-André Carbonneau, Julian Zaidi, Jonathan Boilard, Ghyslain Gagnon

    Abstract: Learning to disentangle and represent factors of variation in data is an important problem in AI. While many advances have been made to learn these representations, it is still unclear how to quantify disentanglement. While several metrics exist, little is known on their implicit assumptions, what they truly measure, and their limits. In consequence, it is difficult to interpret results when compa… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

  12. Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems

    Authors: Marc-André Carbonneau, Eric Granger, Ghyslain Gagnon

    Abstract: A growing number of applications, e.g. video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels o… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

  13. Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Authors: Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon

    Abstract: Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document c… ▽ More

    Submitted 10 December, 2016; originally announced December 2016.

  14. Feature Learning from Spectrograms for Assessment of Personality Traits

    Authors: Marc-André Carbonneau, Eric Granger, Yazid Attabi, Ghyslain Gagnon

    Abstract: Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on… ▽ More

    Submitted 4 October, 2016; originally announced October 2016.

    Comments: 12 pages, 3 figures