Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Futeral, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13579  [pdf, other

    cs.CL

    Towards Zero-Shot Multimodal Machine Translation

    Authors: Matthieu Futeral, Cordelia Schmid, Benoît Sagot, Rachel Bawden

    Abstract: Current multimodal machine translation (MMT) systems rely on fully supervised data (i.e models are trained on sentences with their translations and accompanying images). However, this type of data is costly to collect, limiting the extension of MMT to other language pairs for which such data does not exist. In this work, we propose a method to bypass the need for fully supervised data to train MMT… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  2. arXiv:2406.08707  [pdf, other

    cs.CL cs.CV

    mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

    Authors: Matthieu Futeral, Armel Zebaze, Pedro Ortiz Suarez, Julien Abadji, Rémi Lacroix, Cordelia Schmid, Rachel Bawden, Benoît Sagot

    Abstract: Multimodal Large Language Models (mLLMs) are trained on a large amount of text-image data. While most mLLMs are trained on caption-like data only, Alayrac et al. [2022] showed that additionally training them on interleaved sequences of text and images can lead to the emergence of in-context learning capabilities. However, the dataset they used, M3W, is not public and is only in English. There have… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  3. arXiv:2404.10419  [pdf, other

    eess.AS cs.CL

    MAD Speech: Measures of Acoustic Diversity of Speech

    Authors: Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov

    Abstract: Generative spoken language models produce speech in a wide range of voices, prosody, and recording conditions, seemingly approaching the diversity of natural speech. However, the extent to which generated speech is acoustically diverse remains unclear due to a lack of appropriate metrics. We address this gap by developing lightweight metrics of acoustic diversity, which we collectively refer to as… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  4. Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation

    Authors: Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Benoît Sagot, Rachel Bawden

    Abstract: One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as images. However, recent work in multimodal MT (MMT) has shown that obtaining improvements from images is challenging, limited not only by the difficulty of building effective cross-modal representations, but also by the lack of specific evaluation and training d… ▽ More

    Submitted 26 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023