Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Bihan, E L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00038  [pdf, ps, other

    cs.CL cs.AI

    ViSpeR: Multilingual Audio-Visual Speech Recognition

    Authors: Sanath Narayan, Yasser Abdelaziz Dahou Djilali, Ankit Singh, Eustache Le Bihan, Hakim Hacid

    Abstract: This work presents an extensive and detailed study on Audio-Visual Speech Recognition (AVSR) for five widely spoken languages: Chinese, Spanish, English, Arabic, and French. We have collected large-scale datasets for each language except for English, and have engaged in the training of supervised learning models. Our model, ViSpeR, is trained in a multi-lingual setting, resulting in competitive pe… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  2. arXiv:2311.14063  [pdf, other

    cs.CV cs.CL cs.LG

    Do VSR Models Generalize Beyond LRS3?

    Authors: Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah

    Abstract: The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which is only one hour duration. To alleviate this issue, we build a new VSR test set named WildVSR, by closely following the LRS3 dataset creation process… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.