Skip to main content

Showing 1–13 of 13 results for author: Coavoux, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12621  [pdf, other

    cs.CL

    Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech

    Authors: Adrien Pupier, Maximin Coavoux, Jérôme Goulian, Benjamin Lecouteux

    Abstract: Direct dependency parsing of the speech signal -- as opposed to parsing speech transcriptions -- has recently been proposed as a task (Pupier et al. 2022), as a way of incorporating prosodic information in the parsing system and bypassing the limitations of a pipeline approach that would consist of using first an Automatic Speech Recognition (ASR) system and then a syntactic parser. In this articl… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  2. arXiv:2403.02173  [pdf, other

    cs.CL

    What has LeBenchmark Learnt about French Syntax?

    Authors: Zdravko Dugonjić, Adrien Pupier, Benjamin Lecouteux, Maximin Coavoux

    Abstract: The paper reports on a series of experiments aiming at probing LeBenchmark, a pretrained acoustic model trained on 7k hours of spoken French, for syntactic information. Pretrained acoustic models are increasingly used for downstream speech tasks such as automatic speech recognition, speech translation, spoken language understanding or speech parsing. They are trained on very low level information… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  3. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  4. arXiv:2302.09350  [pdf, other

    cs.CL

    BERT is not The Count: Learning to Match Mathematical Statements with Proofs

    Authors: Weixian Waylon Li, Yftah Ziser, Maximin Coavoux, Shay B. Cohen

    Abstract: We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research article… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted to the Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023; 14 pages. arXiv admin note: substantial text overlap with arXiv:2102.02110

  5. On Detecting Policy-Related Political Ads: An Exploratory Analysis of Meta Ads in 2022 French Election

    Authors: Vera Sosnovik, Romaissa Kessi, Maximin Coavoux, Oana Goga

    Abstract: Online political advertising has become the cornerstone of political campaigns. The budget spent solely on political advertising in the U.S. has increased by more than 100% from \$700 million during the 2017-2018 U.S. election cycle to \$1.6 billion during the 2020 U.S. presidential elections. Naturally, the capacity offered by online platforms to micro-target ads with political content has been w… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1--5, 2023, Austin, TX, USA

  6. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  7. arXiv:2102.02110  [pdf, other

    cs.CL

    Learning to Match Mathematical Statements with Proofs

    Authors: Maximin Coavoux, Shay B. Cohen

    Abstract: We introduce a novel task consisting in assigning a proof to a given mathematical statement. The task is designed to improve the processing of research-level mathematical texts. Applying Natural Language Processing (NLP) tools to research level mathematical articles is both challenging, since it is a highly specialized domain which mixes natural language and mathematical formulae. It is also an im… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

  8. arXiv:2004.14754  [pdf, other

    cs.CL cs.LG

    Self-Supervised and Controlled Multi-Document Opinion Summarization

    Authors: Hady Elsahar, Maximin Coavoux, Matthias Gallé, Jos Rozen

    Abstract: We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem o… ▽ More

    Submitted 30 April, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: 18 pages including 5 pages appendix

  9. arXiv:1912.05372  [pdf, ps, other

    cs.CL cs.LG

    FlauBERT: Unsupervised Language Model Pre-training for French

    Authors: Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

    Abstract: Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely… ▽ More

    Submitted 12 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: Accepted to LREC 2020

  10. arXiv:1904.00615  [pdf, other

    cs.CL

    Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle

    Authors: Maximin Coavoux, Shay B. Cohen

    Abstract: We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack --i.e. a data structure with linear-time sequential access-- the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly $4n - 2$ transitions for a sentence of length… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted for publication at NAACL 2019; 14 pages

  11. arXiv:1902.08912  [pdf, other

    cs.CL

    Unlexicalized Transition-based Discontinuous Constituency Parsing

    Authors: Maximin Coavoux, Benoît Crabbé, Shay B. Cohen

    Abstract: Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing mo… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

    Comments: To appear in Transactions of the Association for Computational Linguistics (TACL); 17 pages

  12. arXiv:1808.09408  [pdf, other

    cs.CL

    Privacy-preserving Neural Representations of Text

    Authors: Maximin Coavoux, Shashi Narayan, Shay B. Cohen

    Abstract: This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural networ… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  13. arXiv:1701.02946  [pdf, other

    cs.CL

    Cross-lingual RST Discourse Parsing

    Authors: Chloé Braud, Maximin Coavoux, Anders Søgaard

    Abstract: Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic th… ▽ More

    Submitted 11 January, 2017; originally announced January 2017.

    Comments: To be published in EACL 2017, 13 pages