Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Crabbé, B

.
  1. arXiv:2402.15343  [pdf, other

    cs.CL cs.AI cs.LG

    NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data

    Authors: Sergei Bogdanov, Alexandre Constantin, Timothée Bernard, Benoit Crabbé, Etienne Bernard

    Abstract: Large Language Models (LLMs) have shown impressive abilities in data annotation, opening the way for new approaches to solve classic NLP problems. In this paper, we show how to use LLMs to create NuNER, a compact language representation model specialized in the Named Entity Recognition (NER) task. NuNER can be fine-tuned to solve downstream NER problems in a data-efficient way, outperforming simil… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  2. arXiv:2212.04523  [pdf, other

    cs.CL

    Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement

    Authors: Bingzhi Li, Guillaume Wisniewski, Benoît Crabbé

    Abstract: The long-distance agreement, evidence for syntactic structure, is increasingly used to assess the syntactic generalization of Neural Language Models. Much work has shown that transformers are capable of high accuracy in varied agreement tasks, but the mechanisms by which the models accomplish this behavior are still not well understood. To better understand transformers' internal working, this wor… ▽ More

    Submitted 4 January, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: To appear in TACL 2022 and EMNLP 2022

  3. arXiv:2202.13972  [pdf, other

    cs.CL

    The impact of lexical and grammatical processing on generating code from natural language

    Authors: Nathanaël Beau, Benoît Crabbé

    Abstract: Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the impact of these components, we use a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided… ▽ More

    Submitted 16 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Article accepted to the Findings of Association for Computational Linguistics 2022

  4. arXiv:2109.10133  [pdf, ps, other

    cs.CL

    Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement

    Authors: Bingzhi Li, Guillaume Wisniewski, Benoit Crabbé

    Abstract: Many recent works have demonstrated that unsupervised sentence representations of neural networks encode syntactic information by observing that neural language models are able to predict the agreement between a verb and its subject. We take a critical look at this line of research by showing that it is possible to achieve high accuracy on this agreement task with simple surface heuristics, indica… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Camera-ready for EMNLP'21

  5. arXiv:2101.02258  [pdf, other

    cs.CL

    Can RNNs learn Recursive Nested Subject-Verb Agreements?

    Authors: Yair Lakretz, Théo Desbordes, Jean-Rémi King, Benoît Crabbé, Maxime Oquab, Stanislas Dehaene

    Abstract: One of the fundamental principles of contemporary linguistics states that language processing requires the ability to extract recursively nested tree structures. However, it remains unclear whether and how this code could be implemented in neural circuits. Recent advances in Recurrent Neural Networks (RNNs), which achieve near-human performance in some language tasks, provide a compelling model to… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  6. arXiv:1912.05372  [pdf, ps, other

    cs.CL cs.LG

    FlauBERT: Unsupervised Language Model Pre-training for French

    Authors: Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

    Abstract: Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely… ▽ More

    Submitted 12 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: Accepted to LREC 2020

  7. arXiv:1902.08912  [pdf, other

    cs.CL

    Unlexicalized Transition-based Discontinuous Constituency Parsing

    Authors: Maximin Coavoux, Benoît Crabbé, Shay B. Cohen

    Abstract: Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing mo… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

    Comments: To appear in Transactions of the Association for Computational Linguistics (TACL); 17 pages