Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Gourru, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13358  [pdf, other

    cs.CL cs.LG

    Capturing Style in Author and Document Representation

    Authors: Enzo Terreau, Antoine Gourru, Julien Velcin

    Abstract: A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that the… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. QAnswer: Towards Question Answering Search over Websites

    Authors: Kunpeng Guo, Clement Defretiere, Dennis Diefenbach, Christophe Gravier, Antoine Gourru

    Abstract: Question Answering (QA) is increasingly used by search engines to provide results to their end-users, yet very few websites currently use QA technologies for their search functionality. To illustrate the potential of QA technologies for the website search practitioner, we demonstrate web searches that combine QA over knowledge graphs and QA over free text -- each being usually tackled separately.… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. Fine-tuning Strategies for Domain Specific Question Answering under Low Annotation Budget Constraints

    Authors: Kunpeng Guo, Dennis Diefenbach, Antoine Gourru, Christophe Gravier

    Abstract: The progress introduced by pre-trained language models and their fine-tuning has resulted in significant improvements in most downstream NLP tasks. The unsupervised training of a language model combined with further target task fine-tuning has become the standard QA fine-tuning procedure. In this work, we demonstrate that this strategy is sub-optimal for fine-tuning QA models, especially under a l… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  4. Wikidata as a seed for Web Extraction

    Authors: Kunpeng Guo, Dennis Diefenbach, Antoine Gourru, Christophe Gravier

    Abstract: Wikidata has grown to a knowledge graph with an impressive size. To date, it contains more than 17 billion triples collecting information about people, places, films, stars, publications, proteins, and many more. On the other side, most of the information on the Web is not published in highly structured data repositories like Wikidata, but rather as unstructured and semi-structured content, more c… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  5. An investigation of structures responsible for gender bias in BERT and DistilBERT

    Authors: Thibaud Leteno, Antoine Gourru, Charlotte Laclau, Christophe Gravier

    Abstract: In recent years, large Transformer-based Pre-trained Language Models (PLM) have changed the Natural Language Processing (NLP) landscape, by pushing the performance boundaries of the state-of-the-art on a wide variety of tasks. However, this performance gain goes along with an increase in complexity, and as a result, the size of such models (up to billions of parameters) represents a constraint for… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Journal ref: 21st International Symposium on Intelligent Data Analysis, IDA 2023

  6. arXiv:2311.12689  [pdf, other

    cs.CL cs.CY cs.LG

    Fair Text Classification with Wasserstein Independence

    Authors: Thibaud Leteno, Antoine Gourru, Charlotte Laclau, Rémi Emonet, Christophe Gravier

    Abstract: Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g. women vs. men) remains an open challenge. This paper presents a novel method for mitigating biases in neural text classification, agnostic to the model architecture. Considering the difficulty to distinguish fair from unfair information in a text encoder, we take inspirat… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  7. arXiv:2209.09670  [pdf, other

    cs.AI cs.LG

    Explainable Clustering via Exemplars: Complexity and Efficient Approximation Algorithms

    Authors: Ian Davidson, Michael Livanos, Antoine Gourru, Peter Walker, Julien Velcin, S. S. Ravi

    Abstract: Explainable AI (XAI) is an important developing area but remains relatively understudied for clustering. We propose an explainable-by-design clustering approach that not only finds clusters but also exemplars to explain each cluster. The use of exemplars for understanding is supported by the exemplar-based school of concept definition in psychology. We show that finding a small set of exemplars to… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: 22 pages; 4 figures

  8. arXiv:2004.03621  [pdf, other

    cs.IR cs.SI

    New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding

    Authors: Robin Brochier, Antoine Gourru, Adrien Guille, Julien Velcin

    Abstract: The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of expertise fields. To tackle this challenge, automatic expert finding algorithms rely on the vast scientific heterogeneous network to match textual queries with potenti… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  9. arXiv:2001.05727  [pdf, other

    cs.IR cs.CL

    Document Network Projection in Pretrained Word Embedding Space

    Authors: Antoine Gourru, Adrien Guille, Julien Velcin, Julien Jacques

    Abstract: We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector avera… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.