Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Lupo, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05700  [pdf, other

    cs.CL

    DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

    Authors: Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz

    Abstract: Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the general public. To facilitate such analyses, we construct, validate, and release publicly the representative DADIT dataset of 30M tweets of 20k Italian Twitter users, along with their bios and profile pictures. We enrich the user data with high-quality labels for gen… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  2. arXiv:2311.11844  [pdf, other

    cs.CL cs.CY

    Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

    Authors: Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud

    Abstract: Recent advances in large language models (LLMs) like GPT-3.5 and GPT-4 promise automation with better results and less programming, opening up new opportunities for text analysis in political science. In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings: a non-English language, legal and political jargon, and comple… ▽ More

    Submitted 28 August, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    ACM Class: J.4; I.2

  3. arXiv:2302.06459  [pdf, other

    cs.CL

    Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture. This paper investigates the intuitive idea of providing the model with explicit information about the position of the sentences contained in the concatenation window. We compare various methods to encode sentence positions into token representations, includin… ▽ More

    Submitted 4 April, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Insights2023 camera-ready

  4. arXiv:2210.13388  [pdf, other

    cs.CL

    Focused Concatenation for Context-Aware Neural Machine Translation

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: A straightforward approach to context-aware neural machine translation consists in feeding the standard encoder-decoder architecture with a window of consecutive sentences, formed by the current sentence and a number of sentences from its context concatenated to it. In this work, we propose an improved concatenation approach that encourages the model to focus on the translation of the current sent… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: WMT 2022 (camera ready)

  5. Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on document-level data. In this work, we discuss the difficulty of training these parameters effectively, due to the… ▽ More

    Submitted 15 March, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: ACL 2022 (camera ready)