Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Tonneau, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17874  [pdf, other

    cs.CL

    From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

    Authors: Manuel Tonneau, Diyi Liu, Samuel Fraiberger, Ralph Schroeder, Scott A. Hale, Paul Röttger

    Abstract: Perceptions of hate can vary greatly across cultural contexts. Hate speech (HS) datasets, however, have traditionally been developed by language. This hides potential cultural biases, as one language may be spoken in different countries home to different cultures. In this work, we evaluate cultural bias in HS datasets by leveraging two interrelated cultural proxies: language and geography. We cond… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted at WOAH (NAACL 2024)

  2. arXiv:2403.19260  [pdf, other

    cs.CL

    NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

    Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. Fraiberger

    Abstract: To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce Nai… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ACL 2024 main conference. Data and models available at https://github.com/worldbank/NaijaHate

  3. Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

    Authors: Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

    Abstract: Large Language Models (LLMs), now used daily by millions, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an In… ▽ More

    Submitted 9 August, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: To be published in GoodIT '24, doi:10.1145/3677525.3678666. 14 pages

  4. Multilingual Detection of Personal Employment Status on Twitter

    Authors: Manuel Tonneau, Dhaval Adjodah, João Palotti, Nir Grinberg, Samuel Fraiberger

    Abstract: Detecting disclosures of individuals' employment status on social media can provide valuable information to match job seekers with suitable vacancies, offer social protection, or measure labor market flows. However, identifying such personal disclosures is a challenging task due to their rarity in a sea of social media content and the variety of linguistic forms used to describe them. Here, we exa… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference. Data and models available at https://github.com/manueltonneau/twitter-unemployment