Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Waseem, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2106.11410  [pdf, other

    cs.CL

    A Survey of Race, Racism, and Anti-Racism in NLP

    Authors: Anjalie Field, Su Lin Blodgett, Zeerak Waseem, Yulia Tsvetkov

    Abstract: Despite inextricable ties between race and language, little work has considered race in NLP research and development. In this work, we survey 79 papers from the ACL anthology that mention race. These papers reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies. However, pe… ▽ More

    Submitted 15 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL 2021

  2. arXiv:2104.14337  [pdf, other

    cs.CL cs.AI

    Dynabench: Rethinking Benchmarking in NLP

    Authors: Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

    Abstract: We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary model… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  3. arXiv:2101.11974  [pdf, ps, other

    cs.AI cs.CL cs.CY

    Disembodied Machine Learning: On the Illusion of Objectivity in NLP

    Authors: Zeerak Waseem, Smarika Lulz, Joachim Bingel, Isabelle Augenstein

    Abstract: Machine Learning seeks to identify and encode bodies of knowledge within provided datasets. However, data encodes subjective content, which determines the possible outcomes of the models trained on it. Because such subjectivity enables marginalisation of parts of society, it is termed (social) `bias' and sought to be removed. In this paper, we contextualise this discourse of bias in the ML communi… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

    Comments: In review

  4. arXiv:2012.15761  [pdf, other

    cs.CL cs.LG

    Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

    Authors: Bertie Vidgen, Tristan Thrush, Zeerak Waseem, Douwe Kiela

    Abstract: We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We provide a new dataset of ~40,000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. It includes ~15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and ta… ▽ More

    Submitted 3 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  5. HateCheck: Functional Tests for Hate Speech Detection Models

    Authors: Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, Janet B. Pierrehumbert

    Abstract: Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increas… ▽ More

    Submitted 27 May, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Accepted at ACL 2021 (Main Conference)

  6. arXiv:2005.03909  [pdf, other

    cs.CL cs.CY cs.SI

    Detecting East Asian Prejudice on Social Media

    Authors: Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale

    Abstract: The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. In this paper we report on the creation of a classifier that detects and categorizes social media posts from Twitter in… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: 12 pages

  7. arXiv:1705.09899  [pdf, ps, other

    cs.CL

    Understanding Abuse: A Typology of Abusive Language Detection Subtasks

    Authors: Zeerak Waseem, Thomas Davidson, Dana Warmsley, Ingmar Weber

    Abstract: As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data a… ▽ More

    Submitted 30 May, 2017; v1 submitted 28 May, 2017; originally announced May 2017.

    Comments: To appear in the proceedings of the 1st Workshop on Abusive Language Online. Please cite that version