-
GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages
Authors:
Spencer Rarrick,
Ranjita Naik,
Sundar Poudel,
Vishal Chowdhary
Abstract:
Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an…
▽ More
Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Reducing Gender Bias in Machine Translation through Counterfactual Data Generation
Authors:
Ranjita Naik,
Spencer Rarrick,
Vishal Chowdhary
Abstract:
Recent advances in neural methods have led to substantial improvement in the quality of Neural Machine Translation (NMT) systems. However, these systems frequently produce translations with inaccurate gender (Stanovsky et al., 2019), which can be traced to bias in training data. Saunders and Byrne (2020) tackle this problem with a handcrafted dataset containing balanced gendered profession words.…
▽ More
Recent advances in neural methods have led to substantial improvement in the quality of Neural Machine Translation (NMT) systems. However, these systems frequently produce translations with inaccurate gender (Stanovsky et al., 2019), which can be traced to bias in training data. Saunders and Byrne (2020) tackle this problem with a handcrafted dataset containing balanced gendered profession words. By using this data to fine-tune an existing NMT model, they show that gender bias can be significantly mitigated, albeit at the expense of translation quality due to catastrophic forgetting. They recover some of the lost quality with modified training objectives or additional models at inference. We find, however, that simply supplementing the handcrafted dataset with a random sample from the base model training corpus is enough to significantly reduce the catastrophic forgetting. We also propose a novel domain-adaptation technique that leverages in-domain data created with the counterfactual data generation techniques proposed by Zmigrod et al. (2019) to further improve accuracy on the WinoMT challenge test set without significant loss in translation quality. We show its effectiveness in NMT systems from English into three morphologically rich languages French, Spanish, and Italian. The relevant dataset and code will be available at Github.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English
Authors:
Spencer Rarrick,
Ranjita Naik,
Sundar Poudel,
Vishal Chowdhary
Abstract:
Machine Translation (MT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies.…
▽ More
Machine Translation (MT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants for each possible gender interpretation. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.
△ Less
Submitted 12 December, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
GATE: A Challenge Set for Gender-Ambiguous Translation Examples
Authors:
Spencer Rarrick,
Ranjita Naik,
Varun Mathur,
Sundar Poudel,
Vishal Chowdhary
Abstract:
Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generat…
▽ More
Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. We also provide tools for evaluation and system analysis when using GATE and use them to evaluate our translation rewriter system.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Achieving Human Parity on Automatic Chinese to English News Translation
Authors:
Hany Hassan,
Anthony Aue,
Chang Chen,
Vishal Chowdhary,
Jonathan Clark,
Christian Federmann,
Xuedong Huang,
Marcin Junczys-Dowmunt,
William Lewis,
Mu Li,
Shujie Liu,
Tie-Yan Liu,
Renqian Luo,
Arul Menezes,
Tao Qin,
Frank Seide,
Xu Tan,
Fei Tian,
Lijun Wu,
Shuangzhi Wu,
Yingce Xia,
Dongdong Zhang,
Zhirui Zhang,
Ming Zhou
Abstract:
Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human…
▽ More
Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft's machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.
△ Less
Submitted 29 June, 2018; v1 submitted 14 March, 2018;
originally announced March 2018.