Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation

Lingfang Li; Weijian Hu; Mingxing Luo

doi:10.3390/e26121081

Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation

Entropy (Basel). 2024 Dec 11;26(12):1081. doi: 10.3390/e26121081.

Authors

Lingfang Li^{1

2}, Weijian Hu², Mingxing Luo¹

Affiliations

¹ School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610032, China.
² School of Information Engineer, Inner Mongolia University of Science & Technology, Hohhot 010021, China.

PMID: 39766710
DOI: 10.3390/e26121081

Abstract

Neural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning attacks. In this paper, we propose a new backdoor attack on NMT models by poisoning a small fraction of parallel training data. Our attack increases the translation entropy of words after injecting a backdoor trigger, making them more easily discarded by NMT. The final translation is part of target translation, and the position of the injected trigger poison affects the scope of the truncation. Moreover, we also propose a defense method, Backdoor Defense by Sematic Representation Change (BDSRC), against our attack. Specifically, we selected backdoor candidates based on the similarity between the semantic representation of words in a sentence and the overall sentence representation. Then, the injected backdoor is identified through computing the semantic deviation caused by backdoor candidates. The experiments show that our attack strategy can achieve a nearly 100% attack success rate, and the functionality of main translation tasks is almost unaffected in models having performance degradation that is less than 1 BLEU. Nonetheless, our defense method can effectively identify backdoor triggers and alleviate performance degradation.

Keywords: backdoor attack; data poisoning; neural machine translation.

Grants and funding

6217234/National Natural Science Foundation of China