Textual Manifold-based Defense Against Natural Language Adversarial Examples

Nguyen, Dang Minh; Tuan, Luu Anh

Computer Science > Computation and Language

arXiv:2211.02878 (cs)

[Submitted on 5 Nov 2022]

Title:Textual Manifold-based Defense Against Natural Language Adversarial Examples

Authors:Dang Minh Nguyen, Luu Anh Tuan

View PDF

Abstract:Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{this https URL}.

Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2211.02878 [cs.CL]
	(or arXiv:2211.02878v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.02878

Submission history

From: Dang Nguyen [view email]
[v1] Sat, 5 Nov 2022 11:19:47 UTC (171 KB)

Computer Science > Computation and Language

Title:Textual Manifold-based Defense Against Natural Language Adversarial Examples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Textual Manifold-based Defense Against Natural Language Adversarial Examples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators