HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: bibentry

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2308.00319v2 [cs.CL] 10 Jan 2024

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Hai Zhu1 3, Qingyang Zhao2, Weiwei Shang1, Yuren Wu3, Kai Liu4 Corresponding author.
Abstract

Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt model internal information (gradients or confidence scores) to generate adversarial examples. However, this information is unavailable in the real world. Therefore, we focus on a more realistic and challenging setting, named hard-label attack, in which the attacker can only query the model and obtain a discrete prediction label. Existing hard-label attack algorithms tend to initialize adversarial examples by random substitution and then utilize complex heuristic algorithms to optimize the adversarial perturbation. These methods require a lot of model queries and the attack success rate is restricted by adversary initialization. In this paper, we propose a novel hard-label attack algorithm named LimeAttack, which leverages a local explainable method to approximate word importance ranking, and then adopts beam search to find the optimal solution. Extensive experiments show that LimeAttack achieves the better attacking performance compared with existing hard-label attack under the same query budget. In addition, we evaluate the effectiveness of LimeAttack on large language models and some defense methods, and results indicate that adversarial examples remain a significant threat to large language models. The adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.

Einführung

Deep Neural Networks (DNNs) are widely applied in the natural language processing field and have achieved great success (Kim 2014; Devlin et al. 2019; Minaee et al. 2021; Hochreiter and Schmidhuber 1997). However, DNNs are vulnerable to adversarial examples, which are correctly classified samples altered by some slight perturbations (Jin et al. 2020; Papernot et al. 2017; Kurakin, Goodfellow, and Bengio 2016). These adversarial perturbations are imperceptible to humans but can mislead the model. Adversarial examples seriously threaten the robustness and reliability of DNNs, especially in some security-critical applications (e.g., autonomous driving and toxic text detection  (Yang et al. 2021; Kurakin, Goodfellow, and Bengio 2018)). Therefore, adversarial examples have attracted enormous attention on adversarial attacks and defenses in computer vision, natural language processing and speech (Szegedy et al. 2013; Carlini and Wagner 2018; Yu et al. 2022). It is more challenging to craft textual adversarial examples due to the discrete nature of language along with the presence of lexical, semantic, and fluency constraints.

According to different scenarios, textual adversarial attacks can be briefly divided into white-box attacks, score-based attacks and hard-label attacks. In a white-box setting, the attacker utilizes the model’s parameters and gradients to generate adversarial examples (Goodman, Zhonghou et al. 2020; Jiang et al. 2020). Score-based attacks only adopt class probabilities or confidence scores to craft adversarial examples (Jin et al. 2020; Li et al. 2020; Ma, Shi, and Guan 2020; Zhu, Zhao, and Wu 2023). However, these attack methods perform poorly in reality due to DNNs being deployed through application programming interfaces (APIs), and the attacker having no access to the model’s parameters, gradients or probability distributions of all labels (Ye et al. 2022b). In contrast, under a hard-label scenario, the model’s internal structures, gradients, training data and even confidence scores are unavailable. The attacker can only query the black-box victim model and get a discrete prediction label, which is more challenging and realistic. Additionally, most realistic models (e.g., HuggingFace API, OpenAI API) usually have a limit on the number of calls. In reality, the adversarial examples attack setting is hard-label with tiny model queries.

Some hard-label attack algorithms have been proposed (Yu et al. 2022; Ye et al. 2022b; Maheshwary, Maheshwary, and Pudi 2021; Ye et al. 2022a). They follow two-stages strategies: i) generate low-quality adversarial examples by randomly replacing several original words with synonyms, and then ii) adopt complex heuristic algorithms (e.g., genetic algorithm) to optimize the adversary perturbation. Therefore, these attack methods usually require a lot of queries and the attack success rate and quality of adversarial examples are limited by adversary initialization. On the contrary, score-based attacks calculate the word importance based on the change in confidence scores after deleting one word. Word importance ranking improves attack efficiency by preferring to attack words that have a significant impact on the model’s predictions (Jin et al. 2020). However, score-based attacks cannot calculate the word importance in a hard-label setting because deleting one token hardly changes the discrete prediction label. Therefore, we want to investigate such a problem: how to calculate word importance ranking in a hard-label setting to improve attack efficiency?

Actually, word importance ranking can reveal the decision boundary to determine the better attack path, but existing hard-label algorithms ignore this useful information because it is hard to obtain. Inspired by local explainable methods (Ribeiro, Singh, and Guestrin 2016; Lundberg and Lee 2017; Shrikumar et al. 2016) for DNNs, which are often used to explain the outputs of black-box models, aim to estimate the token sensitivity on the benign sample. Previous study (Chai et al. 2023) has tried to simply replace deletion-based method with local explainable method to calculate word importance in score-based attack. However, In Appendix B, we have verified through experiments that local explainable method does not have a significant advantage over deletion-based method in a score-based scenario. Because the probability distribution of the model’s output is available, the influence of each word on the output can be well reflected by deletion-based method. Therefore, compared with score-based attacks, we think local explainable method can play a greater advantage in hard-label attacks where deletion-based method is useless. We adopt the most fundamental and straightforward local explainable method, namely LIME. LIME is easy to understand and more in line with the deletion-based method proposed in score-based attacks, since our goal is to bridge the gap between score-base attacks and hard-label attacks by introducing interpretability method. In fact, local explainable methods are model-agnostic and suitable for conducting word importance estimation for hard-label attacks. However, there are the following difficulties in applying LIME to hard-label attacks: 1) How to allocate LIME and search queries under tiny query budget to achieve optimal results. 2) How to establish a mapping relationship between LIME and word importance in adversarial samples without model’s logits output. 3) How to sample reasonably during perturbation execution to achieve optimal results. In subsequent sessions we will explain in detail how to solve these difficulties.

In this work, we propose a novel hard-label attack algorithm named LimeAttack. The application of LIME in hard-label attacks was inspired by the score-based attacks’ deletion method. We verify the effectiveness of inside-to-outside attack path in hard-label attacks, then many excellent score-based attacks may provide hard-label attacks more insight. To evaluate the attack performance and efficiency, we compare LimeAttack with other hard-label attacks and take several score-based attacks as references for two NLP tasks on seven common datasets. We also evaluate LimeAttack on the currently state-of-the-art large language models (e.g., ChatGPT). Experiments show that LimeAttack achieves the highest attack success rate compared to other baselines under the tiny query budget. Our contributions are summarized as follows:

  • We summarize the shortcomings of the existing hard-label attacks and apply LIME to connect score-base attacks and hard-label attacks and verify the effectiveness of inside-to-outside attack path in hard-label attacks.

  • Extensive experiments show that LimeAttack achieves higher attack success rate than existing hard-label attack algorithms under tiny query budget. Meanwhile, adversarial examples crafted by LimeAttack are high quality and difficult for humans to distinguish. 111Code is available in https://github.com/zhuhai-ustc/limeattack

  • In addition, we also conduct attacks and evaluations on the currently state-of-the-art large language models. Results indicate that adversarial examples remain a significant threat to large language models. We also have added attack performance on defense methods and convergence results of attack success rate and perturbation rate.

Related Work

Hard-Label Adversarial Attacks

In a hard-label setting, the attacker can only query the victim model and get a discrete prediction label. Therefore, hard-label setting is more practical and challenging. Existing hard-label attacks contain two-stages strategies, i.e., adversary initialization and perturbation optimization. HLBB (Maheshwary, Maheshwary, and Pudi 2021) initializes an adversarial example and adopts a genetic algorithm to optimize the perturbation. TextHoaxer (Ye et al. 2022b) and LeapAttack (Ye et al. 2022a) utilizes semantic similarity and perturbation rate as optimization objective to search for a better perturbation matrix in the continuous word embedding space. TextHacker (Yu et al. 2022) adopts a hybrid local search algorithm and a word importance table learned from attack history to guide the local search. These attack methods often require a lot of queries to reduce the perturbation rate, and the attack success rate and quality of adversary are limited by initialization. Therefore, in this work, we attempt to craft an adversarial example directly from the benign sample. This approach can generate high-quality adversarial examples with fewer queries.

Local Explainable Methods

To improve DNN interpretability and aid decision-making, various methods for explaining DNNs have been proposed and broadly categorized as global or local explainable methods. Global explainable methods focus on the model itself by using the overall knowledge about the model’s architecture and parameters. On the contrary, local methods fit a simple and interpretable model (e.g., decision tree) to a single input to measure the contribution of each token. In detail, local explainable methods (Lundberg and Lee 2017; Shrikumar et al. 2016; Štrumbelj and Kononenko 2014) associate all input tokens by defining a linear interpretability model and assumes that the contribution of each token in the input is additive. This is also called the additive feature attribution method. In this paper, local interpretable model-agnostic explanation (LIME) (Ribeiro, Singh, and Guestrin 2016) is applied to calculate word importance, which is a fundamental and representative local explainable method. The intuition of LIME is to generate many neighborhood samples by deleting some original words in the benign example. These samples are then used to train a linear model where the number of features equals to the number of words in the benign sample. The parameters of this linear model are approximated to the importance of each word. As LIME is model-agnostic, it is suitable for hard-label attacks.

Limitation of Existing Hard-Label Attack

In order to intuitively compare the difference between LimeAttack and existing hard-label attack algorithms, we create attack search path visualizations in Figure 3. LimeAttack’s search paths are represented by green lines, and they move from inside to outside. LimeAttack utilizes a local explainable method to learn word importance ranking and generates adversarial examples iteratively from benign samples. This helps LimeAttack to find the nearest decision boundary direction, and costs fewer model queries to attack keywords preferentially. In contrast, previous hard-label attack algorithms’ search paths are represented by blue lines, and they move from outside to inside. These algorithms typically begin with a randomly initialized adversarial example and optimize perturbation by maximizing semantic similarity between the initialized example and the benign sample, which requires a lot of model queries to achieve a low perturbation rate. Furthermore, their attack success rate and adversary quality are also limited by the adversary initialization.

Refer to caption
Figure 1: Search paths of existing hard-label attacks and LimeAttack.

Methodology

Problem Formulation

Given a sentence of n𝑛nitalic_n words 𝑿=[x1,x2,,xn]𝑿subscript𝑥1subscript𝑥2subscript𝑥𝑛\bm{X}=[x_{1},x_{2},\cdots,x_{n}]bold_italic_X = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] and its ground truth label Y𝑌Yitalic_Y, an adversarial example 𝑿=[x1,x2,,xn]superscript𝑿bold-′superscriptsubscript𝑥1superscriptsubscript𝑥2superscriptsubscript𝑥𝑛\bm{X^{\prime}}=[x_{1}^{\prime},x_{2}^{\prime},\cdots,x_{n}^{\prime}]bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] is crafted by replacing one or more original words with synonyms to mislead the victim model \mathcal{F}caligraphic_F. i.e.,

(𝑿)(𝑿),s.t.D(𝑿,𝑿)<ϵ\mathcal{F}(\bm{X^{\prime}})\neq\mathcal{F}(\bm{X}),\quad\mathrm{s.t.}\quad D(% \bm{X},\bm{X^{\prime}})<\epsiloncaligraphic_F ( bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ≠ caligraphic_F ( bold_italic_X ) , roman_s . roman_t . italic_D ( bold_italic_X , bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) < italic_ϵ (1)

D(,)𝐷D(\cdot,\cdot)italic_D ( ⋅ , ⋅ ) is an edit distance that measures the modifications between a benign sample 𝑿=[x1,x2,,xn]𝑿subscript𝑥1subscript𝑥2subscript𝑥𝑛\bm{X}=[x_{1},x_{2},\cdots,x_{n}]bold_italic_X = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] and an adversarial example 𝑿=[x1,x2,,xn]superscript𝑿bold-′superscriptsubscript𝑥1superscriptsubscript𝑥2superscriptsubscript𝑥𝑛\bm{X^{\prime}}=[x_{1}^{\prime},x_{2}^{\prime},\cdots,x_{n}^{\prime}]bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ]:

D(𝑿,𝑿)=1ni=1n𝔼(xi,xi)𝐷𝑿superscript𝑿bold-′1𝑛superscriptsubscript𝑖1𝑛𝔼subscript𝑥𝑖subscriptsuperscript𝑥𝑖D(\bm{X},\bm{X^{\prime}})=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}(x_{i},x^{\prime}% _{i})italic_D ( bold_italic_X , bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (2)

𝔼(,)𝔼\mathbb{E}(\cdot,\cdot)blackboard_E ( ⋅ , ⋅ ) is a binary variable that equals to 0 if xi=xisubscript𝑥𝑖subscriptsuperscript𝑥𝑖x_{i}=x^{\prime}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 1 otherwise. A high-quality adversarial example should be similar to the benign sample, and human readers should hardly be able to distinguish the difference. The LimeAttack belongs to the hard-label attack, it has nothing to do with the model’s parameters, gradients or confidence scores. The attacker can only query the victim model to obtain a predicted label 𝒀^=(𝑿^)^𝒀^𝑿\hat{\bm{Y}}=\mathcal{F}(\hat{\bm{X}})over^ start_ARG bold_italic_Y end_ARG = caligraphic_F ( over^ start_ARG bold_italic_X end_ARG ).

The Proposed LimeAttack Algorithm

The overall flow chart is shown in Figure 2. LimeAttack follows two steps, i.e., word importance ranking and perturbation execution.

Refer to caption
Figure 2: Overview of LimeAttack. It consists of two modules, i.e., word importance ranking and perturbation execution. We first generate some neighborhood examples by masking some words in the benign sample, and then adopt linear model to approximate the importance of each word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, we select candidate sets in the counter-fitted embedding space for each word. Finally, we adopt beam search (beam size b=2𝑏2b=2italic_b = 2 in the figure) to generate adversarial examples iteratively.

Word Importance Ranking.

Given a sentence of n𝑛nitalic_n words 𝑿𝑿\bm{X}bold_italic_X, we assume that the contribution of all words is additive, and their sum is positively related to the model’s prediction. As shown in the Figure 2, we generate some neighborhood samples 𝒳=[𝑿1,𝑿2,,𝑿n]𝒳subscriptsuperscript𝑿1subscriptsuperscript𝑿2subscriptsuperscript𝑿𝑛\mathcal{X}=[\bm{X}^{\prime}_{1},\bm{X}^{\prime}_{2},\cdots,\bm{X}^{\prime}_{n}]caligraphic_X = [ bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] from a benign example 𝑿𝑿\bm{X}bold_italic_X by randomly replacing some words with ’[MASK]’. Usually, sentences with more words often requires more neighbor samples to approximate the word importance. Therefore, we keep the number of neighborhood samples consistent with the number of tokens. We then feed 𝒳𝒳\mathcal{X}caligraphic_X to the victim model \mathcal{F}caligraphic_F to obtain discrete prediction labels 𝒴^=[𝒀1^,𝒀2^,,𝒀n^]^𝒴^subscriptsuperscript𝒀1^subscriptsuperscript𝒀2^subscriptsuperscript𝒀𝑛\mathcal{\hat{Y}}=[\hat{\bm{Y}^{\prime}_{1}},\hat{\bm{Y}^{\prime}_{2}},\cdots,% \hat{\bm{Y}^{\prime}_{n}}]over^ start_ARG caligraphic_Y end_ARG = [ over^ start_ARG bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , over^ start_ARG bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , ⋯ , over^ start_ARG bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ]. Subsequently, we will fit a linear interpretability model to classify these neighborhood samples:

g(𝑿,𝜽)=θ0+i=1nθi𝕀(xi,𝑿)𝑔𝑿𝜽subscript𝜃0superscriptsubscript𝑖1𝑛subscript𝜃𝑖𝕀subscript𝑥𝑖𝑿g(\bm{X},\bm{\theta})=\theta_{0}+\sum_{i=1}^{n}\theta_{i}\mathbb{I}(x_{i},\bm{% X})italic_g ( bold_italic_X , bold_italic_θ ) = italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X ) (3)

where 𝜽𝜽\bm{\theta}bold_italic_θ is the parameter of the linear model, 𝕀(,)𝕀\mathbb{I}(\cdot,\cdot)blackboard_I ( ⋅ , ⋅ ) is a binary variable that equals to 1 if word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝑿𝑿\bm{X}bold_italic_X and 0 otherwise. Therefore, the parameter θi,i[1,n]subscript𝜃𝑖𝑖1𝑛\theta_{i},i\in[1,n]italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ [ 1 , italic_n ] reflects the change without word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and is approximated to the word importance. In Appendix O, we have verified through experiments that the linear model (such as LIME) has the same effect as some advanced interpretation methods (such as SHAP) or non-linear models (such as decision tree) under tiny query budgets. SHAP or non-linear models also have a higher computational complexity. The advantages of some advanced interpretation methods or non-linear models will only be reflected when there are a large number of neighborhood samples and queries.

In detail, we transform each neighborhood sample 𝑿isubscriptsuperscript𝑿𝑖\bm{X}^{\prime}_{i}bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into the binary vector 𝑽isubscriptsuperscript𝑽𝑖\bm{V}^{\prime}_{i}bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If the origin word is removed in 𝑿isubscriptsuperscript𝑿𝑖\bm{X}^{\prime}_{i}bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, its corresponding vector dimension in 𝑽isubscriptsuperscript𝑽𝑖\bm{V}^{\prime}_{i}bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is 0 otherwise 1. Therefore, 𝑽isubscriptsuperscript𝑽𝑖\bm{V}^{\prime}_{i}bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has the same length as 𝑿isubscriptsuperscript𝑿𝑖\bm{X}^{\prime}_{i}bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is the length of the benign example. A benign example 𝑿𝑿\bm{X}bold_italic_X is also transformed to 𝑽𝑽\bm{V}bold_italic_V. Sometimes neighborhood samples may not necessarily be linearly separable, LIME adopts gaussian kernel to weight the loss for each sample to gather points closest to the original sample, which helps with linear fitting. We give weights π(𝑽i,𝑽)𝜋subscriptsuperscript𝑽𝑖𝑽\pi(\bm{V}^{\prime}_{i},\bm{V})italic_π ( bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V ) to each neighborhood sample according to their distance from the benign sample (Ribeiro, Singh, and Guestrin 2016).

π(𝑽i,𝑽)=exp(d(𝑽i,𝑽)2/σ2)𝜋subscriptsuperscript𝑽𝑖𝑽𝑑superscriptsubscriptsuperscript𝑽𝑖𝑽2superscript𝜎2\pi(\bm{V}^{\prime}_{i},\bm{V})=\exp{(-d(\bm{V}^{\prime}_{i},\bm{V})^{2}/% \sigma^{2})}italic_π ( bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V ) = roman_exp ( - italic_d ( bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (4)

where d(,)𝑑d(\cdot,\cdot)italic_d ( ⋅ , ⋅ ) is a distance function. We adopt the cosine similarity as the distance metric.

d(𝑽i,𝑽)=𝑽i𝑽|𝑽i||𝑽|𝑑subscriptsuperscript𝑽𝑖𝑽subscriptsuperscript𝑽𝑖𝑽subscriptsuperscript𝑽𝑖𝑽d(\bm{V}^{\prime}_{i},\bm{V})=\frac{\bm{V}^{\prime}_{i}\cdot\bm{V}}{\sqrt{% \lvert\bm{V}^{\prime}_{i}\rvert\lvert\bm{V}\rvert}}italic_d ( bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V ) = divide start_ARG bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_italic_V end_ARG start_ARG square-root start_ARG | bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | bold_italic_V | end_ARG end_ARG (5)

Finally, we calculate the optimal parameters 𝜽*superscript𝜽\bm{\theta^{*}}bold_italic_θ start_POSTSUPERSCRIPT bold_* end_POSTSUPERSCRIPT:

𝜽*=argmin𝜽i=1nπ(𝑽i,𝑽){𝒀i^g(𝑿i)}2+Ω(𝜽)superscript𝜽𝜽superscriptsubscript𝑖1𝑛𝜋subscriptsuperscript𝑽𝑖𝑽superscript^subscriptsuperscript𝒀𝑖𝑔subscriptsuperscript𝑿𝑖2Ω𝜽\bm{\theta^{*}}=\underset{\bm{\theta}}{\arg\min}\sum_{i=1}^{n}\pi(\bm{V}^{% \prime}_{i},\bm{V}){\{\hat{\bm{Y}^{\prime}_{i}}}-g(\bm{X}^{\prime}_{i})\}^{2}+% \Omega(\bm{\theta})bold_italic_θ start_POSTSUPERSCRIPT bold_* end_POSTSUPERSCRIPT = underbold_italic_θ start_ARG roman_arg roman_min end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π ( bold_italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V ) { over^ start_ARG bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_g ( bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Ω ( bold_italic_θ ) (6)

where Ω(𝜽)Ω𝜽\Omega(\bm{\theta})roman_Ω ( bold_italic_θ ) is the non-zero of parameters, which is a measure of the complexity of the linear model. After optimizing 𝜽𝜽\bm{\theta}bold_italic_θ, the importance of each word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is equal to θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. LIME can be seen as an approximation of the model’s decision boundary in the original sample. The parameters can be interpreted as the margin, the larger the margin, the larger the importance of this word in approximating the decision boundary. We will filter out stop words using NLTK222https://www.nltk.org/ firstly and calculate the importance of each word. To ensure that LimeAttack has generated high-quality adversarial examples rather than just negative examples. We only adopt synonym replacement strategy and construct the synonym candidate set 𝒞(xi)𝒞subscript𝑥𝑖\mathcal{C}(x_{i})caligraphic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for each word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by selecting the top k𝑘kitalic_k nearest synonyms in the counter-fitted embedding space (Mrkšić et al. 2016). Additionally, we present the results of human evaluation and more qualitative adversarial examples in Appendix I.

Perturbation Execution.

Adversarial examples generation is a combinatorial optimization problem. Score-based attack iterates by selecting the token that causes the greatest change in model’s logits each time. But there is no such information in the hard-label attack. Therefore, we can only rely on the similarity between the adversarial sample and the original sample for iteration. The problem is that the similarity and attack success rate are not completely linearly correlated. As shown in the Table.7, greedily selecting the adversarial sample with the lowest similarity each time cannot ensure that the final attack success rate is optimal. We hope that each sampling is uniformly distributed to balance attack success rate and semantic similarity. For each origin word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we replace it with c𝒞(xi)𝑐𝒞subscript𝑥𝑖c\in\mathcal{C}(x_{i})italic_c ∈ caligraphic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to generate an adversarial example 𝑿=[x1,,xi1,c,xi+1,,xn]superscript𝑿subscript𝑥1subscript𝑥𝑖1𝑐subscript𝑥𝑖1subscript𝑥𝑛\bm{X}^{\prime}=[x_{1},\cdots,x_{i-1},c,x_{i+1},\cdots,x_{n}]bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_c , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], then we calculate the semantic similarity between the benign sample X𝑋Xitalic_X and the adversarial example 𝑿superscript𝑿\bm{X}^{\prime}bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by universal sentence encoder (USE)333https://tfhub.dev/google/ universal-sentence-encoder. We first sort candidates by similarity and sample b𝑏bitalic_b adversarial examples each time to enter the next iteration. In detail, We have formulated the following sampling rules: (1) Sampling b/3𝑏3\lfloor b/3\rfloor⌊ italic_b / 3 ⌋ adversarial examples with the highest semantic similarity. (2) Sampling b/3𝑏3\lfloor b/3\rfloor⌊ italic_b / 3 ⌋ adversarial examples with the lowest semantic similarity. (3) Sampling b/3𝑏3\lfloor b/3\rfloor⌊ italic_b / 3 ⌋ of the remaining adversarial samples randomly. The analysis of hyper-parameters b𝑏bitalic_b and LimeAttack’s algorithm are summarized in Appendix C and H.

Experiments

Analysis of the transferability and adversarial training of LimeAttack are listed in Appendix D and E.

Tasks, Datasets and Models

We adopt seven common datasets, such as MR (Pang and Lee 2005), SST-2 (Socher et al. 2013), AG (Zhang, Zhao, and LeCun 2015) and Yahoo (Yoo et al. 2020) for text classification. SNLI (Bowman et al. 2015) and MNLI (Williams, Nangia, and Bowman 2018) for textual entailment, where MNLI includes a matched version (MNLIm) and a mismatched version (MNLImm). In addition, we have trained three neural networks as victim models, including CNN (Kim 2014), LSTM (Hochreiter and Schmidhuber 1997) and BERT (Devlin et al. 2019). The parameters of the models and the detailed information of datasets are listed in Appendix A.

Baselines

We have chosen the following existing hard-label attack algorithms as our baselines: HLBB (Maheshwary, Maheshwary, and Pudi 2021), TextHoaxer (Ye et al. 2022b), LeapAttack (Ye et al. 2022a) and TextHacker (Yu et al. 2022) as our baselines. Additionally, we have included some classic score-based attack algorithms, such as TextFooler (TF) (Jin et al. 2020), PWWS (Ma, Shi, and Guan 2020) and Bert-Attack (Li et al. 2020) for references, which obtain additional confidence scores for attacks and are implemented on the TextAttack framework (Morris et al. 2020).

Automatic Evaluation Metrics

We use four metrics to evaluate the attack performance: attack success rate (ASR), perturbation rate (Pert), semantic similarity (Sim) and query number (Query). Specifically, given a dataset 𝒟={(𝑿i,𝒀i)}i=1N𝒟superscriptsubscriptsubscript𝑿𝑖subscript𝒀𝑖𝑖1𝑁\mathcal{D}=\{(\bm{X}_{i},\bm{Y}_{i})\}_{i=1}^{N}caligraphic_D = { ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT consisting of N𝑁Nitalic_N samples 𝑿isubscript𝑿𝑖\bm{X}_{i}bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and corresponding ground truth labels 𝒀isubscript𝒀𝑖\bm{Y}_{i}bold_italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, attack success rate of an adversarial attack method 𝒜𝒜\mathcal{A}caligraphic_A, which generates adversarial examples 𝒜(𝑿)𝒜𝑿\mathcal{A}(\bm{X})caligraphic_A ( bold_italic_X ) given an input 𝑿𝑿\bm{X}bold_italic_X to attack a victim model \mathcal{F}caligraphic_F, is defined as (Wang et al. 2021):

ASR=(𝑿,𝒀)𝒟𝕀[(𝒜(𝑿))𝒀]|𝒟|𝐴𝑆𝑅subscript𝑿𝒀𝒟𝕀delimited-[]𝒜𝑿𝒀𝒟ASR=\sum_{(\bm{X},\bm{Y})\in\mathcal{D}}\frac{\mathbb{I}[\mathcal{F}(\mathcal{% A}(\bm{X}))\neq\bm{Y}]}{|\mathcal{D}|}italic_A italic_S italic_R = ∑ start_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ) ∈ caligraphic_D end_POSTSUBSCRIPT divide start_ARG blackboard_I [ caligraphic_F ( caligraphic_A ( bold_italic_X ) ) ≠ bold_italic_Y ] end_ARG start_ARG | caligraphic_D | end_ARG (7)

The perturbation rate is the proportion of the number of substitutions to the number of original tokens, which has been defined in Eq 2. The semantic similarity is measured by the Universal Sentence Encoder (USE). Most papers (Maheshwary, Maheshwary, and Pudi 2021; Ye et al. 2022a) have adopted USE. In order to maintain consistency and facilitate comparability, we have also utilized USE. Query number is the number of model queries during the attack. The robustness of a model is inversely proportional to the attack success rate, while the perturbation rate and semantic similarity together reveal the quality of adversarial examples. Query number reveals the attack efficiency.

Implementation Details

We set the kernel width σ=25𝜎25\sigma=25italic_σ = 25, the number of neighborhood samples equal to the number of the benign sample’s tokens, and the beam size b=10𝑏10b=10italic_b = 10. For a fair comparison, all baselines follow the same settings: synonyms are selected from counter-fitted embedding space and the number of each candidate set k=50𝑘50k=50italic_k = 50, the same 1000 texts are sampled for baselines to attack. The results are averaged on five runs with different seeds (1234,2234,3234,4234 and 5234) to eliminate randomness. In order to improve the quality of adversarial examples, the attack succeeds if the perturbation rate of each adversarial example is less than 10%. We set a tiny query budget of 100 for hard-label attack, which corresponds to real-world settings. (e.g., The HuggingFace free Inference API typically limits calls to 200 times per minute.)

Experiments Results

Attack Performance.

Table 1 and  2 show that LimeAttack outperforms existing hard-label attacks on text classification and textual entailment tasks, achieving higher attack success rates and lower perturbation rates in datasets such as SST-2, AG, and MNLI. Unlike existing hard-label attacks that require many queries to optimize the perturbation, LimeAttack adopts a local explainable method to calculate word importance ranking and attacks key words first. This approach can generate adversarial examples with a high attack success rate, even under tiny query budgets. Appendix G includes a t-test and the mean and variance of LimeAttack’s success rate compared to other methods.In Appendix K and L, we list the semantic similarity and the results of the comparison results between LimeAttack and several score-based attacks.

Model Attack MR SST-2 AG Yahoo
ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow
CNN HLBB 44.4 5.4 33.4   5.6 17.7 3.3 41.8   3.6
TextHoaxer 44.2 5.2 38.1   5.6 15.7 2.9 39.9   3.3
LeapAttack 43.1 5.3 40.0   5.7 20.2 3.2 40.4   3.4
TextHacker 49.4 6.2 38.1   6.3 20.5 6.2 38.1   5.9
LimeAttack 49.9 5.3 42.8   5.6 20.9 2.9 43.7   3.7
LSTM HLBB 41.2 5.2 33.1   5.7 15.2 3.1 38.4   3.3
TextHoaxer 39.3 5.4 36.4   5.6 14.7 2.7 37.1   3.3
LeapAttack 40.0 5.3 39.8   5.6 15.9 3.1 37.6   3.3
TextHacker 45.8 6.1 35.2   6.4 16.5 6.2 36.8   5.9
LimeAttack 47.6 5.4 40.1   5.5 17.3 2.7 40.3   3.7
BERT HLBB 26.6 5.6 23.0   5.8 12.7 3.2 36.3   3.6
TextHoaxer 27.0 5.5 24.9   5.8 9.8 3.0 32.7   3.3
LeapAttack 26.5 5.4 26.1   5.8 13.7 2.9 34.1   3.4
TextHacker 26.5 6.5 25.4   6.3 12.9 5.5 31.3   6.3
LimeAttack 29.2 5.9 27.8   5.7 14.6 2.9 37.4   3.8
Table 1: The attack success rate (ASR.,%\uparrow) and perturbation rate (Pert.,%\downarrow) of different hard-label attack algorithms on three models for text classification under a query budget of 100.
Dataset HLBB TextHoaxer LeapAttack TextHacker LimeAttack
ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow
SNLI 24.9 8.3 24.7 8.3 28.3 8.3 22.8 8.3 29.1 8.4
MNLIm 41.9 7.8 40.9 7.7 49.1 7.7 38.2 7.8 49.7 7.7
MNLImm 47.8 7.5 45.6 7.6 56.0 7.6 44.3 7.7 56.3 7.6
Table 2: The ASR.,%\uparrow and Pert.,%\downarrow of LimeAttack and other baselines on BERT for textual entailment under a query budget of 100.
Query Budget.

As illustrated in Figure 3, LimeAttack still maintains a stable attack success rate and a smoother attack curve under different query budgets, which means that regardless of high or low query budget, LimeAttack often have a stable and excellent attack performance. The trend of perturbation rate are listed in Appendix N. Comparing the attack performance in low query and high query budgets can provide a more comprehensive evaluation. However, attack without considering the query budget is more of an ideal situation, it shows the upper limit of an attack algorithm. A large number of queries are expensive, we believe attack performance under low query budget is more practical. We also list some attack success rates and perturbation rates of different attacks under the query budget is 2000 in Appendix N.

Refer to caption
Figure 3: Attack success rate of different attacks under different query budgets on CNN-MR.
Adversary Quality.

High-quality adversarial examples should be both fluent and context-aware, while also being similar to benign samples to evade human detection. We utilize Language-Tools444https://www.languagetool.org/ and USE to detect grammatical errors and measure semantic similarity. As shown in Table 3, LimeAttack has the lowest perturbation rate and grammatical error, though its semantic similarity is lower than HLBB, TextHoaxer, and LeapAttack. Because these methods take the similarity into account during the attack, thus LimeAttack exhibits lower similarity than other methods. Considering all metrics, LimeAttack is still dominant. To intuitively contrast the quality of adversarial examples, some qualitative examples are provided in Appendix I.

Attack ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Gram.normal-↓\downarrow
HLBB 23.0 5.8 99.2 1.6
TextHoaxer 24.9 5.8 99.2 1.7
LeapAttack 26.1 5.8 99.1 1.5
TextHacker 25.4 6.3 96.0 1.9
LimeAttack 27.8 5.7 96.4 1.5
Table 3: ASR.,%\uparrow, Pert.,%\downarrow, Sim.,%\uparrow and Gram.,\downarrow of different hard-label attack algorithms on SST-2 dataset for BERT under query budget of 100.
Evaluation on Large Language Models.
Model(size) ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Acc.normal-↑\uparrow
BART-L (407M) 42.0 5.15 93.7 87.0
DeBERTa-L (435M) 52.0 5.82 92.9 79.0
T5-L (780M) 28.0 5.59 95.1 93.0
GPT3(175B) 61.0 4.82 95.2 82.0
ChatGPT (175B) 25.0 5.62 95.3 92.0
Table 4: The evaluation of LimeAttack on large language models. We attack these large language models on MR dataset under query budget of 100.

Large language models (LLMs), also known as foundation models (Bommasani et al. 2021), have achieved impressive performance on various natural language processing tasks. However, their robustness to adversarial examples remains unclear (Wang et al. 2023). To evaluate the effectiveness of LimeAttack on LLMs, we select some popular models such as DeBERTa-L (Kojima et al. 2022), BART-L (Lewis et al. 2019), Flan-T5 (Raffel et al. 2020), GPT-3 (text-davinci-003) and ChatGPT (gpt-3.5-turbo) (Brown et al. 2020). Due to the limited API calls, we sample 100 texts from MR datasets and attacked the zero-shot classification task of these models. As Table 4 shows, LimeAttack successfully attacked most LLMs under tight query budgets. Although these models have high accuracy on zero-shot tasks, their robustness to adversarial examples still needs to be improved. ChatGPT and T5-L are more robust to adversarial examples. The robustness of the victim model is related to origin accuracy. The higher the origin accuracy, the stronger the victim model’s ability to defense adversarial examples. Further analysis of other hard-label attacks and experimental details are discussed in Appendix F.

Defense Method HLBB TextHoaxer LeapAttack TextHacker LimeAttack
ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow
None 24.9 8.3 24.7 8.3 28.3 8.3 22.8 8.3 29.1 8.4
A2T 20.6 9.3 21.4 9.5 23.5 9.4 19.8 9.1 24.5 9.4
ASCC 13.2 6.5 13.4 6.5 14.3 6.4 12.5 7.2 15.8 6.7
Table 5: The evaluation of hard-label attacks on defense methods based on BERT-SNLI under query budget of 100.
Attack Performance on Defense Methods.

To evaluate the effectiveness of LimeAttack on defense methods, we use A2T (Yoo and Qi 2021) and ASCC (Dong et al. 2021) to enhance the defense ability of BERT on SNLI, and conducted attack experiments on this basis. As shown in Table 5, LimeAttack still has a certain attack effect and outcomes other baselines on these defense methods. More attack performance on defense methods are listed in Appendix M.

Ablation Study

Effect of Word Importance Ranking.

To validate the effectiveness of word importance ranking, we removed the word importance ranking strategy and instead randomly selected words to perturb to evaluate its effectiveness. Table 6 shows that without the word importance ranking, the attack success rate decreased by 9% and 6% on the MR and SST-2 datasets, respectively. Furthermore, adversarial examples generated by random selection had higher perturbation rates and required more queries. This indicates the importance of the word importance ranking in guiding LimeAttack to focus on crucial words, leading to a more efficient attack with lower perturbation rates.

Effect of Sampling Rules.

To verify the effectiveness of LimeAttack’s sampling rules, we will replace this strategy with one of three common sampling rules: (1) selecting b𝑏bitalic_b adversarial examples with the highest semantic similarity, (2) selecting b𝑏bitalic_b adversarial examples with the lowest semantic similarity, or (3) randomly selecting b𝑏bitalic_b adversarial examples. The results in Table 7 show that LimeAttack outperforms other sampling rules with a higher attack success rate and lower perturbation rate. Additionally, it has a comparable (second highest) semantic similarity and number of queries.

MR SST-2
Random LIME Random LIME
Pert.normal-↓\downarrow 6.1 5.6 6.4 5.9
ASR.normal-↑\uparrow 30.1 39.3 32.1 36.5
Sim.normal-↑\uparrow 94.6 94.8 94.2 94.6
Query.normal-↓\downarrow 157.2 153.3 148.1 132.5
Table 6: Comparison between word importance ranking learned by LIME and random selecting for BERT under query budget of 1000.
Sample Rule ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Query.normal-↓\downarrow
Method 1 35.8 5.76 95.02 164.65
Method 2 31.5 6.13 93.79 87.45
Method 3 32.1 6.09 94.50 107.05
LimeAttack 39.3 5.65 94.81 153.03
Table 7: Comparison between different sample rules on MR dataset for BERT under query budget of 1000.

Human Evaluation

We selected 200 adversarial examples BERT-MR. Each adversarial example was evaluated by two human judges for semantic similarity, fluency and prediction accuracy. The entire human evaluation is consistent with TextFooler (Jin et al. 2020). In detail, we ask human judges to put a 5-point Likart scale (1-5 corresponds to very not fluent/similar, not fluent/similar, uncertain, fluent/similar, very fluent/similar respectively) to evaluate the the similarity and fluency of adversarial examples and benign samples. The results are listed in the Table 8, semantic similarity is 4.5, which means adversarial samples are similar to original sample. The prediction accuracy here is to make humans to predict what the label of this sentence is (such as it is positive or negative for sentiment analysis). 76.7% means majorities of adversarial examples have the same attribute as original samples from humans’ perspective but mistake victim model.

Ori Adv
Prediction Accuracy 81.2% 76.7%
Fluency 4.4 4.1
Semantic Similarity 4.5
Table 8: The semantic similarity, fluency and prediction accuracy of original texts and adversarial examples evaluated by human judges for BERT-MR.

Fazit

In this work, we summarize the previous score-based attacks and hard-label attacks and propose a novel hard-label attack algorithm called LimeAttack. LimeAttack adopts a local explainable method to approximate the word importance ranking, and then utilizes beam search to generate high-quality adversarial examples with tiny query budget. Experiments show that LimeAttack achieves a higher attack success rate than other hard-label attacks. In addition, we have evaluated LimeAttack’s attack performance on large language models and some defense methods. The adversarial examples crafted by LimeAttack are high-quality, high transferable and improves victim model’s robustness in adversarial training. LimeAttack has verified the effectiveness of inside-to-outside attack path in hard-label. Then many excellent score-based attacks may provide hard-label attacks more insight.

References

  • Bommasani et al. (2021) Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  • Bowman et al. (2015) Bowman, S. R.; Angeli, G.; Potts, C.; and Manning, C. D. 2015. large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 632–642.
  • Brown et al. (2020) Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. 2020. Language models are few-shot learners. In Advances in neural information processing systems, volume 33, 1877–1901.
  • Carlini and Wagner (2018) Carlini, N.; and Wagner, D. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops, 1–7.
  • Chai et al. (2023) Chai, Y.; Liang, R.; Samtani, S.; Zhu, H.; Wang, M.; Liu, Y.; and Jiang, Y. 2023. Additive Feature Attribution Explainable Methods to Craft Adversarial Attacks for Text Classification and Text Regression. IEEE Transactions on Knowledge and Data Engineering.
  • Devlin et al. (2019) Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186.
  • Dong et al. (2021) Dong, X.; Luu, A. T.; Ji, R.; and Liu, H. 2021. Towards robustness against natural language word substitutions. International Conference on Learning Representations.
  • Goodman, Zhonghou et al. (2020) Goodman, D.; Zhonghou, L.; et al. 2020. FastWordBug: A fast method to generate adversarial text against NLP applications. arXiv preprint arXiv:2002.00760.
  • Hochreiter and Schmidhuber (1997) Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term memory. In Neural Computation, volume 9, 1735–1780.
  • Jiang et al. (2020) Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; and Zhao, T. 2020. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2177–2190.
  • Jin et al. (2020) Jin, D.; Jin, Z.; Zhou, J. T.; and Szolovits, P. 2020. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 8018–8025.
  • Kim (2014) Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1746–1751.
  • Kojima et al. (2022) Kojima, T.; Gu, S. S.; Reid, M.; Matsuo, Y.; and Iwasawa, Y. 2022. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.
  • Kurakin, Goodfellow, and Bengio (2016) Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.
  • Kurakin, Goodfellow, and Bengio (2018) Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2018. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security, 99–112.
  • Lewis et al. (2019) Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  • Li et al. (2020) Li, L.; Ma, R.; Guo, Q.; Xue, X.; and Qiu, X. 2020. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 6193–6202.
  • Lundberg and Lee (2017) Lundberg, S. M.; and Lee, S.-I. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems.
  • Ma, Shi, and Guan (2020) Ma, G.; Shi, L.; and Guan, Z. 2020. Adversarial Text Generation via Probability Determined Word Saliency. In International Conference on Machine Learning for Cyber Security, 562–571.
  • Maheshwary, Maheshwary, and Pudi (2021) Maheshwary, R.; Maheshwary, S.; and Pudi, V. 2021. Generating natural language attacks in a hard label black box setting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 13525–13533.
  • Minaee et al. (2021) Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; and Gao, J. 2021. Deep learning–based text classification: a comprehensive review. In ACM Computing Surveys, volume 54, 1–40.
  • Morris et al. (2020) Morris, J.; Lifland, E.; Yoo, J. Y.; Grigsby, J.; Jin, D.; and Qi, Y. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 119–126.
  • Mrkšić et al. (2016) Mrkšić, N.; Ó Séaghdha, D.; Thomson, B.; Gašić, M.; Rojas-Barahona, L. M.; Su, P.-H.; Vandyke, D.; Wen, T.-H.; and Young, S. 2016. Counter-fitting Word Vectors to Linguistic Constraints. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 142–148.
  • Pang and Lee (2005) Pang, B.; and Lee, L. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 115–124.
  • Papernot et al. (2017) Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z. B.; and Swami, A. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 506–519.
  • Raffel et al. (2020) Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. In The Journal of Machine Learning Research, volume 21, 5485–5551.
  • Ribeiro, Singh, and Guestrin (2016) Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135–1144.
  • Shrikumar et al. (2016) Shrikumar, A.; Greenside, P.; Shcherbina, A.; and Kundaje, A. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713.
  • Socher et al. (2013) Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C. D.; Ng, A. Y.; and Potts, C. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–1642.
  • Štrumbelj and Kononenko (2014) Štrumbelj, E.; and Kononenko, I. 2014. Explaining prediction models and individual predictions with feature contributions. In Knowledge and information systems, volume 41, 647–665.
  • Szegedy et al. (2013) Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
  • Wang et al. (2021) Wang, B.; Xu, C.; Wang, S.; Gan, Z.; Cheng, Y.; Gao, J.; Awadallah, A. H.; and Li, B. 2021. Adversarial glue: A multi-task benchmark for robustness evaluation of language models. arXiv preprint arXiv:2111.02840.
  • Wang et al. (2023) Wang, J.; Hu, X.; Hou, W.; Chen, H.; Zheng, R.; Wang, Y.; Yang, L.; Huang, H.; Ye, W.; Geng, X.; et al. 2023. On the robustness of chatgpt: An adversarial and out-of-distribution perspective. In arXiv preprint arXiv:2302.12095.
  • Williams, Nangia, and Bowman (2018) Williams, A.; Nangia, N.; and Bowman, S. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1112–1122.
  • Yang et al. (2021) Yang, X.; Liu, W.; Tao, D.; and Liu, W. 2021. BESA: BERT-based Simulated Annealing for Adversarial Text Attacks. In International Joint Conference on Artificial Intelligence, 3293–3299.
  • Ye et al. (2022a) Ye, M.; Chen, J.; Miao, C.; Wang, T.; and Ma, F. 2022a. LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2307–2315.
  • Ye et al. (2022b) Ye, M.; Miao, C.; Wang, T.; and Ma, F. 2022b. TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 3877–3884.
  • Yoo et al. (2020) Yoo, J. Y.; Morris, J.; Lifland, E.; and Qi, Y. 2020. Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 323–332.
  • Yoo and Qi (2021) Yoo, J. Y.; and Qi, Y. 2021. Towards Improving Adversarial Training of NLP Models. Findings of the Association for Computational Linguistics: EMNLP.
  • Yu et al. (2022) Yu, Z.; Wang, X.; Che, W.; and He, K. 2022. Learning-based Hybrid Local Search for the Hard-label Textual Attack. arXiv preprint arXiv:2201.08193.
  • Zhang, Zhao, and LeCun (2015) Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems.
  • Zhu, Zhao, and Wu (2023) Zhu, H.; Zhao, Q.; and Wu, Y. 2023. BeamAttack: Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces. arXiv preprint arXiv:2303.07199.

Appendix A Appendix A: Victim Model and Datasets

In our experiments, we carry out all experiments on NVIDIA Tesla V100 16G GPU. We adopt three neural networks CNN,LSTM and BERT from TextFooler. The CNN consists of three window sizes of 3, 4, and 5, and 100 filters for each window size. The LSTM consists of a bidirectional LSTM layer with 150 hidden states. Both CNN and LSTM have a dropout rate of 0.3 and 200-dimensional Glove word embeddings pre-trained on 6B tokens. The BERTbase𝑏𝑎𝑠𝑒{}_{base}start_FLOATSUBSCRIPT italic_b italic_a italic_s italic_e end_FLOATSUBSCRIPT consists of 12 layers with 768 units and 12 heads. The origin accuracy of victim models are listed in Table 10. Detailed datasets are listed in Table 10. We select different text length and different classes datasets.

Table 9: The original accuracy of victim model on various data sets.
Table 10: Overview of datasets and NLP tasks.
Dataset CNN LSTM BERT
MR 78.0 80.7 86.0
SST-2 82.7 84.5 92.4
AG 91.5 91.3 94.2
Yahoo 73.7 73.7 79.1
SNLI - - 89.1
MNLIm - - 85.1
MNLImm - - 82.1
Task Dataset Train Test Classes Length
Classification MR 9K 1K 2 18
SST-2 70K 2K 2 8
AG 120K 8K 4 43
Yahoo 12K 4K 10 151
Entailment SNLI 570K 3K 3 20
MNLI(m/mm) 433K 10K 3 11
Table 10: Overview of datasets and NLP tasks.

Appendix B Appendix B: The Effectiveness of LIME in Score-based Attacks

Traditional score-based attacks utilize deletion-based methods to calculate word importance ranking. They drop a word xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the benign sample X𝑋Xitalic_X and query the victim model \mathcal{F}caligraphic_F with the new sample X/xi=[x1,x2,,xi1,xi+1,,xn]𝑋subscript𝑥𝑖subscript𝑥1subscript𝑥2subscript𝑥𝑖1subscript𝑥𝑖1subscript𝑥𝑛X/x_{i}=[x_{1},x_{2},\cdots,x_{i-1},x_{i+1},\cdots,x_{n}]italic_X / italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]. The difference in the model’s confidence score before and after deletion reflects the importance of this word:

I(xi)=(X)(X/xi)𝐼subscript𝑥𝑖𝑋𝑋subscript𝑥𝑖I(x_{i})=\mathcal{F}(X)-\mathcal{F}(X/x_{i})italic_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_F ( italic_X ) - caligraphic_F ( italic_X / italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (8)

To verify the effectiveness of local explainable method, we replace deletion-based method with local explainable method in the score-based attack. We test on MR data set and results are shown in the Table 11. Local explainable method and deletion-based method achieve similar attack success rate, but deletion-based method achieves lower perturbation rate than local explainable method. Because the probability distribution of the model’s output is available, the influence of each word on the output can be well reflected by deletion-based method. Therefore, compared with score-based attacks, we think local explainable methods can play a greater advantage in hard-label attacks where deletion-based method is useless.

Table 11: The comparison with deletion-based method. ASR.,%\uparrow is attack success rate and Pert.,%\downarrow is perturbation rate.
Dataset Victim Models Deletion-based LIME
ASR.normal-↑\uparrow Pert.normal-↓\downarrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow
MR CNN 1.0 11.9 1.0 12.4
LSTM 0.6 12.3 0.6 12.8
BERT 8.2 16.3 8.1 17.4

Appendix C Appendix C: The Effectiveness of Beam Size b𝑏bitalic_b

Beam size b𝑏bitalic_b directly determines the size of search space. Bigger search space is significant to generate the optimal solution (e.g., lower perturbation rate and higher semantic similarity), while it also requires a lot of model queries. Therefore, how to select an appropriate beam size to balance the query and attack success rate. As shown in the Figure  4, We test on MR and SST-2 data sets using BERT with different beam size. With the increase of beam size b𝑏bitalic_b, the search space is effectively expanded, and the attack success rate and the quality of adversarial examples (the perturbation rate is reduced) are improved. With the further increase of beam size b𝑏bitalic_b, the query also gradually increases, resulting in the decrease of attack success rate. Considering the comprehensive effect, we set the beam size b=10𝑏10b=10italic_b = 10.

Refer to caption
(a) Attack success rate.
Refer to caption
(b) Perturbation rate.
Refer to caption
(c) Semantic similarity.
Figure 4: The attack success rate (%) \uparrow, perturbation rate (%) \downarrow and semantic similarity(%) \uparrow LimeAttack on BERT using MR and SST-2 dataset under different beam size b𝑏bitalic_b

Appendix D Appendix D: Transferability

The transferability of adversarial examples reveals the property that adversarial examples crafted by a particular victim model can also fool another. In detail, we calculate the prediction accuracy against the CNN and LSTM models on adversarial examples crafted for attacking BERT on MR dataset. As shown in the Figure 5, adversarial examples generated by LimeAttack achieves higher transferability than baselines. It reduces the prediction accuracy of CNN and LSTM models from 80.7%,78.0% to 58.5%, 58.4% respectively.

Refer to caption
Figure 5: Transferability of adversarial examples on MR dataset for BERT. Lower accuracy indicates higher transferability.

Appendix E Appendix E: Adversarial Training

Adversarial training is a prevalent technique to improve the victim model’s robustness by adding adversarial examples into the training data. We randomly selected 1000 adversarial examples from the MR dataset, retrained the CNN model, and then attacked the CNN model again. The results are shown in the Table 12, after adversarial training, the CNN model achieves higher test accuracy. In addition, LimeAttack’s attack success rate has decreased by 3% with the cost of more queries and a higher perturbation rate. Adversarial examples generated by LimeAttack effectively improve the victim model’s robustness and generalization.

Table 12: The performance of CNN model with(out) adversarial training on the MR dataset.
Ori Acc.normal-↑\uparrow ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Query.normal-↓\downarrow
Original 80.27 38.18 3.90 97.00 22.21
+Adv.Training 81.53 35.09 3.94 97.01 24.90

Appendix F Appendix F: Large Language Models

Einstellungen

In this section, we provide a brief introduction to the large language models used in our experiments.

  • BART-L BART is a transformer-based model that can handle both generation and understanding tasks. It is trained on a combination of auto-regressive and denoising objectives, which is primarily focused on understanding tasks.

  • DeBERTa-L DeBERTa enhances BERT with a disentangled attention mechanism and an improved decoding scheme. This allows it to capture contextual information between different tokens more effectively and generate higher quality natural language sentences.

  • Flan-T5 Flan-T5 uses a text-to-text approach where both input and output are natural language sentences, enabling it to perform a variety of tasks including text generation, summarization, and classification. By taking an input sentence as a prompt, Flan-T5 can accomplish common NLP tasks.

  • Text-davinci-003 and ChatGPT are based on GPT3 and GPT3.5. They can perform any task by natural language inputs and produce higher quality and more faithful output.

In order to ensure the stability of the output of large language models, we use the same prompt for each models under zero-shot text classification task: Please classify the following sentence into either positive or negative. Answer me with ”positive” or ”negative”, just one word.

Discuss

Generalization Error.

In this subsection, we provide some analysis of models’ generalization error. which is also known as the out-of-sample error. It is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Let \mathcal{F}caligraphic_F is a finite hypothesis set, m𝑚mitalic_m is the number of training samples, for each f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F, probably approximately correct (PAC) theory reveals that:

P(|𝔼(f)𝔼^(f)|ln||+ς2m)1δ𝑃𝔼𝑓^𝔼𝑓𝑙𝑛𝜍2𝑚1𝛿P\Bigg{(}|\mathbb{E}(f)-\hat{\mathbb{E}}(f)|\leq\sqrt{\frac{ln|\mathcal{F}|+% \varsigma}{2m}}\Bigg{)}\geq 1-\deltaitalic_P ( | blackboard_E ( italic_f ) - over^ start_ARG blackboard_E end_ARG ( italic_f ) | ≤ square-root start_ARG divide start_ARG italic_l italic_n | caligraphic_F | + italic_ς end_ARG start_ARG 2 italic_m end_ARG end_ARG ) ≥ 1 - italic_δ (9)

where 𝔼(f)𝔼𝑓\mathbb{E}(f)blackboard_E ( italic_f ) and 𝔼^(f)^𝔼𝑓\hat{\mathbb{E}}(f)over^ start_ARG blackboard_E end_ARG ( italic_f ) are the ideal and empirical risk on classifier f𝑓fitalic_f. According to the Table 6 in the main text, the robustness of the victim model is related to origin accuracy. The higher the origin accuracy, the stronger the victim model’s ability to defense adversarial examples. Generalization error relies on two factors: the training sample size (m𝑚mitalic_m) and the hypothesis space (\mathcal{F}caligraphic_F). Large language models, like ChatGPT, excel in performance due to their extensive training data (large m𝑚mitalic_m). Moreover, although the hypothesis set (\mathcal{F}caligraphic_F) is finite, increasing m𝑚mitalic_m and |||\mathcal{F}|| caligraphic_F | can lead to reduced generalization errors. This observation helps elucidate why such models excel in zero-shot classification for certain tasks.

Attack ChatGPT.

To validate the attack effectiveness of hard-label attack algorithms in the real world, we evaluate the attack performance of LimeAttack, HLBB, LeapAttack, TextHoaxer and TextHacker on ChatGPT. Due to OpenAI’s limit on the number of APIs calls, we select 20 adversarial examples generated by different hard-label attack algorithms which attack bert on the MR dataset, and input them into ChatGPT to observe if they produced opposite results compared to the original samples. As shown in Table 13, LimeAttack achieves higher attack success rate, generates higher quality adversarial examples than other methods when facing real world APIs under tight query budget.

Table 13: Attack success rate (ASR., %), perturbation rate (Pert., %), semantic similarity (Sim., %) of various hard-label attacks on ChatGPT under the query budget of 100.
Attack ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow
HLBB 10.0 3.70 96.80
LeapAttack 20.0 8.57 88.85
TextHoaxer 10.0 4.61 89.71
TextHacker 20.0 7.61 90.21
LimeAttack 20.0 4.51 95.30

Appendix G Appendix G: Significance Test

We have added a t-test and listed the mean, variance, and p-value of LimeAttack against other methods on the success rate in the Table 14. LimeAttack has run with five additional seeds and take the average, which is consistent with other baselines. As shown in the Table 14, LimeAttack has achieved better results than other baselines under a tight query budget.

Table 14: The mean, variance, and p-value of LimeAttack against other methods on the success rate in 5 runs.
Model_dataset LimeAttack HLBB TextHoaxer LeapAttack TextHacker
Mean Variance p-value p-value p-value p-value
CNN_MR 49.9 9.00E-02 2.74E-05 2.38E-05 1.19E-05 8.20E-02
LSTM_MR 47.6 2.50E-01 8.98E-05 3.45E-05 4.78E-05 5.69E-03
BERT_MR 29.2 1.42E-01 2.85E-02 6.70E-02 2.37E-02 2.37E-02
CNN_SST 42.8 2.91E-01 1.58E-05 1.24E-04 4.42E-04 1.24E-04
LSTM_SST 40.1 8.02E-01 1.53E-03 5.66E-03 6.41E-02 3.30E-03
BERT_SST 27.8 4.22E-02 1.73E-05 3.39E-05 5.71E-05 4.17E-05
CNN_AG 20.9 2.28E-01 1.38E-02 2.27E-03 6.30E-01 3.09E-01
LSTM_AG 17.3 5.18E-02 1.50E-03 5.07E-04 1.38E-02 3.17E-01
BERT_AG 14.6 1.02E-02 4.41E-03 5.16E-04 3.77E-02 5.86E-03
CNN_Yahoo 43.7 1.56E-01 6.30E-02 1.26E-03 2.68E-03 1.82E-04
LSTM_Yahoo 40.3 4.22E-02 1.39E-03 3.45E-04 5.49E-04 2.69E-04
BERT_Yahoo 37.4 2.25E-02 4.22E-01 1.59E-03 4.04E-03 8.47E-04

Appendix H Appendix H: LimeAttack Algorithm

The all process of LimeAttack’s algorithm is summarized in algo 1.

Algorithm 1 The LimeAttack algorithm

Input: Original text X𝑋Xitalic_X,target model \mathcal{F}caligraphic_F
Output: Adversarial example Xadvsubscript𝑋advX_{\text{adv}}italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT

1:  XadvXsubscript𝑋adv𝑋X_{\text{adv}}\leftarrow Xitalic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ← italic_X
2:  set(Xadv)Xadv𝑠𝑒𝑡subscript𝑋advsubscript𝑋advset({X_{\text{adv}}})\leftarrow X_{\text{adv}}italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) ← italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT
3:  Compute the importance score I(xi)𝐼subscript𝑥𝑖I(x_{i})italic_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) by LIME
4:  Sort the words with importance score I(xi)𝐼subscript𝑥𝑖I(x_{i})italic_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
5:  for i=1𝑖1i=1italic_i = 1 to n𝑛nitalic_n do
6:     Generate the candidate set 𝒞(xi)𝒞subscript𝑥𝑖\mathcal{C}(x_{i})caligraphic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
7:  end for
8:  for Xadvsubscript𝑋advX_{\text{adv}}italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT in set(Xadv)𝑠𝑒𝑡subscript𝑋advset(X_{\text{adv}})italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) do
9:     i𝑖iitalic_i \leftarrow index of the original word
10:     for cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in 𝒞(xi)𝒞subscript𝑥𝑖\mathcal{C}(x_{i})caligraphic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) do
11:        Xadvsubscriptsuperscript𝑋advX^{\prime}_{\text{adv}}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT \leftarrow Replace xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in Xadvsubscript𝑋advX_{\text{adv}}italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT
12:        hinzufügen Xadvsubscriptsuperscript𝑋advX^{\prime}_{\text{adv}}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT to the set(Xadv)𝑠𝑒𝑡subscript𝑋advset(X_{\text{adv}})italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT )
13:     end for
14:     for  Xadvsubscriptsuperscript𝑋advX^{\prime}_{\text{adv}}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT in set(Xadv)𝑠𝑒𝑡subscript𝑋advset(X_{\text{adv}})italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) do
15:        if  (Xadv)ytruesubscriptsuperscript𝑋advsubscript𝑦𝑡𝑟𝑢𝑒\mathcal{F}(X^{\prime}_{\text{adv}})\neq y_{true}caligraphic_F ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) ≠ italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT  then
16:           return Xadvsubscriptsuperscript𝑋advX^{\prime}_{\text{adv}}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT with highest semantic similarity
17:        end if
18:     end for
19:     set(Xadv)𝑠𝑒𝑡subscript𝑋advset(X_{\text{adv}})italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) \leftarrow Sample b𝑏bitalic_b adversarial examples in set(Xadv)𝑠𝑒𝑡subscript𝑋advset(X_{\text{adv}})italic_s italic_e italic_t ( italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ) by rules
20:  end for
21:  return adversarial examples Xadvsubscript𝑋advX_{\text{adv}}italic_X start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT

Appendix I Appendix I: Qualitative Examples

More qualify adversarial examples are listed in Table 20-28

Appendix J Appendix J: Limitation

  • Exploring more LLMs. Due to limited resources, this paper only tests some popular large language models. However, there are other victim models based on other LLMs, e.g.LLaMA. Hence, more victim models based on more LLMs might be studied.

  • More NLP tasks. In this paper, we only attack some classification tasks (e.g., text classification, textual entailment and zero-shot classification). It is interesting to attack other NLP applications, such as dialogue, text summarization, and machine translation.

Appendix K Appendix K: Semanticc Similarity of Different Attack Algorithms

We have added semantic similarity in Table 15. Some baselines take the similarity into account during the attack, thus LimeAttack exhibits lower similarity than other methods. Considering all metrics, LimeAttack is still dominant.

Table 15: The semanticc similarity of different attack algorithms.
HLBB TextHoaxer LeapAttack TextHacker LimeAttack
MR CNN 97.20 97.11 97.17 94.56 95.21
LSTM 97.27 97.27 97.22 95.01 95.31
BERT 97.13 97.16 97.09 94.16 94.77
SST CNN 97.18 97.22 97.14 94.02 94.41
LSTM 97.22 97.21 97.18 94.58 94.69
BERT 97.22 97.07 97.13 93.77 94.56
AG CNN 97.64 97.62 97.62 95.71 96.27
LSTM 97.64 97.58 97.62 95.46 96.11
BERT 97.57 97.61 97.56 95.14 96.53
Yahoo CNN 97.75 97.72 97.71 95.33 96.21
LSTM 97.71 97.66 97.67 95.41 96.41
BERT 97.73 97.68 97.63 95.12 96.55

Appendix L Appendix L: Comparison with Score-based Attacks

Since LimeAttack follows the two-stage strategies samed from score-based attacks, we also take some classic score-based attacks for reference. LimeAttack and these score-based attacks have exactly the same settings. In addition, score-based attacks can obtain the probability distribution of the output, while LimeAttack does not. Therefore, we do not limit query budgets for LimeAttack and score-based attacks. As shown in Table 16, LimeAttack still achieves a higher attack success rate and semantic similarity in most cases. LimeAttack’s superiority can be attributed to its focus on crucial words through the learned word importance ranking and the expanded search space with the introduction of beam search. However, LimeAttack requires more queries to compute word importance rankings because it lacks a probability distribution for the output. This situation is more obvious in long texts.

Table 16: Comparison with other score-based attack. ASR.,%\uparrow is attack success rate, Pert.,%\downarrow is perturbation rate, Sim.,%\uparrow is semantic similarity and Query.,\downarrow is model queries.
Dataset Model Attack ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Query.normal-↓\downarrow Dataset Model Attack ASR.normal-↑\uparrow Pert.normal-↓\downarrow Sim.normal-↑\uparrow Query.normal-↓\downarrow
MR CNN TF 60.9 5.88 94.21 51.84 AG CNN TF 32.1 5.96 94.65 43.67
PWWS 62.4 5.88 92.34 144.37 PWWS 32.1 5.94 94.85 47.68
Bert-Attack 46.3 5.75 94.51 28.25 LimeAttack 38.1 4.55 96.53 879.23
LimeAttack 62.5 5.60 95.33 268.94 LSTM TF 30.5 5.51 95.40 46.93
LSTM TF 65.8 5.63 94.56 49.96 PWWS 32.1 5.94 94.85 47.68
Bert-Attack 50.2 5.77 94.4 28.53 LimeAttack 35.4 4.55 96.13 975.35
Limeattack 61.2 5.51 95.44 253.07 SST-2 CNN TF 51.0 5.96 93.83 51.67
BERT TF 46.5 5.68 94.43 51.48 LimeAttack 51.0 5.99 94.90 150.08
Bert-Attack 35.0 5.82 94.64 28.59 LSTM TF 52.1 5.93 93.54 50.7
LimeAttack 47.6 5.59 94.99 821.28 LimeAttack 50.5 6.13 94.70 320.45

Appendix M Appendix M: Evaluation on Defense Methods

We used A2T (The core part of A2T is a new and cheaper word substitution attack optimized for adversarial training) and ASCC to enhance the defense ability of BERT on MR and SST datasets, and conducted attack experiments on this basis. As shown in Table 17. Even after adversarial training and enhancement, our algorithm still has a certain attack effect on these defense methods. Compared with A2T, ASCC has better defense effect and improves a certain degree of model robustness.

Table 17: The attack performance of different attack algorithms on A2T and ASCC defense methods and original target models in BERT-MR and BERT-SST.
origin BERT-MR A2T ASCC origin BERT-SST A2T ASCC
ASR PERT ASR PERT ASR PERT ASR PERT ASR PERT ASR PERT
HLBB 26.6 5.6 23.5 5.6 20.1 5.6 23.0 5.8 21.3 6.0 19.3 6.1
TextHoaxer 27.0 5.5 24.3 5.6 21.2 5.7 24.9 5.8 21.8 5.9 20.1 5.9
LeapAttack 26.5 5.4 24.0 5.6 22.3 5.6 26.1 5.8 21.7 5.9 19.6 6.1
TextHoaxer 26.5 6.5 24.1 6.6 22.5 6.6 25.4 6.3 22.1 6.3 19.1 6.6
LimeAttack 29.2 5.9 25.7 5.8 23.4 5.8 27.8 5.7 22.7 5.9 20.3 6.1

Appendix N Appendix N: Convergence of Attack Performance

convergence of attack success rate

We have conduct further evaluations on defense methods to validate their effectiveness. As shown in Table 18, LimeAttack achieves better attack success rate than other attacks. Attack success rate without considering the query budget is more of an ideal situation. It shows the upper limit of an attack algorithm. High query budget is equivalent to traverse the solution space and will approximate the asr and pert upper limit of victim model; However, asr and pert will interact with each other, resulting in the upper limit of asr and pert not being in the same direction. Therefore, for some victim models (LSTM-AG and BERT-Yahoo), limeattack’s pert is the lowest, but not the optimal asr (very close).

Table 18: Different attack algorithms on different model and datasets under query is 1000.
CNN_MR CNN_SST LSTM_MR LSTM_SST LSTM_AG BERT_SST BERT_Yahoo
ASR PERT ASR PERT ASR PERT ASR PERT ASR PERT ASR PERT ASR PERT
HLBB 55.6 5.6 43.4 6.4 54.5 5.6 43.3 6.4 30.4 5.5 30.3 6.7 62.2 6.7
TextHoaxer 55.6 5.4 43.9 6.4 52.9 5.4 45.5 6.3 31.1 5.8 35.9 6.6 63.2 6.6
LeapAttack 56.4 5.5 44.3 6.5 54.6 5.5 44.3 6.2 31.3 5.3 37.5 6.2 63.1 6.4
TextHoaxer 59.2 5.6 38.0 6.7 56.0 5.6 44.0 6.5 32.0 5.8 38.0 6.0 67.2 6.4
LimeAttack 59.4 5.7 48.6 6.0 59.3 5.5 45.5 5.9 31.2 5.3 42.5 6.1 66.0 6.2

convergence of perturbation rate

We list convergence behavior of different attack. As shown in the figure 6. Due to the use of complex optimization algorithms in previous algorithms, it does require a large number of queries to complete this part of optimization; Therefore, previous algorithms often have a good perturbation rates.

Refer to caption
Figure 6: Perturbation rate of different attacks on CNN-MR.

Appendix O Appendix O: Comparison with SHAP and Non-linear Models

In a hard-label setting, model’s logits are unavailable and model query budget is tiny. We list the result of attack success rate of different word importance ranking calculation under different query budgets. As shown in the Table 19, compared to LIME, attack success rate and perturbation rate of SHAP or non-linear models do not have significant advantages in tiny query budgets. Considering the time complexity, we adopt LIME to calculate word importance ranking in the main text.

Table 19: Evaluation of different word importance ranking calculation on CNN-MR and BERT-SST under different query budgets.
query budgets 100 query budgets 2000
CNN-MR BERT-SST CNN-MR BERT-SST
ASR PERT ASR PERT ASR PERT ASR PERT
LIME 49.9 5.3 27.8 5.7 59.4 5.7 42.5 6.1
SHAP 49.7 5.2 27.7 5.7 61.2 5.8 44.3 6.3
Decision Tree 50.1 5.3 27.9 5.8 61.6 5.8 44.1 6.4
Table 20: The adversarial example crafted by different attack algorithms on CNN using SST-2 dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

It allows us hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker.

0
HLBB

It allows us hope that nolan is poised to incur a major career as a commercial yet ingenuity filmmaker.

2062
TextHoaxer

It allows us hope that nolan is poised to start a major career as a commercial yet contrivance filmmaker.

48
LeapAttack

It allows us hope that nolan is poised to embark a major career as a commercial yet contrivance filmmaker.

30
TextHacker

It allows us hope that nolan is readies to embark a major career as a commercial yet creative filmmaker.

101
LimeAttack

It allows us hope that nolan is poised to embark a major career as a commercial yet contrivance filmmaker.

43
Table 21: The adversarial example crafted by different attack algorithms on BERT using SST-2 dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

The acting,costumes,music,cinematogrtaphy and sound are all astounding given the production’s austere locales.

0
HLBB

The acting,costumes,music,cinematogrtaphy and sound are all stupendous given the production’s austere locales.

35
TextHoaxer

The acting,costumes,music,cinematogrtaphy and sound are all staggering given the production’s austere locales.

45
LeapAttack

the acting,costumes,music,cinematogrtaphy and sound are all astounding dispensed the production’s austere locales.

35
TextHacker

the provisonal,costumes,music,cinematogrtaphy and sound sunt all startling given the production’s stoic locales.

101
LimeAttack

the acting,costumes,music,cinematogrtaphy and sound are all staggering given the production’s austere locales.

25
Table 22: The adversarial example crafted by different attack algorithms on LSTM using Yahoo dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

In basketball whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until youve completed the exercise on both sides of the court.

0
HLBB

In basket whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until youve completed the exercise on both sides of the court.

6
TextHoaxer

In wildcats whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until youve completed the exercise on both sides of the court

6
LeapAttack

In wildcats whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until youve completed the exercise on both sides of the court.

6
TextHacker

In basketball whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until havent completed the exercise on both sides of the court.

101
LimeAttack

In basketballs whats a suicide? is it like running back and forth? its an exercise where you run the entire court touching down in intnervals until youve completed the exercise on both sides of the court.

39
Table 23: The adversarial example crafted by different attack algorithms on CNN using Yahoo dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent pioneer of indian nationalism freedom fighter and educationist the first indian to become member of british parliament 1862 congress president thrice the grand old man of india.

0
HLBB

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent groundbreaking of indian nationalistic freedom hunter and educationist the first indian to become member of british parliament 1862 congress president thrice the grand old man of india.

116
TextHoaxer

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent pioneer of indian nationalism freedom fighter and educationist the first indian to become member of british parliament 1862 congress president thrice the immense old man of indian.

440
LeapAttack

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent pioneer of indian nationalism liberty hunters and educationist the first indian to become member of british parliament 1862 congress president thrice the grand old man of india.

1411
TextHacker

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent pioneers of indian nationalism freedom fighter and educationist the first indian to become member of british chambre 1862 congress president thrice the grand old man of india.

101
LimeAttack

Who was the first indian who became the member of english parliament? dadabhai naoroji preeminent pioneer of indian nationalism freedom fighter and educationist the first indian to become member of british legislature 1862 congress president thrice the grand old man of india.

45
Table 24: The adversarial example crafted by different attack algorithms on CNN using MR dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

Those outside show business will enjoy a close look at people they do n’t really want to know.

0
HLBB

Those outside show business will enjoy a nearby look at people they do n’t really want to know.

2241
TextHoaxer

Those outside show business will recieve a close look at people they do n’t really want to know.

202
LeapAttack

Those outside show business will like a close glanced at people they do n’t really want to know.

1431
TextHacker

Those outside show companies will experience a close glance at volk they do n’t really want to know.

103
LimeAttack

Those outside show business will recieve a close glanced at people they do n’t really want to know

53
Table 25: The adversarial example crafted by different attack algorithms on LSTM using MR dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

I’m convinced i could keep a family of five blind , crippled , amish people alive in this situation better than these british soldiers do at keeping themselves kicking.

0
HLBB

I’m convinced i could keep a family of five blind , invalids , amish people alive in this situation better than these british soldiers do at keeping themselves kicking.

2110
TextHoaxer

I’m gratified i could keep a family of five blind , crippled , amish people alive in this situation better than these british soldiers do at keeping themselves kicking.

219
LeapAttack

I’m contented i could keep a family of five blind , paralytic, amish people alive in this plight better than these british soldiers do at keeping themselves kicking.

2162
TextHacker

I’m convinced i could keep a family of five blind , handicapped , amish people lively in this situation better than these british soldiers do at keeping themselves kicking.

101
LimeAttack

I’m gratified i could keep a family of five blind , crippled , amish people alive in this situation better than these british soldiers do at keeping themselves kicking.

50
Table 26: The adversarial example crafted by different attack algorithms on LSTM using AG dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

Spaniards to run luton airport after 551 m deal luton , cardiff and belfast international airports are to fall into the hands of a spanish toll motorways operator through a 551 m takeover of the aviation group tbi by a barcelona based abertis infrastructure.

0
HLBB

spaniards to executes luton airport after 551 m deal luton , cardiff and belfast international airports are to fall into the hands of a spanish toll motorways operator through a 551 m takeover of the aeroplanes group tbi by a barcelona based abertis infrastructure.

969
TextHoaxer

Spaniards to run luton airport after 551 m deal luton , cardiff and belfast international airports are to fall into the manaus of a spanish toll motorways exploiter through a 551 m coup of the aviation group tbi by a barcelona based abertis infrastructure.

727
LeapAttack

Spaniards to run luton airport after 551 m deal luton , cardiff and belfast international airports are to fall into the hands of a spanish toll motorways operator through a 551 m takeover of the aeroplanes group tbi by a barcelona based abertis infrastructure.

2148
TextHacker

Spaniards to implementing luton airport after 551 m deal luton , cardiff and belfast international airports represent to fall into the hands of a spanish toll motorways operator through a 551 m takeover of the aviation group tbi by a barcelona based abertis infrastructure.

101
LimeAttack

Spaniards to run luton luton after 551 m deal luton , cardiff and belfast international airports are to fall into the hands of a spanish toll motorways operator through a 551 m takeover of the aviation group tbi by a barcelona based abertis infrastructure.

92
Table 27: The adversarial example crafted by different attack algorithms on CNN using AG dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

Eisner says ovitz required oversight daily michael d eisner appeared for a second day of testimony in the shareholder lawsuit over the lucrative severance package granted to michael s ovitz.

0
HLBB

Eisner says ovitz required oversight daily michael d eisner appeared for a second weekly of testimony in the shareholder lawsuit over the lucrative severance package granted to michael s ovitz.

31
TextHoaxer

Eisner says ovitz required oversight daily michael d eisner appeared for a second day of testimony in the shareholder lawsuit over the interesting severance package granted to michael s ovitz.

48
LeapAttack

Eisner says ovitz needing oversight daily michael d eisner appeared for a second day of testimony in the shareholder lawsuit over the lucrative severance package granted to michael s ovitz.

14
TextHacker

Eisner says ovitz required surveillance everyday michael d eisner appeared for a second day of testimonies in the shareholder lawsuit over the rewarding severance package granted to michael s ovitz.

101
LimeAttack

Eisner says ovitz required oversight daily michael d eisner appeared for a second day of testimony in the proprietors lawsuit over the lucrative severance package granted to michael s ovitz.

34
Table 28: The adversarial example crafted by different attack algorithms on BERT using AG dataset. Replacement words are represented in red. Query.\downarrow is model query numbers.
Attack Texts Query.
No Attack

Cray promotes two execs ly huong pham becomes the supercomputer maker’s senior vice presdent of operations,and peter ungaro is made senior vice president for sales,marketing and services.

0
HLBB

Cray promotes two execs ly huong pham buys the supercomputer maker’s senior vice presdent of operations,and peter ungaro is made senior obscene chairperson for sales,marketing and services.

3811
TextHoaxer

Hucknall promotes two execs ly huong pham becomes the supercomputer maker’s senior vice president of surgical, and peter ungaro is made senior vice president for sales, marketing and services.

94
LeapAttack

Cray promotes two execs ly huong pham becomes the quadrillion maker’s senior vice presdent of operations,and peter ungaro is made senior vice president for sales,marketing and services.

42
TextHacker

Cray promotes two ceos ly huong pham becomes the supercomputer maker’s senior prostitution presdent of operations,and peter ungaro is made senior vice president for selling,marketing and services.

101
LimeAttack

Cray promotes two execs ly huong pham becomes the thermonuclear maker’s senior vice presdent of operations,and peter ungaro is made senior vice president for sales,marketing and services.

39