Attack Recipes CommandLine Use

We provide a number of pre-built attack recipes, which correspond to attacks from the literature.

Help: textattack --help

TextAttack’s main features can all be accessed via the textattack command. Two very common commands are textattack attack <args>, and textattack augment <args>. You can see more information about all commands using

textattack --help 

or a specific command using, for example,

textattack attack --help

The examples/ folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.

The documentation website contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..

Running Attacks: textattack attack --help

The easiest way to try out an attack is via the command-line interface, textattack attack.

Tip: If your machine has multiple GPUs, you can distribute the attack across them using the --parallel option. For some attacks, this can really help performance.

Here are some concrete examples:

TextFooler on BERT trained on the MR sentiment classification dataset:

textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100

DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset:

textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100

Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM:

textattack attack --model lstm-mr --num-examples 20 \
 --search-method beam-search^beam_width=4 --transformation word-swap-embedding \
 --constraints repeat stopword max-words-perturbed^max_num_words=2 embedding^min_cos_sim=0.8 part-of-speech \
 --goal-function untargeted-classification

Tip: Instead of specifying a dataset and number of examples, you can pass --interactive to attack samples inputted by the user.

Attacks and Papers Implemented (”Attack Recipes”): textattack attack --recipe [recipe_name]

We include attack recipes which implement attacks from the literature. You can list attack recipes using textattack list attack-recipes.

To run an attack recipe: textattack attack --recipe [recipe_name]

Attack Recipe Name Goal Function ConstraintsEnforced Transformation Search Method Main Idea

Attacks on classification tasks, like sentiment classification and entailment:
alzantot Untargeted {Classification, Entailment} Percentage of words perturbed, Language Model perplexity, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm from (["Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)](https://arxiv.org/abs/1804.07998))
bae Untargeted Classification USE sentence encoding cosine similarity BERT Masked Token Prediction Greedy-WIR BERT masked language model transformation attack from (["BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019)](https://arxiv.org/abs/2004.01970)).
bert-attack Untargeted Classification USE sentence encoding cosine similarity, Maximum number of words perturbed BERT Masked Token Prediction (with subword expansion) Greedy-WIR (["BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)](https://arxiv.org/abs/2004.09984))
checklist {Untargeted, Targeted} Classification checklist distance contract, extend, and substitutes name entities Greedy-WIR Invariance testing implemented in CheckList . (["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118))
clare Untargeted {Classification, Entailment} USE sentence encoding cosine similarity RoBERTa Masked Prediction for token swap, insert and merge Greedy ["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502))
deepwordbug {Untargeted, Targeted} Classification Levenshtein edit distance {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} Greedy-WIR Greedy replace-1 scoring and multi-transformation character-swap attack (["Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)](https://arxiv.org/abs/1801.04354)
fast-alzantot Untargeted {Classification, Entailment} Percentage of words perturbed, Language Model perplexity, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm Modified, faster version of the Alzantot et al. genetic algorithm, from (["Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)](https://arxiv.org/abs/1909.00986))
hotflip (word swap) Untargeted Classification Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed Gradient-Based Word Swap Beam search (["HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)](https://arxiv.org/abs/1712.06751))
iga Untargeted {Classification, Entailment} Percentage of words perturbed, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm Improved genetic algorithm -based word substitution from (["Natural Language Adversarial Attacks and Defenses in Word Level (Wang et al., 2019)"](https://arxiv.org/abs/1909.06723)
input-reduction Input Reduction Word deletion Greedy-WIR Greedy attack with word importance ranking , Reducing the input while maintaining the prediction through word importance ranking (["Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)](https://arxiv.org/pdf/1804.07781.pdf))
kuleshov Untargeted Classification Thought vector encoding cosine similarity, Language model similarity probability Counter-fitted word embedding swap Greedy word swap (["Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018)](https://openreview.net/pdf?id=r1QZ3zbAZ))
pruthi Untargeted Classification Minimum word length, Maximum number of words perturbed {Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap} Greedy search simulates common typos (["Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)](https://arxiv.org/abs/1905.11268)
pso Untargeted Classification HowNet Word Swap Particle Swarm Optimization (["Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.540/))
pwws Untargeted Classification WordNet-based synonym swap Greedy-WIR (saliency) Greedy attack with word importance ranking based on word saliency and synonym swap scores (["Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)](https://www.aclweb.org/anthology/P19-1103/))
textbugger : (black-box) Untargeted Classification USE sentence encoding cosine similarity {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} Greedy-WIR ([(["TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)](https://arxiv.org/abs/1812.05271)).
textfooler Untargeted {Classification, Entailment} Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity Counter-fitted word embedding swap Greedy-WIR Greedy attack with word importance ranking (["Is Bert Really Robust?" (Jin et al., 2019)](https://arxiv.org/abs/1907.11932))

Attacks on sequence-to-sequence models:
morpheus Minimum BLEU Score Inflection Word Swap Greedy search Greedy to replace words with their inflections with the goal of minimizing BLEU score (["It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations"](https://www.aclweb.org/anthology/2020.acl-main.263.pdf)
seq2sick :(black-box) Non-overlapping output Counter-fitted word embedding swap Greedy-WIR Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper (["Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)](https://arxiv.org/abs/1803.01128))

Recipe Usage Examples

Here are some examples of testing attacks from the literature from the command-line:

TextFooler against BERT fine-tuned on SST-2:

textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examples 10

seq2sick (black-box) against T5 fine-tuned for English-German translation:

 textattack attack --model t5-en-de --recipe seq2sick --num-examples 100