Attack Recipes CommandLine Use

We provide a number of pre-built attack recipes, which correspond to attacks from the literature.

Help: textattack --help

TextAttack’s main features can all be accessed via the textattack command. Two very common commands are textattack attack <args>, and textattack augment <args>. You can see more information about all commands using

textattack --help

or a specific command using, for example,

textattack attack --help

The examples/ folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.

The documentation website contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..

Running Attacks: textattack attack --help

The easiest way to try out an attack is via the command-line interface, textattack attack.

Tip: If your machine has multiple GPUs, you can distribute the attack across them using the --parallel option. For some attacks, this can really help performance.

Here are some concrete examples:

TextFooler on BERT trained on the MR sentiment classification dataset:

textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100

DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset:

textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100

Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM:

textattack attack --model lstm-mr --num-examples 20 \
 --search-method beam-search^beam_width=4 --transformation word-swap-embedding \
 --constraints repeat stopword max-words-perturbed^max_num_words=2 embedding^min_cos_sim=0.8 part-of-speech \
 --goal-function untargeted-classification

Tip: Instead of specifying a dataset and number of examples, you can pass --interactive to attack samples inputted by the user.

Attacks and Papers Implemented (”Attack Recipes”): textattack attack --recipe [recipe_name]

We include attack recipes which implement attacks from the literature. You can list attack recipes using textattack list attack-recipes.

To run an attack recipe: textattack attack --recipe [recipe_name]

Attack Recipe Name Goal Function ConstraintsEnforced Transformation Search Method Main Idea

Attacks on classification tasks, like sentiment classification and entailment:
alzantot Untargeted {Classification, Entailment} Percentage of words perturbed, Language Model perplexity, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm from Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)
bae Untargeted Classification USE sentence encoding cosine similarity BERT Masked Token Prediction Greedy-WIR BERT masked language model transformation attack from "BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019).
bert-attack Untargeted Classification USE sentence encoding cosine similarity, Maximum number of words perturbed BERT Masked Token Prediction (with subword expansion) Greedy-WIR "BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)
checklist {Untargeted, Targeted} Classification checklist distance contract, extend, and substitutes name entities Greedy-WIR Invariance testing implemented in CheckList. "Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)
clare Untargeted {Classification, Entailment} USE sentence encoding cosine similarity RoBERTa Masked Prediction for token swap, insert and merge Greedy "Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)
deepwordbug {Untargeted, Targeted} Classification Levenshtein edit distance {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} Greedy-WIR Greedy replace-1 scoring and multi-transformation character-swap attack, from "Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)
faster-alzantot Untargeted {Classification, Entailment} Percentage of words perturbed, Language Model perplexity, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm Modified, faster version of the Alzantot et al. genetic algorithm, from "Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)
hotflip (word swap) Untargeted Classification Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed Gradient-Based Word Swap Beam search from "HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)
iga Untargeted {Classification, Entailment} Percentage of words perturbed, Word embedding distance Counter-fitted word embedding swap Genetic Algorithm Improved genetic algorithm -based word substitution, from "Natural Language Adversarial Attacks and Defenses in Word Level" (Wang et al., 2019)
input-reduction Input Reduction Word deletion Greedy-WIR Greedy attack with word importance ranking, reducing the input while maintaining the prediction through word importance ranking, from "Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)
kuleshov Untargeted Classification Thought vector encoding cosine similarity, Language model similarity probability Counter-fitted word embedding swap Greedy word swap From "Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018
pruthi Untargeted Classification Minimum word length, Maximum number of words perturbed {Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap} Greedy search simulates common typos, from "Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)
pso Untargeted Classification HowNet Word Swap Particle Swarm Optimization From "Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)
pwws Untargeted Classification WordNet-based synonym swap Greedy-WIR (saliency) Greedy attack with word importance ranking based on word saliency and synonym swap scores, from "Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)
textbugger : (black-box) Untargeted Classification USE sentence encoding cosine similarity {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} Greedy-WIR From "TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)
textfooler Untargeted {Classification, Entailment} Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity Counter-fitted word embedding swap Greedy-WIR Greedy attack with word importance ranking, from "Is Bert Really Robust?" (Jin et al., 2019)

Attacks on sequence-to-sequence models:
morpheus Minimum BLEU Score Inflection Word Swap Greedy search Greedy to replace words with their inflections with the goal of minimizing BLEU score, from "It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations" (Tan et al., 2020)
seq2sick :(black-box) Non-overlapping output Counter-fitted word embedding swap Greedy-WIR Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper, from "Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)

Recipe Usage Examples

Here are some examples of testing attacks from the literature from the command-line:

TextFooler against BERT fine-tuned on SST-2:

textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examples 10

seq2sick (black-box) against T5 fine-tuned for English-German translation:

 textattack attack --model t5-en-de --recipe seq2sick --num-examples 100