Attack Recipes CommandLine Use

We provide a number of pre-built attack recipes, which correspond to attacks from the literature.

Help: `textattack --help`

TextAttack’s main features can all be accessed via the textattack command. Two very common commands are textattack attack <args>, and textattack augment <args>. You can see more information about all commands using

textattack --help

or a specific command using, for example,

textattack attack --help

The examples/ folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.

The documentation website contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..

Running Attacks: `textattack attack --help`

The easiest way to try out an attack is via the command-line interface, textattack attack.

Tip: If your machine has multiple GPUs, you can distribute the attack across them using the --parallel option. For some attacks, this can really help performance.

Here are some concrete examples:

TextFooler on BERT trained on the MR sentiment classification dataset:

textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100

DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset:

textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100

Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM:

textattack attack --model lstm-mr --num-examples 20 \
 --search-method beam-search^beam_width=4 --transformation word-swap-embedding \
 --constraints repeat stopword max-words-perturbed^max_num_words=2 embedding^min_cos_sim=0.8 part-of-speech \
 --goal-function untargeted-classification

Tip: Instead of specifying a dataset and number of examples, you can pass --interactive to attack samples inputted by the user.

Attacks and Papers Implemented (”Attack Recipes”): `textattack attack --recipe [recipe_name]`

We include attack recipes which implement attacks from the literature. You can list attack recipes using textattack list attack-recipes.

To run an attack recipe: textattack attack --recipe [recipe_name]

Attack Recipe Name	Goal Function	ConstraintsEnforced	Transformation	Search Method	Main Idea
Attacks on classification tasks, like sentiment classification and entailment:
`alzantot`	_{Untargeted {Classification, Entailment}}	_{Percentage of words perturbed, Language Model perplexity, Word embedding distance}	_{Counter-fitted word embedding swap}	_{Genetic Algorithm}	_{from Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)}
`bae`	_{Untargeted Classification}	_{USE sentence encoding cosine similarity}	_{BERT Masked Token Prediction}	_Greedy-WIR	_{BERT masked language model transformation attack from "BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019).}
`bert-attack`	_{Untargeted Classification}	_{USE sentence encoding cosine similarity, Maximum number of words perturbed}	_{BERT Masked Token Prediction (with subword expansion)}	_Greedy-WIR	_{"BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)}
`checklist`	_{{Untargeted, Targeted} Classification}	_{checklist distance}	_{contract, extend, and substitutes name entities}	_Greedy-WIR	_{Invariance testing implemented in CheckList. "Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)}
`clare`	_{Untargeted {Classification, Entailment}}	_{USE sentence encoding cosine similarity}	_{RoBERTa Masked Prediction for token swap, insert and merge}	_Greedy	_{"Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)}
`deepwordbug`	_{{Untargeted, Targeted} Classification}	_{Levenshtein edit distance}	_{{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}}	_Greedy-WIR	_{Greedy replace-1 scoring and multi-transformation character-swap attack, from "Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)}
`faster-alzantot`	_{Untargeted {Classification, Entailment}}	_{Percentage of words perturbed, Language Model perplexity, Word embedding distance}	_{Counter-fitted word embedding swap}	_{Genetic Algorithm}	_{Modified, faster version of the Alzantot et al. genetic algorithm, from "Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)}
`hotflip` (word swap)	_{Untargeted Classification}	_{Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed}	_{Gradient-Based Word Swap}	_{Beam search}	_{from "HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)}
`iga`	_{Untargeted {Classification, Entailment}}	_{Percentage of words perturbed, Word embedding distance}	_{Counter-fitted word embedding swap}	_{Genetic Algorithm}	_{Improved genetic algorithm -based word substitution, from "Natural Language Adversarial Attacks and Defenses in Word Level" (Wang et al., 2019)}
`input-reduction`	_{Input Reduction}		_{Word deletion}	_Greedy-WIR	_{Greedy attack with word importance ranking, reducing the input while maintaining the prediction through word importance ranking, from "Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)}
`kuleshov`	_{Untargeted Classification}	_{Thought vector encoding cosine similarity, Language model similarity probability}	_{Counter-fitted word embedding swap}	_{Greedy word swap}	_{From "Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018}
`pruthi`	_{Untargeted Classification}	_{Minimum word length, Maximum number of words perturbed}	_{{Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap}}	_{Greedy search}	_{simulates common typos, from "Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)}
`pso`	_{Untargeted Classification}		_{HowNet Word Swap}	_{Particle Swarm Optimization}	_{From "Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)}
`pwws`	_{Untargeted Classification}		_{WordNet-based synonym swap}	_{Greedy-WIR (saliency)}	_{Greedy attack with word importance ranking based on word saliency and synonym swap scores, from "Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)}
`textbugger` : (black-box)	_{Untargeted Classification}	_{USE sentence encoding cosine similarity}	_{{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}}	_Greedy-WIR	_{From "TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)}
`textfooler`	_{Untargeted {Classification, Entailment}}	_{Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity}	_{Counter-fitted word embedding swap}	_Greedy-WIR	_{Greedy attack with word importance ranking, from "Is Bert Really Robust?" (Jin et al., 2019)}
Attacks on sequence-to-sequence models:
`morpheus`	_{Minimum BLEU Score}		_{Inflection Word Swap}	_{Greedy search}	_{Greedy to replace words with their inflections with the goal of minimizing BLEU score, from "It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations" (Tan et al., 2020)}
`seq2sick` :(black-box)	_{Non-overlapping output}		_{Counter-fitted word embedding swap}	_Greedy-WIR	_{Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper, from "Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)}
General:
`bad-characters`	_{TargetedClassification, TargetedStrict, TargetedBonus, NamedEntityRecognition, LogitSum, MinimizeBleu, MaximizeLevenshtein}		_{(Homoglyph, Invisible Characters, Reorderings, Deletions) Word Swap}	_{DifferentialEvolution}	_{Uses imperceptible character-level perturbations including homoglyph substitutions, Unicode reordering, deletions, and invisibles. Based on (["Bad Characters: Imperceptible NLP Attacks" (Boucher et al., 2021)](https://arxiv.org/abs/2106.09898)).}

Recipe Usage Examples

Here are some examples of testing attacks from the literature from the command-line:

TextFooler against BERT fine-tuned on SST-2:

textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examples 10

seq2sick (black-box) against T5 fine-tuned for English-German translation:

 textattack attack --model t5-en-de --recipe seq2sick --num-examples 100

Attack Recipes CommandLine Use

Help: textattack --help

Running Attacks: textattack attack --help

Attacks and Papers Implemented (”Attack Recipes”): textattack attack --recipe [recipe_name]

Recipe Usage Examples

Help: `textattack --help`

Running Attacks: `textattack attack --help`

Attacks and Papers Implemented (”Attack Recipes”): `textattack attack --recipe [recipe_name]`