Attack Recipes CommandLine Use
We provide a number of pre-built attack recipes, which correspond to attacks from the literature.
Help: textattack --help
TextAttack’s main features can all be accessed via the textattack
command. Two very
common commands are textattack attack <args>
, and textattack augment <args>
. You can see more
information about all commands using
textattack --help
or a specific command using, for example,
textattack attack --help
The examples/
folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
The documentation website contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
Running Attacks: textattack attack --help
The easiest way to try out an attack is via the command-line interface, textattack attack
.
Tip: If your machine has multiple GPUs, you can distribute the attack across them using the
--parallel
option. For some attacks, this can really help performance.
Here are some concrete examples:
TextFooler on BERT trained on the MR sentiment classification dataset:
textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100
DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset:
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM:
textattack attack --model lstm-mr --num-examples 20 \
--search-method beam-search^beam_width=4 --transformation word-swap-embedding \
--constraints repeat stopword max-words-perturbed^max_num_words=2 embedding^min_cos_sim=0.8 part-of-speech \
--goal-function untargeted-classification
Tip: Instead of specifying a dataset and number of examples, you can pass
--interactive
to attack samples inputted by the user.
Attacks and Papers Implemented (”Attack Recipes”): textattack attack --recipe [recipe_name]
We include attack recipes which implement attacks from the literature. You can list attack recipes using textattack list attack-recipes
.
To run an attack recipe: textattack attack --recipe [recipe_name]
Attack Recipe Name | Goal Function | ConstraintsEnforced | Transformation | Search Method | Main Idea |
---|---|---|---|---|---|
Attacks on classification tasks, like sentiment classification and entailment: | |||||
alzantot |
Untargeted {Classification, Entailment} | Percentage of words perturbed, Language Model perplexity, Word embedding distance | Counter-fitted word embedding swap | Genetic Algorithm | from Generating Natural Language Adversarial Examples" (Alzantot et al., 2018) |
bae |
Untargeted Classification | USE sentence encoding cosine similarity | BERT Masked Token Prediction | Greedy-WIR | BERT masked language model transformation attack from "BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019). |
bert-attack |
Untargeted Classification | USE sentence encoding cosine similarity, Maximum number of words perturbed | BERT Masked Token Prediction (with subword expansion) | Greedy-WIR | "BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020) |
checklist |
{Untargeted, Targeted} Classification | checklist distance | contract, extend, and substitutes name entities | Greedy-WIR | Invariance testing implemented in CheckList. "Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020) |
clare |
Untargeted {Classification, Entailment} | USE sentence encoding cosine similarity | RoBERTa Masked Prediction for token swap, insert and merge | Greedy | "Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020) |
deepwordbug |
{Untargeted, Targeted} Classification | Levenshtein edit distance | {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} | Greedy-WIR | Greedy replace-1 scoring and multi-transformation character-swap attack, from "Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018) |
faster-alzantot |
Untargeted {Classification, Entailment} | Percentage of words perturbed, Language Model perplexity, Word embedding distance | Counter-fitted word embedding swap | Genetic Algorithm | Modified, faster version of the Alzantot et al. genetic algorithm, from "Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019) |
hotflip (word swap) |
Untargeted Classification | Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed | Gradient-Based Word Swap | Beam search | from "HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017) |
iga |
Untargeted {Classification, Entailment} | Percentage of words perturbed, Word embedding distance | Counter-fitted word embedding swap | Genetic Algorithm | Improved genetic algorithm -based word substitution, from "Natural Language Adversarial Attacks and Defenses in Word Level" (Wang et al., 2019) |
input-reduction |
Input Reduction | Word deletion | Greedy-WIR | Greedy attack with word importance ranking, reducing the input while maintaining the prediction through word importance ranking, from "Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018) | |
kuleshov |
Untargeted Classification | Thought vector encoding cosine similarity, Language model similarity probability | Counter-fitted word embedding swap | Greedy word swap | From "Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018 |
pruthi |
Untargeted Classification | Minimum word length, Maximum number of words perturbed | {Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap} | Greedy search | simulates common typos, from "Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019) |
pso |
Untargeted Classification | HowNet Word Swap | Particle Swarm Optimization | From "Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020) | |
pwws |
Untargeted Classification | WordNet-based synonym swap | Greedy-WIR (saliency) | Greedy attack with word importance ranking based on word saliency and synonym swap scores, from "Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019) | |
textbugger : (black-box) |
Untargeted Classification | USE sentence encoding cosine similarity | {Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution} | Greedy-WIR | From "TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018) |
textfooler |
Untargeted {Classification, Entailment} | Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity | Counter-fitted word embedding swap | Greedy-WIR | Greedy attack with word importance ranking, from "Is Bert Really Robust?" (Jin et al., 2019) |
Attacks on sequence-to-sequence models: | |||||
morpheus |
Minimum BLEU Score | Inflection Word Swap | Greedy search | Greedy to replace words with their inflections with the goal of minimizing BLEU score, from "It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations" (Tan et al., 2020) | |
seq2sick :(black-box) |
Non-overlapping output | Counter-fitted word embedding swap | Greedy-WIR | Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper, from "Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018) |
Recipe Usage Examples
Here are some examples of testing attacks from the literature from the command-line:
TextFooler against BERT fine-tuned on SST-2:
textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examples 10
seq2sick (black-box) against T5 fine-tuned for English-German translation:
textattack attack --model t5-en-de --recipe seq2sick --num-examples 100