TextAttack & AllenNLP

This is an example of testing adversarial attacks from TextAttack on pretrained models provided by AllenNLP.

In a few lines of code, we load a sentiment analysis model trained on the Stanford Sentiment Treebank and configure it with a TextAttack model wrapper. Then, we initialize the TextBugger attack and run the attack on a few samples from the SST-2 train set.

For more information on AllenNLP pre-trained models: https://docs.allennlp.org/models/main/

For more information about the TextBugger attack: https://arxiv.org/abs/1812.05271

Open In Colab

View Source on GitHub

[1]:
!pip install allennlp allennlp_models > /dev/null
[2]:
from allennlp.predictors import Predictor
import allennlp_models.classification

import textattack

class AllenNLPModel(textattack.models.wrappers.ModelWrapper):
    def __init__(self):
        self.predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz")
        self.model = self.predictor._model
        self.tokenizer = self.predictor._dataset_reader._tokenizer

    def __call__(self, text_input_list):
        outputs = []
        for text_input in text_input_list:
            outputs.append(self.model.predict(sentence=text_input))
        # For each output, outputs['logits'] contains the logits where
        # index 0 corresponds to the positive and index 1 corresponds
        # to the negative score. We reverse the outputs (by reverse slicing,
        # [::-1]) so that negative comes first and positive comes second.
        return [output['logits'][::-1] for output in outputs]

model_wrapper = AllenNLPModel()
[3]:
from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import TextBuggerLi2018
from textattack.attacker import Attacker


dataset = HuggingFaceDataset("glue", "sst2", "train")
attack = TextBuggerLi2018.build(model_wrapper)

attacker = Attacker(attack, dataset)
attacker.attack_dataset()
Reusing dataset glue (/p/qdata/jy2ma/.cache/textattack/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
textattack: Loading datasets dataset glue, subset sst2, split train.
textattack: Unknown if model of class <class 'allennlp.predictors.text_classifier.TextClassifierPredictor'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
  0%|          | 0/10 [00:00<?, ?it/s]
Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapRandomCharacterInsertion(
        (random_one):  True
      )
    (1): WordSwapRandomCharacterDeletion(
        (random_one):  True
      )
    (2): WordSwapNeighboringCharacterSwap(
        (random_one):  True
      )
    (3): WordSwapHomoglyphSwap
    (4): WordSwapEmbedding(
        (max_candidates):  5
        (embedding):  WordEmbedding
      )
    )
  (constraints):
    (0): UniversalSentenceEncoder(
        (metric):  angular
        (threshold):  0.8
        (window_size):  inf
        (skip_text_shorter_than_window):  False
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
)

Using /p/qdata/jy2ma/.cache/textattack to cache modules.
[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2:  20%|██        | 2/10 [00:06<00:27,  3.46s/it]
--------------------------------------------- Result 1 ---------------------------------------------
Negative (95%) --> Positive (93%)

hide new secretions from the parental units

concealing new secretions from the parental units


--------------------------------------------- Result 2 ---------------------------------------------
Negative (96%) --> [FAILED]

contains no wit , only labored gags


[Succeeded / Failed / Skipped / Total] 1 / 2 / 1 / 4:  40%|████      | 4/10 [00:07<00:10,  1.80s/it]
--------------------------------------------- Result 3 ---------------------------------------------
Positive (100%) --> [FAILED]

that loves its characters and communicates something rather beautiful about human nature


--------------------------------------------- Result 4 ---------------------------------------------
Positive (82%) --> [SKIPPED]

remains utterly satisfied to remain the same throughout


[Succeeded / Failed / Skipped / Total] 2 / 2 / 1 / 5:  50%|█████     | 5/10 [00:07<00:07,  1.52s/it]
--------------------------------------------- Result 5 ---------------------------------------------
Negative (98%) --> Positive (52%)

on the worst revenge-of-the-nerds clichés the filmmakers could dredge up

on the pire reveng-of-the-nerds clichés the filmmakers could dragging up


[Succeeded / Failed / Skipped / Total] 2 / 3 / 1 / 6:  60%|██████    | 6/10 [00:07<00:05,  1.32s/it]
--------------------------------------------- Result 6 ---------------------------------------------
Negative (99%) --> [FAILED]

that 's far too tragic to merit such superficial treatment


[Succeeded / Failed / Skipped / Total] 3 / 4 / 1 / 8:  80%|████████  | 8/10 [00:09<00:02,  1.13s/it]
--------------------------------------------- Result 7 ---------------------------------------------
Positive (98%) --> Negative (62%)

demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop .

shows that the directors of such tinseltown blockbusters as patriot games can still turning out a tiny , personal movies with an emotional batting .


--------------------------------------------- Result 8 ---------------------------------------------
Positive (90%) --> [FAILED]

of saucy


[Succeeded / Failed / Skipped / Total] 4 / 5 / 1 / 10: 100%|██████████| 10/10 [00:09<00:00,  1.06it/s]
--------------------------------------------- Result 9 ---------------------------------------------
Negative (99%) --> [FAILED]

a depressed fifteen-year-old 's suicidal poetry


--------------------------------------------- Result 10 ---------------------------------------------
Positive (79%) --> Negative (65%)

are more deeply thought through than in most ` right-thinking ' films

are more seriously thought through than in most ` right-thinking ' films



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 4      |
| Number of failed attacks:     | 5      |
| Number of skipped attacks:    | 1      |
| Original accuracy:            | 90.0%  |
| Accuracy under attack:        | 50.0%  |
| Attack success rate:          | 44.44% |
| Average perturbed word %:     | 20.95% |
| Average num. words per input: | 9.5    |
| Avg num queries:              | 34.67  |
+-------------------------------+--------+

[3]:
[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb68d0028b0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb685f0dbb0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb689188040>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7fb695031250>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb695031760>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7abb0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb67cd36df0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7a880>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7a790>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb689ab1be0>]