TextAttack on Keras Model

Please remember to run pip3 install textattack[tensorflow] in your notebook enviroment before the following codes:

This notebook runs textattack on a trained keras model:

Training

The code below trains a basic neural network on a series of movie reviews from the IMDB dataset, loaded using Tensorflow’s datasets module. Each review is encoded as a sequence of tokens corresponding to a word’s index in the vocabulary. Class labels are provided, denoting a positive or negative sentiment.

See here for more information on the IMDB dataset.

[1]:

import tensorflow as tf
import keras
import numpy as np
from keras.utils import to_categorical
from textattack.models.wrappers import ModelWrapper
from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import PWWSRen2019

import numpy as np
from keras.utils import to_categorical
from keras import models
from keras import layers
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Dropout

from nltk.tokenize import word_tokenize, RegexpTokenizer

Below, we load the IMDB dataset from Tensorflow and transform it for our classifier, using a Bag-of-Words format.

[2]:

NUM_WORDS = 1000

(x_train_tokens, y_train), (x_test_tokens, y_test) = tf.keras.datasets.imdb.load_data(
    path="imdb.npz",
    num_words=NUM_WORDS,
    skip_top=0,
    maxlen=None,
    seed=113,
    start_char=1,
    oov_char=2,
    index_from=3,
)


def transform(x):
    x_transform = []
    for i, word_indices in enumerate(x):
        BoW_array = np.zeros((NUM_WORDS,))
        for index in word_indices:
            if index < len(BoW_array):
                BoW_array[index] += 1
        x_transform.append(BoW_array)
    return np.array(x_transform)


index = int(0.9 * len(x_train_tokens))
x_train = transform(x_train_tokens)[:index]
x_test = transform(x_test_tokens)[index:]
y_train = np.array(y_train[:index])
y_test = np.array(y_test[index:])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

vocabulary = tf.keras.datasets.imdb.get_word_index(path="imdb_word_index.json")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step

With our data successfully loaded, we can now design and trained our model.

[3]:

# Model Created with Keras
model = Sequential()
model.add(Dense(512, activation="relu", input_dim=NUM_WORDS))
model.add(Dropout(0.3))
model.add(Dense(100, activation="relu"))
model.add(Dense(2, activation="sigmoid"))
opt = keras.optimizers.Adam(learning_rate=0.00001)

model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])


results = model.fit(
    x_train, y_train, epochs=18, batch_size=512, validation_data=(x_test, y_test)
)


print(results.history)

Epoch 1/18
44/44 [==============================] - 0s 9ms/step - loss: 0.9584 - accuracy: 0.4987 - val_loss: 0.7314 - val_accuracy: 0.5056
Epoch 2/18
44/44 [==============================] - 0s 6ms/step - loss: 0.9078 - accuracy: 0.5064 - val_loss: 0.7149 - val_accuracy: 0.5332
Epoch 3/18
44/44 [==============================] - 0s 6ms/step - loss: 0.8743 - accuracy: 0.5264 - val_loss: 0.7000 - val_accuracy: 0.5600
Epoch 4/18
44/44 [==============================] - 0s 6ms/step - loss: 0.8534 - accuracy: 0.5385 - val_loss: 0.6840 - val_accuracy: 0.5904
Epoch 5/18
44/44 [==============================] - 0s 6ms/step - loss: 0.8329 - accuracy: 0.5564 - val_loss: 0.6754 - val_accuracy: 0.6064
Epoch 6/18
44/44 [==============================] - 0s 6ms/step - loss: 0.8168 - accuracy: 0.5615 - val_loss: 0.6637 - val_accuracy: 0.6348
Epoch 7/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7942 - accuracy: 0.5767 - val_loss: 0.6568 - val_accuracy: 0.6460
Epoch 8/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7798 - accuracy: 0.5895 - val_loss: 0.6464 - val_accuracy: 0.6632
Epoch 9/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7624 - accuracy: 0.6000 - val_loss: 0.6357 - val_accuracy: 0.6772
Epoch 10/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7523 - accuracy: 0.6096 - val_loss: 0.6275 - val_accuracy: 0.6932
Epoch 11/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7391 - accuracy: 0.6185 - val_loss: 0.6196 - val_accuracy: 0.6996
Epoch 12/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7265 - accuracy: 0.6324 - val_loss: 0.6126 - val_accuracy: 0.7076
Epoch 13/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7140 - accuracy: 0.6408 - val_loss: 0.6047 - val_accuracy: 0.7196
Epoch 14/18
44/44 [==============================] - 0s 6ms/step - loss: 0.7042 - accuracy: 0.6476 - val_loss: 0.5981 - val_accuracy: 0.7268
Epoch 15/18
44/44 [==============================] - 0s 6ms/step - loss: 0.6944 - accuracy: 0.6586 - val_loss: 0.5906 - val_accuracy: 0.7340
Epoch 16/18
44/44 [==============================] - 0s 6ms/step - loss: 0.6798 - accuracy: 0.6677 - val_loss: 0.5826 - val_accuracy: 0.7432
Epoch 17/18
44/44 [==============================] - 0s 6ms/step - loss: 0.6702 - accuracy: 0.6766 - val_loss: 0.5741 - val_accuracy: 0.7524
Epoch 18/18
44/44 [==============================] - 0s 6ms/step - loss: 0.6643 - accuracy: 0.6834 - val_loss: 0.5667 - val_accuracy: 0.7580
{'loss': [0.9584308862686157, 0.9078119993209839, 0.8743314146995544, 0.8533967733383179, 0.8329190015792847, 0.816802442073822, 0.7941828966140747, 0.7797670960426331, 0.7623777985572815, 0.7523201107978821, 0.7390732765197754, 0.7265127897262573, 0.714047372341156, 0.7041717767715454, 0.6944125294685364, 0.6798228025436401, 0.6702008247375488, 0.6643370985984802], 'accuracy': [0.49871110916137695, 0.5064444541931152, 0.5264000296592712, 0.5385333299636841, 0.5563555359840393, 0.5614666938781738, 0.5766666531562805, 0.5895110964775085, 0.6000000238418579, 0.6095555424690247, 0.6185333132743835, 0.6323555707931519, 0.6407999992370605, 0.647599995136261, 0.6585777997970581, 0.6676889061927795, 0.6765778064727783, 0.6834222078323364], 'val_loss': [0.731362521648407, 0.7148647904396057, 0.7000304460525513, 0.6839893460273743, 0.6753506064414978, 0.6637153625488281, 0.6567765474319458, 0.6463953852653503, 0.6357491612434387, 0.6274867057800293, 0.6196037530899048, 0.6126242280006409, 0.6046810746192932, 0.5980660915374756, 0.590559184551239, 0.582603931427002, 0.5741293430328369, 0.5667080283164978], 'val_accuracy': [0.5055999755859375, 0.5332000255584717, 0.5600000023841858, 0.590399980545044, 0.6064000129699707, 0.6348000168800354, 0.6460000276565552, 0.6632000207901001, 0.6772000193595886, 0.6931999921798706, 0.6995999813079834, 0.7075999975204468, 0.7196000218391418, 0.7268000245094299, 0.734000027179718, 0.7432000041007996, 0.7523999810218811, 0.7580000162124634]}

Attacking

With our model trained, we can create a ModelWrapper that will allow us to run TextAttack on a custom Keras model. Each ModelWrapper must implement a single method, __call__, which takes a list of strings and returns a List, np.ndarray, or torch.Tensor of predictions.

[4]:

class CustomKerasModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        x_transform = []
        for i, review in enumerate(text_input_list):
            tokens = [x.strip(",") for x in review.split()]
            BoW_array = np.zeros((NUM_WORDS,))
            for word in tokens:
                if word in vocabulary:
                    if vocabulary[word] < len(BoW_array):
                        BoW_array[vocabulary[word]] += 1
            x_transform.append(BoW_array)
        x_transform = np.array(x_transform)
        prediction = self.model.predict(x_transform)
        return prediction


CustomKerasModelWrapper(model)(["bad bad bad bad bad", "good good good good"])

[4]:

array([[0.44404104, 0.5262513 ],
       [0.49010894, 0.49974558]], dtype=float32)

With our ModelWrapper constructed, we can use TextAttack’s HuggingFaceDataset module to load reviews for testing, alongside TextAttack’s PWWSRen2019 module to serve as our attack recipe.

The attack below leverages TextAttack’s Attack class, capable of running attacks against entire datasets.

[5]:

from textattack import AttackArgs
from textattack.datasets import Dataset
from textattack import Attacker

model_wrapper = CustomKerasModelWrapper(model)
dataset = HuggingFaceDataset("rotten_tomatoes", None, "test", shuffle=True)

attack = PWWSRen2019.build(model_wrapper)

attack_args = AttackArgs(num_examples=10, checkpoint_dir="checkpoints")

attacker = Attacker(attack, dataset, attack_args)

attacker.attack_dataset()

Using custom data configuration default
Reusing dataset rotten_tomatoes_movie_review (/p/qdata/jy2ma/.cache/textattack/datasets/rotten_tomatoes_movie_review/default/1.0.0/9c411f7ecd9f3045389de0d9ce984061a1056507703d2e3183b1ac1a90816e4d)
textattack: Loading datasets dataset rotten_tomatoes, split test.
textattack: Unknown if model of class <class 'tensorflow.python.keras.engine.sequential.Sequential'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  10%|█         | 1/10 [00:00<00:00, 17.58it/s]

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  weighted-saliency
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapWordNet
  (constraints):
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)

--------------------------------------------- Result 1 ---------------------------------------------
Negative (50%) --> [SKIPPED]

lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .

[Succeeded / Failed / Skipped / Total] 0 / 1 / 1 / 2:  20%|██        | 2/10 [00:00<00:00,  8.59it/s]

--------------------------------------------- Result 2 ---------------------------------------------
Positive (50%) --> [FAILED]

consistently clever and suspenseful .

[Succeeded / Failed / Skipped / Total] 1 / 1 / 3 / 5:  50%|█████     | 5/10 [00:00<00:00,  5.88it/s]

--------------------------------------------- Result 3 ---------------------------------------------
Positive (50%) --> Negative (50%)

it's like a " big chill " reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .

it's similar a " big chill " reunion of the baader-meinhof bunch , only these guys are more harmless pranksters than political activists .


--------------------------------------------- Result 4 ---------------------------------------------
Negative (51%) --> [SKIPPED]

the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .


--------------------------------------------- Result 5 ---------------------------------------------
Negative (50%) --> [SKIPPED]

red dragon " never cuts corners .

[Succeeded / Failed / Skipped / Total] 2 / 1 / 5 / 8:  80%|████████  | 8/10 [00:01<00:00,  6.08it/s]

--------------------------------------------- Result 6 ---------------------------------------------
Positive (50%) --> Negative (51%)

fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .

fresnadillo has something serious to tell about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .


--------------------------------------------- Result 7 ---------------------------------------------
Negative (51%) --> [SKIPPED]

throws in enough clever and unexpected twists to make the formula feel fresh .


--------------------------------------------- Result 8 ---------------------------------------------
Negative (51%) --> [SKIPPED]

weighty and ponderous but every bit as filling as the treat of the title .

[Succeeded / Failed / Skipped / Total] 3 / 1 / 5 / 9:  90%|█████████ | 9/10 [00:01<00:00,  4.89it/s]

--------------------------------------------- Result 9 ---------------------------------------------
Positive (50%) --> Negative (50%)

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance society office .

[Succeeded / Failed / Skipped / Total] 4 / 1 / 5 / 10: 100%|██████████| 10/10 [00:02<00:00,  4.86it/s]

--------------------------------------------- Result 10 ---------------------------------------------
Positive (51%) --> Negative (50%)

generates an enormous feeling of empathy for its characters .

generates an enormous look of empathy for its characters .



+-------------------------------+-------+
| Attack Results                |       |
+-------------------------------+-------+
| Number of successful attacks: | 4     |
| Number of failed attacks:     | 1     |
| Number of skipped attacks:    | 5     |
| Original accuracy:            | 50.0% |
| Accuracy under attack:        | 10.0% |
| Attack success rate:          | 80.0% |
| Average perturbed word %:     | 7.24% |
| Average num. words per input: | 15.4  |
| Avg num queries:              | 103.2 |
+-------------------------------+-------+

[5]:

[<textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d494c67f0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7f2d40ab9520>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d46675be0>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d4740da60>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d40aca130>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d4289e9d0>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d42fa9820>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d3b54eb50>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d4905ce50>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d49032940>]

Conclusion

Great! We trained a binary classifier, created a custom ModelWrapper for Keras models, and successsfully ran adversarial attacks against our trained Keras model! This serves a basic demo for how to use TextAttack within your own environments.