TextAttack with Custom Dataset and Word Embedding. This tutorial will show you how to use textattack with any dataset and word embedding you may want to use

Open In Colab

View Source on GitHub

Importing the Model

We start by choosing a pretrained model we want to attack. In this example we will use the albert base v2 model from HuggingFace. This model was trained with data from imbd, a set of movie reviews with either positive or negative labels.

[1]:
import transformers
from textattack.models.wrappers import HuggingFaceModelWrapper

# https://huggingface.co/textattack
model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/albert-base-v2-imdb")
tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/albert-base-v2-imdb")
# We wrap the model so it can be used by textattack
model_wrapper = HuggingFaceModelWrapper(model, tokenizer)





Creating A Custom Dataset

Textattack takes in dataset in the form of a list of tuples. The tuple can be in the form of (“string”, label) or (“string”, label, label). In this case we will use former one, since we want to create a custom movie review dataset with label 0 representing a positive review, and label 1 representing a negative review.

For simplicity, I created a dataset consisting of 4 reviews, the 1st and 4th review have “correct” labels, while the 2nd and 3rd review have “incorrect” labels. We will see how this impacts perturbation later in this tutorial.

[2]:
# dataset: An iterable of (text, ground_truth_output) pairs.
#0 means the review is negative
#1 means the review is positive
custom_dataset = [
    ('I hate this movie', 0), #A negative comment, with a negative label
    ('I hate this movie', 1), #A negative comment, with a positive label
    ('I love this movie', 0), #A positive comment, with a negative label
    ('I love this movie', 1), #A positive comment, with a positive label
]

Creating An Attack

[4]:
from textattack import Attack
from textattack.search_methods import GreedySearch
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.goal_functions import UntargetedClassification
from textattack.transformations import WordSwapEmbedding
from textattack.constraints.pre_transformation import RepeatModification
from textattack.constraints.pre_transformation import StopwordModification

# We'll use untargeted classification as the goal function.
goal_function = UntargetedClassification(model_wrapper)
# We'll to use our WordSwapEmbedding as the attack transformation.
transformation = WordSwapEmbedding()
# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]
# We'll use the Greedy search method
search_method = GreedySearch()
# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)
textattack: Unknown if model of class <class 'transformers.models.albert.modeling_albert.AlbertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.

Attack Results With Custom Dataset

As you can see, the attack fools the model by changing a few words in the 1st and 4th review.

The attack skipped the 2nd and and 3rd review because since it they were labeled incorrectly, they managed to fool the model without any modifications.

[6]:
for example, label in custom_dataset:
    result = attack.attack(example, label)
    print(result.__str__(color_method='ansi'))
0 (99%) --> 1 (81%)

I hate this movie

did hateful this footage
0 (99%) --> [SKIPPED]

I hate this movie
1 (96%) --> [SKIPPED]

I love this movie
1 (96%) --> 0 (99%)

I love this movie

I iove this movie

Creating A Custom Word Embedding

In textattack, a pre-trained word embedding is necessary in transformation in order to find synonym replacements, and in constraints to check the semantic validity of the transformation. To use custom pre-trained word embeddings, you can either create a new class that inherits the AbstractWordEmbedding class, or use the WordEmbedding class which takes in 4 parameters.

[7]:
from textattack.shared import WordEmbedding

embedding_matrix = [[1.0], [2.0], [3.0], [4.0]] #2-D array of shape N x D where N represents size of vocab and D is the dimension of embedding vectors.
word2index = {"hate":0, "despise":1, "like":2, "love":3} #dictionary that maps word to its index with in the embedding matrix.
index2word = {0:"hate", 1: "despise", 2:"like", 3:"love"} #dictionary that maps index to its word.
nn_matrix = [[0, 1, 2, 3], [1, 0, 2, 3], [2, 1, 3, 0], [3, 2, 1, 0]] #2-D integer array of shape N x K where N represents size of vocab and K is the top-K nearest neighbours.

embedding = WordEmbedding(embedding_matrix, word2index, index2word, nn_matrix)

Attack Results With Custom Dataset and Word Embedding

Now if we run the attack again with the custom word embedding, you will notice the modifications are limited to the vocab provided by our custom word embedding.

[8]:
from textattack.attack_results import SuccessfulAttackResult

transformation = WordSwapEmbedding(3, embedding)

attack = Attack(goal_function, constraints, transformation, search_method)

for example, label in custom_dataset:
    result = attack.attack(example, label)
    print(result.__str__(color_method='ansi'))
0 (99%) --> 1 (98%)

I hate this movie

I like this movie
0 (99%) --> [SKIPPED]

I hate this movie
1 (96%) --> [SKIPPED]

I love this movie
1 (96%) --> 0 (99%)

I love this movie

I despise this movie