The importance of constraints

Constraints determine which potential adversarial examples are valid inputs to the model. When determining the efficacy of an attack, constraints are everything. After all, an attack that looks very powerful may just be generating nonsense. Or, perhaps more nefariously, an attack may generate a real-looking example that changes the original label of the input. That’s why you should always clearly define the constraints your adversarial examples must meet.

Open In Colab

View Source on GitHub

Please remember to run pip3 install textattack[tensorflow] in your notebook enviroment before the following codes:

Classes of constraints

TextAttack evaluates constraints using methods from three groups:

  • Overlap constraints determine if a perturbation is valid based on character-level analysis. For example, some attacks are constrained by edit distance: a perturbation is only valid if it perturbs some small number of characters (or fewer).

  • Grammaticality constraints filter inputs based on syntactical information. For example, an attack may require that adversarial perturbations do not introduce grammatical errors.

  • Semantic constraints try to ensure that the perturbation is semantically similar to the original input. For example, we may design a constraint that uses a sentence encoder to encode the original and perturbed inputs, and enforce that the sentence encodings be within some fixed distance of one another. (This is what happens in subclasses of textattack.constraints.semantics.sentence_encoders.)

A new constraint

To add our own constraint, we need to create a subclass of textattack.constraints.Constraint. We can implement one of two functions, either _check_constraint or _check_constraint_many:

  • _check_constraint determines whether candidate AttackedText transformed_text, transformed from current_text, fulfills a desired constraint. It returns either True or False.

  • _check_constraint_many determines whether each of a list of candidates transformed_texts fulfill the constraint relative to current_text. This is here in case your constraint can be vectorized. If not, just implement _check_constraint, and _check_constraint will be executed for each (transformed_text, current_text) pair.

A custom constraint

For fun, we’re going to see what happens when we constrain an attack to only allow perturbations that substitute out a named entity for another. In linguistics, a named entity is a proper noun, the name of a person, organization, location, product, etc. Named Entity Recognition is a popular NLP task (and one that state-of-the-art models can perform quite well).

NLTK and Named Entity Recognition

NLTK, the Natural Language Toolkit, is a Python package that helps developers write programs that process natural language. NLTK comes with predefined algorithms for lots of linguistic tasks– including Named Entity Recognition.

First, we’re going to write a constraint class. In the _check_constraints method, we’re going to use NLTK to find the named entities in both current_text and transformed_text. We will only return True (that is, our constraint is met) if transformed_text has substituted one named entity in current_text for another.

Let’s import NLTK and download the required modules:

[1]:
# cd ..
[11]:
import tensorflow as tf
print(tf.__version__)
2.4.0
[2]:
!pip3 install .

import nltk
nltk.download('punkt') # The NLTK tokenizer
nltk.download('maxent_ne_chunker') # NLTK named-entity chunker
nltk.download('words') # NLTK list of words
nltk.download('averaged_perceptron_tagger')
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
WARNING: You are using pip version 20.1.1; however, version 21.1.3 is available.
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.
[nltk_data] Downloading package punkt to /Users/ccy/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/ccy/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /Users/ccy/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/ccy/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[2]:
True

NLTK NER Example

Here’s an example of using NLTK to find the named entities in a sentence:

[3]:
sentence = ('In 2017, star quarterback Tom Brady led the Patriots to the Super Bowl, '
           'but lost to the Philadelphia Eagles.')

# 1. Tokenize using the NLTK tokenizer.
tokens = nltk.word_tokenize(sentence)

# 2. Tag parts of speech using the NLTK part-of-speech tagger.
tagged = nltk.pos_tag(tokens)

# 3. Extract entities from tagged sentence.
entities = nltk.chunk.ne_chunk(tagged)
print(entities)
(S
  In/IN
  2017/CD
  ,/,
  star/NN
  quarterback/NN
  (PERSON Tom/NNP Brady/NNP)
  led/VBD
  the/DT
  (ORGANIZATION Patriots/NNP)
  to/TO
  the/DT
  (ORGANIZATION Super/NNP Bowl/NNP)
  ,/,
  but/CC
  lost/VBD
  to/TO
  the/DT
  (ORGANIZATION Philadelphia/NNP Eagles/NNP)
  ./.)

It looks like nltk.chunk.ne_chunk gives us an nltk.tree.Tree object where named entities are also nltk.tree.Tree objects within that tree. We can take this a step further and grab the named entities from the tree of entities:

[4]:
# 4. Filter entities to just named entities.
named_entities = [entity for entity in entities if isinstance(entity, nltk.tree.Tree)]
print(named_entities)
[Tree('PERSON', [('Tom', 'NNP'), ('Brady', 'NNP')]), Tree('ORGANIZATION', [('Patriots', 'NNP')]), Tree('ORGANIZATION', [('Super', 'NNP'), ('Bowl', 'NNP')]), Tree('ORGANIZATION', [('Philadelphia', 'NNP'), ('Eagles', 'NNP')])]

Caching with @functools.lru_cache

A little-known feature of Python 3 is functools.lru_cache, a decorator that allows users to easily cache the results of a function in an LRU cache. We’re going to be using the NLTK library quite a bit to tokenize, parse, and detect named entities in sentences. These sentences might repeat themselves. As such, we’ll use this decorator to cache named entities so that we don’t have to perform this expensive computation multiple times.

Putting it all together: getting a list of Named Entity Labels from a sentence

Now that we know how to tokenize, parse, and detect named entities using NLTK, let’s put it all together into a single helper function. Later, when we implement our constraint, we can query this function to easily get the entity labels from a sentence. We can even use @functools.lru_cache to try and speed this process up.

[5]:
import functools

@functools.lru_cache(maxsize=2**14)
def get_entities(sentence):
    tokens = nltk.word_tokenize(sentence)
    tagged = nltk.pos_tag(tokens)
    # Setting `binary=True` makes NLTK return all of the named
    # entities tagged as NNP instead of detailed tags like
    #'Organization', 'Geo-Political Entity', etc.
    entities = nltk.chunk.ne_chunk(tagged, binary=True)
    return entities.leaves()

And let’s test our function to make sure it works:

[6]:
sentence = 'Jack Black starred in the 2003 film classic "School of Rock".'
get_entities(sentence)
[6]:
[('Jack', 'NNP'),
 ('Black', 'NNP'),
 ('starred', 'VBD'),
 ('in', 'IN'),
 ('the', 'DT'),
 ('2003', 'CD'),
 ('film', 'NN'),
 ('classic', 'JJ'),
 ('``', '``'),
 ('School', 'NNP'),
 ('of', 'IN'),
 ('Rock', 'NNP'),
 ("''", "''"),
 ('.', '.')]

We flattened the tree of entities, so the return format is a list of (word, entity type) tuples. For non-entities, the entity_type is just the part of speech of the word. 'NNP' is the indicator of a named entity (a proper noun, according to NLTK). Looks like we identified three named entities here: ‘Jack’ and ‘Black’, ‘School’, and ‘Rock’. as a ‘GPE’. (Seems that the labeler thinks Rock is the name of a place, a city or something.) Whatever technique NLTK uses for named entity recognition may be a bit rough, but it did a pretty decent job here!

Creating our NamedEntityConstraint

Now that we know how to detect named entities using NLTK, let’s create our custom constraint.

[7]:
from textattack.constraints import Constraint

class NamedEntityConstraint(Constraint):
    """ A constraint that ensures `transformed_text` only substitutes named entities from `current_text` with other named entities.
    """
    def _check_constraint(self, transformed_text, current_text):
        transformed_entities = get_entities(transformed_text.text)
        current_entities = get_entities(current_text.text)
        # If there aren't named entities, let's return False (the attack
        # will eventually fail).
        if len(current_entities) == 0:
            return False
        if len(current_entities) != len(transformed_entities):
            # If the two sentences have a different number of entities, then
            # they definitely don't have the same labels. In this case, the
            # constraint is violated, and we return False.
            return False
        else:
            # Here we compare all of the words, in order, to make sure that they match.
            # If we find two words that don't match, this means a word was swapped
            # between `current_text` and `transformed_text`. That word must be a named entity to fulfill our
            # constraint.
            current_word_label = None
            transformed_word_label = None
            for (word_1, label_1), (word_2, label_2) in zip(current_entities, transformed_entities):
                if word_1 != word_2:
                    # Finally, make sure that words swapped between `x` and `x_adv` are named entities. If
                    # they're not, then we also return False.
                    if (label_1 not in ['NNP', 'NE']) or (label_2 not in ['NNP', 'NE']):
                        return False
            # If we get here, all of the labels match up. Return True!
            return True

Testing our constraint

We need to create an attack and a dataset to test our constraint on. We went over all of this in the transformations tutorial, so let’s gloss over this part for now.

[8]:
# Import the model
import transformers
from textattack.models.wrappers import HuggingFaceModelWrapper

model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/albert-base-v2-ag-news")
tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/albert-base-v2-ag-news")

model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

# Create the goal function using the model
from textattack.goal_functions import UntargetedClassification
goal_function = UntargetedClassification(model_wrapper)

# Import the dataset
from textattack.datasets import HuggingFaceDataset
dataset = HuggingFaceDataset("ag_news", None, "test")





textattack: Unknown if model of class <class 'transformers.models.albert.modeling_albert.AlbertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
Using custom data configuration default
Reusing dataset ag_news (/Users/ccy/.cache/huggingface/datasets/ag_news/default/0.0.0/fb5c5e74a110037311ef5e904583ce9f8b9fbc1354290f97b4929f01b3f48b1a)
textattack: Loading datasets dataset ag_news, split test.
[9]:
from textattack.transformations import WordSwapEmbedding
from textattack.search_methods import GreedyWordSwapWIR
from textattack import Attack
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification

# We're going to the `WordSwapEmbedding` transformation. Using the default settings, this
# will try substituting words with their neighbors in the counter-fitted embedding space.
transformation = WordSwapEmbedding(max_candidates=20)

# We'll use the greedy search with word importance ranking method again
search_method = GreedyWordSwapWIR()

# Our constraints will be the same as Tutorial 1, plus the named entity constraint
constraints = [RepeatModification(),
               StopwordModification(),
               NamedEntityConstraint(False)]

# Now, let's make the attack using these parameters.
attack = Attack(goal_function, constraints, transformation, search_method)


Now, let’s use our attack. We’re going to attack samples until we achieve 5 successes. (There’s a lot to check here, and since we’re using a greedy search over all potential word swap positions, each sample will take a few minutes. This will take a few hours to run on a single core.)

[ ]:
from textattack.loggers import CSVLogger # tracks a dataframe for us.
from textattack.attack_results import SuccessfulAttackResult
from textattack import Attacker, AttackArgs

attack_args = AttackArgs(num_successful_examples=5, log_to_csv="results.csv", csv_coloring_style="html")
attacker = Attacker(attack, dataset, attack_args)

attacker.attack_dataset()
textattack: Logging to CSV at path results.csv

  0%|                                                                                                                                             | 0/5 [00:00<?, ?it/s]
Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  unk
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  20
    (embedding):  WordEmbedding
  )
  (constraints):
    (0): NamedEntityConstraint(
        (compare_against_original):  False
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
)


 20%|██████████████████████████▍                                                                                                         | 1/5 [02:47<11:10, 167.69s/it]
[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  20%|███████████████▌                                                              | 1/5 [02:47<11:10, 167.72s/it]
--------------------------------------------- Result 1 ---------------------------------------------
[[Business (75%)]] --> [[Sci/tech (61%)]]

Fears for T N pension after talks Unions representing workers at [[Turner]]   Newall say they are 'disappointed' after talks with stricken parent firm Federal [[Mogul]].

Fears for T N pension after talks Unions representing workers at [[Knapp]]   Newall say they are 'disappointed' after talks with stricken parent firm Federal [[Titan]].



[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2:  20%|███████████████                                                            | 1/5 [16:59<1:07:57, 1019.36s/it]
--------------------------------------------- Result 2 ---------------------------------------------
[[Sci/tech (100%)]] --> [[[FAILED]]]

The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the  #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.



[Succeeded / Failed / Skipped / Total] 1 / 2 / 0 / 3:  20%|███████████████                                                            | 1/5 [25:29<1:41:59, 1529.85s/it]
--------------------------------------------- Result 3 ---------------------------------------------
[[Sci/tech (100%)]] --> [[[FAILED]]]

Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.


Now let’s visualize our 5 successes in color:

[ ]:
import pandas as pd
pd.options.display.max_colwidth = 480 # increase column width so we can actually read the examples

from IPython.core.display import display, HTML

logger = attacker.attack_log_manager.loggers[0]
successes = logger.df[logger.df["result_type"] == "Successful"]
display(HTML(successes[['original_text', 'perturbed_text']].to_html(escape=False)))

Conclusion

Our constraint seems to have done its job: it filtered out attacks that did not swap out a named entity for another, according to the NLTK named entity detector. However, we can see some problems inherent in the detector: it often thinks the title of the news article or the first word of a given sentence is a named entity, probably due to capitalization.