The importance of constraints¶
Constraints determine which potential adversarial examples are valid inputs to the model. When determining the efficacy of an attack, constraints are everything. After all, an attack that looks very powerful may just be generating nonsense. Or, perhaps more nefariously, an attack may generate a real-looking example that changes the original label of the input. That’s why you should always clearly define the constraints your adversarial examples must meet.
Please remember to run pip3 install textattack[tensorflow] in your notebook enviroment before the following codes:
Classes of constraints¶
TextAttack evaluates constraints using methods from three groups:
- Overlap constraints determine if a perturbation is valid based on character-level analysis. For example, some attacks are constrained by edit distance: a perturbation is only valid if it perturbs some small number of characters (or fewer).
- Grammaticality constraints filter inputs based on syntactical information. For example, an attack may require that adversarial perturbations do not introduce grammatical errors.
- Semantic constraints try to ensure that the perturbation is semantically similar to the original input. For example, we may design a constraint that uses a sentence encoder to encode the original and perturbed inputs, and enforce that the sentence encodings be within some fixed distance of one another. (This is what happens in subclasses of
A new constraint¶
To add our own constraint, we need to create a subclass of
textattack.constraints.Constraint. We can implement one of two functions, either
_check_constraintdetermines whether candidate
transformed_text, transformed from
current_text, fulfills a desired constraint. It returns either
_check_constraint_manydetermines whether each of a list of candidates
transformed_textsfulfill the constraint relative to
current_text. This is here in case your constraint can be vectorized. If not, just implement
_check_constraintwill be executed for each
A custom constraint¶
For fun, we’re going to see what happens when we constrain an attack to only allow perturbations that substitute out a named entity for another. In linguistics, a named entity is a proper noun, the name of a person, organization, location, product, etc. Named Entity Recognition is a popular NLP task (and one that state-of-the-art models can perform quite well).
NLTK and Named Entity Recognition¶
NLTK, the Natural Language Toolkit, is a Python package that helps developers write programs that process natural language. NLTK comes with predefined algorithms for lots of linguistic tasks– including Named Entity Recognition.
First, we’re going to write a constraint class. In the
_check_constraints method, we’re going to use NLTK to find the named entities in both
transformed_text. We will only return
True (that is, our constraint is met) if
transformed_text has substituted one named entity in
current_text for another.
Let’s import NLTK and download the required modules:
!pip3 install textattack[tensorflow] import nltk nltk.download("punkt") # The NLTK tokenizer nltk.download("maxent_ne_chunker") # NLTK named-entity chunker nltk.download("words") # NLTK list of words nltk.download("averaged_perceptron_tagger")
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. WARNING: You are using pip version 20.1.1; however, version 21.1.3 is available. You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.
[nltk_data] Downloading package punkt to /Users/ccy/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package maxent_ne_chunker to [nltk_data] /Users/ccy/nltk_data... [nltk_data] Package maxent_ne_chunker is already up-to-date! [nltk_data] Downloading package words to /Users/ccy/nltk_data... [nltk_data] Package words is already up-to-date! [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /Users/ccy/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date!
NLTK NER Example¶
Here’s an example of using NLTK to find the named entities in a sentence:
sentence = ( "In 2017, star quarterback Tom Brady led the Patriots to the Super Bowl, " "but lost to the Philadelphia Eagles." ) # 1. Tokenize using the NLTK tokenizer. tokens = nltk.word_tokenize(sentence) # 2. Tag parts of speech using the NLTK part-of-speech tagger. tagged = nltk.pos_tag(tokens) # 3. Extract entities from tagged sentence. entities = nltk.chunk.ne_chunk(tagged) print(entities)
(S In/IN 2017/CD ,/, star/NN quarterback/NN (PERSON Tom/NNP Brady/NNP) led/VBD the/DT (ORGANIZATION Patriots/NNP) to/TO the/DT (ORGANIZATION Super/NNP Bowl/NNP) ,/, but/CC lost/VBD to/TO the/DT (ORGANIZATION Philadelphia/NNP Eagles/NNP) ./.)
It looks like
nltk.chunk.ne_chunk gives us an
nltk.tree.Tree object where named entities are also
nltk.tree.Tree objects within that tree. We can take this a step further and grab the named entities from the tree of entities:
# 4. Filter entities to just named entities. named_entities = [entity for entity in entities if isinstance(entity, nltk.tree.Tree)] print(named_entities)
[Tree('PERSON', [('Tom', 'NNP'), ('Brady', 'NNP')]), Tree('ORGANIZATION', [('Patriots', 'NNP')]), Tree('ORGANIZATION', [('Super', 'NNP'), ('Bowl', 'NNP')]), Tree('ORGANIZATION', [('Philadelphia', 'NNP'), ('Eagles', 'NNP')])]
A little-known feature of Python 3 is
functools.lru_cache, a decorator that allows users to easily cache the results of a function in an LRU cache. We’re going to be using the NLTK library quite a bit to tokenize, parse, and detect named entities in sentences. These sentences might repeat themselves. As such, we’ll use this decorator to cache named entities so that we don’t have to perform this expensive computation multiple times.
Putting it all together: getting a list of Named Entity Labels from a sentence¶
Now that we know how to tokenize, parse, and detect named entities using NLTK, let’s put it all together into a single helper function. Later, when we implement our constraint, we can query this function to easily get the entity labels from a sentence. We can even use
@functools.lru_cache to try and speed this process up.
import functools @functools.lru_cache(maxsize=2**14) def get_entities(sentence): tokens = nltk.word_tokenize(sentence) tagged = nltk.pos_tag(tokens) # Setting `binary=True` makes NLTK return all of the named # entities tagged as NNP instead of detailed tags like #'Organization', 'Geo-Political Entity', etc. entities = nltk.chunk.ne_chunk(tagged, binary=True) return entities.leaves()
And let’s test our function to make sure it works:
sentence = 'Jack Black starred in the 2003 film classic "School of Rock".' get_entities(sentence)
[('Jack', 'NNP'), ('Black', 'NNP'), ('starred', 'VBD'), ('in', 'IN'), ('the', 'DT'), ('2003', 'CD'), ('film', 'NN'), ('classic', 'JJ'), ('``', '``'), ('School', 'NNP'), ('of', 'IN'), ('Rock', 'NNP'), ("''", "''"), ('.', '.')]
We flattened the tree of entities, so the return format is a list of
(word, entity type) tuples. For non-entities, the
entity_type is just the part of speech of the word.
'NNP' is the indicator of a named entity (a proper noun, according to NLTK). Looks like we identified three named entities here: ‘Jack’ and ‘Black’, ‘School’, and ‘Rock’. as a ‘GPE’. (Seems that the labeler thinks Rock is the name of a place, a city or something.) Whatever technique NLTK uses for named entity
recognition may be a bit rough, but it did a pretty decent job here!
Creating our NamedEntityConstraint¶
Now that we know how to detect named entities using NLTK, let’s create our custom constraint.
from textattack.constraints import Constraint class NamedEntityConstraint(Constraint): """A constraint that ensures `transformed_text` only substitutes named entities from `current_text` with other named entities.""" def _check_constraint(self, transformed_text, current_text): transformed_entities = get_entities(transformed_text.text) current_entities = get_entities(current_text.text) # If there aren't named entities, let's return False (the attack # will eventually fail). if len(current_entities) == 0: return False if len(current_entities) != len(transformed_entities): # If the two sentences have a different number of entities, then # they definitely don't have the same labels. In this case, the # constraint is violated, and we return False. return False else: # Here we compare all of the words, in order, to make sure that they match. # If we find two words that don't match, this means a word was swapped # between `current_text` and `transformed_text`. That word must be a named entity to fulfill our # constraint. current_word_label = None transformed_word_label = None for (word_1, label_1), (word_2, label_2) in zip( current_entities, transformed_entities ): if word_1 != word_2: # Finally, make sure that words swapped between `x` and `x_adv` are named entities. If # they're not, then we also return False. if (label_1 not in ["NNP", "NE"]) or (label_2 not in ["NNP", "NE"]): return False # If we get here, all of the labels match up. Return True! return True
Testing our constraint¶
We need to create an attack and a dataset to test our constraint on. We went over all of this in the transformations tutorial, so let’s gloss over this part for now.
# Import the model import transformers from textattack.models.wrappers import HuggingFaceModelWrapper model = transformers.AutoModelForSequenceClassification.from_pretrained( "textattack/albert-base-v2-ag-news" ) tokenizer = transformers.AutoTokenizer.from_pretrained( "textattack/albert-base-v2-ag-news" ) model_wrapper = HuggingFaceModelWrapper(model, tokenizer) # Create the goal function using the model from textattack.goal_functions import UntargetedClassification goal_function = UntargetedClassification(model_wrapper) # Import the dataset from textattack.datasets import HuggingFaceDataset dataset = HuggingFaceDataset("ag_news", None, "test")
textattack: Unknown if model of class <class 'transformers.models.albert.modeling_albert.AlbertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>. Using custom data configuration default Reusing dataset ag_news (/Users/ccy/.cache/huggingface/datasets/ag_news/default/0.0.0/fb5c5e74a110037311ef5e904583ce9f8b9fbc1354290f97b4929f01b3f48b1a) textattack: Loading datasets dataset ag_news, split test.
from textattack.transformations import WordSwapEmbedding from textattack.search_methods import GreedyWordSwapWIR from textattack import Attack from textattack.constraints.pre_transformation import ( RepeatModification, StopwordModification, ) # We're going to the `WordSwapEmbedding` transformation. Using the default settings, this # will try substituting words with their neighbors in the counter-fitted embedding space. transformation = WordSwapEmbedding(max_candidates=20) # We'll use the greedy search with word importance ranking method again search_method = GreedyWordSwapWIR() # Our constraints will be the same as Tutorial 1, plus the named entity constraint constraints = [ RepeatModification(), StopwordModification(), NamedEntityConstraint(False), ] # Now, let's make the attack using these parameters. attack = Attack(goal_function, constraints, transformation, search_method)
Now, let’s use our attack. We’re going to attack samples until we achieve 5 successes. (There’s a lot to check here, and since we’re using a greedy search over all potential word swap positions, each sample will take a few minutes. This will take a few hours to run on a single core.)
from textattack.loggers import CSVLogger # tracks a dataframe for us. from textattack.attack_results import SuccessfulAttackResult from textattack import Attacker, AttackArgs attack_args = AttackArgs( num_successful_examples=5, log_to_csv="results.csv", csv_coloring_style="html" ) attacker = Attacker(attack, dataset, attack_args) attack_results = attacker.attack_dataset()
textattack: Logging to CSV at path results.csv 0%| | 0/5 [00:00<?, ?it/s]
Attack( (search_method): GreedyWordSwapWIR( (wir_method): unk ) (goal_function): UntargetedClassification (transformation): WordSwapEmbedding( (max_candidates): 20 (embedding): WordEmbedding ) (constraints): (0): NamedEntityConstraint( (compare_against_original): False ) (1): RepeatModification (2): StopwordModification (is_black_box): True )
20%|██████████████████████████▍ | 1/5 [02:47<11:10, 167.69s/it] [Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1: 20%|███████████████▌ | 1/5 [02:47<11:10, 167.72s/it]
--------------------------------------------- Result 1 --------------------------------------------- [[Business (75%)]] --> [[Sci/tech (61%)]] Fears for T N pension after talks Unions representing workers at [[Turner]] Newall say they are 'disappointed' after talks with stricken parent firm Federal [[Mogul]]. Fears for T N pension after talks Unions representing workers at [[Knapp]] Newall say they are 'disappointed' after talks with stricken parent firm Federal [[Titan]].
[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2: 20%|███████████████ | 1/5 [16:59<1:07:57, 1019.36s/it]
--------------------------------------------- Result 2 --------------------------------------------- [[Sci/tech (100%)]] --> [[[FAILED]]] The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.
[Succeeded / Failed / Skipped / Total] 1 / 2 / 0 / 3: 20%|███████████████ | 1/5 [25:29<1:41:59, 1529.85s/it]
--------------------------------------------- Result 3 --------------------------------------------- [[Sci/tech (100%)]] --> [[[FAILED]]] Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.
Now let’s visualize our 5 successes in color:
import pandas as pd pd.options.display.max_colwidth = ( 480 # increase column width so we can actually read the examples ) from IPython.core.display import display, HTML logger = attacker.attack_log_manager.loggers successes = logger.df[logger.df["result_type"] == "Successful"] display(HTML(successes[["original_text", "perturbed_text"]].to_html(escape=False)))
Our constraint seems to have done its job: it filtered out attacks that did not swap out a named entity for another, according to the NLTK named entity detector. However, we can see some problems inherent in the detector: it often thinks the title of the news article or the first word of a given sentence is a named entity, probably due to capitalization.