Attack API Reference

Attack

Attack is composed of four components:

Goal Functions: stipulate the goal of the attack, like to change the prediction score of a classification model, or to change all of the words in a translation output.
Constraints: determine if a potential perturbation is valid with respect to the original input.
Transformations: take a text input and transform it by inserting and deleting characters, words, and/or phrases.
Search Methods: explore the space of possible transformations within the defined constraints and attempt to find a successful perturbation which satisfies the goal function.

The Attack class represents an adversarial attack composed of a goal function, search method, transformation, and constraints.

class textattack.Attack(goal_function: GoalFunction, constraints: List[Constraint | PreTransformationConstraint], transformation: Transformation, search_method: SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]

An attack generates adversarial examples on text.

An attack is comprised of a goal function, constraints, transformation, and a search method. Use attack() method to attack one sample at a time.

Parameters:

goal_function (GoalFunction) – A function for determining how well a perturbation is doing at achieving the attack’s goal.
constraints (list of Constraint or PreTransformationConstraint) – A list of constraints to add to the attack, defining which perturbations are valid.
transformation (Transformation) – The transformation applied at each step of the attack.
search_method (SearchMethod) – The method for exploring the search space of possible perturbations
transformation_cache_size (int, optional, defaults to 2**15) – The number of items to keep in the transformations cache
constraint_cache_size (int, optional, defaults to 2**15) – The number of items to keep in the constraints cache

Example:

>>> import textattack
>>> import transformers

>>> # Load model, tokenizer, and model_wrapper
>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-imdb")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb")
>>> model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer)

>>> # Construct our four components for `Attack`
>>> from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
>>> from textattack.constraints.semantics import WordEmbeddingDistance
>>> from textattack.transformations import WordSwapEmbedding
>>> from textattack.search_methods import GreedyWordSwapWIR

>>> goal_function = textattack.goal_functions.UntargetedClassification(model_wrapper)
>>> constraints = [
...     RepeatModification(),
...     StopwordModification(),
...     WordEmbeddingDistance(min_cos_sim=0.9)
... ]
>>> transformation = WordSwapEmbedding(max_candidates=50)
>>> search_method = GreedyWordSwapWIR(wir_method="delete")

>>> # Construct the actual attack
>>> attack = textattack.Attack(goal_function, constraints, transformation, search_method)

>>> input_text = "I really enjoyed the new movie that came out last month."
>>> label = 1 #Positive
>>> attack_result = attack.attack(input_text, label)

attack(example, ground_truth_output)[source]

Attack a single example.

Parameters:

example (str, OrderedDict[str, str] or AttackedText) – Example to attack. It can be a single string or an OrderedDict where keys represent the input fields (e.g. “premise”, “hypothesis”) and the values are the actual input textx. Also accepts AttackedText that wraps around the input.
ground_truth_output (int, float or str) – Ground truth output of example. For classification tasks, it should be an integer representing the ground truth label. For regression tasks (e.g. STS), it should be the target value. For seq2seq tasks (e.g. translation), it should be the target string.

Returns:

AttackResult that represents the result of the attack.

cpu_()[source]: Move any torch.nn.Module models that are part of Attack to CPU.

cuda_()[source]: Move any torch.nn.Module models that are part of Attack to GPU.

filter_transformations(transformed_texts, current_text, original_text=None)[source]

Filters a list of potential transformed texts based on self.constraints Utilizes an LRU cache to attempt to avoid recomputing common transformations.

Parameters:

transformed_texts – A list of candidate transformed AttackedText to filter.
current_text – The current AttackedText on which the transformation was applied.
original_text – The original AttackedText from which the attack started.

get_indices_to_order(current_text, **kwargs)[source]

Applies pre_transformation_constraints to text to get all the indices that can be used to search and order.

Parameters:: current_text – The current AttackedText for which we need to find indices are eligible to be ordered.
Returns:: The length and the filtered list of indices which search methods can use to search/order.

get_transformations(current_text, original_text=None, **kwargs)[source]

Applies self.transformation to text, then filters the list of possible transformations through the applicable constraints.

Parameters:

current_text – The current AttackedText on which to perform the transformations.
original_text – The original AttackedText from which the attack started.

Returns:

A filtered list of transformations where each transformation matches the constraints

AttackRecipe

Attack recipe is a subclass of Attack class that has a special method build() which returns a pre-built Attack that correspond to attacks from the literature.

class textattack.attack_recipes.AttackRecipe(goal_function: GoalFunction, constraints: List[Constraint | PreTransformationConstraint], transformation: Transformation, search_method: SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]

A recipe for building an NLP adversarial attack from the literature.

abstract static build(model_wrapper, **kwargs)[source]

Creates pre-built Attack that correspond to attacks from the literature.

Parameters:

model_wrapper (ModelWrapper) – ModelWrapper that contains the victim model and tokenizer. This is passed to GoalFunction when constructing the attack.
kwargs – Additional keyword arguments.

Returns:

Attack