Attack API Reference

Attack

Attack is composed of four components:

  • Goal Functions: stipulate the goal of the attack, like to change the prediction score of a classification model, or to change all of the words in a translation output.

  • Constraints: determine if a potential perturbation is valid with respect to the original input.

  • Transformations: take a text input and transform it by inserting and deleting characters, words, and/or phrases.

  • Search Methods: explore the space of possible transformations within the defined constraints and attempt to find a successful perturbation which satisfies the goal function.

The Attack class represents an adversarial attack composed of a goal function, search method, transformation, and constraints.

class textattack.Attack(goal_function: textattack.goal_functions.goal_function.GoalFunction, constraints: List[Union[textattack.constraints.constraint.Constraint, textattack.constraints.pre_transformation_constraint.PreTransformationConstraint]], transformation: textattack.transformations.transformation.Transformation, search_method: textattack.search_methods.search_method.SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]

An attack generates adversarial examples on text.

An attack is comprised of a goal function, constraints, transformation, and a search method. Use attack() method to attack one sample at a time.

Parameters
  • goal_function (GoalFunction) – A function for determining how well a perturbation is doing at achieving the attack’s goal.

  • constraints (list of Constraint or PreTransformationConstraint) – A list of constraints to add to the attack, defining which perturbations are valid.

  • transformation (Transformation) – The transformation applied at each step of the attack.

  • search_method (SearchMethod) – The method for exploring the search space of possible perturbations

  • transformation_cache_size (int, optional, defaults to 2**15) – The number of items to keep in the transformations cache

  • constraint_cache_size (int, optional, defaults to 2**15) – The number of items to keep in the constraints cache

Example:

>>> import textattack
>>> import transformers

>>> # Load model, tokenizer, and model_wrapper
>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-imdb")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb")
>>> model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer)

>>> # Construct our four components for `Attack`
>>> from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
>>> from textattack.constraints.semantics import WordEmbeddingDistance

>>> goal_function = textattack.goal_functions.UntargetedClassification(model_wrapper)
>>> constraints = [
...     RepeatModification(),
...     StopwordModification()
...     WordEmbeddingDistance(min_cos_sim=0.9)
... ]
>>> transformation = WordSwapEmbedding(max_candidates=50)
>>> search_method = GreedyWordSwapWIR(wir_method="delete")

>>> # Construct the actual attack
>>> attack = Attack(goal_function, constraints, transformation, search_method)

>>> input_text = "I really enjoyed the new movie that came out last month."
>>> label = 1 #Positive
>>> attack_result = attack.attack(input_text, label)
attack(example, ground_truth_output)[source]

Attack a single example.

Parameters
  • example (str, OrderedDict[str, str] or AttackedText) – Example to attack. It can be a single string or an OrderedDict where keys represent the input fields (e.g. “premise”, “hypothesis”) and the values are the actual input textx. Also accepts AttackedText that wraps around the input.

  • ground_truth_output (int, float or str) – Ground truth output of example. For classification tasks, it should be an integer representing the ground truth label. For regression tasks (e.g. STS), it should be the target value. For seq2seq tasks (e.g. translation), it should be the target string.

Returns

AttackResult that represents the result of the attack.

cpu_()[source]

Move any torch.nn.Module models that are part of Attack to CPU.

cuda_()[source]

Move any torch.nn.Module models that are part of Attack to GPU.

filter_transformations(transformed_texts, current_text, original_text=None)[source]

Filters a list of potential transformed texts based on self.constraints Utilizes an LRU cache to attempt to avoid recomputing common transformations.

Parameters
  • transformed_texts – A list of candidate transformed AttackedText to filter.

  • current_text – The current AttackedText on which the transformation was applied.

  • original_text – The original AttackedText from which the attack started.

get_transformations(current_text, original_text=None, **kwargs)[source]

Applies self.transformation to text, then filters the list of possible transformations through the applicable constraints.

Parameters
  • current_text – The current AttackedText on which to perform the transformations.

  • original_text – The original AttackedText from which the attack started.

Returns

A filtered list of transformations where each transformation matches the constraints

AttackRecipe

Attack recipe is a subclass of Attack class that has a special method build() which returns a pre-built Attack that correspond to attacks from the literature.

class textattack.attack_recipes.AttackRecipe(goal_function: textattack.goal_functions.goal_function.GoalFunction, constraints: List[Union[textattack.constraints.constraint.Constraint, textattack.constraints.pre_transformation_constraint.PreTransformationConstraint]], transformation: textattack.transformations.transformation.Transformation, search_method: textattack.search_methods.search_method.SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]

A recipe for building an NLP adversarial attack from the literature.

abstract static build(model_wrapper, **kwargs)[source]

Creates pre-built Attack that correspond to attacks from the literature.

Parameters
  • model_wrapper (ModelWrapper) – ModelWrapper that contains the victim model and tokenizer. This is passed to GoalFunction when constructing the attack.

  • kwargs – Additional keyword arguments.

Returns

Attack