Goal Functions API Reference

GoalFunction determines both the conditions under which the attack is successful (in terms of the model outputs) and the heuristic score that we want to maximize when searching for the solution.

GoalFunction

class textattack.goal_functions.GoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576)[source]

Evaluates how well a perturbed attacked_text object is achieving a specified goal.

Parameters
  • model_wrapper (ModelWrapper) – The victim model to attack.

  • maximizable (bool, optional, defaults to False) – Whether the goal function is maximizable, as opposed to a boolean result of success or failure.

  • query_budget (float, optional, defaults to float("in")) – The maximum number of model queries allowed.

  • model_cache_size (int, optional, defaults to 2**20) – The maximum number of items to keep in the model results cache at once.

get_output(attacked_text)[source]

Returns output for display based on the result of calling the model.

get_result(attacked_text, **kwargs)[source]

A helper method that queries self.get_results with a single AttackedText object.

get_results(attacked_text_list, check_skip=False)[source]

For each attacked_text object in attacked_text_list, returns a result consisting of whether or not the goal has been achieved, the output for display purposes, and a score.

Additionally returns whether the search is over due to the query budget.

init_attack_example(attacked_text, ground_truth_output)[source]

Called before attacking attacked_text to ‘reset’ the goal function and set properties for this example.

ClassificationGoalFunction

class textattack.goal_functions.classification.ClassificationGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576)[source]

A goal function defined on a model that outputs a probability for some number of classes.

TargetedClassification

class textattack.goal_functions.classification.TargetedClassification(*args, target_class=0, **kwargs)[source]

A targeted attack on classification models which attempts to maximize the score of the target label.

Complete when the arget label is the predicted label.

UntargetedClassification

class textattack.goal_functions.classification.UntargetedClassification(*args, target_max_score=None, **kwargs)[source]

An untargeted attack on classification models which attempts to minimize the score of the correct label until it is no longer the predicted label.

Parameters

target_max_score (float) – If set, goal is to reduce model output to below this score. Otherwise, goal is to change the overall predicted class.

InputReduction

class textattack.goal_functions.classification.InputReduction(*args, target_num_words=1, **kwargs)[source]

Attempts to reduce the input down to as few words as possible while maintaining the same predicted label.

From Feng, Wallace, Grissom, Iyyer, Rodriguez, Boyd-Graber. (2018). Pathologies of Neural Models Make Interpretations Difficult. https://arxiv.org/abs/1804.07781

TextToTextGoalFunction

class textattack.goal_functions.text.TextToTextGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576)[source]

A goal function defined on a model that outputs text.

model: The PyTorch or TensorFlow model used for evaluation. original_output: the original output of the model

MinimizeBleu

class textattack.goal_functions.text.MinimizeBleu(*args, target_bleu=0.0, **kwargs)[source]

Attempts to minimize the BLEU score between the current output translation and the reference translation.

BLEU score was defined in (BLEU: a Method for Automatic Evaluation of Machine Translation).

ArxivURL

This goal function is defined in (It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations).

ArxivURL2

NonOverlappingOutput

class textattack.goal_functions.text.NonOverlappingOutput(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576)[source]

Ensures that none of the words at a position are equal.

Defined in seq2sick (https://arxiv.org/pdf/1803.01128.pdf), equation (3).