Goal Functions API Reference

GoalFunction determines both the conditions under which the attack is successful (in terms of the model outputs) and the heuristic score that we want to maximize when searching for the solution.

GoalFunction

class textattack.goal_functions.GoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]

Evaluates how well a perturbed attacked_text object is achieving a specified goal.

Parameters:

model_wrapper (ModelWrapper) – The victim model to attack.
maximizable (bool, optional, defaults to False) – Whether the goal function is maximizable, as opposed to a boolean result of success or failure.
query_budget (float, optional, defaults to float("in")) – The maximum number of model queries allowed.
model_cache_size (int, optional, defaults to 2**20) – The maximum number of items to keep in the model results cache at once.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

get_output(attacked_text)[source]: Returns output for display based on the result of calling the model.

get_result(attacked_text, **kwargs)[source]: A helper method that queries self.get_results with a single AttackedText object.

get_results(attacked_text_list, check_skip=False)[source]

For each attacked_text object in attacked_text_list, returns a result consisting of whether or not the goal has been achieved, the output for display purposes, and a score.

Additionally returns whether the search is over due to the query budget.

init_attack_example(attacked_text, ground_truth_output)[source]: Called before attacking attacked_text to ‘reset’ the goal function and set properties for this example.

LogitSum

class textattack.goal_functions.LogitSum(*args, target_logit_sum=None, first_element_threshold=None, **kwargs)[source]

A goal function that minimizes the sum of output logits for classification models.

This can be used for tasks where the objective is to suppress the model’s overall confidence, or specifically the logit of the most probable label.

Behavior:

If target_logit_sum is set, the attack succeeds when the sum of all logits is less than target_logit_sum.
If first_element_threshold is set (or defaulted to 0.5), the attack succeeds when the first logit’s value is less than that threshold.

Parameters:

target_logit_sum (float, optional) – A threshold for the total sum of logits.
first_element_threshold (float, optional) – A fallback threshold for the first logit only.

Note

Only one of target_logit_sum or first_element_threshold may be set.

NamedEntityRecognition

class textattack.goal_functions.NamedEntityRecognition(*args, target_suffix: str, **kwargs)[source]

A goal function for attacking named entity recognition (NER) models.

Expects model outputs to be a list of dictionaries, each containing at least:

‘entity’: the predicted entity label (e.g., “PER”, “ORG”)
‘score’: the confidence score associated with that entity

The goal is to reduce the total confidence of all entities ending with a specified suffix (e.g., “PER” for person names), effectively suppressing target entity types.

TargetedStrict

class textattack.goal_functions.TargetedStrict(*args, target_class=0, **kwargs)[source]

A modified targeted attack on classification models which only sets _is_goal_complete to True if argmax(model_output) matches the target_class.

In TargetedClassification, if either argmax(model_output) == target_class or ground_truth_output == target_class, then _is_goal_complete returns True.

TargetedBonus

class textattack.goal_functions.TargetedBonus(*args, target_class=0, **kwargs)[source]: A modified targeted attack on classification models which awards a bonus score of 1 if the class with the highest predicted probability is exactly equal to the target_class.

ClassificationGoalFunction

class textattack.goal_functions.classification.ClassificationGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]

A goal function defined on a model that outputs a probability for some number of classes.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

TargetedClassification

class textattack.goal_functions.classification.TargetedClassification(*args, target_class=0, **kwargs)[source]

A targeted attack on classification models which attempts to maximize the score of the target label.

Complete when the arget label is the predicted label.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

UntargetedClassification

class textattack.goal_functions.classification.UntargetedClassification(*args, target_max_score=None, **kwargs)[source]

An untargeted attack on classification models which attempts to minimize the score of the correct label until it is no longer the predicted label.

Parameters:: target_max_score (float) – If set, goal is to reduce model output to below this score. Otherwise, goal is to change the overall predicted class.

InputReduction

class textattack.goal_functions.classification.InputReduction(*args, target_num_words=1, **kwargs)[source]

Attempts to reduce the input down to as few words as possible while maintaining the same predicted label.

From Feng, Wallace, Grissom, Iyyer, Rodriguez, Boyd-Graber. (2018). Pathologies of Neural Models Make Interpretations Difficult. https://arxiv.org/abs/1804.07781

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

TextToTextGoalFunction

class textattack.goal_functions.text.TextToTextGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]

A goal function defined on a model that outputs text.

model: The PyTorch or TensorFlow model used for evaluation. original_output: the original output of the model

MinimizeBleu

class textattack.goal_functions.text.MinimizeBleu(*args, target_bleu=0.0, **kwargs)[source]

Attempts to minimize the BLEU score between the current output translation and the reference translation.

BLEU score was defined in (BLEU: a Method for Automatic Evaluation of Machine Translation).

ArxivURL

This goal function is defined in (It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations).

ArxivURL2

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

NonOverlappingOutput

class textattack.goal_functions.text.NonOverlappingOutput(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]

Ensures that none of the words at a position are equal.

Defined in seq2sick (https://arxiv.org/pdf/1803.01128.pdf), equation (3).

MaximizeLevenshtein

class textattack.goal_functions.text.MaximizeLevenshtein(*args, target_distance=None, **kwargs)[source]

Attempts to maximise the Levenshtein distance between the current output translation and the reference translation.

Levenshtein distance is defined as the minimum number of single- character edits (insertions, deletions, or substitutions) required to change one string into another.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.