Goal Functions API Reference
GoalFunction determines both the conditions under which the attack is successful (in terms of the model outputs)
and the heuristic score that we want to maximize when searching for the solution.
GoalFunction
- class textattack.goal_functions.GoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]
Evaluates how well a perturbed attacked_text object is achieving a specified goal.
- Parameters:
model_wrapper (
ModelWrapper) – The victim model to attack.maximizable (
bool, optional, defaults toFalse) – Whether the goal function is maximizable, as opposed to a boolean result of success or failure.query_budget (
float, optional, defaults tofloat("in")) – The maximum number of model queries allowed.model_cache_size (
int, optional, defaults to2**20) – The maximum number of items to keep in the model results cache at once.
- get_output(attacked_text)[source]
Returns output for display based on the result of calling the model.
- get_result(attacked_text, **kwargs)[source]
A helper method that queries
self.get_resultswith a singleAttackedTextobject.
- get_results(attacked_text_list, check_skip=False)[source]
For each attacked_text object in attacked_text_list, returns a result consisting of whether or not the goal has been achieved, the output for display purposes, and a score.
Additionally returns whether the search is over due to the query budget.
LogitSum
- class textattack.goal_functions.LogitSum(*args, target_logit_sum=None, first_element_threshold=None, **kwargs)[source]
A goal function that minimizes the sum of output logits for classification models.
This can be used for tasks where the objective is to suppress the model’s overall confidence, or specifically the logit of the most probable label.
- Behavior:
If target_logit_sum is set, the attack succeeds when the sum of all logits is less than target_logit_sum.
If first_element_threshold is set (or defaulted to 0.5), the attack succeeds when the first logit’s value is less than that threshold.
- Parameters:
target_logit_sum (float, optional) – A threshold for the total sum of logits.
first_element_threshold (float, optional) – A fallback threshold for the first logit only.
Note
Only one of target_logit_sum or first_element_threshold may be set.
NamedEntityRecognition
- class textattack.goal_functions.NamedEntityRecognition(*args, target_suffix: str, **kwargs)[source]
A goal function for attacking named entity recognition (NER) models.
- Expects model outputs to be a list of dictionaries, each containing at least:
‘entity’: the predicted entity label (e.g., “PER”, “ORG”)
‘score’: the confidence score associated with that entity
The goal is to reduce the total confidence of all entities ending with a specified suffix (e.g., “PER” for person names), effectively suppressing target entity types.
TargetedStrict
- class textattack.goal_functions.TargetedStrict(*args, target_class=0, **kwargs)[source]
A modified targeted attack on classification models which only sets _is_goal_complete to True if argmax(model_output) matches the target_class.
In TargetedClassification, if either argmax(model_output) == target_class or ground_truth_output == target_class, then _is_goal_complete returns True.
TargetedBonus
ClassificationGoalFunction
- class textattack.goal_functions.classification.ClassificationGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]
A goal function defined on a model that outputs a probability for some number of classes.
TargetedClassification
UntargetedClassification
- class textattack.goal_functions.classification.UntargetedClassification(*args, target_max_score=None, **kwargs)[source]
An untargeted attack on classification models which attempts to minimize the score of the correct label until it is no longer the predicted label.
- Parameters:
target_max_score (float) – If set, goal is to reduce model output to below this score. Otherwise, goal is to change the overall predicted class.
InputReduction
- class textattack.goal_functions.classification.InputReduction(*args, target_num_words=1, **kwargs)[source]
Attempts to reduce the input down to as few words as possible while maintaining the same predicted label.
From Feng, Wallace, Grissom, Iyyer, Rodriguez, Boyd-Graber. (2018). Pathologies of Neural Models Make Interpretations Difficult. https://arxiv.org/abs/1804.07781
TextToTextGoalFunction
- class textattack.goal_functions.text.TextToTextGoalFunction(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]
A goal function defined on a model that outputs text.
model: The PyTorch or TensorFlow model used for evaluation. original_output: the original output of the model
MinimizeBleu
- class textattack.goal_functions.text.MinimizeBleu(*args, target_bleu=0.0, **kwargs)[source]
Attempts to minimize the BLEU score between the current output translation and the reference translation.
BLEU score was defined in (BLEU: a Method for Automatic Evaluation of Machine Translation).
This goal function is defined in (It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations).
NonOverlappingOutput
- class textattack.goal_functions.text.NonOverlappingOutput(model_wrapper, maximizable=False, use_cache=True, query_budget=inf, model_batch_size=32, model_cache_size=1048576, allow_skip=True)[source]
Ensures that none of the words at a position are equal.
Defined in seq2sick (https://arxiv.org/pdf/1803.01128.pdf), equation (3).
MaximizeLevenshtein
- class textattack.goal_functions.text.MaximizeLevenshtein(*args, target_distance=None, **kwargs)[source]
Attempts to maximise the Levenshtein distance between the current output translation and the reference translation.
Levenshtein distance is defined as the minimum number of single- character edits (insertions, deletions, or substitutions) required to change one string into another.