textattack.goal_functions.custom package

Custom goal functions

Submodules

Goal Function for Logit sum

class textattack.goal_functions.custom.logit_sum.LogitSum(*args, target_logit_sum=None, first_element_threshold=None, **kwargs)[source]

Bases: GoalFunction

A goal function that minimizes the sum of output logits for classification models.

This can be used for tasks where the objective is to suppress the model’s overall confidence, or specifically the logit of the most probable label.

Behavior:

If target_logit_sum is set, the attack succeeds when the sum of all logits is less than target_logit_sum.
If first_element_threshold is set (or defaulted to 0.5), the attack succeeds when the first logit’s value is less than that threshold.

Parameters:

target_logit_sum (float, optional) – A threshold for the total sum of logits.
first_element_threshold (float, optional) – A fallback threshold for the first logit only.

Note

Only one of target_logit_sum or first_element_threshold may be set.

Goal Function for NamedEntityRecognition

class textattack.goal_functions.custom.named_entity_recognition.NamedEntityRecognition(*args, target_suffix: str, **kwargs)[source]

Bases: GoalFunction

A goal function for attacking named entity recognition (NER) models.

Expects model outputs to be a list of dictionaries, each containing at least:

‘entity’: the predicted entity label (e.g., “PER”, “ORG”)
‘score’: the confidence score associated with that entity

The goal is to reduce the total confidence of all entities ending with a specified suffix (e.g., “PER” for person names), effectively suppressing target entity types.

Goal Function for Targeted classification with bonus score

class textattack.goal_functions.custom.targeted_bonus.TargetedBonus(*args, target_class=0, **kwargs)[source]

Bases: GoalFunction

A modified targeted attack on classification models which awards a bonus score of 1 if the class with the highest predicted probability is exactly equal to the target_class.

Goal Function for Strict targeted classification

class textattack.goal_functions.custom.targeted_strict.TargetedStrict(*args, target_class=0, **kwargs)[source]

Bases: GoalFunction

A modified targeted attack on classification models which only sets _is_goal_complete to True if argmax(model_output) matches the target_class.

In TargetedClassification, if either argmax(model_output) == target_class or ground_truth_output == target_class, then _is_goal_complete returns True.