Attack API Reference
Attack
Attack is composed of four components:
Goal Functions: stipulate the goal of the attack, like to change the prediction score of a classification model, or to change all of the words in a translation output.
Constraints: determine if a potential perturbation is valid with respect to the original input.
Transformations: take a text input and transform it by inserting and deleting characters, words, and/or phrases.
Search Methods: explore the space of possible transformations within the defined constraints and attempt to find a successful perturbation which satisfies the goal function.
The Attack
class represents an adversarial attack composed of a goal function, search method, transformation, and constraints.
- class textattack.Attack(goal_function: GoalFunction, constraints: List[Union[Constraint, PreTransformationConstraint]], transformation: Transformation, search_method: SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]
An attack generates adversarial examples on text.
An attack is comprised of a goal function, constraints, transformation, and a search method. Use
attack()
method to attack one sample at a time.- Parameters
goal_function (
GoalFunction
) – A function for determining how well a perturbation is doing at achieving the attack’s goal.constraints (list of
Constraint
orPreTransformationConstraint
) – A list of constraints to add to the attack, defining which perturbations are valid.transformation (
Transformation
) – The transformation applied at each step of the attack.search_method (
SearchMethod
) – The method for exploring the search space of possible perturbationstransformation_cache_size (
int
, optional, defaults to2**15
) – The number of items to keep in the transformations cacheconstraint_cache_size (
int
, optional, defaults to2**15
) – The number of items to keep in the constraints cache
Example:
>>> import textattack >>> import transformers >>> # Load model, tokenizer, and model_wrapper >>> model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-imdb") >>> tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb") >>> model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer) >>> # Construct our four components for `Attack` >>> from textattack.constraints.pre_transformation import RepeatModification, StopwordModification >>> from textattack.constraints.semantics import WordEmbeddingDistance >>> goal_function = textattack.goal_functions.UntargetedClassification(model_wrapper) >>> constraints = [ ... RepeatModification(), ... StopwordModification() ... WordEmbeddingDistance(min_cos_sim=0.9) ... ] >>> transformation = WordSwapEmbedding(max_candidates=50) >>> search_method = GreedyWordSwapWIR(wir_method="delete") >>> # Construct the actual attack >>> attack = Attack(goal_function, constraints, transformation, search_method) >>> input_text = "I really enjoyed the new movie that came out last month." >>> label = 1 #Positive >>> attack_result = attack.attack(input_text, label)
- attack(example, ground_truth_output)[source]
Attack a single example.
- Parameters
example (
str
,OrderedDict[str, str]
orAttackedText
) – Example to attack. It can be a single string or an OrderedDict where keys represent the input fields (e.g. “premise”, “hypothesis”) and the values are the actual input textx. Also acceptsAttackedText
that wraps around the input.ground_truth_output (
int
,float
orstr
) – Ground truth output of example. For classification tasks, it should be an integer representing the ground truth label. For regression tasks (e.g. STS), it should be the target value. For seq2seq tasks (e.g. translation), it should be the target string.
- Returns
AttackResult
that represents the result of the attack.
- filter_transformations(transformed_texts, current_text, original_text=None)[source]
Filters a list of potential transformed texts based on
self.constraints
Utilizes an LRU cache to attempt to avoid recomputing common transformations.- Parameters
transformed_texts – A list of candidate transformed
AttackedText
to filter.current_text – The current
AttackedText
on which the transformation was applied.original_text – The original
AttackedText
from which the attack started.
- get_indices_to_order(current_text, **kwargs)[source]
Applies
pre_transformation_constraints
totext
to get all the indices that can be used to search and order.- Parameters
current_text – The current
AttackedText
for which we need to find indices are eligible to be ordered.- Returns
The length and the filtered list of indices which search methods can use to search/order.
- get_transformations(current_text, original_text=None, **kwargs)[source]
Applies
self.transformation
totext
, then filters the list of possible transformations through the applicable constraints.- Parameters
current_text – The current
AttackedText
on which to perform the transformations.original_text – The original
AttackedText
from which the attack started.
- Returns
A filtered list of transformations where each transformation matches the constraints
AttackRecipe
Attack recipe is a subclass of Attack
class that has a special method build()
which
returns a pre-built Attack
that correspond to attacks from the literature.
- class textattack.attack_recipes.AttackRecipe(goal_function: GoalFunction, constraints: List[Union[Constraint, PreTransformationConstraint]], transformation: Transformation, search_method: SearchMethod, transformation_cache_size=32768, constraint_cache_size=32768)[source]
A recipe for building an NLP adversarial attack from the literature.
- abstract static build(model_wrapper, **kwargs)[source]
Creates pre-built
Attack
that correspond to attacks from the literature.- Parameters
model_wrapper (
ModelWrapper
) –ModelWrapper
that contains the victim model and tokenizer. This is passed toGoalFunction
when constructing the attack.kwargs – Additional keyword arguments.
- Returns