textattack.shared package

Shared TextAttack Functions

This package includes functions shared across packages.

Attacked Text Class

A helper class that represents a string that can be attacked.

class textattack.shared.attacked_text.AttackedText(text_input, attack_attrs=None)[source]

Bases: object

A helper class that represents a string that can be attacked.

Models that take multiple sentences as input separate them by SPLIT_TOKEN. Attacks “see” the entire input, joined into one string, without the split token.

AttackedText instances that were perturbed from other AttackedText objects contain a pointer to the previous text (attack_attrs["previous_attacked_text"]), so that the full chain of perturbations might be reconstructed by using this key to form a linked list.

Parameters
  • text (string) – The string that this AttackedText represents

  • attack_attrs (dict) – Dictionary of various attributes stored during the course of an attack.

align_with_model_tokens(model_wrapper)[source]

Align AttackedText’s words with target model’s tokenization scheme (e.g. word, character, subword). Specifically, we map each word to list of indices of tokens that compose the word (e.g. embedding –> [“em”, “##bed”, “##ding”])

Parameters

model_wrapper (textattack.models.wrappers.ModelWrapper) – ModelWrapper of the target model

Returns

Dictionary that maps i-th word to list of indices.

Return type

word2token_mapping (dict[int, list[int]])

all_words_diff(other_attacked_text)[source]

Returns the set of indices for which this and other_attacked_text have different words.

convert_from_original_idxs(idxs)[source]

Takes indices of words from original string and converts them to indices of the same words in the current string.

Uses information from self.attack_attrs['original_index_map'], which maps word indices from the original to perturbed text.

delete_word_at_index(index)[source]

This code returns a new AttackedText object where the word at index is removed.

first_word_diff(other_attacked_text)[source]

Returns the first word in self.words that differs from other_attacked_text.

Useful for word swap strategies.

first_word_diff_index(other_attacked_text)[source]

Returns the index of the first word in self.words that differs from other_attacked_text.

Useful for word swap strategies.

free_memory()[source]

Delete items that take up memory.

Can be called once the AttackedText is only needed to display.

generate_new_attacked_text(new_words)[source]

Returns a new AttackedText object and replaces old list of words with a new list of words, but preserves the punctuation and spacing of the original message.

self.words is a list of the words in the current text with punctuation removed. However, each “word” in new_words could be an empty string, representing a word deletion, or a string with multiple space-separated words, representation an insertion of one or more words.

get_deletion_indices()[source]
insert_text_after_word_index(index, text)[source]

Inserts a string before word at index index and attempts to add appropriate spacing.

insert_text_before_word_index(index, text)[source]

Inserts a string before word at index index and attempts to add appropriate spacing.

ith_word_diff(other_attacked_text, i)[source]

Returns whether the word at index i differs from other_attacked_text.

ner_of_word_index(desired_word_idx)[source]

Returns the ner tag of the word at index word_idx.

Uses FLAIR ner tagger.

pos_of_word_index(desired_word_idx)[source]

Returns the part-of-speech of the word at index word_idx.

Uses FLAIR part-of-speech tagger.

printable_text(key_color='bold', key_color_method=None)[source]

Represents full text input. Adds field descriptions.

For example, entailment inputs look like:

` premise: ... hypothesis: ... `

replace_word_at_index(index, new_word)[source]

This code returns a new AttackedText object where the word at index is replaced with a new word.

replace_words_at_indices(indices, new_words)[source]

This code returns a new AttackedText object where the word at index is replaced with a new word.

text_after_word_index(i)[source]

Returns the text after the end of word at index i.

text_until_word_index(i)[source]

Returns the text before the beginning of word at index i.

text_window_around_index(index, window_size)[source]

The text window of window_size words centered around index.

words_diff_num(other_attacked_text)[source]
words_diff_ratio(x)[source]

Get the ratio of words difference between current text and x.

Note that current text and x must have same number of words.

SPLIT_TOKEN = '<SPLIT>'
property column_labels

Returns the labels for this text’s columns.

For single-sequence inputs, this simply returns [‘text’].

property num_words

Returns the number of words in the sequence.

property text

Represents full text input.

Multiply inputs are joined with a line break.

property tokenizer_input

The tuple of inputs to be passed to the tokenizer.

property words
property words_per_input

Returns a list of lists of words corresponding to each input.

Misc Checkpoints

The AttackCheckpoint class saves in-progress attacks and loads saved attacks from disk.

class textattack.shared.checkpoint.AttackCheckpoint(attack_args, attack_log_manager, worklist, worklist_candidates, chkpt_time=None)[source]

Bases: object

An object that stores necessary information for saving and loading checkpoints.

Parameters
  • attack_args (textattack.AttackArgs) – Arguments of the original attack

  • attack_log_manager (textattack.loggers.AttackLogManager) – Object for storing attack results

  • worklist (deque[int]) – List of examples that will be attacked. Examples are represented by their indicies within the dataset.

  • worklist_candidates (int) – List of other available examples we can attack. Used to get the next dataset element when attack_n=True.

  • chkpt_time (float) – epoch time representing when checkpoint was made

classmethod load(path)[source]
save(quiet=False)[source]
property dataset_offset

Calculate offset into the dataset to start from.

property datetime
property num_failed_attacks
property num_maximized_attacks
property num_remaining_attacks
property num_skipped_attacks
property num_successful_attacks
property results_count

Return number of attacks made so far.

Shared data fields

Lists of named entities: countries, nationalities, cities.

Lists of person names, first and last.

Misc Validators

Validators ensure compatibility between search methods, transformations, constraints, and goal functions.

textattack.shared.validators.transformation_consists_of(transformation, transformation_classes)[source]

Determines if transformation is or consists only of instances of a class in transformation_classes

textattack.shared.validators.transformation_consists_of_word_swaps(transformation)[source]

Determines if transformation is a word swap or consists of only word swaps.

textattack.shared.validators.transformation_consists_of_word_swaps_and_deletions(transformation)[source]

Determines if transformation is a word swap or consists of only word swaps and deletions.

textattack.shared.validators.validate_model_goal_function_compatibility(goal_function_class, model_class)[source]

Determines if model_class is task-compatible with goal_function_class.

For example, a text-generative model like one intended for translation or summarization would not be compatible with a goal function that requires probability scores, like the UntargetedGoalFunction.

textattack.shared.validators.validate_model_gradient_word_swap_compatibility(model)[source]

Determines if model is task-compatible with GradientBasedWordSwap.

We can only take the gradient with respect to an individual word if the model uses a word-based tokenizer.