textattack.transformations.word_insertions package

word_insertions package

Submodules

WordInsertion Class

Word Insertion transformations act by inserting a new word at a specific word index. For example, if we insert “new” in position 3 in the text “I like the movie”, we get “I like the new movie”. Subclasses can implement the abstract WordInsertion class by overriding self._get_new_words.

class textattack.transformations.word_insertions.word_insertion.WordInsertion[source]

Bases: Transformation

A base class for word insertions.

WordInsertionMaskedLM Class

class textattack.transformations.word_insertions.word_insertion_masked_lm.WordInsertionMaskedLM(masked_language_model='bert-base-uncased', tokenizer=None, max_length=512, window_size=inf, max_candidates=50, min_confidence=0.0005, batch_size=16)[source]

Bases: WordInsertion

Generate potential insertion for a word using a masked language model.

Based off of: CLARE: Contextualized Perturbation for Textual Adversarial Attack” (Li et al, 2020): https://arxiv.org/abs/2009.07502

Parameters:

masked_language_model (Union[str|transformers.AutoModelForMaskedLM]) – Either the name of pretrained masked language model from transformers model hub or the actual model. Default is bert-base-uncased.
tokenizer (obj) – The tokenizer of the corresponding model. If you passed in name of a pretrained model for masked_language_model, you can skip this argument as the correct tokenizer can be infered from the name. However, if you’re passing the actual model, you must provide a tokenizer.
max_length (int) – the max sequence length the masked language model is designed to work with. Default is 512.
window_size (int) – The number of surrounding words to include when making top word prediction. For each position to insert we take window_size // 2 words to the left and window_size // 2 words to the right and pass the text within the window to the masked language model. Default is float(“inf”), which is equivalent to using the whole text.
max_candidates (int) – maximum number of candidates to consider inserting for each position. Replacements are ranked by model’s confidence.
min_confidence (float) – minimum confidence threshold each new word must pass.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

WordInsertionRandomSynonym Class

random synonym insertation Transformation

class textattack.transformations.word_insertions.word_insertion_random_synonym.WordInsertionRandomSynonym[source]

Bases: WordInsertion

Transformation that inserts synonyms of words that are already in the sequence.

property deterministic

textattack.transformations.word_insertions.word_insertion_random_synonym.check_if_one_word(word)[source]