Word Insertion transformations act by inserting a new word at a specific word index.
For example, if we insert “new” in position 3 in the text “I like the movie”, we get “I like the new movie”.
Subclasses can implement the abstract
WordInsertion class by overriding
- class textattack.transformations.word_insertions.word_insertion.WordInsertion
A base class for word insertions.
- class textattack.transformations.word_insertions.word_insertion_masked_lm.WordInsertionMaskedLM(masked_language_model='bert-base-uncased', tokenizer=None, max_length=512, window_size=inf, max_candidates=50, min_confidence=0.0005, batch_size=16)
Generate potential insertion for a word using a masked language model.
Based off of: CLARE: Contextualized Perturbation for Textual Adversarial Attack” (Li et al, 2020): https://arxiv.org/abs/2009.07502
masked_language_model (Union[str|transformers.AutoModelForMaskedLM]) – Either the name of pretrained masked language model from transformers model hub or the actual model. Default is bert-base-uncased.
tokenizer (obj) – The tokenizer of the corresponding model. If you passed in name of a pretrained model for masked_language_model, you can skip this argument as the correct tokenizer can be infered from the name. However, if you’re passing the actual model, you must provide a tokenizer.
max_length (int) – the max sequence length the masked language model is designed to work with. Default is 512.
window_size (int) – The number of surrounding words to include when making top word prediction. For each position to insert we take window_size // 2 words to the left and window_size // 2 words to the right and pass the text within the window to the masked language model. Default is float(“inf”), which is equivalent to using the whole text.
max_candidates (int) – maximum number of candidates to consider inserting for each position. Replacements are ranked by model’s confidence.
min_confidence (float) – minimum confidence threshold each new word must pass.
random synonym insertation Transformation
- class textattack.transformations.word_insertions.word_insertion_random_synonym.WordInsertionRandomSynonym
Transformation that inserts synonyms of words that are already in the sequence.
- property deterministic