Word Insertion transformations act by inserting a new word at a specific word index.
For example, if we insert “new” in position 3 in the text “I like the movie”, we get “I like the new movie”.
Subclasses can implement the abstract
WordInsertion class by overriding
A base class for word insertions.
WordInsertionMaskedLM(masked_language_model='bert-base-uncased', tokenizer=None, max_length=512, window_size=inf, max_candidates=50, min_confidence=0.0005, batch_size=16)¶
Generate potential insertion for a word using a masked language model.
Based off of: CLARE: Contextualized Perturbation for Textual Adversarial Attack” (Li et al, 2020): https://arxiv.org/abs/2009.07502
- masked_language_model (Union[str|transformers.AutoModelForMaskedLM]) – Either the name of pretrained masked language model from transformers model hub or the actual model. Default is bert-base-uncased.
- tokenizer (obj) – The tokenizer of the corresponding model. If you passed in name of a pretrained model for masked_language_model, you can skip this argument as the correct tokenizer can be infered from the name. However, if you’re passing the actual model, you must provide a tokenizer.
- max_length (int) – the max sequence length the masked language model is designed to work with. Default is 512.
- window_size (int) – The number of surrounding words to include when making top word prediction. For each position to insert we take window_size // 2 words to the left and window_size // 2 words to the right and pass the text within the window to the masked language model. Default is float(“inf”), which is equivalent to using the whole text.
- max_candidates (int) – maximum number of candidates to consider inserting for each position. Replacements are ranked by model’s confidence.
- min_confidence (float) – minimum confidence threshold each new word must pass.
extra fields to be included in the representation of a class.
random synonym insertation Transformation
Transformation that inserts synonyms of words that are already in the sequence.