textattack.transformations.word_swaps package
word_swaps package
Word Swap
Word swap transformations act by replacing some words in the input. Subclasses can implement the abstract WordSwap
class by overriding self._get_replacement_words
- class textattack.transformations.word_swaps.word_swap.WordSwap(letters_to_insert=None)[source]
Bases:
Transformation
An abstract class that takes a sentence and transforms it by replacing some of its words.
letters_to_insert (string): letters allowed for insertion into words (used by some char-based transformations)
Word Swap by Changing Location
- class textattack.transformations.word_swaps.word_swap_change_location.WordSwapChangeLocation(n=3, confidence_score=0.7, language='en', consistent=False, **kwargs)[source]
Bases:
WordSwap
- textattack.transformations.word_swaps.word_swap_change_location.idx_to_words(ls, words)[source]
Given a list generated from cluster_idx, return a list that contains sub-list (the first element being the idx, and the second element being the words corresponding to the idx)
Word Swap by Changing Name
- class textattack.transformations.word_swaps.word_swap_change_name.WordSwapChangeName(num_name_replacements=3, first_only=False, last_only=False, confidence_score=0.7, language='en', consistent=False, **kwargs)[source]
Bases:
WordSwap
Word Swap by Changing Number
- class textattack.transformations.word_swaps.word_swap_change_number.WordSwapChangeNumber(max_change=1, n=3, **kwargs)[source]
Bases:
WordSwap
- textattack.transformations.word_swaps.word_swap_change_number.idx_to_words(ls, words)[source]
Given a list generated from cluster_idx, return a list that contains sub-list (the first element being the idx, and the second element being the words corresponding to the idx)
Word Swap by Contraction
- class textattack.transformations.word_swaps.word_swap_contract.WordSwapContract(letters_to_insert=None)[source]
Bases:
WordSwap
Transforms an input by performing contraction on recognized combinations.
- reverse_contraction_map = {'I am': "I'm", 'I have': "I've", 'I will': "I'll", 'I would': "I'd", 'are not': "aren't", 'cannot': "can't", 'cannot have': "can't've", 'could have': "could've", 'could not': "couldn't", 'did not': "didn't", 'do not': "don't", 'does not': "doesn't", 'had not': "hadn't", 'has not': "hasn't", 'have not': "haven't", 'he is': "he's", 'he will': "he'll", 'he would': "he'd", 'he would have': "he'd've", 'how did': "how'd", 'how do you': "how'd'y", 'how is': "how's", 'how will': "how'll", 'i am': "i'm", 'i have': "i've", 'i will': "i'll", 'i would': "i'd", 'is not': "isn't", "isn't": "ain't", 'it is': "it's", 'it will': "it'll", 'it would': "it'd", 'madam': "ma'am", 'might have': "might've", 'might not': "mightn't", 'must have': "must've", 'must not': "mustn't", 'need not': "needn't", 'ought not': "oughtn't", 'shall not': "shan't", 'she is': "she's", 'she will': "she'll", 'she would': "she'd", 'should have': "should've", 'should not': "shouldn't", 'that is': "that's", 'that would': "that'd", 'there is': "there's", 'there would': "there'd", 'they are': "they're", 'they have': "they've", 'they will': "they'll", 'they would': "they'd", 'was not': "wasn't", 'we are': "we're", 'we have': "we've", 'we will': "we'll", 'we would': "we'd", 'were not': "weren't", 'what are': "what're", 'what is': "what's", 'when is': "when's", 'where did': "where'd", 'where have': "where've", 'where is': "where's", 'who have': "who've", 'who is': "who's", 'who will': "who'll", 'why is': "why's", 'will not': "won't", 'would have': "would've", 'would not': "wouldn't", 'you are': "you're", 'you have': "you've", 'you will': "you'll", 'you would': "you'd", 'you would have': "you'd've"}
Word Swap by Embedding
Based on paper: arxiv.org/abs/1603.00892
Paper title: Counter-fitting Word Vectors to Linguistic Constraints
- class textattack.transformations.word_swaps.word_swap_embedding.WordSwapEmbedding(max_candidates=15, embedding=None, **kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing its words with synonyms in the word embedding space.
- Parameters:
max_candidates (int) – maximum number of synonyms to pick
embedding (textattack.shared.AbstractWordEmbedding) – Wrapper for word embedding
>>> from textattack.transformations import WordSwapEmbedding >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapEmbedding() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- textattack.transformations.word_swaps.word_swap_embedding.recover_word_case(word, reference_word)[source]
Makes the case of word like the case of reference_word.
Supports lowercase, UPPERCASE, and Capitalized.
Word Swap by Extension
- class textattack.transformations.word_swaps.word_swap_extend.WordSwapExtend(letters_to_insert=None)[source]
Bases:
WordSwap
Transforms an input by performing extension on recognized combinations.
Word Swap by Gradient
- class textattack.transformations.word_swaps.word_swap_gradient_based.WordSwapGradientBased(model_wrapper, top_n=1)[source]
Bases:
WordSwap
Uses the model’s gradient to suggest replacements for a given word.
Based off of HotFlip: White-Box Adversarial Examples for Text Classification (Ebrahimi et al., 2018). https://arxiv.org/pdf/1712.06751.pdf
- Parameters:
model (nn.Module) – The model to attack. Model must have a word_embeddings matrix and convert_id_to_word function.
top_n (int) – the number of top words to return at each index
>>> from textattack.transformations import WordSwapGradientBased >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapGradientBased() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
Word Swap by Homoglyph
- class textattack.transformations.word_swaps.word_swap_homoglyph_swap.WordSwapHomoglyphSwap(random_one=False, **kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing its words with visually similar words using homoglyph swaps.
>>> from textattack.transformations import WordSwapHomoglyphSwap >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapHomoglyphSwap() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- property deterministic
Word Swap by OpenHowNet
- class textattack.transformations.word_swaps.word_swap_hownet.WordSwapHowNet(max_candidates=-1, **kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing its words with synonyms in the stored synonyms bank generated by the OpenHowNet.
- PATH = 'transformations/hownet'
- textattack.transformations.word_swaps.word_swap_hownet.recover_word_case(word, reference_word)[source]
Makes the case of word like the case of reference_word.
Supports lowercase, UPPERCASE, and Capitalized.
Word Swap by inflections
- class textattack.transformations.word_swaps.word_swap_inflections.WordSwapInflections(**kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing its words with their inflections.
For example, the inflections of ‘schedule’ are {‘schedule’, ‘schedules’, ‘scheduling’}.
Base on
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
.
Word Swap by BERT-Masked LM.
- class textattack.transformations.word_swaps.word_swap_masked_lm.WordSwapMaskedLM(method='bae', masked_language_model='bert-base-uncased', tokenizer=None, max_length=512, window_size=inf, max_candidates=50, min_confidence=0.0005, batch_size=16, **kwargs)[source]
Bases:
WordSwap
Generate potential replacements for a word using a masked language model.
- Based off of following papers
“Robustness to Modification with Shared Words in Paraphrase Identification” (Shi et al., 2019) https://arxiv.org/abs/1909.02560
“BAE: BERT-based Adversarial Examples for Text Classification” (Garg et al., 2020) https://arxiv.org/abs/2004.01970
“BERT-ATTACK: Adversarial Attack Against BERT Using BERT” (Li et al, 2020) https://arxiv.org/abs/2004.09984
“CLARE: Contextualized Perturbation for Textual Adversarial Attack” (Li et al, 2020): https://arxiv.org/abs/2009.07502
BAE and CLARE simply masks the word we want to replace and selects replacements predicted by the masked language model.
- BERT-Attack instead performs replacement on token level. For words that are consisted of two or more sub-word tokens,
it takes the top-K replacements for seach sub-word token and produces all possible combinations of the top replacments. Then, it selects the top-K combinations based on their perplexity calculated using the masked language model.
Choose which method to use by specifying “bae” or “bert-attack” for method argument.
- Parameters:
method (str) – the name of replacement method (e.g. “bae”, “bert-attack”)
masked_language_model (Union[str|transformers.AutoModelForMaskedLM]) – Either the name of pretrained masked language model from transformers model hub or the actual model. Default is bert-base-uncased.
tokenizer (obj) – The tokenizer of the corresponding model. If you passed in name of a pretrained model for masked_language_model, you can skip this argument as the correct tokenizer can be infered from the name. However, if you’re passing the actual model, you must provide a tokenizer.
max_length (int) – the max sequence length the masked language model is designed to work with. Default is 512.
window_size (int) – The number of surrounding words to include when making top word prediction. For each word to swap, we take window_size // 2 words to the left and window_size // 2 words to the right and pass the text within the window to the masked language model. Default is float(“inf”), which is equivalent to using the whole text.
max_candidates (int) – maximum number of candidates to consider as replacements for each word. Replacements are ranked by model’s confidence.
min_confidence (float) – minimum confidence threshold each replacement word must pass.
batch_size (int) – Size of batch for “bae” replacement method.
- textattack.transformations.word_swaps.word_swap_masked_lm.recover_word_case(word, reference_word)[source]
Makes the case of word like the case of reference_word.
Supports lowercase, UPPERCASE, and Capitalized.
Word Swap by Neighboring Character Swap
- class textattack.transformations.word_swaps.word_swap_neighboring_character_swap.WordSwapNeighboringCharacterSwap(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing its words with a neighboring character swap.
- Parameters:
random_one (bool) – Whether to return a single word with two characters swapped. If not, returns all possible options.
skip_first_char (bool) – Whether to disregard perturbing the first character.
skip_last_char (bool) – Whether to disregard perturbing the last character.
>>> from textattack.transformations import WordSwapNeighboringCharacterSwap >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapNeighboringCharacterSwap() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- property deterministic
Word Swap by swaps characters with QWERTY adjacent keys
- class textattack.transformations.word_swaps.word_swap_qwerty.WordSwapQWERTY(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]
Bases:
WordSwap
- property deterministic
Word Swap by Random Character Deletion
- class textattack.transformations.word_swaps.word_swap_random_character_deletion.WordSwapRandomCharacterDeletion(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]
Bases:
WordSwap
Transforms an input by deleting its characters.
- Parameters:
random_one (bool) – Whether to return a single word with a random character deleted. If not, returns all possible options.
skip_first_char (bool) – Whether to disregard deleting the first character.
skip_last_char (bool) – Whether to disregard deleting the last character.
>>> from textattack.transformations import WordSwapRandomCharacterDeletion >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapRandomCharacterDeletion() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- property deterministic
Word Swap by Random Character Insertion
- class textattack.transformations.word_swaps.word_swap_random_character_insertion.WordSwapRandomCharacterInsertion(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]
Bases:
WordSwap
Transforms an input by inserting a random character.
random_one (bool): Whether to return a single word with a random character deleted. If not, returns all possible options. skip_first_char (bool): Whether to disregard inserting as the first character. skip_last_char (bool): Whether to disregard inserting as the last character. >>> from textattack.transformations import WordSwapRandomCharacterInsertion >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapRandomCharacterInsertion() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- property deterministic
Word Swap by Random Character Substitution
- class textattack.transformations.word_swaps.word_swap_random_character_substitution.WordSwapRandomCharacterSubstitution(random_one=True, **kwargs)[source]
Bases:
WordSwap
Transforms an input by replacing one character in a word with a random new character.
- Parameters:
random_one (bool) – Whether to return a single word with a random character deleted. If not set, returns all possible options.
>>> from textattack.transformations import WordSwapRandomCharacterSubstitution >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapRandomCharacterSubstitution() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)
- property deterministic
Word Swap by swapping synonyms in WordNet
- class textattack.transformations.word_swaps.word_swap_wordnet.WordSwapWordNet(language='eng')[source]
Bases:
WordSwap
Transforms an input by replacing its words with synonyms provided by WordNet.
>>> from textattack.transformations import WordSwapWordNet >>> from textattack.augmentation import Augmenter
>>> transformation = WordSwapWordNet() >>> augmenter = Augmenter(transformation=transformation) >>> s = 'I am fabulous.' >>> augmenter.augment(s)