textattack.transformations.word_swaps package

word_swaps package

Word Swap

Word swap transformations act by replacing some words in the input. Subclasses can implement the abstract WordSwap class by overriding self._get_replacement_words

class textattack.transformations.word_swaps.word_swap.WordSwap(letters_to_insert=None)[source]

Bases: Transformation

An abstract class that takes a sentence and transforms it by replacing some of its words.

letters_to_insert (string): letters allowed for insertion into words (used by some char-based transformations)

Word Swap by Changing Location

class textattack.transformations.word_swaps.word_swap_change_location.WordSwapChangeLocation(n=3, confidence_score=0.7, language='en', consistent=False, **kwargs)[source]: Bases: WordSwap

textattack.transformations.word_swaps.word_swap_change_location.idx_to_words(ls, words)[source]: Given a list generated from cluster_idx, return a list that contains sub-list (the first element being the idx, and the second element being the words corresponding to the idx)

Word Swap by Changing Name

class textattack.transformations.word_swaps.word_swap_change_name.WordSwapChangeName(num_name_replacements=3, first_only=False, last_only=False, confidence_score=0.7, language='en', consistent=False, **kwargs)[source]: Bases: WordSwap

Word Swap by Changing Number

class textattack.transformations.word_swaps.word_swap_change_number.WordSwapChangeNumber(max_change=1, n=3, **kwargs)[source]: Bases: WordSwap

textattack.transformations.word_swaps.word_swap_change_number.idx_to_words(ls, words)[source]: Given a list generated from cluster_idx, return a list that contains sub-list (the first element being the idx, and the second element being the words corresponding to the idx)

Word Swap by Contraction

class textattack.transformations.word_swaps.word_swap_contract.WordSwapContract(letters_to_insert=None)[source]

Bases: WordSwap

Transforms an input by performing contraction on recognized combinations.

reverse_contraction_map = {'I am': "I'm", 'I have': "I've", 'I will': "I'll", 'I would': "I'd", 'are not': "aren't", 'cannot': "can't", 'cannot have': "can't've", 'could have': "could've", 'could not': "couldn't", 'did not': "didn't", 'do not': "don't", 'does not': "doesn't", 'had not': "hadn't", 'has not': "hasn't", 'have not': "haven't", 'he is': "he's", 'he will': "he'll", 'he would': "he'd", 'he would have': "he'd've", 'how did': "how'd", 'how do you': "how'd'y", 'how is': "how's", 'how will': "how'll", 'i am': "i'm", 'i have': "i've", 'i will': "i'll", 'i would': "i'd", 'is not': "isn't", "isn't": "ain't", 'it is': "it's", 'it will': "it'll", 'it would': "it'd", 'madam': "ma'am", 'might have': "might've", 'might not': "mightn't", 'must have': "must've", 'must not': "mustn't", 'need not': "needn't", 'ought not': "oughtn't", 'shall not': "shan't", 'she is': "she's", 'she will': "she'll", 'she would': "she'd", 'should have': "should've", 'should not': "shouldn't", 'that is': "that's", 'that would': "that'd", 'there is': "there's", 'there would': "there'd", 'they are': "they're", 'they have': "they've", 'they will': "they'll", 'they would': "they'd", 'was not': "wasn't", 'we are': "we're", 'we have': "we've", 'we will': "we'll", 'we would': "we'd", 'were not': "weren't", 'what are': "what're", 'what is': "what's", 'when is': "when's", 'where did': "where'd", 'where have': "where've", 'where is': "where's", 'who have': "who've", 'who is': "who's", 'who will': "who'll", 'why is': "why's", 'will not': "won't", 'would have': "would've", 'would not': "wouldn't", 'you are': "you're", 'you have': "you've", 'you will': "you'll", 'you would': "you'd", 'you would have': "you'd've"}

Word Swap by Embedding

Based on paper: arxiv.org/abs/1603.00892

Paper title: Counter-fitting Word Vectors to Linguistic Constraints

class textattack.transformations.word_swaps.word_swap_embedding.WordSwapEmbedding(max_candidates=15, embedding=None, **kwargs)[source]

Bases: WordSwap

Transforms an input by replacing its words with synonyms in the word embedding space.

Parameters:

max_candidates (int) – maximum number of synonyms to pick
embedding (textattack.shared.AbstractWordEmbedding) – Wrapper for word embedding

>>> from textattack.transformations import WordSwapEmbedding
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapEmbedding()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

textattack.transformations.word_swaps.word_swap_embedding.recover_word_case(word, reference_word)[source]

Makes the case of word like the case of reference_word.

Supports lowercase, UPPERCASE, and Capitalized.

Word Swap by Extension

class textattack.transformations.word_swaps.word_swap_extend.WordSwapExtend(letters_to_insert=None)[source]

Bases: WordSwap

Transforms an input by performing extension on recognized combinations.

Word Swap by Gradient

class textattack.transformations.word_swaps.word_swap_gradient_based.WordSwapGradientBased(model_wrapper, top_n=1)[source]

Bases: WordSwap

Uses the model’s gradient to suggest replacements for a given word.

Based off of HotFlip: White-Box Adversarial Examples for Text Classification (Ebrahimi et al., 2018). https://arxiv.org/pdf/1712.06751.pdf

Parameters:

model (nn.Module) – The model to attack. Model must have a word_embeddings matrix and convert_id_to_word function.
top_n (int) – the number of top words to return at each index

>>> from textattack.transformations import WordSwapGradientBased
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapGradientBased()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

Word Swap by Homoglyph

class textattack.transformations.word_swaps.word_swap_homoglyph_swap.WordSwapHomoglyphSwap(random_one=False, **kwargs)[source]

Bases: WordSwap

Transforms an input by replacing its words with visually similar words using homoglyph swaps.

>>> from textattack.transformations import WordSwapHomoglyphSwap
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapHomoglyphSwap()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

property deterministic

Word Swap by OpenHowNet

class textattack.transformations.word_swaps.word_swap_hownet.WordSwapHowNet(max_candidates=-1, **kwargs)[source]

Bases: WordSwap

Transforms an input by replacing its words with synonyms in the stored synonyms bank generated by the OpenHowNet.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

PATH = 'transformations/hownet'

textattack.transformations.word_swaps.word_swap_hownet.recover_word_case(word, reference_word)[source]

Makes the case of word like the case of reference_word.

Supports lowercase, UPPERCASE, and Capitalized.

Word Swap by inflections

class textattack.transformations.word_swaps.word_swap_inflections.WordSwapInflections(**kwargs)[source]

Bases: WordSwap

Transforms an input by replacing its words with their inflections.

For example, the inflections of ‘schedule’ are {‘schedule’, ‘schedules’, ‘scheduling’}.

Base on It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations.

Paper URL

Word Swap by BERT-Masked LM.

class textattack.transformations.word_swaps.word_swap_masked_lm.WordSwapMaskedLM(method='bae', masked_language_model='bert-base-uncased', tokenizer=None, max_length=512, window_size=inf, max_candidates=50, min_confidence=0.0005, batch_size=16, **kwargs)[source]

Bases: WordSwap

Generate potential replacements for a word using a masked language model.

Based off of following papers

“Robustness to Modification with Shared Words in Paraphrase Identification” (Shi et al., 2019) https://arxiv.org/abs/1909.02560
“BAE: BERT-based Adversarial Examples for Text Classification” (Garg et al., 2020) https://arxiv.org/abs/2004.01970
“BERT-ATTACK: Adversarial Attack Against BERT Using BERT” (Li et al, 2020) https://arxiv.org/abs/2004.09984
“CLARE: Contextualized Perturbation for Textual Adversarial Attack” (Li et al, 2020): https://arxiv.org/abs/2009.07502

BAE and CLARE simply masks the word we want to replace and selects replacements predicted by the masked language model.

BERT-Attack instead performs replacement on token level. For words that are consisted of two or more sub-word tokens,: it takes the top-K replacements for seach sub-word token and produces all possible combinations of the top replacments. Then, it selects the top-K combinations based on their perplexity calculated using the masked language model.

Choose which method to use by specifying “bae” or “bert-attack” for method argument.

Parameters:

method (str) – the name of replacement method (e.g. “bae”, “bert-attack”)
masked_language_model (Union[str|transformers.AutoModelForMaskedLM]) – Either the name of pretrained masked language model from transformers model hub or the actual model. Default is bert-base-uncased.
tokenizer (obj) – The tokenizer of the corresponding model. If you passed in name of a pretrained model for masked_language_model, you can skip this argument as the correct tokenizer can be infered from the name. However, if you’re passing the actual model, you must provide a tokenizer.
max_length (int) – the max sequence length the masked language model is designed to work with. Default is 512.
window_size (int) – The number of surrounding words to include when making top word prediction. For each word to swap, we take window_size // 2 words to the left and window_size // 2 words to the right and pass the text within the window to the masked language model. Default is float(“inf”), which is equivalent to using the whole text.
max_candidates (int) – maximum number of candidates to consider as replacements for each word. Replacements are ranked by model’s confidence.
min_confidence (float) – minimum confidence threshold each replacement word must pass.
batch_size (int) – Size of batch for “bae” replacement method.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

textattack.transformations.word_swaps.word_swap_masked_lm.recover_word_case(word, reference_word)[source]

Makes the case of word like the case of reference_word.

Supports lowercase, UPPERCASE, and Capitalized.

Word Swap by Neighboring Character Swap

class textattack.transformations.word_swaps.word_swap_neighboring_character_swap.WordSwapNeighboringCharacterSwap(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]

Bases: WordSwap

Transforms an input by replacing its words with a neighboring character swap.

Parameters:

random_one (bool) – Whether to return a single word with two characters swapped. If not, returns all possible options.
skip_first_char (bool) – Whether to disregard perturbing the first character.
skip_last_char (bool) – Whether to disregard perturbing the last character.

>>> from textattack.transformations import WordSwapNeighboringCharacterSwap
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapNeighboringCharacterSwap()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

property deterministic

Word Swap by swaps characters with QWERTY adjacent keys

class textattack.transformations.word_swaps.word_swap_qwerty.WordSwapQWERTY(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]

Bases: WordSwap

property deterministic

Word Swap by Random Character Deletion

class textattack.transformations.word_swaps.word_swap_random_character_deletion.WordSwapRandomCharacterDeletion(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]

Bases: WordSwap

Transforms an input by deleting its characters.

Parameters:

random_one (bool) – Whether to return a single word with a random character deleted. If not, returns all possible options.
skip_first_char (bool) – Whether to disregard deleting the first character.
skip_last_char (bool) – Whether to disregard deleting the last character.

>>> from textattack.transformations import WordSwapRandomCharacterDeletion
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapRandomCharacterDeletion()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

property deterministic

Word Swap by Random Character Insertion

class textattack.transformations.word_swaps.word_swap_random_character_insertion.WordSwapRandomCharacterInsertion(random_one=True, skip_first_char=False, skip_last_char=False, **kwargs)[source]

Bases: WordSwap

Transforms an input by inserting a random character.

random_one (bool): Whether to return a single word with a random character deleted. If not, returns all possible options. skip_first_char (bool): Whether to disregard inserting as the first character. skip_last_char (bool): Whether to disregard inserting as the last character. >>> from textattack.transformations import WordSwapRandomCharacterInsertion >>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapRandomCharacterInsertion()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

property deterministic

Word Swap by Random Character Substitution

class textattack.transformations.word_swaps.word_swap_random_character_substitution.WordSwapRandomCharacterSubstitution(random_one=True, **kwargs)[source]

Bases: WordSwap

Transforms an input by replacing one character in a word with a random new character.

Parameters:: random_one (bool) – Whether to return a single word with a random character deleted. If not set, returns all possible options.

>>> from textattack.transformations import WordSwapRandomCharacterSubstitution
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapRandomCharacterSubstitution()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

property deterministic

Word Swap by swapping synonyms in WordNet

class textattack.transformations.word_swaps.word_swap_wordnet.WordSwapWordNet(language='eng')[source]

Bases: WordSwap

Transforms an input by replacing its words with synonyms provided by WordNet.

>>> from textattack.transformations import WordSwapWordNet
>>> from textattack.augmentation import Augmenter

>>> transformation = WordSwapWordNet()
>>> augmenter = Augmenter(transformation=transformation)
>>> s = 'I am fabulous.'
>>> augmenter.augment(s)