textattack.augmentation package
TextAttack augmentation package:
Transformations and constraints can be used outside of an attack for simple NLP data augmentation with the Augmenter
class that returns all possible transformations for a given string.
Augmenter Class
- class textattack.augmentation.augmenter.Augmenter(transformation, constraints=[], pct_words_to_swap=0.1, transformations_per_example=1, high_yield=False, fast_augment=False, enable_advanced_metrics=False)[source]
Bases:
object
A class for performing data augmentation using TextAttack.
- Returns all possible transformations for a given string. Currently only
supports transformations which are word swaps.
- Parameters:
transformation (textattack.Transformation) – the transformation that suggests new texts from an input.
constraints – (list(textattack.Constraint)): constraints that each transformation must meet
pct_words_to_swap – (float): [0., 1.], percentage of words to swap per augmented example
transformations_per_example – (int): Maximum number of augmentations per input
high_yield – Whether to return a set of augmented texts that will be relatively similar, or to return only a single one.
fast_augment – Stops additional transformation runs when number of successful augmentations reaches transformations_per_example
advanced_metrics – return perplexity and USE Score of augmentation
- Example::
>>> from textattack.transformations import WordSwapRandomCharacterDeletion, WordSwapQWERTY, CompositeTransformation >>> from textattack.constraints.pre_transformation import RepeatModification, StopwordModification >>> from textattack.augmentation import Augmenter
>>> transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()]) >>> constraints = [RepeatModification(), StopwordModification()]
>>> # initiate augmenter >>> augmenter = Augmenter( ... transformation=transformation, ... constraints=constraints, ... pct_words_to_swap=0.5, ... transformations_per_example=3 ... )
>>> # additional parameters can be modified if not during initiation >>> augmenter.enable_advanced_metrics = True >>> augmenter.fast_augment = True >>> augmenter.high_yield = True
>>> s = 'What I cannot create, I do not understand.' >>> results = augmenter.augment(s)
>>> augmentations = results[0] >>> perplexity_score = results[1] >>> use_score = results[2]
- augment_many(text_list, show_progress=False)[source]
Returns all possible augmentations of a list of strings according to
self.transformation
.- Parameters:
text_list (list(string)) – a list of strings for data augmentation
Returns a list(string) of augmented texts. :param show_progress: show process during augmentation
Augmenter Recipes:
Transformations and constraints can be used for simple NLP data augmentations. Here is a list of recipes for NLP data augmentations
- class textattack.augmentation.recipes.BackTranslationAugmenter(**kwargs)[source]
Bases:
Augmenter
Sentence level augmentation that uses MarianMTModel to back-translate.
- class textattack.augmentation.recipes.CLAREAugmenter(model='distilroberta-base', tokenizer='distilroberta-base', **kwargs)[source]
Bases:
Augmenter
Li, Zhang, Peng, Chen, Brockett, Sun, Dolan.
“Contextualized Perturbation for Textual Adversarial Attack” (Li et al., 2020)
https://arxiv.org/abs/2009.07502
CLARE builds on a pre-trained masked language model and modifies the inputs in a contextaware manner. We propose three contextualized perturbations, Replace, Insert and Merge, allowing for generating outputs of varied lengths.
- class textattack.augmentation.recipes.CharSwapAugmenter(**kwargs)[source]
Bases:
Augmenter
Augments words by swapping characters out for other characters.
- class textattack.augmentation.recipes.CheckListAugmenter(**kwargs)[source]
Bases:
Augmenter
Augments words by using the transformation methods provided by CheckList INV testing, which combines:
Name Replacement
Location Replacement
Number Alteration
Contraction/Extension
“Beyond Accuracy: Behavioral Testing of NLP models with CheckList” (Ribeiro et al., 2020) https://arxiv.org/abs/2005.04118
- class textattack.augmentation.recipes.EasyDataAugmenter(pct_words_to_swap=0.1, transformations_per_example=4)[source]
Bases:
Augmenter
An implementation of Easy Data Augmentation, which combines:
- WordNet synonym replacement
Randomly replace words with their synonyms.
- Word deletion
Randomly remove words from the sentence.
- Word order swaps
Randomly swap the position of words in the sentence.
- Random synonym insertion
Insert a random synonym of a random word at a random location.
in one augmentation method.
“EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks” (Wei and Zou, 2019) https://arxiv.org/abs/1901.11196