textattack.augmentation package

TextAttack augmentation package:

Transformations and constraints can be used outside of an attack for simple NLP data augmentation with the Augmenter class that returns all possible transformations for a given string.

Augmenter Class

class textattack.augmentation.augmenter.AugmentationResult(text1, text2)[source]

Bases: object

class tempResult(text)[source]

Bases: object

class textattack.augmentation.augmenter.Augmenter(transformation, constraints=[], pct_words_to_swap=0.1, transformations_per_example=1, high_yield=False, fast_augment=False, enable_advanced_metrics=False)[source]

Bases: object

A class for performing data augmentation using TextAttack.

Returns all possible transformations for a given string. Currently only

supports transformations which are word swaps.

Parameters:
  • transformation (textattack.Transformation) – the transformation that suggests new texts from an input.

  • constraints – (list(textattack.Constraint)): constraints that each transformation must meet

  • pct_words_to_swap – (float): [0., 1.], percentage of words to swap per augmented example

  • transformations_per_example – (int): Maximum number of augmentations per input

  • high_yield – Whether to return a set of augmented texts that will be relatively similar, or to return only a single one.

  • fast_augment – Stops additional transformation runs when number of successful augmentations reaches transformations_per_example

  • advanced_metrics – return perplexity and USE Score of augmentation

Example::
>>> from textattack.transformations import WordSwapRandomCharacterDeletion, WordSwapQWERTY, CompositeTransformation
>>> from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
>>> from textattack.augmentation import Augmenter
>>> transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()])
>>> constraints = [RepeatModification(), StopwordModification()]
>>> # initiate augmenter
>>> augmenter = Augmenter(
...     transformation=transformation,
...     constraints=constraints,
...     pct_words_to_swap=0.5,
...     transformations_per_example=3
... )
>>> # additional parameters can be modified if not during initiation
>>> augmenter.enable_advanced_metrics = True
>>> augmenter.fast_augment = True
>>> augmenter.high_yield = True
>>> s = 'What I cannot create, I do not understand.'
>>> results = augmenter.augment(s)
>>> augmentations = results[0]
>>> perplexity_score = results[1]
>>> use_score = results[2]
augment(text)[source]

Returns all possible augmentations of text according to self.transformation.

augment_many(text_list, show_progress=False)[source]

Returns all possible augmentations of a list of strings according to self.transformation.

Parameters:

text_list (list(string)) – a list of strings for data augmentation

Returns a list(string) of augmented texts. :param show_progress: show process during augmentation

augment_text_with_ids(text_list, id_list, show_progress=True)[source]

Supplements a list of text with more text data.

Returns the augmented text along with the corresponding IDs for each augmented example.

Augmenter Recipes:

Transformations and constraints can be used for simple NLP data augmentations. Here is a list of recipes for NLP data augmentations

class textattack.augmentation.recipes.BackTranscriptionAugmenter(**kwargs)[source]

Bases: Augmenter

Sentence level augmentation that uses back transcription (TTS+ASR).

class textattack.augmentation.recipes.BackTranslationAugmenter(**kwargs)[source]

Bases: Augmenter

Sentence level augmentation that uses MarianMTModel to back-translate.

https://huggingface.co/transformers/model_doc/marian.html

class textattack.augmentation.recipes.CLAREAugmenter(model='distilroberta-base', tokenizer='distilroberta-base', **kwargs)[source]

Bases: Augmenter

Li, Zhang, Peng, Chen, Brockett, Sun, Dolan.

“Contextualized Perturbation for Textual Adversarial Attack” (Li et al., 2020)

https://arxiv.org/abs/2009.07502

CLARE builds on a pre-trained masked language model and modifies the inputs in a contextaware manner. We propose three contextualized perturbations, Replace, Insert and Merge, allowing for generating outputs of varied lengths.

class textattack.augmentation.recipes.CharSwapAugmenter(**kwargs)[source]

Bases: Augmenter

Augments words by swapping characters out for other characters.

class textattack.augmentation.recipes.CheckListAugmenter(**kwargs)[source]

Bases: Augmenter

Augments words by using the transformation methods provided by CheckList INV testing, which combines:

  • Name Replacement

  • Location Replacement

  • Number Alteration

  • Contraction/Extension

“Beyond Accuracy: Behavioral Testing of NLP models with CheckList” (Ribeiro et al., 2020) https://arxiv.org/abs/2005.04118

class textattack.augmentation.recipes.DeletionAugmenter(**kwargs)[source]

Bases: Augmenter

class textattack.augmentation.recipes.EasyDataAugmenter(pct_words_to_swap=0.1, transformations_per_example=4)[source]

Bases: Augmenter

An implementation of Easy Data Augmentation, which combines:

  • WordNet synonym replacement
    • Randomly replace words with their synonyms.

  • Word deletion
    • Randomly remove words from the sentence.

  • Word order swaps
    • Randomly swap the position of words in the sentence.

  • Random synonym insertion
    • Insert a random synonym of a random word at a random location.

in one augmentation method.

“EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks” (Wei and Zou, 2019) https://arxiv.org/abs/1901.11196

augment(text)[source]

Returns all possible augmentations of text according to self.transformation.

class textattack.augmentation.recipes.EmbeddingAugmenter(**kwargs)[source]

Bases: Augmenter

Augments text by transforming words with their embeddings.

class textattack.augmentation.recipes.SwapAugmenter(**kwargs)[source]

Bases: Augmenter

class textattack.augmentation.recipes.SynonymInsertionAugmenter(**kwargs)[source]

Bases: Augmenter

class textattack.augmentation.recipes.WordNetAugmenter(**kwargs)[source]

Bases: Augmenter

Augments text by replacing with synonyms from the WordNet thesaurus.