textattack.shared.utils package

class textattack.shared.utils.importing.LazyLoader(local_name, parent_module_globals, name)[source]

Bases: module

Lazily import a module, mainly to avoid pulling in large dependencies.

This allows them to only be loaded when they are used.

textattack.shared.utils.importing.load_module_from_file(file_path)[source]

Uses importlib to dynamically open a file and load an object from it.

textattack.shared.utils.install.download_from_s3(folder_name, skip_if_cached=True)[source]

Folder name will be saved as <cache_dir>/textattack/<folder_name>. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters:
  • folder_name (str) – path to folder or file in cache

  • skip_if_cached (bool) – If True, skip downloading if content is already cached.

Returns:

path to the downloaded folder or file on disk

Return type:

str

textattack.shared.utils.install.download_from_url(url, save_path, skip_if_cached=True)[source]

Downloaded file will be saved under <cache_dir>/textattack/<save_path>. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters:
  • url (str) – URL path from which to download.

  • save_path (str) – path to which to save the downloaded content.

  • skip_if_cached (bool) – If True, skip downloading if content is already cached.

Returns:

path to the downloaded folder or file on disk

Return type:

str

textattack.shared.utils.install.http_get(url, out_file, proxies=None)[source]

Get contents of a URL and save to a file.

https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py

textattack.shared.utils.install.path_in_cache(file_path)[source]
textattack.shared.utils.install.s3_url(uri)[source]
textattack.shared.utils.install.set_cache_dir(cache_dir)[source]

Sets all relevant cache directories to TA_CACHE_DIR.

textattack.shared.utils.install.unzip_file(path_to_zip_file, unzipped_folder_path)[source]

Unzips a .zip file to folder path.

textattack.shared.utils.misc.get_textattack_model_num_labels(model_name, model_path)[source]

Reads train_args.json and gets the number of labels for a trained model, if present.

textattack.shared.utils.misc.hashable(key)[source]
textattack.shared.utils.misc.html_style_from_dict(style_dict)[source]

Turns.

{ ‘color’: ‘red’, ‘height’: ‘100px’}

into

style: “color: red; height: 100px”

textattack.shared.utils.misc.html_table_from_rows(rows, title=None, header=None, style_dict=None)[source]
textattack.shared.utils.misc.load_textattack_model_from_path(model_name, model_path)[source]

Loads a pre-trained TextAttack model from its name and path.

For example, model_name “lstm-yelp” and model path “models/classification/lstm/yelp”.

textattack.shared.utils.misc.set_seed(random_seed)[source]
textattack.shared.utils.misc.sigmoid(n)[source]
class textattack.shared.utils.strings.ANSI_ESCAPE_CODES[source]

Bases: object

Escape codes for printing color to the terminal.

BOLD = '\x1b[1m'
BROWN = '\x1b[38:5:52m'
CYAN = '\x1b[96m'
FAIL = '\x1b[91m'
GRAY = '\x1b[38:5:240m'
HEADER = '\x1b[95m'
OKBLUE = '\x1b[94m'
OKGREEN = '\x1b[92m'
ORANGE = '\x1b[38:5:208m'
PINK = '\x1b[95m'
PURPLE = '\x1b[35m'
STOP = '\x1b[0m'
UNDERLINE = '\x1b[4m'

This color stops the current color sequence.

WARNING = '\x1b[93m'
YELLOW = '\x1b[93m'
class textattack.shared.utils.strings.ReprMixin[source]

Bases: object

Mixin for enhanced __repr__ and __str__.

extra_repr_keys()[source]

Extra fields to be included in the representation of a class.

class textattack.shared.utils.strings.TextAttackFlairTokenizer[source]

Bases: Tokenizer

tokenize(text: str)[source]
textattack.shared.utils.strings.add_indent(s_, numSpaces)[source]
textattack.shared.utils.strings.check_if_punctuations(word)[source]

Returns True if word is just a sequence of punctuations.

textattack.shared.utils.strings.check_if_subword(token, model_type, starting=False)[source]

Check if token is a subword token that is not a standalone word.

Parameters:
  • token (str) – token to check.

  • model_type (str) – type of model (options: “bert”, “roberta”, “xlnet”).

  • starting (bool) – Should be set True if this token is the starting token of the overall text. This matters because models like RoBERTa does not add “Ġ” to beginning token.

Returns:

True if token is a subword token.

Return type:

(bool)

textattack.shared.utils.strings.color_from_label(label_num)[source]

Arbitrary colors for different labels.

textattack.shared.utils.strings.color_from_output(label_name, label)[source]

Returns the correct color for a label name, like ‘positive’, ‘medicine’, or ‘entailment’.

textattack.shared.utils.strings.color_text(text, color=None, method=None)[source]
textattack.shared.utils.strings.default_class_repr(self)[source]
textattack.shared.utils.strings.flair_tag(sentence, tag_type='upos-fast')[source]

Tags a Sentence object using flair part-of-speech tagger.

textattack.shared.utils.strings.has_letter(word)[source]

Returns true if word contains at least one character in [A-Za-z].

textattack.shared.utils.strings.is_one_word(word)[source]
textattack.shared.utils.strings.process_label_name(label_name)[source]

Takes a label name from a dataset and makes it nice.

Meant to correct different abbreviations and automatically capitalize.

textattack.shared.utils.strings.strip_BPE_artifacts(token, model_type)[source]

Strip characters such as “Ġ” that are left over from BPE tokenization.

Parameters:
  • token (str) –

  • model_type (str) – type of model (options: “bert”, “roberta”, “xlnet”)

textattack.shared.utils.strings.words_from_text(s, words_to_ignore=[])[source]

Lowercases a string, removes all non-alphanumeric characters, and splits into words.

textattack.shared.utils.strings.zip_flair_result(pred, tag_type='upos-fast')[source]

Takes a sentence tagging from flair and returns two lists, of words and their corresponding parts-of-speech.

textattack.shared.utils.strings.zip_stanza_result(pred, tagset='universal')[source]

Takes the first sentence from a document from stanza and returns two lists, one of words and the other of their corresponding parts-of- speech.

textattack.shared.utils.tensor.batch_model_predict(model_predict, inputs, batch_size=32)[source]

Runs prediction on iterable inputs using batch size batch_size.

Aggregates all predictions into an np.ndarray.