textattack.shared.utils package

Submodules

class textattack.shared.utils.importing.LazyLoader(local_name, parent_module_globals, name)[source]

Bases: module

Lazily import a module, mainly to avoid pulling in large dependencies.

This allows them to only be loaded when they are used.

textattack.shared.utils.importing.load_module_from_file(file_path)[source]: Uses importlib to dynamically open a file and load an object from it.

textattack.shared.utils.install.download_from_s3(folder_name, skip_if_cached=True)[source]

Folder name will be saved as <cache_dir>/textattack/<folder_name>. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters:

folder_name (str) – path to folder or file in cache
skip_if_cached (bool) – If True, skip downloading if content is already cached.

Returns:

path to the downloaded folder or file on disk

Return type:

str

textattack.shared.utils.install.download_from_url(url, save_path, skip_if_cached=True)[source]

Downloaded file will be saved under <cache_dir>/textattack/<save_path>. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters:

url (str) – URL path from which to download.
save_path (str) – path to which to save the downloaded content.
skip_if_cached (bool) – If True, skip downloading if content is already cached.

Returns:

path to the downloaded folder or file on disk

Return type:

str

textattack.shared.utils.install.http_get(url, out_file, proxies=None)[source]

Get contents of a URL and save to a file.

https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py

textattack.shared.utils.install.path_in_cache(file_path)[source]

textattack.shared.utils.install.s3_url(uri)[source]

textattack.shared.utils.install.set_cache_dir(cache_dir)[source]: Sets all relevant cache directories to TA_CACHE_DIR.

textattack.shared.utils.install.unzip_file(path_to_zip_file, unzipped_folder_path)[source]: Unzips a .zip file to folder path.

textattack.shared.utils.misc.get_textattack_model_num_labels(model_name, model_path)[source]: Reads train_args.json and gets the number of labels for a trained model, if present.

textattack.shared.utils.misc.hashable(key)[source]

textattack.shared.utils.misc.html_style_from_dict(style_dict)[source]

Turns.

{ ‘color’: ‘red’, ‘height’: ‘100px’}

into: style: “color: red; height: 100px”

textattack.shared.utils.misc.html_table_from_rows(rows, title=None, header=None, style_dict=None)[source]

textattack.shared.utils.misc.load_textattack_model_from_path(model_name, model_path)[source]

Loads a pre-trained TextAttack model from its name and path.

For example, model_name “lstm-yelp” and model path “models/classification/lstm/yelp”.

textattack.shared.utils.misc.set_seed(random_seed)[source]

textattack.shared.utils.misc.sigmoid(n)[source]

class textattack.shared.utils.strings.ANSI_ESCAPE_CODES[source]

Bases: object

Escape codes for printing color to the terminal.

BOLD = '\x1b[1m'

BROWN = '\x1b[38:5:52m'

CYAN = '\x1b[96m'

FAIL = '\x1b[91m'

GRAY = '\x1b[38:5:240m'

HEADER = '\x1b[95m'

OKBLUE = '\x1b[94m'

OKGREEN = '\x1b[92m'

ORANGE = '\x1b[38:5:208m'

PINK = '\x1b[95m'

PURPLE = '\x1b[35m'

STOP = '\x1b[0m'

UNDERLINE = '\x1b[4m': This color stops the current color sequence.

WARNING = '\x1b[93m'

YELLOW = '\x1b[93m'

class textattack.shared.utils.strings.ReprMixin[source]

Bases: object

Mixin for enhanced __repr__ and __str__.

extra_repr_keys()[source]: Extra fields to be included in the representation of a class.

class textattack.shared.utils.strings.TextAttackFlairTokenizer[source]

Bases: Tokenizer

tokenize(text: str)[source]

textattack.shared.utils.strings.add_indent(s_, numSpaces)[source]

textattack.shared.utils.strings.check_if_punctuations(word)[source]: Returns True if word is just a sequence of punctuations.

textattack.shared.utils.strings.check_if_subword(token, model_type, starting=False)[source]

Check if token is a subword token that is not a standalone word.

Parameters:

token (str) – token to check.
model_type (str) – type of model (options: “bert”, “roberta”, “xlnet”).
starting (bool) – Should be set True if this token is the starting token of the overall text. This matters because models like RoBERTa does not add “Ġ” to beginning token.

Returns:

True if token is a subword token.

Return type:

(bool)

textattack.shared.utils.strings.color_from_label(label_num)[source]: Arbitrary colors for different labels.

textattack.shared.utils.strings.color_from_output(label_name, label)[source]: Returns the correct color for a label name, like ‘positive’, ‘medicine’, or ‘entailment’.

textattack.shared.utils.strings.color_text(text, color=None, method=None)[source]

textattack.shared.utils.strings.default_class_repr(self)[source]

textattack.shared.utils.strings.flair_tag(sentence, tag_type='upos-fast')[source]: Tags a Sentence object using flair part-of-speech tagger.

textattack.shared.utils.strings.has_letter(word)[source]: Returns true if word contains at least one character in [A-Za-z].

textattack.shared.utils.strings.is_one_word(word)[source]

textattack.shared.utils.strings.process_label_name(label_name)[source]

Takes a label name from a dataset and makes it nice.

Meant to correct different abbreviations and automatically capitalize.

textattack.shared.utils.strings.strip_BPE_artifacts(token, model_type)[source]

Strip characters such as “Ġ” that are left over from BPE tokenization.

Parameters:

token (str) –
model_type (str) – type of model (options: “bert”, “roberta”, “xlnet”)

textattack.shared.utils.strings.words_from_text(s, words_to_ignore=[])[source]: Lowercases a string, removes all non-alphanumeric characters, and splits into words.

textattack.shared.utils.strings.zip_flair_result(pred, tag_type='upos-fast')[source]: Takes a sentence tagging from flair and returns two lists, of words and their corresponding parts-of-speech.

textattack.shared.utils.strings.zip_stanza_result(pred, tagset='universal')[source]: Takes the first sentence from a document from stanza and returns two lists, one of words and the other of their corresponding parts-of- speech.

textattack.shared.utils.tensor.batch_model_predict(model_predict, inputs, batch_size=32)[source]

Runs prediction on iterable inputs using batch size batch_size.

Aggregates all predictions into an np.ndarray.