textattack.constraints.grammaticality.language_models.google_language_model package

Google Language Models:

Submodules

Google Language Models from Alzantot

Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.

class textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm.GoogLMHelper[source]

Bases: object

An implementation of https://arxiv.org/abs/1804.07998 adapted from https://github.com/nesl/nlp_adversarial_examples.

clear_cache()[source]

get_words_probs(prefix, list_words)[source]

Retrieves the probability of words.

Parameters:

prefix_words –
list_words –

get_words_probs_uncached(prefix_words, list_words)[source]

CACHE_PATH = 'constraints/semantics/language-models/alzantot-goog-lm'

Google 1-Billion Words Language Model

class textattack.constraints.grammaticality.language_models.google_language_model.google_language_model.GoogleLanguageModel(top_n=None, top_n_per_index=None, compare_against_original=True)[source]

Bases: Constraint

Constraint that uses the Google 1 Billion Words Language Model to determine the difference in perplexity between x and x_adv.

Parameters:

top_n (int) –
top_n_per_index (int) –
compare_against_original (bool) – If True, compare new x_adv against the original x. Otherwise, compare it against the previous x_adv.

check_compatibility(transformation)[source]

Checks if this constraint is compatible with the given transformation. For example, the WordEmbeddingDistance constraint compares the embedding of the word inserted with that of the word deleted. Therefore it can only be applied in the case of word swaps, and not for transformations which involve only one of insertion or deletion.

Parameters:: transformation – The Transformation to check compatibility with.

extra_repr_keys()[source]

Set the extra representation of the constraint using these keys.

To print customized extra information, you should reimplement this method in your own constraint. Both single-line and multi- line strings are acceptable.

A library for loading 1B word benchmark dataset.

class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.CharsVocabulary(filename, max_word_length)[source]

Bases: Vocabulary

Vocabulary containing character-level information.

encode_chars(sentence)[source]

word_to_char_ids(word)[source]

property max_word_length

property word_char_ids

class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.LM1BDataset(filepattern, vocab)[source]

Bases: object

Utility class for 1B word benchmark dataset.

The current implementation reads the data from the tokenized text files.

get_batch(batch_size, num_steps, pad=False, forever=True)[source]

property vocab

class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary(filename)[source]

Bases: object

Class that holds a vocabulary for the dataset.

decode(cur_ids)[source]: Convert a list of ids to a sentence, with space inserted.

encode(sentence)[source]: Convert a sentence to a list of ids, with special tokens added.

id_to_word(cur_id)[source]

Converts an ID to the word it represents.

Parameters:: cur_id – The ID
Returns:: The word that cur_id represents.

word_to_id(word)[source]

property bos

property eos

property size

property unk

textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.get_batch(generator, batch_size, num_steps, max_word_length, pad=False)[source]: Read batches of input.

Utils for loading 1B word benchmark dataset.

Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.

textattack.constraints.grammaticality.language_models.google_language_model.lm_utils.LoadModel(sess, graph, gd_file, ckpt_file)[source]

Load the model from GraphDef and AttackCheckpoint.

Parameters:

gd_file – GraphDef proto text file.
ckpt_file – TensorFlow AttackCheckpoint file.

Returns:

TensorFlow session and tensors dict.