textattack.constraints.grammaticality.language_models.google_language_model package

Google Language Models:

Google Language Models from Alzantot

Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.

class textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm.GoogLMHelper[source]

Bases: object

An implementation of https://arxiv.org/abs/1804.07998 adapted from https://github.com/nesl/nlp_adversarial_examples.

clear_cache()[source]
get_words_probs(prefix, list_words)[source]

Retrieves the probability of words.

Parameters
  • prefix_words

  • list_words

get_words_probs_uncached(prefix_words, list_words)[source]
CACHE_PATH = 'constraints/semantics/language-models/alzantot-goog-lm'

Google 1-Billion Words Language Model

class textattack.constraints.grammaticality.language_models.google_language_model.google_language_model.GoogleLanguageModel(top_n=None, top_n_per_index=None, compare_against_original=True)[source]

Bases: textattack.constraints.constraint.Constraint

Constraint that uses the Google 1 Billion Words Language Model to determine the difference in perplexity between x and x_adv.

Parameters
  • top_n (int) –

  • top_n_per_index (int) –

  • compare_against_original (bool) – If True, compare new x_adv against the original x. Otherwise, compare it against the previous x_adv.

check_compatibility(transformation)[source]

Checks if this constraint is compatible with the given transformation. For example, the WordEmbeddingDistance constraint compares the embedding of the word inserted with that of the word deleted. Therefore it can only be applied in the case of word swaps, and not for transformations which involve only one of insertion or deletion.

Parameters

transformation – The Transformation to check compatibility with.

extra_repr_keys()[source]

Set the extra representation of the constraint using these keys.

To print customized extra information, you should reimplement this method in your own constraint. Both single-line and multi- line strings are acceptable.

A library for loading 1B word benchmark dataset.

class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.CharsVocabulary(filename, max_word_length)[source]

Bases: textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary

Vocabulary containing character-level information.

encode_chars(sentence)[source]
word_to_char_ids(word)[source]
property max_word_length
property word_char_ids
class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.LM1BDataset(filepattern, vocab)[source]

Bases: object

Utility class for 1B word benchmark dataset.

The current implementation reads the data from the tokenized text files.

get_batch(batch_size, num_steps, pad=False, forever=True)[source]
property vocab
class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary(filename)[source]

Bases: object

Class that holds a vocabulary for the dataset.

decode(cur_ids)[source]

Convert a list of ids to a sentence, with space inserted.

encode(sentence)[source]

Convert a sentence to a list of ids, with special tokens added.

id_to_word(cur_id)[source]

Converts an ID to the word it represents.

Parameters

cur_id – The ID

Returns

The word that cur_id represents.

word_to_id(word)[source]
property bos
property eos
property size
property unk
textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.get_batch(generator, batch_size, num_steps, max_word_length, pad=False)[source]

Read batches of input.

Utils for loading 1B word benchmark dataset.

Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.

textattack.constraints.grammaticality.language_models.google_language_model.lm_utils.LoadModel(sess, graph, gd_file, ckpt_file)[source]

Load the model from GraphDef and AttackCheckpoint.

Parameters
  • gd_file – GraphDef proto text file.

  • ckpt_file – TensorFlow AttackCheckpoint file.

Returns

TensorFlow session and tensors dict.