textattack.constraints.grammaticality.language_models.google_language_model package
Google Language Models:
Google Language Models from Alzantot
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- class textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm.GoogLMHelper[source]
Bases:
object
An implementation of https://arxiv.org/abs/1804.07998 adapted from https://github.com/nesl/nlp_adversarial_examples.
- get_words_probs(prefix, list_words)[source]
Retrieves the probability of words.
- Parameters:
prefix_words –
list_words –
- CACHE_PATH = 'constraints/semantics/language-models/alzantot-goog-lm'
Google 1-Billion Words Language Model
- class textattack.constraints.grammaticality.language_models.google_language_model.google_language_model.GoogleLanguageModel(top_n=None, top_n_per_index=None, compare_against_original=True)[source]
Bases:
Constraint
Constraint that uses the Google 1 Billion Words Language Model to determine the difference in perplexity between x and x_adv.
- Parameters:
top_n (int) –
top_n_per_index (int) –
compare_against_original (bool) – If True, compare new x_adv against the original x. Otherwise, compare it against the previous x_adv.
- check_compatibility(transformation)[source]
Checks if this constraint is compatible with the given transformation. For example, the
WordEmbeddingDistance
constraint compares the embedding of the word inserted with that of the word deleted. Therefore it can only be applied in the case of word swaps, and not for transformations which involve only one of insertion or deletion.- Parameters:
transformation – The
Transformation
to check compatibility with.
A library for loading 1B word benchmark dataset.
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.CharsVocabulary(filename, max_word_length)[source]
Bases:
Vocabulary
Vocabulary containing character-level information.
- property max_word_length
- property word_char_ids
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.LM1BDataset(filepattern, vocab)[source]
Bases:
object
Utility class for 1B word benchmark dataset.
The current implementation reads the data from the tokenized text files.
- property vocab
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary(filename)[source]
Bases:
object
Class that holds a vocabulary for the dataset.
- id_to_word(cur_id)[source]
Converts an ID to the word it represents.
- Parameters:
cur_id – The ID
- Returns:
The word that
cur_id
represents.
- property bos
- property eos
- property size
- property unk
- textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.get_batch(generator, batch_size, num_steps, max_word_length, pad=False)[source]
Read batches of input.
Utils for loading 1B word benchmark dataset.
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- textattack.constraints.grammaticality.language_models.google_language_model.lm_utils.LoadModel(sess, graph, gd_file, ckpt_file)[source]
Load the model from GraphDef and AttackCheckpoint.
- Parameters:
gd_file – GraphDef proto text file.
ckpt_file – TensorFlow AttackCheckpoint file.
- Returns:
TensorFlow session and tensors dict.