textattack.constraints.grammaticality.language_models.google_language_model package
Google Language Models:
Google Language Models from Alzantot
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- class textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm.GoogLMHelper[source]
Bases:
object
An implementation of https://arxiv.org/abs/1804.07998 adapted from https://github.com/nesl/nlp_adversarial_examples.
- get_words_probs(prefix, list_words)[source]
Retrieves the probability of words.
- Parameters
prefix_words –
list_words –
- CACHE_PATH = 'constraints/semantics/language-models/alzantot-goog-lm'
Google 1-Billion Words Language Model
- class textattack.constraints.grammaticality.language_models.google_language_model.google_language_model.GoogleLanguageModel(top_n=None, top_n_per_index=None, compare_against_original=True)[source]
Bases:
textattack.constraints.constraint.Constraint
Constraint that uses the Google 1 Billion Words Language Model to determine the difference in perplexity between x and x_adv.
- Parameters
top_n (int) –
top_n_per_index (int) –
compare_against_original (bool) – If True, compare new x_adv against the original x. Otherwise, compare it against the previous x_adv.
- check_compatibility(transformation)[source]
Checks if this constraint is compatible with the given transformation. For example, the
WordEmbeddingDistance
constraint compares the embedding of the word inserted with that of the word deleted. Therefore it can only be applied in the case of word swaps, and not for transformations which involve only one of insertion or deletion.- Parameters
transformation – The
Transformation
to check compatibility with.
A library for loading 1B word benchmark dataset.
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.CharsVocabulary(filename, max_word_length)[source]
Bases:
textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary
Vocabulary containing character-level information.
- property max_word_length
- property word_char_ids
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.LM1BDataset(filepattern, vocab)[source]
Bases:
object
Utility class for 1B word benchmark dataset.
The current implementation reads the data from the tokenized text files.
- property vocab
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary(filename)[source]
Bases:
object
Class that holds a vocabulary for the dataset.
- id_to_word(cur_id)[source]
Converts an ID to the word it represents.
- Parameters
cur_id – The ID
- Returns
The word that
cur_id
represents.
- property bos
- property eos
- property size
- property unk
Utils for loading 1B word benchmark dataset.
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- textattack.constraints.grammaticality.language_models.google_language_model.lm_utils.LoadModel(sess, graph, gd_file, ckpt_file)[source]
Load the model from GraphDef and AttackCheckpoint.
- Parameters
gd_file – GraphDef proto text file.
ckpt_file – TensorFlow AttackCheckpoint file.
- Returns
TensorFlow session and tensors dict.