textattack.constraints.grammaticality.language_models.google_language_model package
Google Language Models:
Submodules
Google Language Models from Alzantot
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- class textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm.GoogLMHelper[source]
Bases:
objectAn implementation of https://arxiv.org/abs/1804.07998 adapted from https://github.com/nesl/nlp_adversarial_examples.
- get_words_probs(prefix, list_words)[source]
Retrieves the probability of words.
- Parameters:
prefix_words –
list_words –
- CACHE_PATH = 'constraints/semantics/language-models/alzantot-goog-lm'
Google 1-Billion Words Language Model
- class textattack.constraints.grammaticality.language_models.google_language_model.google_language_model.GoogleLanguageModel(top_n=None, top_n_per_index=None, compare_against_original=True)[source]
Bases:
ConstraintConstraint that uses the Google 1 Billion Words Language Model to determine the difference in perplexity between x and x_adv.
- Parameters:
top_n (int) –
top_n_per_index (int) –
compare_against_original (bool) – If True, compare new x_adv against the original x. Otherwise, compare it against the previous x_adv.
- check_compatibility(transformation)[source]
Checks if this constraint is compatible with the given transformation. For example, the
WordEmbeddingDistanceconstraint compares the embedding of the word inserted with that of the word deleted. Therefore it can only be applied in the case of word swaps, and not for transformations which involve only one of insertion or deletion.- Parameters:
transformation – The
Transformationto check compatibility with.
A library for loading 1B word benchmark dataset.
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.CharsVocabulary(filename, max_word_length)[source]
Bases:
VocabularyVocabulary containing character-level information.
- property max_word_length
- property word_char_ids
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.LM1BDataset(filepattern, vocab)[source]
Bases:
objectUtility class for 1B word benchmark dataset.
The current implementation reads the data from the tokenized text files.
- property vocab
- class textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.Vocabulary(filename)[source]
Bases:
objectClass that holds a vocabulary for the dataset.
- id_to_word(cur_id)[source]
Converts an ID to the word it represents.
- Parameters:
cur_id – The ID
- Returns:
The word that
cur_idrepresents.
- property bos
- property eos
- property size
- property unk
- textattack.constraints.grammaticality.language_models.google_language_model.lm_data_utils.get_batch(generator, batch_size, num_steps, max_word_length, pad=False)[source]
Read batches of input.
Utils for loading 1B word benchmark dataset.
Author: Moustafa Alzantot (malzantot@ucla.edu) All rights reserved.
- textattack.constraints.grammaticality.language_models.google_language_model.lm_utils.LoadModel(sess, graph, gd_file, ckpt_file)[source]
Load the model from GraphDef and AttackCheckpoint.
- Parameters:
gd_file – GraphDef proto text file.
ckpt_file – TensorFlow AttackCheckpoint file.
- Returns:
TensorFlow session and tensors dict.