Back

Explore Courses Blog Tutorials Interview Questions
+4 votes
4 views
in Machine Learning by (10.2k points)
edited by

I've trained a sentiment classifier model using Keras library by following the below steps(broadly).

  1. Convert Text corpus into sequences using Tokenizer object/class
  2. Build a model using the model.fit() method
  3. Evaluate this model

Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a way around this?

3 Answers

+5 votes
by (46k points)
edited by

I would suggest you to use pickle to save Tokenizer:

import pickle

# saving
with open('tokenizer.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

# loading
with open('tokenizer.pickle', 'rb') as handle:
    tokenizer = pickle.load(handle)

by (29.5k points)
thanks for the answer, this works
by (19.9k points)
This worked perfectly. Thank you.
by (19k points)
What if I need to pickle multiple objects?
by (33.1k points)
You can pickle them together, instead of separately.
by (47.2k points)
They have 2 different roles, the tokenizer will transform text into vectors, it's important to have the same vector space between training & testing.
by (29.3k points)
Thanks for this comment it helps me to understand the roles
+2 votes
by (108k points)
edited by

The most common way is to use either pickle or joblib. Here you have an example on how to use pickle to save Tokenizer:

import pickle

# saving

with open('tokenizer.pickle', 'wb') as handle:

    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

# loading

with open('tokenizer.pickle', 'rb') as handle:

    tokenizer = pickle.load(handle)

Watch this video to know more about Keras:

by (19.7k points)
Thanks for this answer, it worked for me!
by (44.4k points)
Don't call tokenizer.fit_on_texts again, it could change the index.
by (41.4k points)
Here, you have to save both a model and a tokenizer in order to run a model in the future
0 votes
by (106k points)

For Saving Tokenizer object to file for scoring you can use Tokenizer class which has a function to save the date into JSON format See the code below:-

tokenizer_json = tokenizer.to_json() 

with io.open('tokenizer.json', 'w', encoding='utf-8') as f:  

      f.write(json.dumps(tokenizer_json, ensure_ascii=False))

The data is loaded using tokenizer_from_json function from keras_preprocessing.text see the code below:-

with open('tokenizer.json') as f: 

        data = json.load(f) 

        tokenizer = tokenizer_from_json(data)

Browse Categories

...