Back
I've trained a sentiment classifier model using Keras library by following the below steps(broadly).
Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a way around this?
I would suggest you to use pickle to save Tokenizer:
import pickle# savingwith open('tokenizer.pickle', 'wb') as handle: pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)# loadingwith open('tokenizer.pickle', 'rb') as handle: tokenizer = pickle.load(handle)
import pickle
# savingwith open('tokenizer.pickle', 'wb') as handle: pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# loadingwith open('tokenizer.pickle', 'rb') as handle: tokenizer = pickle.load(handle)
The most common way is to use either pickle or joblib. Here you have an example on how to use pickle to save Tokenizer:
# saving
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# loading
with open('tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
Watch this video to know more about Keras:
For Saving Tokenizer object to file for scoring you can use Tokenizer class which has a function to save the date into JSON format See the code below:-
tokenizer_json = tokenizer.to_json() with io.open('tokenizer.json', 'w', encoding='utf-8') as f: f.write(json.dumps(tokenizer_json, ensure_ascii=False))
tokenizer_json = tokenizer.to_json()
with io.open('tokenizer.json', 'w', encoding='utf-8') as f:
f.write(json.dumps(tokenizer_json, ensure_ascii=False))
The data is loaded using tokenizer_from_json function from keras_preprocessing.text see the code below:-
with open('tokenizer.json') as f: data = json.load(f) tokenizer = tokenizer_from_json(data)
with open('tokenizer.json') as f:
data = json.load(f)
tokenizer = tokenizer_from_json(data)
31k questions
32.8k answers
501 comments
693 users