What is tokenization in NLP?

Tokenization is one of the preprocessing steps in NLP. In simple words, Tokenization is splitting the text into smaller units and each unit is called tokens. We can simply jump into the model building without tokenization.

There are six types of tokenizations like word tokenization, sentence tokenization, etc. Word tokenization is breaking down the sentences into words. Sentence tokenization means breaking down the paragraph into sentences.

In NLP, we can use the NLTK package or SpaCy library for the sentence or word tokenization. We can also use a deep learning framework Keras for doing tokenization.

You can know more about Tokenization from this video on NLP:

What is tokenization in NLP?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources