Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (55.6k points)
Can anyone explain tokenization in simple words?

1 Answer

0 votes
by (119k points)

Tokenization is one of the preprocessing steps in NLP. In simple words, Tokenization is splitting the text into smaller units and each unit is called tokens. We can simply jump into the model building without tokenization.

There are six types of tokenizations like word tokenization, sentence tokenization, etc. Word tokenization is breaking down the sentences into words. Sentence tokenization means breaking down the paragraph into sentences. 

In NLP, we can use the NLTK package or SpaCy library for the sentence or word tokenization. We can also use a deep learning framework Keras for doing tokenization. 

You can know more about Tokenization from this video on NLP:

Related questions

0 votes
3 answers
0 votes
1 answer
asked Mar 5, 2020 in AI and Deep Learning by Sudhir_1997 (55.6k points)
0 votes
1 answer
asked Mar 5, 2020 in AI and Deep Learning by Sudhir_1997 (55.6k points)
0 votes
1 answer

Browse Categories

...