Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

Consider an arbitrary text box that records the answer to the question, what do you want to do before you die?

Using a collection of response strings (max length 240), I'd like to somehow sort and group them and count them by the idea.

Is there another or better way to do something like this?

  1. Is this any different than string similarity?

  2. Is this the right question to be asking?

The idea here is to have people write in a text box over and over again, and me to provide a number that describes, generally speaking, that 802 people wrote approximately the same thing.

1 Answer

0 votes
by (108k points)

This is what you need to perform:

  • Implement some text formatting/cleaning tasks like eliminating punctuations characters and common "stop words"

  • Create a corpus (collection of words with their usage statistics) from the terms that occur answers.

  • Calculate a weight for every term.

  • Create a document vector from every answer (each term corresponds to a dimension in a very high dimensional Euclidian space)

  • Run a clustering algorithm on document vectors.

If you wish to learn more about the clustering algorithm then visit this Machine Learning Course

Browse Categories

...