Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (4.2k points)

I am aware of the duplicates of this question:

These questions are interested in how the algorithm actually works. My question is more like: Let's assume Google did not exist or maybe this feature did not exist and we don't have user input. How does one go about implementing an approximate version of this algorithm?

Why is this interesting?

Ok. Try typing "qualfy" into Google and it tells you:

Did you mean: qualify

Fair enough. It uses Statistical Machine Learning on data collected from billions of users to do this. But now try typing this: "Trytoreconnectyou" into Google and it tells you:

Did you mean: Try To Reconnect You

Now this is the more interesting part. How does Google determine this? Have a dictionary handy and guess the most probably words again using user input? And how does it differentiate between a misspelled word and a sentence?

Now considering that most programmers do not have access to input from billions of users, I am looking for the best approximate way to implement this algorithm and what resources are available (datasets, libraries etc.). Any suggestions?

1 Answer

0 votes
by (108k points)

Let's suppose you are having a dictionary of words (all the words that appear in the dictionary in the worst case, all the phrases that appear in the data in your system in the best case) and that you know the corresponding number of the numerous words, you should be able to calculate at what the user meant via some mixture of the similarity of the word and the number of hits for the similar word. The weights certainly need a bit of trial and error, but usually, the user will be more involved in a popular result that is a bit linguistically distant from the string they entered than in a legitimate word that is linguistically more alike but only has one or two hits in your system.

You can refer to the Machine Learning course for more information regarding the same.

...