Find substring in text which has the highest similarity to a given keyword

Question

1 Answer

Anurag · Answer 1 · 2019-08-03T09:51:39+0000

Jaccard index is a "lucky" similarity algorithm because you can update its value for a new symbol without recalculating all previous stuff. So, you can view the text as a sequence of diffs for the resulting index value. After that, the problem can be reduced to https://en.wikipedia.org/wiki/Maximum_subarray_problem.

In your second paragraph, if you are doing some NLP-like research, I'd suggest cleaning your data (remove those extra symbols and spaces, whenever that's possible) before further processing. That's known as "spelling correction", and there are tons of different algorithms and libraries. To choose the appropriate one, extra information about your domain is needed.

Find substring in text which has the highest similarity to a given keyword

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources