Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

Of course, Google has been doing this for years! However, rather than start from scratch, spend 10 years+ and squander large sums of money :) I was wondering if anyone knows of a simple PHP library that would return a list of important words (and/or some sort of context) from a web page or chunk of text using PHP?

On a basic level, I am guessing the most spiders will pull in words, remove words without real meaning, then count the rest. The most occurring words would most likely be what I'm interested in.

Any sort of pointers would be really appreciated!

1 Answer

0 votes
by (108k points)

You can use LSA(Latent Semantic Indexing) as it can offer a new/different approach for retrieving a document based on particular search time. You could easily use it for determining the meaning of a document however though too. One of the problems with the search of yester-years was that they were based on keywords analysis. If you take Yahoo/Altavista from the late 1999's through to probably 2002/03 (don't quote me on this), they were extremely dependant on ONLY using keywords as a factor of retrieving a document from their index. Keywords don't translate to anything other than the keyword which they represent. However, the keyword "Hot", means lots of things depending on the context in which it is placed. If you were to take the term "hot" and identity that it was placed around other terms such as "chilies", "spices" or "herbs", then conceptually it means something totally different to the term "hot" when surrounding by other terms such as "heat" or "warmth".

LSA attempts to overcome these deficiencies by working on a matrix of statistical probabilities, (which you build yourself).

If you want to learn Latent Semantic Indexing then visit this NLP Course.

Browse Categories