Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

I have read many times around the web about this question:

How do you extract the meaning of a page?

And I know that I am not experienced enough to even try to suggest any solution. To me, this is the holy grail of web programming or maybe even computer technology as a whole.

But through the power of imagination let us assume that I have written the ultimate script that does exactly that. For example, I enter this text:

Imagination has brought mankind through the dark ages to its present state of civilization. Imagination led Columbus to discover America. Imagination led Franklin to discover electricity.

and my powerful script extracts the meaning and says this:

The ability of human beings to think leads them to discover new things.

For this example, I used a "String" to explain the meaning of the text. But if I had to store this in a database, or an array or any sort of storage, what will be the datatype I will be using?

Note that I can have another text that uses a different analogy but still has the same meaning worded differently, for example, 

Imagination helps humankind advance.

Now I can enter a search query about the importance of imagination and these 2 results appear. But how will they be matched? Will it be a String comparison? Some integers, floating points? Maybe even binary?

What will the meaning be saved under? I would like to hear from you.

Update: Let me restate the question simply.

How do you represent Meaning in data?

1 Answer

0 votes
by (108k points)

"Meaning" is represented as a configuration of neuronal connections, hormonal levels, electrical activity maybe even quantum fluctuations and the interaction between all these and the outer world and other brains. If you want to express the meaning of lexical entities (e.g., concepts, actions) you can use distributed models such as vector space models. In these models, usually, the meaning has a geometric component. Each concept is represented as a vector and you place the concepts in space in such a way that similar concepts are closer to each other. A very simple way to form such space is to pick a set of commonly used words (basis words) as the dimensions of the space and simply count the number of times a target concept is observed together in speech/text with these basic words.

...