Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I am working on a project where I would like to achieve a sense of natural language understanding. However, I am going to start small and would like to train it on specific queries.

So for example, starting I might tell it:


Then if it sees a sentence like "Kanye Wests songs" it can match against that.

BUT then I would like to give it some extra sentences that could mean the same thing so that it eventually learns to be able to predict unknown sentences into a set that I have trained it on.

So I might add the sentence: "Songs by

And of course, it would be a database of names it can match against.

I came across a neat website, that does something like I talk about. However, they resolve their matches to an intent, where I would like to match it to a simplified query or BETTER a database like a query (like facebook graph search).

I understand a context-free grammar would work well for this (anything else?). But what are good methods to train several CFG that I say have a similar meaning and then when it sees unknown sentences it can try and predict?

Any thoughts would be great.

Basically, I would like to be able to take a natural language sentence and convert it to some form that can be run better understood to my system and presented to the user in a nice way. 

1 Answer

0 votes
by (108k points)

Any grammar trained on a corpus is going to be dependent on the words in that training corpus. The bad performance on unknown words is a well-known concern in not just PCFG training, but in pretty much any probabilistic learning framework. What we can do, is to look at the problem as a paraphrasing issue. In the end, you want to group together sentences that have the same meaning. Identifying the sentences or phrases that have the identical (or alike) meaning have employed a technique known as a distributional similarity. It aims at improving probability estimation for unseen co-occurrences. The basic concept is the words or phrases that share the same distribution—the same set of words in the same context in a corpus—tend to have similar meanings.

You can also use only intrinsic features (e.g. production rules in PCFG) or support such features with additional semantic knowledge (e.g. ontologies like FreeBase). Using additional semantic knowledge allows the generation of more complex sentences/phrases with similar meanings, but such methods normally work well only for specific areas. So, if you want your system to work well only for music, it's a good idea.

Generating the actual distributional similarity algorithms will make this answer long, so here's a link to a great article:

title named: Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods

Browse Categories