Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

Is there a way to find all the sub-sentences of a sentence that still are meaningful and contain at least one subject, verb, and a predicate/object?

For example, if we have a sentence like "I am going to do a seminar on NLP at SXSW in Austin next month". We can extract the following meaningful sub-sentences from this sentence: "I am going to do a seminar", "I am going to do a seminar on NLP", "I am going to do a seminar on NLP at SXSW", "I am going to do a seminar at SXSW", "I am going to do a seminar in Austin", "I am going to do a seminar on NLP next month", etc.

Please note that there are no deduced sentences here (e.g. "There will be an NLP seminar at SXSW next month". Although this is true, we don't need this as part of this problem.). All generated sentences are strictly part of the given sentence.

How can we approach solving this problem? I was thinking of creating annotated training data that has a set of legal sub-sentences for each sentence in the training data set. And then write some supervised learning algorithm(s) to generate a model.

I am quite new to NLP and Machine Learning, so it would be great if you guys could suggest some ways to solve this problem.

2 Answers

0 votes
by (108k points)

SVO(subject-verb and object) can help us understand what a particular sentence is talking about, and through this, make inferences about the whole body of the text. To get this information we need to take our tokenized sentences and run them through an n-gram tagging model. In order to get a more accurate result, I chose to use NLTK’s trigram tagger, with a Bigram, a Unigram and a Default backoff tagger. A backoff tagger will attempt to tag any untagged words that the previous tagger was unable to tag. To train this model, I chose to use the Brown, CoNLL2000 and the TreeBank corpus, all of which are included with NLTK.

The full code used in this project can be found here.

For finding the most important sentences using NLP & TF-IDF, refer the following link:

https://hackernoon.com/finding-the-most-important-sentences-using-nlp-tf-idf-3065028897a3

0 votes
by (33.1k points)

You can use the dependency parser provided by Stanford CoreNLP. The collapsed output of your sentence will look like:

nsubj(going-3, I-1)

xsubj(do-5, I-1)

aux(going-3, am-2)

root(ROOT-0, going-3)

aux(do-5, to-4)

xcomp(going-3, do-5)

det(seminar-7, a-6)

dobj(do-5, seminar-7)

prep_on(seminar-7, NLP-9)

prep_at(do-5, -11)

prep_in(do-5, Austin-13)

amod(month-15, next-14)

tmod(do-5, month-15)

The last 5 of your sentence output is optional. You can remove one or more parts that are not essential to your sentence.

Most of these optional parts are belong to prepositional and modifier e.g: prep_in, prep_do, advmod, tmod, etc.

Hope this answer helps you!

...