Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I have a chatting app that works with predefined messages. The database has about 80 predefined conversations each with 5 possible responses. To clarify, here's an example:

 Q: "How heavy is a polar bear?"

R1: "Very heavy?"

R2: "Heavy enough to break the ice."

R3: "I don't know. Silly question."

R4: ...

R5: ...

Let's say a user will choose R3: "I don't know. Silly question"

Then that response will have 5 possible responses, e.g.:

R1: "Why is that silly?"

R2: "You're silly!"

R3: "Ugh. I'm done talking to you now."

R4: ...

R5: ...

And each of those responses will have 5 possible responses; after which, the conversation will end and a new one will have to be started.

So to recap, I have 80 manually-written conversations, each with 5 possible responses, going 3 layers deep = 10,000 messages total.

My question: What would be the most accurate way of automatically generating more conversations such as these using machine learning?

I researched RNN: Karparthy's RNN post. Although RNN can make new content based on the old, the new content is quite random and nonsensical.

1 Answer

0 votes
by (108k points)

There is a helpful article that uses Python and Keras (you can, however, use LSTM recurrent neural network also with TensorFlow). With a good and rich set of training data, the algorithm can indeed provide pretty interesting text outputs. As mentioned in the article above, there is a Gutenberg project where you can find an effective number of free books for free. The next thing is the relations between a question and it's possible responses. There is a certain semantics involved in your conversations. Meaning that it's not random and generated responses should at least try to "fit" into the somewhat relevant response. Something like Latent Dirichlet Allocation to find proper categories and topics based on data but in the reversed way - based on topic (question) you need to find out at least somehow relevant data (responses).

Browse Categories