Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in AI and Deep Learning by (50.2k points)

I have a database with 2.000.000 messages. When a user receipt a message I need to find relevant messages in my database based on the occurrence of words.

I had tried run a batch process to summarize my database: 1 - Store all words(except an, a, the, of, for...) of all messages. 2 - Create association between all messages and the words contained therein (I also store the frequency of this word appears in the message.)

Then, when I receipt a message: 1 - I parse words (it looks like with the first step of my batch process.) 2 - Perform query in the database to fetching messages sorted by numbers of coincident words.

However, the process of updating my word base and the query to fetching similar messages are very heavy and slow. The word base update lasts ~1.2111 seconds for a message of 3000 bytes. The query similar messages last ~9.8 seconds for a message with the same size.

The database tuning already been done and the code works fine.

I need a better algorithm to do it.

Any ideas?

1 Answer

0 votes
by (108k points)

I would suggest the book "Collective Intelligence". It's written for Python, but there is plenty of theory for you to implement it in another language. The very first chapter of the book involves what you are trying to do. 

Browse Categories