I have a database with 2.000.000 messages. When a user receipt a message I need to find relevant messages in my database based on the occurrence of words.
I had tried run a batch process to summarize my database: 1 - Store all words(except an, a, the, of, for...) of all messages. 2 - Create association between all messages and the words contained therein (I also store the frequency of this word appears in the message.)
Then, when I receipt a message: 1 - I parse words (it looks like with the first step of my batch process.) 2 - Perform query in the database to fetching messages sorted by numbers of coincident words.
However, the process of updating my word base and the query to fetching similar messages are very heavy and slow. The word base update lasts ~1.2111 seconds for a message of 3000 bytes. The query similar messages last ~9.8 seconds for a message with the same size.
The database tuning already been done and the code works fine.
I need a better algorithm to do it.
Any ideas?