I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I'd like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general, Word: (Word_it_appears_with, Frequency), ....
Please note the inherent redundancy in this type of data. If the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?
The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.
I realize this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.
Thanks