I have a big table in my database with a lot of words from various texts in the text order. I want to find the number of times/frequency that some set of words appears together.
Example: Supposing I have these 4 words in many texts: United | States | of | America. I will get as a result:
United States: 50
United States of: 45
United States of America: 40
(This is only an example with 4 words, but can there are with less and more than 4).
There is some algorithm that can do this or similar to this?
Edit: Some R or SQL code showing how to do is welcome. I need a practical example of what I need to do.
I have two tables: Token which haves id and text. The text is UNIQUE and each entry in this table represents a different word.
TextBlockHasToken is the table that keeps the text order. Each row represents a word in a text.
It haves textblockid that is the block of the text the token belongs. a sentence that is the sentence of the token, position that is the token position inside the sentence and tokenid that is the token table reference.