Using a naive Bayes will most probably work for you. The method is like:
You should fix a number of categories and simply train data set of (document, category) pairs.
A data vector from your document will be sth like a bag of words. e.g. Take the 100 most common words except words like "the", "and" and such. Each word should get a fixed component of your data vector. A feature vector is an array of booleans, each indicating whether the word came up in the corresponding document.
For your training set, calculate the probability of every feature and every class:
p(C) = number documents of class C / total number of documents
Calculate the probability of a feature in a class: p(F|C) = number of documents of class with given feature (= word "food" is in the text) / number of documents in the given class.
Given an unclassified document, the probability of it belonging to class C is proportional to
P(C|F1, ..., F500) = P(C) * P(F1|C) * P(F2|C)
Since multiplication is numerically difficult, you can use the sum of the logs instead, which will maximize at the same
C: log P(C|F1, ..., F500) = log P(C) + log P(F1|C) + log P(F2|C) + ... + log P(F500|C)
Hope this answer helps.
These are the few topics, questions on which are asked by interviewers. Thus, to master these topics, study Machine Learning Tutorial and to gain more insight about interviews, study Machine Learning Interview Questions as well.