For your case, there are some techniques to solve your problem:
First Technique: step-wise similarity
If you want to gather a number of techniques and rank them along two axes - inherent complexity or ease of implementation. This technique would be high on the first axis but might underperform against state-of-the-art techniques.
We determined that the combination of low-frequency keyword intersection combined with the similarity of the document is a fairly strong predictor of the document's content. If two documents have a similar set of very low-frequency terms (e.g., domain-specific terms, like 'decision manifold', etc.) and they have similar inbound traffic profiles, that combined with a strongly probative similarity of the documents.
The better insight on this will be provided through the Machine Learning Algorithms. Since questions are quite roughly based on this, mastering the course would help you to crack Machine Learning Interview Questions as well.
Hope this answer helps you!