Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
2 views
in Machine Learning by (6.8k points)

I want to build a web application that lets users upload documentsvideosimagesmusic, and then give them an ability to search them. Think of it as Dropbox + Semantic Search.

When the user uploads a new file, e.g. Document1.docx, how could I automatically generate tags based on the content of the file? In other words, no user input is needed to determine what the file is about. I suppose that Document1.docx is a research paper on data mining, then when the user searches for data mining, or research paper, or document1, that file should be returned in search results, since data mining and research paper will most likely be potential auto-generated tags for that given document.

1. Which algorithms would you recommend for this problem?

2. Is there a natural language library that could do this for me?

3. Which machine learning techniques should I look into to improve tagging precision?

4. How could I extend this to video and image automatic tagging?

Thanks in advance!

1 Answer

+1 vote
by (6.8k points)

The most common unsupervised machine learning model for this sort of task is Latent Dirichlet Allocation (LDA). This model automatically infers a set of topics over a corpus of documents supported the words in those documents. Running LDA on your set of documents would assign words with the likelihood to sure topics after you seek for them, and so you may retrieve the documents with the highest probabilities to be relevant to that word.

There are some extensions to pictures and music furthermore, see http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf.

LDA has several efficient implementations in several languages:

  • Many implementations from the original researchers
  • http://mallet.cs.umass.edu/, written in Java and recommended by others on SO
  • PLDA: a fast, parallelized C++ implementation.
  • They exits an algorithm which is used alternative to LDA.
  • These guys propose an alternative to LDA.

Automatic Tag Recommendation Algorithms for Social Recommender Systemshttp://research.microsoft.com/pubs/79896/tagging.pdf

Studying the Intellipaat's Types of Machine Learning will put forth a new dimension. Also, studying the Machine Learning Tutorial will be quite effective as far as solving questions related to Machine Learning is concerned.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...