0 votes
1 view
in Machine Learning by (19k points)

I am currently embarking on a project that will involve crawling and processing huge amounts of data (hundreds of gigs), and also mining them for extracting structured data, named entity recognition, deduplication, classification, etc.

I'm familiar with ML tools from both Java and the Python world: Lingpipe, Mahout, NLTK, etc. However, when it comes down to picking a platform for such a large scale problem - I lack sufficient experience to decide between Java or Python.

I know this sounds like a vague question, and but I am looking for general advice on picking either Java or Python. The JVM offers better performance(?) over Python, but are libraries like Lingpipe, etc. match up with the Python ecosystem? If I went this Python, how easy would it be scaling it and managing it across multiple machines, etc.

Which one should I go with and why?

1 Answer

0 votes
by (33.2k points)

For large scale machine learning, Python is being most commonly used in the industry. You may know that Apache is producing excellent stuff like Lucene/Solr/Nutch for Search, Mahout for Big Data Machine Learning, Hadoop for Map Reduce, OpenNLP for NLP, lot of NoSQL stuff. The best part is integration and these products can be integrated with each other well.

Hope this answer helps.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !