0 votes
1 view
in DevOps and Agile by (47.8k points)

Is combination of Apache Spark and Docker a rational choice?

1 Answer

0 votes
by (107k points)

In the field of Data analysis spark is very powerful, but spark itself is just a set of libraries. It is a real computation power which comes from the underlining job distributing system, e.g. Hadoop, MESOS, etc. I assume you won’t deploy Spark in *standalone* mode. Spark still supports cluster deployment but is much harder to manage when you have a lot of slaves. So, if you deploy your Spark system on top of Hadoop, using YARN, the real computation power comes from those Hadoop nodes. If you can get those Hadoop nodes running in a really powerful docker container, it will be an efficient system. If you want to work with Docker I would suggest you must take up the following Docker Training Course. For your reference, I am also mentioning a video tutorial which you must watch this will help you learn Docker. I am also mentioning a video tutorial of Apache Spark which you can also watch to learn more about it. 



Apache Spark Video:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !