At what situation I can use Dask instead of Apache Spark?

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T10:20:37+0000

Generally, Dask is smaller and lighter weight as compared to Spark. This means that it has fewer features and, instead, is used in conjunction with other libraries, particularly those in the numeric Python ecosystem. It couples with libraries like Pandas or Scikit-Learn to achieve high-level functionality.

Reasons you might choose Spark

You prefer Scala or the SQL language
You have mostly JVM infrastructure and legacy systems
You are mostly doing business analytics with some lightweight machine learning

Reasons you might choose Dask

You prefer Python, or have large legacy code bases that you do not want to entirely rewrite.
You have got a complex use case or your use case does not cleanly fit the Spark computing model
You want a lighter-weight transition from local computing to cluster computing
You intend to interoperate with other technologies and have no issue in installing multiple packages

At what situation I can use Dask instead of Apache Spark?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources