SparkR vs sparklyr

Question

SparkR vs sparklyr

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T09:04:59+0000

Sparklyr is an effective tool for interfacing with large datasets in an interactive environment. Simply we can say it is an R interface for apache Spark. It filters and aggregates Spark datasets and then bring them into R for analysis and visualization.

Through Sparklyr gives you the capability to use Spark as the backend for dplyr, that is a popular data manipulation package.

Sparklyr provides a range of functions that allow you to access the Spark tools for transforming/pre-processing data.

SparkR is basically a tool for running R on Spark. In order to use SparkR, we just import it into our environment and run our code. It is similar to the Python API except that it follows R’s syntax instead of Python. For the most part, almost everything available in Python is available in SparkR.

Choosing the winner between these two depends on an individual, i.e. which parameters he/she prefers.

According to my understanding when learning sparkR together with scala-spark api, seem's to be much easier than learning sparklyr which is much more different at least in my perspective.

However sparklyr is more powerful as it supports dplyr, Spark ML and H2O.

The advantages of using sparklyr are:

Better data manipulation through compatibility with dpylr
Better function naming conventions
Better tools for quickly evaluating ML models
Easier to run arbitrary code on a Spark DataFrame

If you want to know more about Spark, then do check out this awesome video tutorial:

SparkR vs sparklyr

Please log in to add a comment.

Please log in to answer this question.

1 Answer

Please log in to add a comment.

Related questions