0 votes
1 view
in Big Data Hadoop & Spark by (53.7k points)

Can anyone explain the RDD in Spark?

1 Answer

0 votes
by (116k points)

RDD (Resilient Distributed Dataset) is a basic data structure in Spark and it is immutable. RDD is an immutable distributed dataset collection and each distributed dataset is divided into partitions across the nodes of the cluster so that we can execute the operations in parallel.

RDD can be created using either parallelize() method or referencing a dataset from the external database. We can also create an RDD from an existing RDD when we applied a transformation on RDD.

If you are interested in to learn Spark, I would suggest this Spark Certification by Intellipaat.

You can watch this video on Spark RDD for a better understanding of RDD:

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !