Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (55.6k points)

Can anyone explain the RDD in Spark?

1 Answer

0 votes
by (119k points)

RDD (Resilient Distributed Dataset) is a basic data structure in Spark and it is immutable. RDD is an immutable distributed dataset collection and each distributed dataset is divided into partitions across the nodes of the cluster so that we can execute the operations in parallel.

RDD can be created using either parallelize() method or referencing a dataset from the external database. We can also create an RDD from an existing RDD when we applied a transformation on RDD.

If you are interested in to learn Spark, I would suggest this Spark Certification by Intellipaat.

You can watch this video on Spark RDD for a better understanding of RDD:

Related questions

Browse Categories