0 votes
in Big Data Hadoop & Spark by (55.6k points)

Can anyone explain the RDD in Spark?

1 Answer

0 votes
by (119k points)

RDD (Resilient Distributed Dataset) is a basic data structure in Spark and it is immutable. RDD is an immutable distributed dataset collection and each distributed dataset is divided into partitions across the nodes of the cluster so that we can execute the operations in parallel.

RDD can be created using either parallelize() method or referencing a dataset from the external database. We can also create an RDD from an existing RDD when we applied a transformation on RDD.

