Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (55.6k points)

Can anyone tell me the need for RDD in Spark?

1 Answer

0 votes
by (119k points)

RDD (Resilient Distributed Dataset) is a basic data structure used in Spark to execute the MapReduce operations faster and efficiently.

Data sharing in MapReduce take a lot of time because of replication, serialization, and disk IO. Hadoop applications take over 90 percent of the time in read-write operations. So, researchers came up with this RDD concept that uses in-memory processing computation. Using RDDs increased the data sharing in memory by 10 to 100 times faster than network and disk.

If you wish to learn Spark then check out this Spark Training course by Intellipaat.

You can watch this YouTube tutorial to know more about the need for RDD:

Related questions

Browse Categories