Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Big Data Hadoop & Spark by (9k points)
Why Spark RDD is immutable?

1 Answer

0 votes
by (45.3k points)

RDD in PySpark stands for Resilient Distributed Dataset. It is an inflexible distributed collection of objects. Each RDD dataset is logically parted and computed on multiple nodes of the cluster. It can contain various types of objects including Python, Scala, or Java as well as user-defined classes. It is a collection of various elements that you can simultaneously operate on with its fault-tolerance features.

Spark RDD is an immutable collection of objects for the following reasons:

  • Immutable data can be shared safely across various processes and threads
  • It allows you to easily recreate the RDD
  • You can enhance the computation process by caching RDD

To learn more about Spark RDD, you must enroll in PySpark Course. In this course, you will also learn about PySpark SQL, Spark Framework, and PySpark Streaming. Further, you will also gain experience in various tools used in the Spark like Kafka, Spark MLlib, Flume, and Spark SQL.

You should also watch this YouTube video on PySpark:

Related questions

0 votes
1 answer
asked Feb 7, 2020 in Big Data Hadoop & Spark by anmolj (9k points)
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...