Why Spark RDD is immutable?

Name: PySpark Tutorial Python Spark Intellipaat
Uploaded: 2020-02-06T13:35:17+00:00
Description: Why Spark RDD is immutable

Question

1 Answer

Eresh Kumar · Answer 1 · 2020-02-07T13:35:28+0000

RDD in PySpark stands for Resilient Distributed Dataset. It is an inflexible distributed collection of objects. Each RDD dataset is logically parted and computed on multiple nodes of the cluster. It can contain various types of objects including Python, Scala, or Java as well as user-defined classes. It is a collection of various elements that you can simultaneously operate on with its fault-tolerance features.

Spark RDD is an immutable collection of objects for the following reasons:

Immutable data can be shared safely across various processes and threads
It allows you to easily recreate the RDD
You can enhance the computation process by caching RDD

To learn more about Spark RDD, you must enroll in PySpark Course. In this course, you will also learn about PySpark SQL, Spark Framework, and PySpark Streaming. Further, you will also gain experience in various tools used in the Spark like Kafka, Spark MLlib, Flume, and Spark SQL.

You should also watch this YouTube video on PySpark:

Why Spark RDD is immutable?

Why Spark RDD is immutable?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions