What is RDD in PySpark?

Name: PySpark Tutorial Python Spark Intellipaat
Uploaded: 2020-02-06T13:35:17+00:00
Description: What is RDD in PySpark

Question

1 Answer

Eresh Kumar · Answer 1 · 2020-02-07T13:37:45+0000

RDD in PySpark stands for Resilient Distributed Dataset wherein:

Resilient: It has the feature of fault tolerance and also has the capability to rebuild data when there is a failure
Distributed: The data is distributed across a number of nodes in a cluster
Dataset: It consists of the portioned data along with the values

You can perform the following functions with the help of RDD:

Transformations: These operations help in creating new RDD
Actions: They are applied on RDD in order to instruct Apache Spark to compute and send the result to the driver

If you want to learn more about RDD in PySpark, you must register for an online PySpark Course. In this training, you will work on assignments and projects that will give you hands-on experience in solving real-world business issues.

Also, watch this YouTube video tutorial:

What is RDD in PySpark?

What is RDD in PySpark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions