Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (9k points)
What is RDD in PySpark?

1 Answer

0 votes
by (45.3k points)

RDD in PySpark stands for Resilient Distributed Dataset wherein:

  • Resilient: It has the feature of fault tolerance and also has the capability to rebuild data when there is a failure
  • Distributed: The data is distributed across a number of nodes in a cluster
  • Dataset: It consists of the portioned data along with the values

You can perform the following functions with the help of RDD:

  • Transformations: These operations help in creating new RDD
  • Actions: They are applied on RDD in order to instruct Apache Spark to compute and send the result to the driver

If you want to learn more about RDD in PySpark, you must register for an online PySpark Course. In this training, you will work on assignments and projects that will give you hands-on experience in solving real-world business issues.

Also, watch this YouTube video tutorial:

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...