Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (32.9k points)
What exactly are PySpark RDDs (Resilient Distributed Datasets)?

1 Answer

0 votes
by (31.8k points)

Resilient Distributed Datasets (RDDs) are a fundamental data structure in PySpark, which is an open-source distributed computing framework for big data processing. RDDs are immutable distributed collections of objects that can be processed in parallel across multiple nodes in a cluster. RDDs are fault-tolerant, meaning they can recover from node failures and ensure data consistency.

If you are interested in learning more about it, then don’t miss checking out the below video tutorial on PySpark -

Browse Categories

...