What are RDDs (Resilient Distributed Datasets) in PySpark?

Question

1 Answer

Nisha S · Answer 1 · 2023-05-08T10:34:27+0000

Resilient Distributed Datasets (RDDs) are a fundamental data structure in PySpark, which is an open-source distributed computing framework for big data processing. RDDs are immutable distributed collections of objects that can be processed in parallel across multiple nodes in a cluster. RDDs are fault-tolerant, meaning they can recover from node failures and ensure data consistency.

If you are interested in learning more about it, then don’t miss checking out the below video tutorial on PySpark -

What are RDDs (Resilient Distributed Datasets) in PySpark?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources