Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (9k points)
What is PySpark DataFrame?

1 Answer

0 votes
by (45.3k points)

PySpark was developed by Apache Spark to integrate and work with Python’s RDD. It supports the distributed framework of Apache Spark in order to handle Big Data analysis. It can be integrated with various programming languages including SQL, Java, Scala, Python, and R.

PySpark DataFrames are tables that consist of rows and columns of data. It has a two-dimensional structure wherein every column consists of values of a particular variable while each row consists of a single set of values from each column. Some of the characteristics of PySpark DataFrame include:

  • Column names cannot be empty
  • Row names must be distinct
  • The stored data can be of factor, character or numeric data type
  • Every column must consist of the same number of data items

To learn in detail about PySpark Dataframe and pursue a career in PySpark, register for PySpark Course.

Also, check out this PySpark video tutorial on YouTube:

Browse Categories

...