Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column.

I've tried the following without any success:

type(randomed_hours) # => list

# Create in Python and transform to RDD

new_col = pd.DataFrame(randomed_hours, columns=['new_col'])

spark_new_col = sqlContext.createDataFrame(new_col)

my_df_spark.withColumn("hours", spark_new_col["new_col"])


Also got an error using this:

my_df_spark.withColumn("hours",  sc.parallelize(randomed_hours))


So how do I add a new column (based on Python vector) to an existing DataFrame with PySpark?

1 Answer

0 votes
by (32.3k points)
edited by

To add a column using a UDF:

df = sqlContext.createDataFrame(

    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

from pyspark.sql.functions import udf

from pyspark.sql.types import *

def valueToCategory(value):

   if   value == 1: return 'cat1'

   elif value == 2: return 'cat2'

   ...

   else: return 'n/a'

# NOTE: it seems that calls to udf() must be after SparkContext() is called

udfValueToCategory = udf(valueToCategory, StringType())

df_with_cat = df.withColumn("category", udfValueToCategory("x1"))

df_with_cat.show()

## +---+---+-----+---------+

## | x1| x2|   x3| category|

## +---+---+-----+---------+

## |  1| a| 23.0|     cat1|

## |  3| B|-23.0|      n/a|

## +---+---+-----+---------+

Also another method to create new column is possible using literals.

from pyspark.sql.functions import lit

df = sqlContext.createDataFrame(

    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

df_with_x4 = df.withColumn("x4", lit(0))

df_with_x4.show()

## +---+---+-----+---+

## | x1| x2|   x3| x4|

## +---+---+-----+---+

## |  1| a| 23.0|  0|

## |  3| B|-23.0|  0|

## +---+---+-----+---+

If you want more information regarding Spark, you can refer to the following  video tutorial:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...