Explore Courses Blog Tutorials Interview Questions
+2 votes
in Big Data Hadoop & Spark by (11.4k points)

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column.

I tried:'new_date')).show()

and I get a string of nulls. Can anyone help?

6 Answers

+3 votes
by (32.3k points)

Try this code:

> from pyspark.sql.functions import unix_timestamp

> from pyspark.sql.functions import from_unixtime

> df = spark.createDataFrame([("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], ['date_str'])

> df2 ='date_str', from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date'))

> df2

DataFrame[date_str: string, date: timestamp]



|  date_str|                date|


|11/25/1991|1991-11-25 00:00:...|

|11/24/1991|1991-11-24 00:00:...|

|11/30/1991|1991-11-30 00:00:...|


If you wish to know about Hadoop and Big Data visit this  Hadoop Certification.

Convert pyspark string to date format
by (19.9k points)
Very well explained. Thank you.
by (44.4k points)
Using a udf for this might destroy your performance. So this is the correct answer.
by (47.2k points)
I tried this option among many from AWS Glue pyspark, works like charm!
by (32.1k points)
Looks like this code helps solve your problem of null strings!
+1 vote
by (29.3k points)

If Strptime() approach doesn't help you then you could do this using cast:

from pyspark.sql.types import DateType

spark_df1 = spark_df.withColumn("record_date",spark_df['order_submitted_date'].cast(DateType()))

#below is the result'order_submitted_date','record_date').show(10,False)


|order_submitted_date |record_date|


|2015-08-19 12:54:16.0|2015-08-19 |

|2016-04-14 13:55:50.0|2016-04-14 |

|2013-10-11 18:23:36.0|2013-10-11 |

|2015-08-19 20:18:55.0|2015-08-19 |

|2015-08-20 12:07:40.0|2015-08-20 |

|2013-10-11 21:24:12.0|2013-10-11 |

|2013-10-11 23:29:28.0|2013-10-11 |

|2015-08-20 16:59:35.0|2015-08-20 |

|2015-08-20 17:32:03.0|2015-08-20 |

|2016-04-13 16:56:21.0|2016-04-13 |

0 votes
by (108k points)

In @Amit Rawat answer's you don't see the example for the to_date function, so another solution using it would be:

from pyspark.sql import functions as F

df=df.withColumn('new_date',F.to_date(F.unix_timestamp('STRINGCOLUMN', 'MM-dd-yyyy').cast('timestamp'))

Convert pyspark string to date format
by (19.7k points)
Thanks, your additional solution with @Amit Rawat's answer helped!
0 votes
by (29.5k points)

Try something like following to convert dates

from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date

spark = SparkSession.builder.appName("Python Spark SQL basic example")\
    .config("spark.some.config.option", "some-value").getOrCreate()

df = spark.createDataFrame([('2019-06-22',)], ['t'])
df1 =, 'yyyy-MM-dd').alias('dt'))
print df1

0 votes
by (40.7k points)

Try using the below code:

from datetime import datetime

from pyspark.sql.functions import col, udf

from pyspark.sql.types import DateType

+# Creation of a dummy dataframe:

df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"), 

                            ("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])

# Setting an user define function:

# This function converts the string cell into a date:

func =  udf (lambda x: datetime.strptime(x, '%m/%d/%Y'), DateType())

df = df1.withColumn('test', func(col('first')))




|     first|    second|     third|      test|






 |-- first: string (nullable = true)

 |-- second: string (nullable = true)

 |-- third: string (nullable = true)

 |-- test: date (nullable = true)

0 votes
by (106k points)

You can convert date from string to date format in data frames by using to_date with Java SimpleDateFormat

See the syntax below:-


Code Example:


SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate""" ).show()


|        dt| 




Convert pyspark string to date format
by (100 points)
am still getting null value , below is the code which i have written

dataf=dataframe.withColumn('abc', from_unixtime(unix_timestamp(func.col("POLICYEFFECTIVEDATE"), "yyyy-MM-dd"), "yyyy-MM-dd"))

I am gettiing output as

|2019-08-06 09:52:52.1100000|null|

Kindly help on this

Related questions

Browse Categories