Convert pyspark string to date format

Question

6 Answers

Amit Rawat · Answer 1 · 2019-07-10T10:39:03+0000

Try this code:

> from pyspark.sql.functions import unix_timestamp
> from pyspark.sql.functions import from_unixtime
> df = spark.createDataFrame([("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], ['date_str'])
> df2 = df.select('date_str', from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date'))
> df2
DataFrame[date_str: string, date: timestamp]
> df2.show()
+----------+--------------------+
| date_str| date|
+----------+--------------------+
|11/25/1991|1991-11-25 00:00:...|
|11/24/1991|1991-11-24 00:00:...|
|11/30/1991|1991-11-30 00:00:...|
+----------+--------------------+

If you wish to know about Hadoop and Big Data visit this Hadoop Certification.

Using a udf for this might destroy your performance. So this is the correct answer. — kodee, Aug 13, 2019
I tried this option among many from AWS Glue pyspark, works like charm! — Ashok, Aug 16, 2019
Looks like this code helps solve your problem of null strings! — Kasheeka, Aug 16, 2019

chandra · Answer 2 · 2019-08-09T09:11:30+0000

If Strptime() approach doesn't help you then you could do this using cast:

from pyspark.sql.types import DateType
spark_df1 = spark_df.withColumn("record_date",spark_df['order_submitted_date'].cast(DateType()))
#below is the result
spark_df1.select('order_submitted_date','record_date').show(10,False)
+---------------------+-----------+
|order_submitted_date |record_date|
+---------------------+-----------+
|2015-08-19 12:54:16.0|2015-08-19 |
|2016-04-14 13:55:50.0|2016-04-14 |
|2013-10-11 18:23:36.0|2013-10-11 |
|2015-08-19 20:18:55.0|2015-08-19 |
|2015-08-20 12:07:40.0|2015-08-20 |
|2013-10-11 21:24:12.0|2013-10-11 |
|2013-10-11 23:29:28.0|2013-10-11 |
|2015-08-20 16:59:35.0|2015-08-20 |
|2015-08-20 17:32:03.0|2015-08-20 |
|2016-04-13 16:56:21.0|2016-04-13 |

vinita · Answer 3 · 2019-08-10T05:18:48+0000

In @Amit Rawat answer's you don't see the example for the to_date function, so another solution using it would be:

from pyspark.sql import functions as F
df=df.withColumn('new_date',F.to_date(F.unix_timestamp('STRINGCOLUMN', 'MM-dd-yyyy').cast('timestamp'))

Thanks, your additional solution with @Amit Rawat's answer helped! — Han Zhyang, Aug 10, 2019

Prakhar_04 · Answer 4 · 2019-08-10T12:27:47+0000

Try something like following to convert dates

from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date

spark = SparkSession.builder.appName("Python Spark SQL basic example")\
.config("spark.some.config.option", "some-value").getOrCreate()

df = spark.createDataFrame([('2019-06-22',)], ['t'])
df1 = df.select(to_date(df.t, 'yyyy-MM-dd').alias('dt'))
print df1
print df1.show()

Soni Kumari · Answer 5 · 2019-08-14T09:51:05+0000

Try using the below code:

from datetime import datetime
from pyspark.sql.functions import col, udf
from pyspark.sql.types import DateType
+# Creation of a dummy dataframe:
df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"),
("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])
# Setting an user define function:
# This function converts the string cell into a date:
func = udf (lambda x: datetime.strptime(x, '%m/%d/%Y'), DateType())
df = df1.withColumn('test', func(col('first')))
df.show()
df.printSchema()

Output:

+----------+----------+----------+----------+
| first| second| third| test|
+----------+----------+----------+----------+
|11/25/1991|11/24/1991|11/30/1991|1991-01-25|
|11/25/1391|11/24/1992|11/30/1992|1391-01-17|
+----------+----------+----------+----------+
root
|-- first: string (nullable = true)
|-- second: string (nullable = true)
|-- third: string (nullable = true)
|-- test: date (nullable = true)

Vishal · Answer 6 · 2019-08-20T10:17:46+0000

You can convert date from string to date format in data frames by using to_date with Java SimpleDateFormat.

See the syntax below:-

TO_DATE(CAST(UNIX_TIMESTAMP(date, 'MM/dd/yyyy') AS TIMESTAMP))

Code Example:

spark.sql("""
SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate""" ).show()

+----------+
| dt|
+----------+
|2016-08-26|
+----------+

am still getting null value , below is the code which i have written

dataf=dataframe.withColumn('abc', from_unixtime(unix_timestamp(func.col("POLICYEFFECTIVEDATE"), "yyyy-MM-dd"), "yyyy-MM-dd"))

I am gettiing output as

|POLICYEFFECTIVEDATE |abc |
+---------------------------+----+
|2019-08-06 09:52:52.1100000|null|

Kindly help on this — anonymous, Nov 22, 2019

Convert pyspark string to date format

6 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources