Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and convert it into a Timestamp like this

import java.sql.Timestamp
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql.Row

def getTimestamp(x:Any) : Timestamp = {
    val format = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss")
    if (x.toString() == "")
    return null
    else {
        val d = format.parse(x.toString());
        val t = new Timestamp(d.getTime());
        return t
    }
}

def convert(row : Row) : Row = {
    val d1 = getTimestamp(row(3))
    return Row(row(0),row(1),row(2),d1)
}


Is there a better, more concise way to do this?

1 Answer

0 votes
by (32.3k points)

Spark >= 2.2

Since Spark 2.2, you can provide format string directly. So, try to something like this:

import org.apache.spark.sql.functions.to_timestamp

val ts = to_timestamp($"dts", "MM/dd/yyyy HH:mm:ss")

df.withColumn("ts", ts).show(2, false)

// +---+-------------------+-------------------+

// |id |dts                |ts |

// +---+-------------------+-------------------+

// |1  |05/26/2016 01:01:01|2016-05-26 01:01:01|

// |2  |#$@#@#             |null |

// +---+-------------------+-------------------+

For Spark<2.2

You can use date processing functions which were introduced in Spark 1.5 while assuming that you have the following data:

val df = Seq((1L, "05/26/2016 01:01:01"), (2L, "#$@#@#")).toDF("id", "dts")

You can use unix_timestamp to parse strings and cast it to timestamp

import org.apache.spark.sql.functions.unix_timestamp

val ts = unix_timestamp($"dts", "MM/dd/yyyy HH:mm:ss").cast("timestamp")

df.withColumn("ts", ts).show(2, false)

// +---+-------------------+---------------------+

// |id |dts                |ts |

// +---+-------------------+---------------------+

// |1  |05/26/2016 01:01:01|2016-05-26 01:01:01.0|

// |2  |#$@#@#             |null   |

// +---+-------------------+---------------------+


If you see this properly you will notice that it covers both parsing and error handling. So, the format string should be compatible with Java SimpleDateFormat.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...