Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and convert it into a Timestamp like this

import java.sql.Timestamp
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql.Row

def getTimestamp(x:Any) : Timestamp = {
    val format = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss")
    if (x.toString() == "")
    return null
    else {
        val d = format.parse(x.toString());
        val t = new Timestamp(d.getTime());
        return t
    }
}

def convert(row : Row) : Row = {
    val d1 = getTimestamp(row(3))
    return Row(row(0),row(1),row(2),d1)
}


Is there a better, more concise way to do this?

1 Answer

0 votes
by (32.3k points)

Spark >= 2.2

Since Spark 2.2, you can provide format string directly. So, try to something like this:

import org.apache.spark.sql.functions.to_timestamp

val ts = to_timestamp($"dts", "MM/dd/yyyy HH:mm:ss")

df.withColumn("ts", ts).show(2, false)

// +---+-------------------+-------------------+

// |id |dts                |ts |

// +---+-------------------+-------------------+

// |1  |05/26/2016 01:01:01|2016-05-26 01:01:01|

// |2  |#$@#@#             |null |

// +---+-------------------+-------------------+

For Spark<2.2

You can use date processing functions which were introduced in Spark 1.5 while assuming that you have the following data:

val df = Seq((1L, "05/26/2016 01:01:01"), (2L, "#$@#@#")).toDF("id", "dts")

You can use unix_timestamp to parse strings and cast it to timestamp

import org.apache.spark.sql.functions.unix_timestamp

val ts = unix_timestamp($"dts", "MM/dd/yyyy HH:mm:ss").cast("timestamp")

df.withColumn("ts", ts).show(2, false)

// +---+-------------------+---------------------+

// |id |dts                |ts |

// +---+-------------------+---------------------+

// |1  |05/26/2016 01:01:01|2016-05-26 01:01:01.0|

// |2  |#$@#@#             |null   |

// +---+-------------------+---------------------+


If you see this properly you will notice that it covers both parsing and error handling. So, the format string should be compatible with Java SimpleDateFormat.

Browse Categories

...