How to change column types in Spark SQL's DataFrame?

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-05T15:18:25+0000

For Spark version 1.4+:

Apply the casting method with DataType on the column:

import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
.drop("year")
.withColumnRenamed("yearTmp", "year")

If you are using SQL expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year",
                        "make",
                        "model",
                        "comment",
                        "blank")

In case you need a helper method, use:

object DFHelper{
  def castColumnTo( df: DataFrame, cn: String, type: DataType ) : DataFrame = {
    df.withColumn( cn, df(cn).cast(type) )
  }
}

which is used like:

import DFHelper._
val df2 = castColumnTo( df, "year", IntegerType )

If you want to know more regarding spark, you can refer the following video:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized Microsoft SQL Certification.

How to change column types in Spark SQL's DataFrame?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources