For Spark version 1.4+:
Apply the casting method with DataType on the column:
import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
.drop("year")
.withColumnRenamed("yearTmp", "year")
If you are using SQL expressions you can also do:
val df2 = df.selectExpr("cast(year as int) year",
"make",
"model",
"comment",
"blank")
In case you need a helper method, use:
object DFHelper{
def castColumnTo( df: DataFrame, cn: String, type: DataType ) : DataFrame = {
df.withColumn( cn, df(cn).cast(type) )
}
}
which is used like:
import DFHelper._
val df2 = castColumnTo( df, "year", IntegerType )
If you want to know more regarding spark, you can refer the following video:
If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.
You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized Microsoft SQL Certification.