0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Suppose I'm doing something like:

val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" -> "true"))

 |-- year: string (nullable = true)
 |-- make: string (nullable = true)
 |-- model: string (nullable = true)
 |-- comment: string (nullable = true)
 |-- blank: string (nullable = true)

year make  model comment              blank
2012 Tesla S     No comment               
1997 Ford  E350  Go get one now th...  

but I really wanted the year as Int (and perhaps transform some other columns). 

Any Suggestions?

1 Answer

0 votes
by (32.5k points)
edited by

For Spark version 1.4+:

 Apply the casting method with DataType on the column:

import org.apache.spark.sql.types.IntegerType

val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))


    .withColumnRenamed("yearTmp", "year")

If you are using SQL expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", 





In case you need a helper method, use:

object DFHelper{

  def castColumnTo( df: DataFrame, cn: String, type: DataType ) : DataFrame = {

    df.withColumn( cn, df(cn).cast(type) )



which is used like:

import DFHelper._

val df2 = castColumnTo( df, "year", IntegerType )

If you want to know more regarding spark, you can refer the following video:


If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized Microsoft SQL Certification.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !