Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)
edited by

In spark SQL (perhaps only HiveQL) one can do:

select sex, avg(age) as avg_age
from humans
group by sex


which would result in a DataFrame with columns named "sex" and "avg_age".

How can avg(age) be aliased to "avg_age" without using textual SQL?

1 Answer

0 votes
by (32.3k points)

In order to rename a single column I would suggest you to use withColumnRenamed method:

case class Person(name: String, age: Int)

val df = sqlContext.createDataFrame(

    Person("Alice", 2) :: Person("Bob", 5) :: Nil) 

df.withColumnRenamed("name", "first_name")

Alternatively approach will be to use alias method:

import org.apache.spark.sql.functions.avg

df.select(avg($"age").alias("average_age")) 

You can take it further with small helper:

import org.apache.spark.sql.Column

def normalizeName(c: Column) = {

  val pattern = "\\W+".r

  c.alias(pattern.replaceAllIn(c.toString, "_"))

}

df.select(normalizeName(avg($"age")))

You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized Microsoft SQL Certification.

...