0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)
edited by

In spark SQL (perhaps only HiveQL) one can do:

select sex, avg(age) as avg_age
from humans
group by sex


which would result in a DataFrame with columns named "sex" and "avg_age".

How can avg(age) be aliased to "avg_age" without using textual SQL?

1 Answer

0 votes
by (32.3k points)

In order to rename a single column I would suggest you to use withColumnRenamed method:

case class Person(name: String, age: Int)

val df = sqlContext.createDataFrame(

    Person("Alice", 2) :: Person("Bob", 5) :: Nil) 

df.withColumnRenamed("name", "first_name")

Alternatively approach will be to use alias method:

import org.apache.spark.sql.functions.avg

df.select(avg($"age").alias("average_age")) 

You can take it further with small helper:

import org.apache.spark.sql.Column

def normalizeName(c: Column) = {

  val pattern = "\\W+".r

  c.alias(pattern.replaceAllIn(c.toString, "_"))

}

df.select(normalizeName(avg($"age")))

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...