Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I need the resulting data frame in the line below, to have an alias name "maxDiff" for the max('diff') column after groupBy. However, the below line does not makeany change, nor throw an error.

 grpdf = joined_df.groupBy(temp1.datestamp).max('diff').alias("maxDiff")

1 Answer

0 votes
by (32.3k points)

This is because you are not aliasing a particular column instead you are aliasing the whole DataFrame object. Given below is an example how to alias the Column only:

import pyspark.sql.functions as func

grpdf = joined_df \

    .groupBy(temp1.datestamp) \

    .max('diff') \

    .select(func.col("max(diff)").alias("maxDiff"))

Browse Categories

...