0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I need the resulting data frame in the line below, to have an alias name "maxDiff" for the max('diff') column after groupBy. However, the below line does not makeany change, nor throw an error.

 grpdf = joined_df.groupBy(temp1.datestamp).max('diff').alias("maxDiff")

1 Answer

0 votes
by (32.2k points)

This is because you are not aliasing a particular column instead you are aliasing the whole DataFrame object. Given below is an example how to alias the Column only:

import pyspark.sql.functions as func

grpdf = joined_df \

    .groupBy(temp1.datestamp) \

    .max('diff') \

    .select(func.col("max(diff)").alias("maxDiff"))

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...