I am trying to build for each of my users a vector containing the average number of records per hour of day. Hence the vector has to have 24 dimensions.
My original DataFrame has userID and hour columns, and I am starting by doing a groupBy and counting the number of record per user per hour as follow:
val hourFreqDF = df.groupBy("userID", "hour").agg(count("*") as "hfreq")
Now, in order to generate a vector per user I am doing this:
val hours = (0 to 23 map { n => s"$n" } toArray)
val assembler = new VectorAssembler()
.setInputCols(hours)
.setOutputCol("hourlyConnections")
val exprs = hours.map(c => avg(when($"hour" === c, $"hfreq").otherwise(lit(0))).alias(c))
val transformed = assembler.transform(hourFreqDF.groupBy($"userID")
.agg(exprs.head, exprs.tail: _*))
When I run this, I get the following warning:
Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
I presume this is because the expression is too long?
My question is: can I safely ignore this warning?