Broadcast Hash Joins:
In SparkSQL, you can see the type of join being performed by calling queryExecution.executedPlan. As with core Spark, if one of the tables is much smaller than the other you may want a broadcast hash join. You can hint to Spark SQL that a given DF should be broadcast for join by calling method broadcast on the DataFrame before joining it
Example: largedataframe.join(broadcast(smalldataframe), "key")
Is there a way to force broadcast ignoring this variable?
Try the below command:
sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold = -1")
Another way to hint for a dataframe to be broadcasted is by using
left.join(broadcast(right), ...)