Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I am trying to do a left outer join in spark (1.6.2) and it doesn't work. My sql query is like this:

sqlContext.sql("select t.type, t.uuid, p.uuid
from symptom_type t LEFT JOIN plugin p
ON t.uuid = p.uuid
where t.created_year = 2016
and p.created_year = 2016").show()

The result is like this:

|                type|                uuid|                uuid|
|              tained|89759dcc-50c0-490...|89759dcc-50c0-490...|
|             swapper|740cd0d4-53ee-438...|740cd0d4-53ee-438...|

I got same result either using LEFT JOIN or LEFT OUTER JOIN (the second uuid is not null).

I would expect the second uuid column to be null only. how to do a left outer join correctly?

1 Answer

0 votes
by (32.3k points)

I don't see any issues in your code. Please check the data again, the data you are showing is for matches.

Try perform Spark SQL join by using:

// Left outer join explicit

df1.join(df2, df1("col1") === df2("col1"), "left_outer")

Try LEFT OUTER JOIN instead of LEFT JOIN keyword. For more information look at the Spark documentation.

Browse Categories