Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I am trying to do a left outer join in spark (1.6.2) and it doesn't work. My sql query is like this:

sqlContext.sql("select t.type, t.uuid, p.uuid
from symptom_type t LEFT JOIN plugin p
ON t.uuid = p.uuid
where t.created_year = 2016
and p.created_year = 2016").show()

The result is like this:

|                type|                uuid|                uuid|
|              tained|89759dcc-50c0-490...|89759dcc-50c0-490...|
|             swapper|740cd0d4-53ee-438...|740cd0d4-53ee-438...|

I got same result either using LEFT JOIN or LEFT OUTER JOIN (the second uuid is not null).

I would expect the second uuid column to be null only. how to do a left outer join correctly?

1 Answer

0 votes
by (32.3k points)

I don't see any issues in your code. Please check the data again, the data you are showing is for matches.

Try perform Spark SQL join by using:

// Left outer join explicit

df1.join(df2, df1("col1") === df2("col1"), "left_outer")

Try LEFT OUTER JOIN instead of LEFT JOIN keyword. For more information look at the Spark documentation.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers


108k users

Browse Categories