Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I would like to include null values in an Apache Spark join. Spark doesn't include rows with null by default.

Here is the default Spark behavior.

val numbersDf = Seq(

val lettersDf = Seq(
  ("123", "abc"),
  ("456", "def"),
  (null, "zzz"),
  ("", "hhh")
).toDF("numbers", "letters")

val joinedDf = numbersDf.join(lettersDf, Seq("numbers"))

Here is the output of

|    123|    abc|
|    456|    def|
|       |    hhh|

This is the output I would like:

|    123|    abc|
|    456|    def|
|       |    hhh|
|   null|    zzz|

1 Answer

0 votes
by (32.3k points)
edited by

I would suggest you use a NULL-safe equal operator(<=>).

I tried the approach and it worked.


If you want to know more about Spark, then do check out this awesome video tutorial:

Browse Categories
