Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

root
 |-- user_id: long (nullable = false)
 |-- event_id: long (nullable = false)
 |-- invited: integer (nullable = false)
 |-- day_diff: long (nullable = true)
 |-- interested: integer (nullable = false)
 |-- event_owner: long (nullable = false)
 |-- friend_id: long (nullable = false)

And the data is shown below:

+----------+----------+-------+--------+----------+-----------+---------+
|   user_id|  event_id|invited|day_diff|interested|event_owner|friend_id|
+----------+----------+-------+--------+----------+-----------+---------+
|   4236494| 110357109|      0|      -1|         0|  937597069|     null|
|  78065188| 498404626|      0|       0|         0| 2904922087|     null|
| 282487230|2520855981|      0|      28|         0| 3749735525|     null|
| 335269852|1641491432|      0|       2|         0| 1490350911|     null|
| 437050836|1238456614|      0|       2|         0|  991277599|     null|
| 447244169|2095085551|      0|      -1|         0| 1579858878|     null|
| 516353916|1076364848|      0|       3|         1| 3597645735|     null|
| 528218683|1151525474|      0|       1|         0| 3433080956|     null|
| 531967718|3632072502|      0|       1|         0| 3863085861|     null|
| 627948360|2823119321|      0|       0|         0| 4092665803|     null|
| 811791433|3513954032|      0|       2|         0|  415464198|     null|
| 830686203|  99027353|      0|       0|         0| 3549822604|     null|
|1008893291|1115453150|      0|       2|         0| 2245155244|     null|
|1239364869|2824096896|      0|       2|         1| 2579294650|     null|
|1287950172|1076364848|      0|       0|         0| 3597645735|     null|
|1345896548|2658555390|      0|       1|         0| 2025118823|     null|
|1354205322|2564682277|      0|       3|         0| 2563033185|     null|
|1408344828|1255629030|      0|      -1|         1|  804901063|     null|
|1452633375|1334001859|      0|       4|         0| 1488588320|     null|
|1625052108|3297535757|      0|       3|         0| 1972598895|     null|
+----------+----------+-------+--------+----------+-----------+---------+

 

I want to filter out the rows have null values in the field of "friend_id".

scala> val aaa = test.filter("friend_id is null")

scala> aaa.count
I got :res52: Long = 0 which is obvious not right. What is the right way to get it?

1 Answer

0 votes
by (32.3k points)

To create the filter condition that you are looking for, follow the example below:

val df = spark.createDataFrame(Seq(

  (0, "a1", "b1", "c1", "d1"),

  (1, "a2", "b2", "c2", "d2"),

  (2, "a3", "b3", null, "d3"),

  (3, "a4", null, "c4", "d4"),

  (4, null, "b5", "c5", "d5")

)).toDF("id", "col1", "col2", "col3", "col4")

+---+----+----+----+----+

| id|col1|col2|col3|col4|

+---+----+----+----+----+

|  0| a1|  b1| c1| d1|

|  1| a2|  b2| c2| d2|

|  2| a3|  b3|null| d3|

|  3| a4|null|  c4| d4|

|  4|null|  b5| c5| d5|

+---+----+----+----+----+

Browse Categories

...