Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.4k points)

I have the following sample DataFrame:

a    | b    | c   |

1    | 2    | 4   |
0    | null | null|
null | 3    | 4   |


And I want to replace null values only in the first 2 columns - Column "a" and "b":

a    | b    | c   |

1    | 2    | 4   |
0    | 0    | null|
0    | 3    | 4   |

 

Here is the code to create sample dataframe:

rdd = sc.parallelize([(1,2,4), (0,None,None), (None,3,4)])
df2 = sqlContext.createDataFrame(rdd, ["a", "b", "c"])


I know how to replace all null values using:

df2 = df2.fillna(0)


And when I try this, I lose the third column:

df2 = df2.select(df2.columns[0:1]).fillna(0)

1 Answer

0 votes
by (32.3k points)

Firstly, you will create your dataframe:

image

Now, in order to replace null values only in the first 2 columns - Column "a" and "b", and that too without losing the third column, you can use:

df.fillna( { 'a':0, 'b':0 } )

image

Learn Pyspark with the help of Pyspark Course by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...