Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Data Science by (17.6k points)

I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames?

In other word, a data frame that has all the rows/columns in df1 that are not in df2?

enter image description here

2 Answers

0 votes
by (36.8k points)

You can use drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

The above code is used only on the data frame which has no duplicates For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})

df2=pd.DataFrame({'A':[1],'B':[2]})

Wrong Output :

pd.concat([df1, df2]).drop_duplicates(keep=False)

Out[655]: 

   A  B

1  2  3

Correct Output

Out[656]: 

   A  B

1  2  3

2  3  4

3  3  4

There are 2 methods to achieve it:

Method 1: using isin  in the tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]

Out[657]: 

   A  B

1  2  3

2  3  4

3  3  4

Method 2: Then merge with indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']

Out[421]: 

   A  B     _merge

1  2  3  left_only

2  3  4  left_only

3  3  4  left_only

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

 

0 votes
by (140 points)
I would like to learn how to get the result
Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...