Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Data Science by (17.6k points)

I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames?

In other word, a data frame that has all the rows/columns in df1 that are not in df2?

enter image description here

2 Answers

0 votes
by (36.8k points)

You can use drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

The above code is used only on the data frame which has no duplicates For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})

df2=pd.DataFrame({'A':[1],'B':[2]})

Wrong Output :

pd.concat([df1, df2]).drop_duplicates(keep=False)

Out[655]: 

   A  B

1  2  3

Correct Output

Out[656]: 

   A  B

1  2  3

2  3  4

3  3  4

There are 2 methods to achieve it:

Method 1: using isin  in the tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]

Out[657]: 

   A  B

1  2  3

2  3  4

3  3  4

Method 2: Then merge with indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']

Out[421]: 

   A  B     _merge

1  2  3  left_only

2  3  4  left_only

3  3  4  left_only

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

 

0 votes
by (140 points)
I would like to learn how to get the result

Browse Categories

...