Explore Courses Blog Tutorials Interview Questions
0 votes
in Python by (19.9k points)

I'm new to python pandas, I faced a problem to find difference for 2 lists within a pandas dataframe.

Example Input with ; separator:

ColA; ColB  

A,B,C,D; B,C,D  

A,C,E,F; A,C,F  

Expected Output:

ColA; ColB; ColC  

A,B,C,D; B,C,D; A  

A,C,E,F; A,C,F; E  

What I want to do is similiar to:

df['ColC'] = np.setdiff1d( df['ColA'].str.split(','), df['ColB'].str.split(','))

But it returns error:

raise ValueError('Length of values does not match length of index',data,index,len(data),len(index))

Kindly advise

1 Answer

0 votes
by (25.1k points)

You can apply a lambda function to get the difference, like this:

import pandas as pd

df = pd.DataFrame([[['A', 'B', 'C', 'D'], ['B', 'C']]], columns=['ColA', 'ColB'])

df['ColC'] = df[['ColA', 'ColB']].apply(lambda x: [i for i in x[0] if i not in x[1]], axis=1)

Browse Categories