Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

I am looking for a pythonic way to handle the following problem.

The pandas.get_dummies() method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B'], get_dummies() creates 2 dummy variables and assigns 0 or 1 accordingly.

Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D'] . get_dummies() creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks

1 Answer

0 votes
by (108k points)

There is a one-liner that is supported by the following documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html

In [4]: df

Out[4]:

      label

0  (a, c, e)

1     (a, d)

2       (b,)

3     (d, e)

In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*')

Out[5]:

   a  b c  d e

0  1 0  1 0 1

1  1 0  0 1 0

2  0 1  0 0 0

3  0 0  0 1 1

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers

500 comments

108k users

Browse Categories

...