Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

I am looking for a pythonic way to handle the following problem.

The pandas.get_dummies() method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B'], get_dummies() creates 2 dummy variables and assigns 0 or 1 accordingly.

Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D'] . get_dummies() creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks

1 Answer

0 votes
by (108k points)

There is a one-liner that is supported by the following documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html

In [4]: df

Out[4]:

      label

0  (a, c, e)

1     (a, d)

2       (b,)

3     (d, e)

In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*')

Out[5]:

   a  b c  d e

0  1 0  1 0 1

1  1 0  0 1 0

2  0 1  0 0 0

3  0 0  0 1 1

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Browse Categories

...