Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

I am looking for a pythonic way to handle the following problem.

The pandas.get_dummies() method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B'], get_dummies() creates 2 dummy variables and assigns 0 or 1 accordingly.

Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D'] . get_dummies() creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks

1 Answer

0 votes
by (107k points)

There is a one-liner that is supported by the following documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html

In [4]: df

Out[4]:

      label

0  (a, c, e)

1     (a, d)

2       (b,)

3     (d, e)

In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*')

Out[5]:

   a  b c  d e

0  1 0  1 0 1

1  1 0  0 1 0

2  0 1  0 0 0

3  0 0  0 1 1

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...