2 views

pd.get_dummies allows converting a categorical variable into dummy variables. Besides the fact that it's trivial to reconstruct the categorical variable, is there a preferred/quick way to do it?

by (108k points)

In [46]: s = Series(list('aaabbbccddefgh')).astype('category')

In [47]: s

Out[47]:

0     a

1     a

2     a

3     b

4     b

5     b

6     c

7     c

8     d

9     d

10    e

11    f

12    g

13    h

dtype: category

Categories (8, object): [a < b < c < d < e < f < g < h]

In [48]: df = pd.get_dummies(s)

In [49]: df

Out[49]:

a  b c  d e f  g h

0   1 0  0 0 0  0 0 0

1   1 0  0 0 0  0 0 0

2   1 0  0 0 0  0 0 0

3   0 1  0 0 0  0 0 0

4   0 1  0 0 0  0 0 0

5   0 1  0 0 0  0 0 0

6   0 0  1 0 0  0 0 0

7   0 0  1 0 0  0 0 0

8   0 0  0 1 0  0 0 0

9   0 0  0 1 0  0 0 0

10  0 0  0 0 1  0 0 0

11  0 0  0 0 0  1 0 0

12  0 0  0 0 0  0 1 0

13  0 0  0 0 0  0 0 1

In [50]: x = df.stack()

# here you need to specify ALL of the categories

In [51]: Series(pd.Categorical(x[x!=0].index.get_level_values(1)))

Out[51]:

0     a

1     a

2     a

3     b

4     b

5     b

6     c

7     c

8     d

9     d

10    e

11    f

12    g

13    h

Name: level_1, dtype: category

Categories (8, object): [a < b < c < d < e < f < g < h]

I think you need a function to 'do' this as it seems to be a natural operation. Maybe get_categories(), you can refer the following link:

https://github.com/pandas-dev/pandas/issues/8745