Back
pd.get_dummies allows converting a categorical variable into dummy variables. Besides the fact that it's trivial to reconstruct the categorical variable, is there a preferred/quick way to do it?
In [46]: s = Series(list('aaabbbccddefgh')).astype('category')In [47]: sOut[47]: 0 a1 a2 a3 b4 b5 b6 c7 c8 d9 d10 e11 f12 g13 hdtype: categoryCategories (8, object): [a < b < c < d < e < f < g < h]In [48]: df = pd.get_dummies(s)In [49]: dfOut[49]: a b c d e f g h0 1 0 0 0 0 0 0 01 1 0 0 0 0 0 0 02 1 0 0 0 0 0 0 03 0 1 0 0 0 0 0 04 0 1 0 0 0 0 0 05 0 1 0 0 0 0 0 06 0 0 1 0 0 0 0 07 0 0 1 0 0 0 0 08 0 0 0 1 0 0 0 09 0 0 0 1 0 0 0 010 0 0 0 0 1 0 0 011 0 0 0 0 0 1 0 012 0 0 0 0 0 0 1 013 0 0 0 0 0 0 0 1In [50]: x = df.stack()# here you need to specify ALL of the categories In [51]: Series(pd.Categorical(x[x!=0].index.get_level_values(1)))Out[51]: 0 a1 a2 a3 b4 b5 b6 c7 c8 d9 d10 e11 f12 g13 hName: level_1, dtype: categoryCategories (8, object): [a < b < c < d < e < f < g < h]
In [46]: s = Series(list('aaabbbccddefgh')).astype('category')
In [47]: s
Out[47]:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 d
9 d
10 e
11 f
12 g
13 h
dtype: category
Categories (8, object): [a < b < c < d < e < f < g < h]
In [48]: df = pd.get_dummies(s)
In [49]: df
Out[49]:
a b c d e f g h
0 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0
5 0 1 0 0 0 0 0 0
6 0 0 1 0 0 0 0 0
7 0 0 1 0 0 0 0 0
8 0 0 0 1 0 0 0 0
9 0 0 0 1 0 0 0 0
10 0 0 0 0 1 0 0 0
11 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 1
In [50]: x = df.stack()
# here you need to specify ALL of the categories
In [51]: Series(pd.Categorical(x[x!=0].index.get_level_values(1)))
Out[51]:
Name: level_1, dtype: category
I think you need a function to 'do' this as it seems to be a natural operation. Maybe get_categories(), you can refer the following link:
https://github.com/pandas-dev/pandas/issues/8745
31k questions
32.8k answers
501 comments
693 users