Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)
edited by

I want to transform my data from df1 to df2:

df1:

   ID  a  b  c  d  a-d  c-d  a-c-d

0   1  0  0  0  0    0    0      1

1   2  0  0  1  0    1    0      0

2   3  0  1  0  0    0    1      0

3   4  0  0  0  0    1    0      1

4   5  0  0  1  1    0    0      0

And df2 is:

   ID  a  b  c  d

0   1  1  0  1  1

1   2  1  0  1  1

2   3  0  1  1  1

3   4  2  0  1  2

4   5  0  0  1  1

Basically, I want to get a total value of "a", from all my columns in which a letter "a" appears in the column name. E.g. in the 4th row of df1 there are 2 column names in which a letter "a" appears. If you sum up all values "a" from the 4th row, there would be my total of 2 a's there. I want my single column for apples in the new dataset (df2). Note that the 1 for "a-c-d" is the 1 for EACH "a", "b", "c".

1 Answer

0 votes
by (36.8k points)

You can try the melt to stack and the column names, then str.split followed by the explode to split a,b,c,d and duplicate your data:

(df1.melt('ID', var_name='col')

    .assign(col=lambda x: x['col'].str.split('-'))

    .explode('col')

    .pivot_table(index='ID',columns='col', 

                 values='value', aggfunc='sum')

    .reset_index()

)

Output:

col  ID  a  b  c  d

0     1  1  0  1  1

1     2  1  0  1  1

2     3  0  1  1  1

3     4  2  0  1  2

4     5  0  0  1  1

Improve your knowledge in data science from scratch using Data science online courses

Browse Categories

...