Transform dataset based on conditional addition - complicated

Question

asked Oct 27, 2020 in Data Science by blackindya (18.4k points)
edited Oct 29, 2020 by blackindya

I want to transform my data from df1 to df2:

df1:
ID a b c d a-d c-d a-c-d
0 1 0 0 0 0 0 0 1
1 2 0 0 1 0 1 0 0
2 3 0 1 0 0 0 1 0
3 4 0 0 0 0 1 0 1
4 5 0 0 1 1 0 0 0

And df2 is:

ID a b c d
0 1 1 0 1 1
1 2 1 0 1 1
2 3 0 1 1 1
3 4 2 0 1 2
4 5 0 0 1 1

Basically, I want to get a total value of "a", from all my columns in which a letter "a" appears in the column name. E.g. in the 4th row of df1 there are 2 column names in which a letter "a" appears. If you sum up all values "a" from the 4th row, there would be my total of 2 a's there. I want my single column for apples in the new dataset (df2). Note that the 1 for "a-c-d" is the 1 for EACH "a", "b", "c".

1 Answer

supriya · Answer 1 · 2020-10-28T07:25:13+0000

You can try the melt to stack and the column names, then str.split followed by the explode to split a,b,c,d and duplicate your data:

(df1.melt('ID', var_name='col')
.assign(col=lambda x: x['col'].str.split('-'))
.explode('col')
.pivot_table(index='ID',columns='col',
values='value', aggfunc='sum')
.reset_index()
)

Output:

col ID a b c d
0 1 1 0 1 1
1 2 1 0 1 1
2 3 0 1 1 1
3 4 2 0 1 2
4 5 0 0 1 1

Improve your knowledge in data science from scratch using Data science online courses

Transform dataset based on conditional addition - complicated

Transform dataset based on conditional addition - complicated

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions