Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)
edited by

I'm have an excel document formatted like so (Columns are datasets, Rows are cell types, values are comma-delineated gene names) Input Excel Format

I would like to reformat the sheet like so (Columns are still datasets, but Rows are now gene names, and values are cell types):

Output Excel Format

I was trying to do this in pandas. I imported the input as a dataFrame and called it 'test'. My logic was to loop over each column, and within loop over each row, take the comma-delineated values, split those, and then make each of those a new index.

This approach is obviously pretty inefficient, but I'm not even able to get it to work as expected yet (even though I'm not getting an error, just no output, Edit Note: 'blank' is the name of a new, blank Excel book)

Unworking attempt:

for dataSet in test.columns:

    for index, rows in test.iterrows():

        geneList = test[dataSet].loc[index].split(",")

        for gene in geneList:

            blank[dataSet].reindex([gene])

So two questions: 1. How can I get this example to work? 2. How can I accomplish this transformation more efficiently?

Thanks!

1 Answer

0 votes
by (41.4k points)

1. Firstly, create Series by stack.

2. Then, reshape by stack by using split

3.For columns from MultiIndex, call reset_index.

4.After that use groupby for list and then reshape by unstack:

df = df.stack().str.split(',', expand=True).stack().reset_index(name='a')

df = df.groupby(['a', 'level_1'])['level_0'].apply(','.join).unstack()

print (df)

level_1          dataset 1          dataset 2          dataset 3

a                                                               

gene1    cell1,cell2,cell3  cell1,cell2,cell3  cell1,cell2,cell3

gene2    cell1,cell2,cell3  cell1,cell2,cell3  cell1,cell2,cell3

gene3    cell1,cell2,cell3               None               None

gene4                 None  cell1,cell2,cell3               None

gene5                 None               None  cell1,cell2,cell3

If you wish to learn about Pandas visit this Pandas Tutorial by Intellipaat.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...