How to export associated adjacent pandas dataframe data into a dictionary?

Question

asked Jul 27, 2019 in Data Science by sourav (17.6k points)

I'm wanting to take the following style of dataframe into a dictionary.

Input:

>>>import pandas as pd
>>>df = pd.read_csv('file.csv')
>>>print(df)
Market Rep Name Date Amount
0 A1 B1 C1 D1 1
1 A1 B1 C1 D1 2
2 A1 B1 C1 D2 3
3 A1 B1 C1 D2 4
4 A1 B1 C2 D1 5
5 A1 B1 C2 D1 6
6 A1 B1 C2 D2 7
7 A1 B1 C2 D2 8
8 A1 B2 C3 D1 9
9 A1 B2 C3 D1 10
10 A1 B2 C3 D2 11
11 A1 B2 C3 D2 12
12 A2 B3 C4 D1 13
13 A2 B3 C4 D1 14

Desired Output:

>>> print(associated_data)
{'A1': {'B1': {'C1': {'D1':[1 + 2],
{'D2':[3 + 4]},
'C2': {'D1':[5 + 6],
'D2':[7 + 8]}}
{'B2': {'C3': {'D1':[9 + 10],
'D2':[11 + 12]}}},
'A2': {'B3': {'C4': {'D1':[13 + 14]}}}}

This might not be the best way to go about organizing and sorting the data, so I'm open to suggesting.

I've tried a method I was hoping would work by going through a ton of for loops like this:

# Main function
for market in df['Market'].unique():
market_data = self.df.loc[self.df['Market'] == market]
associated_reps = market_data['Rep'].unique()
# Repeat
for rep in associated_reps:
rep_data = market_data.loc[market_data['Rep'] == rep]
associated_names = rep_data['Name'].unique()
# Repeat
for name in associated_names:
name_data = rep_data.loc[rep_data['Name'] == name]
associated_dates = name_data['Date'].unique()
# Repeat
for date in associated_dates:
date_data = name_data.loc[name_data['Date'] == date]
associated_amount = sum(date_data['Amount'].tolist())
# Attempted solution code (total fail)
breakdown[market][rep][name][date] = associated_amount

This does appropriately separate all of the data, and at the end attempts to put it all together. I was hoping that you could make a super nested dict like this, but it failed completely (as it turns out that's just not how dicts work unfortunately lmao).

How can you produce the desired output to produce the same result (maybe with shorter sorting code as well)?

Thanks!

1 Answer

Shlok Pandey · Answer 1 · 2019-08-01T06:25:08+0000

You have to first set the indices for all "categories" in your data.Then, aggregate on the index to remove duplicate indices and after that generate output dict.

import print
import numpy as np
def make_dict(ind_vals, d, v):
"""Accumulate index entries as keys in a dict."""
p = d
for ix in ind_vals[:-1]:
# Replace with collection.OrderedDict if necessary.
p = p.setdefault(ix, {})
# Set the actual value of interest.
p[ind_vals[-1]] = v
# Set indices correctly.
df = df.set_index(['Market', 'Rep', 'Name', 'Date'])
# Group values so we don't have duplicate indices
df = df.groupby(level=df.index.names).apply(np.sum)
dct = {} # Replace with collection.OrderedDict if necessary.
for idx, val in df.iterrows():
make_dict(idx, dct, val.Amount)
pprint.pprint(dct)
# {'A1': {'B1': {'C1': {'D1': 3, 'D2': 7}, 'C2': {'D1': 11, 'D2': 15}},
# 'B2': {'C3': {'D1': 19, 'D2': 23}}},
# 'A2': {'B3': {'C4': {'D1': 27}}}}

How to export associated adjacent pandas dataframe data into a dictionary?

1 Answer

Related questions

Browse Categories