I'm wanting to take the following style of dataframe into a dictionary.
Input:
>>>import pandas as pd
>>>df = pd.read_csv('file.csv')
>>>print(df)
Market Rep Name Date Amount
0 A1 B1 C1 D1 1
1 A1 B1 C1 D1 2
2 A1 B1 C1 D2 3
3 A1 B1 C1 D2 4
4 A1 B1 C2 D1 5
5 A1 B1 C2 D1 6
6 A1 B1 C2 D2 7
7 A1 B1 C2 D2 8
8 A1 B2 C3 D1 9
9 A1 B2 C3 D1 10
10 A1 B2 C3 D2 11
11 A1 B2 C3 D2 12
12 A2 B3 C4 D1 13
13 A2 B3 C4 D1 14
Desired Output:
>>> print(associated_data)
{'A1': {'B1': {'C1': {'D1':[1 + 2],
{'D2':[3 + 4]},
'C2': {'D1':[5 + 6],
'D2':[7 + 8]}}
{'B2': {'C3': {'D1':[9 + 10],
'D2':[11 + 12]}}},
'A2': {'B3': {'C4': {'D1':[13 + 14]}}}}
This might not be the best way to go about organizing and sorting the data, so I'm open to suggesting.
I've tried a method I was hoping would work by going through a ton of for loops like this:
# Main function
for market in df['Market'].unique():
market_data = self.df.loc[self.df['Market'] == market]
associated_reps = market_data['Rep'].unique()
# Repeat
for rep in associated_reps:
rep_data = market_data.loc[market_data['Rep'] == rep]
associated_names = rep_data['Name'].unique()
# Repeat
for name in associated_names:
name_data = rep_data.loc[rep_data['Name'] == name]
associated_dates = name_data['Date'].unique()
# Repeat
for date in associated_dates:
date_data = name_data.loc[name_data['Date'] == date]
associated_amount = sum(date_data['Amount'].tolist())
# Attempted solution code (total fail)
breakdown[market][rep][name][date] = associated_amount
This does appropriately separate all of the data, and at the end attempts to put it all together. I was hoping that you could make a super nested dict like this, but it failed completely (as it turns out that's just not how dicts work unfortunately lmao).
How can you produce the desired output to produce the same result (maybe with shorter sorting code as well)?
Thanks!