Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Suppose I have a nested dictionary 'user_dict' with structure:

Level 1: UserId (Long Integer)

Level 2: Category (String)

Level 3: Assorted Attributes (floats, ints, etc..)

For example, an entry of this dictionary would be:

user_dict[12] = {

    "Category 1": {"att_1": 1, 

                   "att_2": "whatever"},

    "Category 2": {"att_1": 23, 

                   "att_2": "another"}}

each item in "user_dict" has the same structure and "user_dict" contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose.

Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary?

If I try something like:

df = pandas.DataFrame(users_summary)

The items in "level 1" (the user id's) are taken as columns, which is the opposite of what I want to achieve (have user id's as index).

I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.

1 Answer

0 votes
by (41.4k points)
edited by

Since, pd.concat accepts a dictionary. So, we can use a dictionary comprehension to build a dictionary mapping keys to sub-frames.

pd.concat({k: pd.DataFrame(v).T for k, v in user_dict.items()}, axis=0)

Study Natural Language Processing comprehensively with the help of this video tutorial:

Browse Categories

...