Parsing a JSON string which was loaded from a CSV using Pandas

Question

asked Sep 23, 2019 in Data Science by ashely (50.2k points)

I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example:

name,dob,stats
john smith,1/1/1980,"{""eye_color"": ""brown"", ""height"": 160, ""weight"": 76}"
dave jones,2/2/1981,"{""eye_color"": ""blue"", ""height"": 170, ""weight"": 85}"
bob roberts,3/3/1982,"{""eye_color"": ""green"", ""height"": 180, ""weight"": 94}"

After using df = pandas.read_csv('file.csv'), what's the most efficient way to parse and split the stats column into additional columns?

After about an hour, the only thing I could come up with was:

import json
stdf = df['stats'].apply(json.loads)
stlst = list(stdf)
stjson = json.dumps(stlst)
df.join(pandas.read_json(stjson))

This seems like I'm doing it wrong, and it's quite a bit of work considering I'll need to do this on three columns regularly.

*Edit: desired output is the dataframe object below. added following lines of code to get there in my (crappy) way:

1 Answer

vinita · Answer 1 · 2019-09-23T06:04:40+0000

Applying the json.load is a great idea, but from there you can just simply directly convert it to dataframe columns instead of writing/loading it again:

stdf = df['stats'].apply(json.loads)
pd.DataFrame(stdf.tolist()) # or stdf.apply(pd.Series)

or you can also do this in one step:

df.join(df['stats'].apply(json.loads).apply(pd.Series))

If you wish to learn more about Pandas visit this Pandas Tutorial.

Parsing a JSON string which was loaded from a CSV using Pandas

1 Answer

Related questions

Browse Categories