How to parse pandas column where list stored as string with multiple values

Question

asked Apr 14, 2021 in Python by anonlvx (150 points)
closed Jun 17, 2023 by Anamika Chakravarty

I am grabbing data from an API that returns a Content-Type of "application/csv". Then reading the csv file into a DataFrame and filtering out some unnecessary columns.

This results in the following DataFrame:

Int64Index: 97 entries, 6 to 1234
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Pin          97 non-null     int64  
 1   Dto          97 non-null     object 
 2   MsgId        97 non-null     int64  
 3   Tki          97 non-null     float64
 4   Diagnostics  97 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 4.5+ KB

The primary column of interest is the "Diagnostics" column.  Here's the data in the column:

 ["0x1101";"1";"NaN";"77.0";"77.0";"0x1102";"1";"NaN";"-12.0";"-12.0";"0x1103";"1";"NaN";"NaN";"NaN";"0x1104";"1";"0";"576";"575";"0x1105";"1";"0";"0";"0";"0x1106";"1";"1";"277";"18";"0x1107";"1";"72.21161";"0x1108";"1";"-15.60144";"0x1109";"1";"64.27636";"0x110A";"1";"-40.0";"0x110B";"1";"65.580734";"0x110C";"1";"-40.0";"0x110D";"1";"182";"0x110E";"1";"92";"0x110F";"1";"593";"0x1110";"1";"1.2894478E7";"0x1111";"1";"8308605.0";"0x1112";"1";"40.214996";"0x1113";"1";"4585831.0";"0x1114";"1";"1.1615938E7";"0x1115";"1";"9510470.0";"0x1116";"1";"1.0480007E7";"0x1117";"1";"275853.28";"0x1118";"1";"0";"0x1119";"1";"0";"0x111A";"1";"39";"0x1137";"1";"42.12033";"0x1138";"1";"28.678665";"0x1139";"1";"6.32547";"0x113A";"1";"0";"0x113B";"1";"0";"0x113C";"1";"115050.41";"0x113D";"1";"1.5.1.4";"0x113E";"0x113F"]

The data is a class str:

I've converted it to a list:

The first item in the list is the "Variable" - 0x1101

The second value is a "Component" - can be 1, 2, 3, 4, 5

The next 1 to 3 values are the actual values of the Component/Variable

However, you can see in the full data set from the column above that there may be 1 value or 3 values reported.

Then this sequence repeats for all the variables being captured.

The output I need to arrive at is as follows:

<index>	Pin	Dto	MsgId	Variable	Component	Value1	Value2	Value3
0	123456789	2021-04-06T00:49:06.1.10	1751	0x1101	1	NaN	77.0	77.0
1	123456789	2021-04-06T00:49:06.1.10	1751	0x1102	1	NaN	-12.0	-12.0
2	123456789	2021-04-06T00:49:06.1.10	1751	0x1103	1	NaN	NaN	NaN
3	123456789	2021-04-06T00:49:06.1.10	1751	0x1104	1	0	576	575
...	...	...	...	...	...	...	...	...

I'm completely stuck on how to approach this and would greatly appreciate and recommendations on the steps to derive this output.

closed

3 Answers

answered Jun 17, 2023 by Balram111 (25.7k points)
selected Jun 17, 2023 by Anamika Chakravarty

Best answer

To derive the desired output from the "Diagnostics" column, you can follow these steps:

Convert the string data in the "Diagnostics" column to a list.

Create an empty DataFrame with the desired column names: 'Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', 'Value3'.

Iterate over the list of data extracted from the "Diagnostics" column.

In each iteration, extract the relevant values for 'Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', 'Value3'.

Append a new row to the DataFrame with the extracted values.

Repeat the process for all entries in the list.

Display the resulting DataFrame.

Here's an example code snippet that demonstrates these steps:

import pandas as pd

# Convert the string data in the "Diagnostics" column to a list

diagnostics_list = eval(df['Diagnostics'].iloc[0])

# Create an empty DataFrame

output_df = pd.DataFrame(columns=['Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', 'Value3'])

# Iterate over the list of diagnostics data

for i in range(0, len(diagnostics_list), 3):

    variable = diagnostics_list[i]

    component = diagnostics_list[i+1]

    values = diagnostics_list[i+2]

    # Extract relevant values and append a new row to the DataFrame

    row = {

        'Pin': df['Pin'].iloc[0],

        'Dto': df['Dto'].iloc[0],

        'MsgId': df['MsgId'].iloc[0],

        'Variable': variable,

        'Component': component,

        'Value1': values[0] if len(values) >= 1 else None,

        'Value2': values[1] if len(values) >= 2 else None,

        'Value3': values[2] if len(values) >= 3 else None

    }

    output_df = output_df.append(row, ignore_index=True)

# Display the resulting DataFrame

print(output_df)

Make sure to adapt the code to fit your specific DataFrame structure and column names.

Similu · Answer 1 · 2023-06-17T10:49:53+0000

To derive the desired output from the "Diagnostics" column, follow these steps:

Convert the string data in the "Diagnostics" column to a list.
Create an empty DataFrame with the required column names.
Iterate over the list of data extracted from the "Diagnostics" column.
Extract the relevant values for each row, including 'Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', and 'Value3'.
Append a new row to the DataFrame with the extracted values.
Repeat the process for all entries in the list.
Display the resulting DataFrame.

Here's a code snippet that demonstrates these steps:

import pandas as pd

# Convert the string data in the "Diagnostics" column to a list

diagnostics_list = eval(df['Diagnostics'].iloc[0])

# Create an empty DataFrame

output_df = pd.DataFrame(columns=['Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', 'Value3'])

# Iterate over the list of diagnostics data

for i in range(0, len(diagnostics_list), 3):

variable = diagnostics_list[i]

component = diagnostics_list[i+1]

values = diagnostics_list[i+2]

# Extract relevant values and append a new row to the DataFrame

row = {'Pin': df['Pin'].iloc[0], 'Dto': df['Dto'].iloc[0], 'MsgId': df['MsgId'].iloc[0],

'Variable': variable, 'Component': component,

'Value1': values[0] if len(values) >= 1 else None,

'Value2': values[1] if len(values) >= 2 else None,

'Value3': values[2] if len(values) >= 3 else None}

output_df = output_df.append(row, ignore_index=True)

# Display the resulting DataFrame

print(output_df)

Anamika Chakravarty · Answer 2 · 2023-06-17T10:51:17+0000

To obtain the desired output from the "Diagnostics" column:

Convert the string data to a list.
Create an empty DataFrame with the required column names.
Iterate over the list, extracting the necessary values for each row.
Append a new row to the DataFrame with the extracted values.
Display the resulting DataFrame.

Here's a concise code snippet that outlines the steps:

import pandas as pd

diagnostics_list = eval(df['Diagnostics'].iloc[0])

output_df = pd.DataFrame(columns=['Pin', 'Dto', 'MsgId', 'Variable', 'Component', 'Value1', 'Value2', 'Value3'])

for i in range(0, len(diagnostics_list), 3):

variable, component, values = diagnostics_list[i], diagnostics_list[i+1], diagnostics_list[i+2]

output_df.loc[i] = [df['Pin'].iloc[0], df['Dto'].iloc[0], df['MsgId'].iloc[0], variable, component,

values[0] if len(values) >= 1 else None, values[1] if len(values) >= 2 else None,

values[2] if len(values) >= 3 else None]

print(output_df)

Remember to adapt the code to match your DataFrame structure and column names.

How to parse pandas column where list stored as string with multiple values

How to parse pandas column where list stored as string with multiple values

Please log in or register to add a comment.

3 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions