Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am working to extract all integer values from a specific column (left, top, length and width) in a csv file with multiple rows and columns. I have used pandas to isolate the columns I am interested in but Im stuck on how to use a specific parts of an array.

Let me explain: I need to use the CSV file's column with "left, top, length and width" attributes to then obtain xmin, ymin, xmax and ymax (these are coordinated of boxes in images). Example of a row in this column looks like so:

[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]

And I need to extract the 171, 0, 163 and 137 to do the necessary operations for finding my xmax, xmin, ymax and ymin

The above line is a single row in my pandas array, how do I extract the numbers I need for running my operations?

Here is the code I wrote to extract the column and this is what I have so far:

import os

import csv

import pandas

import numpy as np

csvPath = "/path/of/my/csvfile/csvfile.csv"

data = pandas.read_csv(csvPath)

csv_coords = data['Answer.annotation_data'].values #column with the coordinates

image_name = data ['Input.image_url'].values

print csv_coords[2]

1 Answer

0 votes
by (41.4k points)

df = pd.DataFrame(d)

Use the below code:

import ast

d = {'Answer.annotation_data': ['[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]',

                                '[{"left":170,"top":10,"width":173,"height":157,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]']}

print (df)

                              Answer.annotation_data

0  [{"left":171,"top":0,"width":163,"height":137,...

1  [{"left":170,"top":10,"width":173,"height":157...

# If it is necessary then convert string data to list of dicts

df['Answer.annotation_data'] = df['Answer.annotation_data'].apply(ast.literal_eval)

 Extract values of dict for each value of cols and then return DataFrame.

def get_val(val):

    comb = [[y.get(val, np.nan) for y in x] for x in df['Answer.annotation_data']]

    return pd.DataFrame(comb).add_prefix('{}_'.format(val))

At last join together by concat:

cols = ['left','top','width','height']

df1 = pd.concat([get_val(x) for x in cols], axis=1)

print (df1)

   left_0  left_1  top_0  top_1  width_0  width_1  height_0  height_1

0     171     222      0     42      163       45       137        70

1     170     222     10     42      173       45       157        70

If you wish to Learn more about Python visit this Python Tutorial.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...