Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I want to process a CSV file present on my local hard disk in chunks using pandas. I have the processing code ready and it works without any error if I ran the code on a whole dataset. The problem arises when the same code is run on the chunks.

I thought maybe the chunks are of different data types so tried checking the type of chunks using type(chunk) and it is the same as type(whole_dataframe).

What I tried:

whole_data = pd.read_csv('data.csv', sep=',', header=0)

whole_data['cuisines'] = whole_data.cuisines.apply(lambda x: ','+x)

This gives me the expected result. But when I try running the same code on chunks as:

for chunk in pd.read_csv('data.csv', sep=',', header=0, chunksize=1000):

    chunk['cuisines'] = chunk.cuisines.apply(lambda x: ','+x)

This gives me an error: TypeError: can only concatenate str (not "float") to str

I expect the output to be the same as output I got while running the code on the whole dataset.

1 Answer

0 votes
by (25.1k points)

You need to convert those chunks into string. You can do it like this:

for chunk in pd.read_csv('data.csv', sep=',', header=0, chunksize=1000):

    chunk['cuisines'] = ',' + chunk.cuisines.astype(str).str

Related questions

0 votes
1 answer
asked Jan 14, 2020 in Python by Rajesh Malhotra (19.9k points)
+5 votes
2 answers
asked May 29, 2019 in Python by Ritik (3.5k points)
...