Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I want to process a CSV file present on my local hard disk in chunks using pandas. I have the processing code ready and it works without any error if I ran the code on a whole dataset. The problem arises when the same code is run on the chunks.

I thought maybe the chunks are of different data types so tried checking the type of chunks using type(chunk) and it is the same as type(whole_dataframe).

What I tried:

whole_data = pd.read_csv('data.csv', sep=',', header=0)

whole_data['cuisines'] = whole_data.cuisines.apply(lambda x: ','+x)

This gives me the expected result. But when I try running the same code on chunks as:

for chunk in pd.read_csv('data.csv', sep=',', header=0, chunksize=1000):

    chunk['cuisines'] = chunk.cuisines.apply(lambda x: ','+x)

This gives me an error: TypeError: can only concatenate str (not "float") to str

I expect the output to be the same as output I got while running the code on the whole dataset.

1 Answer

0 votes
by (25.1k points)

You need to convert those chunks into string. You can do it like this:

for chunk in pd.read_csv('data.csv', sep=',', header=0, chunksize=1000):

    chunk['cuisines'] = ',' + chunk.cuisines.astype(str).str

Related questions

0 votes
1 answer
asked Jan 14, 2020 in Python by Rajesh Malhotra (19.9k points)
+5 votes
2 answers
asked May 29, 2019 in Python by Ritik (3.5k points)
Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...