How can I use dask.dataframe to_csv when i have bigdata

Question

asked Jul 29, 2019 in Python by Rajesh Malhotra (19.9k points)

I just want to save CSV file using dask.I got dat fiel which is over 30GB. there are no problem with read_csv. but after work i need to save it as CSV file. It doesn't work. Help me

import dask.dataframe as dd
df = dd.read_csv("E:/bigdata/H_2015_04.dat", sep="|", header=None)
df.to_csv("E:/bigdata/1.csv")

There are error messages like..

File "pandas/_libs/parsers.pyx", line 894, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 993, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1122, in pandas._libs.parsers.TextReader._convert_column_data File "pandas/_libs/parsers.pyx", line 1167, in pandas._libs.parsers.TextReader._convert_tokens File "pandas/_libs/parsers.pyx", line 1215, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas/_libs/parsers.pyx", line 1905, in pandas._libs.parsers._try_int64 MemoryError

1 Answer

Anirudh Singh · Answer 1 · 2019-07-29T11:21:57+0000

The file you are reading is too big to be read into memory at once. You need to the data part by part so it can fit into memory. Like this:

reader = pd.read_csv(file_path, iterator=True)
chunk = reader.get_chunk(1000) # Reads first 1000 lines into memory

How can I use dask.dataframe to_csv when i have bigdata

How can I use dask.dataframe to_csv when i have bigdata

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions