Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I just want to save CSV file using dask.I got dat fiel which is over 30GB. there are no problem with read_csv. but after work i need to save it as CSV file. It doesn't work. Help me

import dask.dataframe as dd

df = dd.read_csv("E:/bigdata/H_2015_04.dat", sep="|", header=None)

df.to_csv("E:/bigdata/1.csv")

There are error messages like..

File "pandas/_libs/parsers.pyx", line 894, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 993, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1122, in pandas._libs.parsers.TextReader._convert_column_data File "pandas/_libs/parsers.pyx", line 1167, in pandas._libs.parsers.TextReader._convert_tokens File "pandas/_libs/parsers.pyx", line 1215, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas/_libs/parsers.pyx", line 1905, in pandas._libs.parsers._try_int64 MemoryError

1 Answer

0 votes
by (25.1k points)

The file you are reading is too big to be read into memory at once. You need to the data part by part so it can fit into memory. Like this:

reader = pd.read_csv(file_path, iterator=True)

chunk = reader.get_chunk(1000) # Reads first 1000 lines into memory

Related questions

Browse Categories

...