Lets say i have 10gb of csv file and i want to get the summary statistics of the file using DataFrame describe method.

In this case first i need to create a DataFrame for all the 10gb csv data.




Does this mean all the 10gb will get loaded in to memory and calculate the statistics?

There is no limitation of size of file in pandas.read_csv method.

Use iterator=True and chunksize=xyz  for loading the giant csv file.

After that you can calculate  your statistics.

import pandas as pd

df = pd.read_csv('some_data.csv', iterator=True, chunksize=2000) # gives TextFileReader,which is iterable with chunks of 2000 rows.

partial_desc = df.describe()

After this, aggregate the info of all the partial describe.

