Summary statistics on Large csv file using python pandas

Question

1 Answer

Shlok Pandey · Answer 1 · 2019-07-05T07:20:53+0000

There is no limitation of size of file in pandas.read_csv method.

Use iterator=True and chunksize=xyz for loading the giant csv file.

After that you can calculate your statistics.

import pandas as pd
df = pd.read_csv('some_data.csv', iterator=True, chunksize=2000) # gives TextFileReader,which is iterable with chunks of 2000 rows.
partial_desc = df.describe()

After this, aggregate the info of all the partial describe.

If you want to learn statistics for Data Science then you can watch this video tutorial:

Gain practical exposure with data science projects in Intellipaat's Data Science course online.

Summary statistics on Large csv file using python pandas

1 Answer

Related questions

Browse Categories