I have a huge csv file, which is 600 mb in size with 11 million rows. With this file, I want to create some statistical data like pivots, histograms, graphs etc. Just read the csv file normally:
df = pd.read_csv('Check400_900.csv', sep='\t')
But it doesn't work, so I used iterator and chunksize.
df = pd.read_csv('Check1_900.csv', sep='\t', iterator=True, chunksize=1000)
It was good, example "print df.get_chunk(5)" and explore the entire file with just
for chunk in df:
print chunk
Here is my problem arose, I just don't know how to use stuff like below code for the entire df and not for just one chunk
plt.plot()
print df.head()
print df.describe()
print df.dtypes
customer_group3 = df.groupby('UserID')
y3 = customer_group.size()