0 votes
1 view
in Data Science by (17.6k points)

I have a very big csv file so that I can not read them all into the memory. I only want to read and process a few lines in it. So I am seeking a function in Pandas which could handle this task, which the basic python can handle this well:

with open('abc.csv') as f:

    line = f.readline()

    # pass until it reaches a particular line number....

However, if I do this in pandas, I always read the first line:

datainput1 = pd.read_csv('matrix.txt',sep=',', header = None, nrows = 1 )

datainput2 = pd.read_csv('matrix.txt',sep=',', header = None, nrows = 1 )

I am looking for some easier way to handle this task in pandas. For example, if I want to read rows from 1000 to 2000. How can I do this quickly?

I want to use pandas because I want to read data into the dataframe.

1 Answer

0 votes
by (39.1k points)

Use chunksize:

for df in pd.read_csv('matrix.txt',sep=',', header = None, chunksize=1):

    #do something

To answer your second part do this:

df = pd.read_csv('matrix.txt',sep=',', header = None, skiprows=1000, chunksize=1000)

This will skip the first 1000 rows and then only read the next 1000 rows giving you rows 1000-2000, unclear if you require the end points to be included or not but you can fiddle the numbers to get what you want.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !