0 votes
1 view
in Python by (1.6k points)

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...

Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.

What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.

If Python, assume the following interface:

def removeLine(filename, lineno):

Thanks,

1 Answer

0 votes
by (25.1k points)

Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing

import os

from mmap import mmap

def removeLine(filename, lineno):

    f=os.open(filename, os.O_RDWR)

    m=mmap(f,0)

    p=0

    for i in range(lineno-1):

        p=m.find('\n',p)+1

    q=m.find('\n',p)

    m[p:q] = ' '*(q-p)

    os.close(f)

Related questions

0 votes
2 answers
0 votes
1 answer
0 votes
1 answer
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...