Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Java by (3.5k points)

I need to read a large text file of around 5-6 GB line by line using Java.

How can I do this quickly?

1 Answer

0 votes
by (46k points)

A simple way of reading the lines of the file is in memory – both Guava and Apache Commons IO provide a fast way to do just that:

Files.readLines(new File(path), Charsets.UTF_8);

FileUtils.readLines(new File(path));

The difficulty with this method is that all the file lines are stored in memory – which will promptly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

@Test

public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {

    String path = ...

    Files.readLines(new File(path), Charsets.UTF_8);

}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

 

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

This means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

To solve this error use:

jmap -dump:format=b,file=filename 6054

It should be obvious by this point that keeping in memory the contents of the file will quickly exhaust the available memory – regardless of how much that is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding them in memory.

Related questions

0 votes
1 answer
+1 vote
1 answer
0 votes
1 answer
asked Jul 9, 2019 in Java by Nigam (4k points)
0 votes
1 answer
0 votes
1 answer

Browse Categories

...