How to Read Really Large Files in Java

Here’s a class that makes this really easy. The entire file is never read into memory, so it should be able to handle files of any size (that your operating system can handle).

import java.util.*;
import java.io.*;
 
public class BigFile implements Iterable<String>
{
    private BufferedReader _reader;
 
    public BigFile(String filePath) throws Exception
    {
	_reader = new BufferedReader(new FileReader(filePath));
    }
 
    public void Close()
    {
	try
	{
	    _reader.close();
	}
	catch (Exception ex) {}
    }
 
    public Iterator<String> iterator()
    {
	return new FileIterator();
    }
 
    private class FileIterator implements Iterator<String>
    {
	private String _currentLine;
 
	public boolean hasNext()
	{
	    try
	    {
		_currentLine = _reader.readLine();
	    }
	    catch (Exception ex)
	    {
		_currentLine = null;
		ex.printStackTrace();
	    }
 
	    return _currentLine != null;
	}
 
	public String next()
	{
	    return _currentLine;
	}
 
	public void remove()
	{
	}
    }
}

Here’s how you might use it:

BigFile file = new BigFile("C:\Temp\BigFile.txt");
 
for (String line : file)
    System.out.println(line);

If you liked this, please tell someone else about it and subscribe to this blog. If not, please leave me a suggestion below.

70 Responses to “How to Read Really Large Files in Java”

  1. Hi,

    Thanks for sharing the code. I really liked the way you have made file reading very easy.

    Thanks a lot.

    Satish

  2. Hi,
    Have you tried this on files larger than 4GB? I have used a similar method but it can’t cope with anything >4GB
    Dave

  3. My apologies – my method did not check properly for EOF. I am sure yours will be OK for files > 4GB.
    Dave

  4. I haven’t tried anything larger than 4 GB that I can remember. What really matters is whether you’re trying to save the file contents into memory as you read it. If you do, then you will run into a memory limit based on your computer hardware. If you don’t (i.e., process it a line at a time), then there should be no limit other than hard drive space.

  5. Hi, im a Java newbie. Where can I place the code

    BigFile file = new BigFile(“C:\Temp\BigFile.txt”);

    for (String line : file)
    System.out.println(line);

    ??
    Can you send me all the code?
    Thanks

  6. Hi David,

    You should be able to put the code into a Main.java class that would look something like this.

    public class Main
    {
    public static void main(String[] args)
    {
    BigFile file = new BigFile(“C:\Temp\BigFile.txt”);

    for (String line : file)
    System.out.println(line);
    }
    }

    If that doesn’t help you, maybe you can post what you have, and we can try from there.

  7. Thanks for the help. I’ve created that class you suggested but it gives me the following error regarding the file variable: Type mismatch: cannot convert from element type Object to String. Any idea how to fix this? Thanks in advance.

  8. Hi David,

    I think I found the problem. Part of the code got chopped out by the blogging platform I’m using because it though they were HTML tags. I’ve updated the code above. Basically all I changes was anywhere it said Iterator to Iterator<String>. See above. This way when it iterates through the file it knows to look for String objects rather than Object objects. Hope that helps.

    -Steve

  9. Thanks for the help. That was exactly what I needed.

  10. HiAll,

    I want to read one large text file, then i have to split this file at runtime then i need to place this 2 files into one folder, Please give me the solution,it’s very urgent.

    Please help me ya
    Sreenivas

  11. Sreenivas,

    I’m afraid your description of what you’re trying to do is not adequate for us to offer a solution. Please provide more details, even pseudocode. Also I’m curious what you need it for?

  12. i have 2000000 numbers of txt file upload to two servers . using java . for each server upload to 1000000 number using mysql db.

  13. That clearly works, but if perfomance is important, you should use a MappedByteBuffer of the NIO package instead, like in this example.

  14. I have a large log file, which I have to read and have to do the following:

    In the log file I would have an entry saying “processing 1234″ and somewhere down the line in the file I would have corresponding line
    saying “acknowledged 1234″. So based on this number I have to say whether or not it was acknowledged. If we don’t recieve ack, we would not have the second line. I am trying for a good optimal solution. Any help appreciated.

  15. Girish, do you know what you’re looking for in the file from the beginning? Or is the value (for example, “1234″) different depending on the file? If the latter, then you’ll probably need to use a technique called “regular expressions.”

  16. Could you implement delete functionality ?

  17. Probably. Can you please give me more detail about what you want to do?

  18. I want to replace/delete string occurrence in a very big log file without writing it to another file.

  19. If you are working with a very large file, the problem you often run into is a lack of memory. If you didn’t have to worry about memory capacity, you could read the file contents into a StringBuilder (or array) object, modify it, and then spit it back out to the same file. But since we are talking about ways to deal with very large files, the only solution I can think of would be to read it from the source file (using the technique shown in this post), modify each line as you read it, and then spit it out to a new file. Then you could delete the original file and rename the new file to the name of the original file if you wanted. Hope that helps.

  20. I tried to read a very big file and write to another it works fine but if I try to call this class from another one (like main but from another class) I get out of memory error any idea why?

  21. Not sure. Can you post a code sample of where it’s working and where it’s not?

  22. The BigFile class with

    public void main(String[] args)
    {
    BigFile file = new BigFile(“C:\Temp\BigFile.txt”);

    for (String line : file)
    System.out.println(line);
    }
    works fine
    but using another class that has function that do the same as above I get out of memory on the line
    for (String line : file)

  23. Hi guys,
    I hv a 2GB+ file. I need to read this file and split in 9 files. I already used this provided code. It’s working fine but it is taken more than 15 mnts to read the file. I need to read this file just with in 5 mnts….
    Is it possible in java? Please give some suggession.

  24. Hmmm…the focus of this post is enabling you to parse very large files, and it seems to be working for that purpose. If you need to read the file faster, you can try a few things. One is to buy a faster computer and/or hard drive. You could try a solid state drive and see how that goes.

    But what are you doing with each line of the file? For it to take 15 minutes, it seems you are probably parsing each line of the file, which is probably slowing it down. If you post a code sample, we might be able to help you figure something out.

  25. Hi

    The above method basically splits the file based on new lines, i have a large file which i’d like to split based on some separator.
    let’s say my file content is abc||def||ghi and i want the iterator to return the elements as abc, def and ghi.
    Can you please tell me how to do this(I’m reading a 2 GB file and saving the content in a string array list)? I’m fairly new to JAVA.

  26. PJ,

    Before I answer…

    Is your file still on multiple lines? Or is it one very long line?

  27. Hi Admin,

    My scenario is same as PJ’s, multiple lines, 1 GB Text file. I need to Parse line by line and do further processing(I have the code) and write the processed line in the new file. My doubt is, can I use this class for writing to a file as well? If yes will it be very slow? because the current code what i have writes the processed lines to a string, and I get a “java.lang.OutOfMemoryError: Java heap space” error upon handling big files. Please let me know your comments.

  28. The best approach probably would be to write it to a file line by line. If you save the complete output to a String object and then write the object, you are limited by the amount of memory that the machine has. However, if you write it line by line, there is no limit. This should be somewhat simple code to write (probably simpler than the code in this post because you won’t need an iterator, etc.). You can make it faster by keeping a writer object open (rather than opening and closing with each line). Let me know if you’re able to get it to work.

  29. Hi,

    i have 100 text files in a single folder.all files which contain just numerical data in 3 columns and 30k rows(app).i have to read the files and write them into a single text file.can anyone one help me giving the code.im done by merging 2 files but i donno how to make it with multiple files.and one more thing is that for every file the 1st two columns are the same.so the 1st two columns should be copied only once from the first file and only the 3rd column should be copied for the 100 files.

  30. Thank you for this class. I had a 1GB file I was trying to parse with java.util.scanner. Since it was loading the whole file into memory the program would crash about 20% of the way through. I plugge din BigFile instead of Scanner and it not only finishes but is running at least 5 times faster.

  31. IS there any way we could read more than one line at a time, Because i have a requirement o validate n consecutive line and process those N lines with some regular expression?

  32. It Gives me error at this line
    _reader = new BufferedReader(filePath);

    saying The constructor BufferedReader(String) is undefined.

  33. thank you for your share,I wonder how the class implement the function that the entire file is never read into memory?can you explain it?thank you

  34. Sorry for the late response. Yes, you can read more than one line at a time. You would need to have the Iterator return an ArrayList rather than a String and then you will call readLine multiple times in each iteration and construct the ArrayList. Please let me know if this is not clear or if you found another solution.

  35. You may be missing the import statements at the top if you are getting an error saying the constructor is undefined. Or you may have an old version of Java.

  36. The basic answer as to how this works so that the file is never read into memory is that you only read one line at a time, then process it, then it gets removed from memory by the Java runtime. If you were storing each line in an array or ArrayList or something else, you would be storing it in memory, and that’s when you’d have a problem with really larrge files. Hope that helps.

  37. Thank you very much!!! You are the man!!!

  38. [...] a look at the following. http://code.hammerpig.com/how-to-rea…s-in-java.html It's a very common solution, yet it has some drawbacks as well. __________________ Use an [...]

  39. I am unable to run this on java 1.4.2 version as there is no Iterable interface available. Is there any way that I can run this program on JDK 1.4.2 version?

  40. I’m not very familiar with the differences between versions of Java. If at all possible, it would be worth your time to figure out a way to upgrade to a newer version.

  41. Hi,
    this is kiran
    i want read log file and write automatically into different file plz hepl me ..

  42. Hi!

    I hope you can help me to solve this issue. I have two different large files that have been formed by channels of information about customers of a store. Some channels of file 1 have to be completed with information from file2 channels, is to said, channels sharing information separately. The thing is that file 1 is larger than file2, cause file 1 have (for example) 15 channels of information while file 2 have only 3, but both are information of the same customer.

    I using
    BigFile file = new BigFile(inputFile);
    BigFile file2 = new BigFile(inputFile2);

    to read both files. But a have a problem with file 2 reading. I have this code (pseudocode)

    for(String line : file){//reading file 1

    //in some part of the file 1 reading, I need one line of file 2
    //because i need some information to complete the information of
    //file 2
    String line2 : file 2; //<- this does not work
    String anotherstr = line2.substring(x,y);
    }

    but this does not work because it semms that i have to read file 2 in a loop too as file 2, but i dont need this, i just want one line at time I need it

    how can i get one line of file 2 without using a for loop? juts taking one line at time

    I hope I explained well

    any help?

    thanks!

    raymundo

  43. Raymundo,

    How large are the files? Can you read both of them separately and then write some code to combine them?

  44. Hi thanks for answer.

    One of the files, the file 1 is about 200M, but sometimes we handle files of 1G. The file 2 is smaller than file 1, but it can be about 500M.

    The option of read the files separately is good, but I think it could take more time to finish the process (I am in a factory).

    At this time i try with the BufferedReader character to read file 2, cause I read the file 1 with the for loop and its working. Thats I what I need but it would be better if I can use the same bigfile routines to read both files.

    Thanks for help!

    raymundo

  45. Raymundo,

    Sounds like a bigger problem than I can answer here. But what you might consider is reading file 2 first. Parse out the information you need for each customer. Put that information in a HashMap object (or multiple HashMap objects). The key might be the customer ID, and the value(s) would be the values you parsed from file 2. Then when you are reading file 1, you can retrieve the values from the HashMap object for each customer that you encounter in file 1.

  46. Hello

    Yes, this is a good choice to solve this problem. In fact I think I’ll do in the way you suggested; I did it the other way, but is not efficient because customer in file 1 can have some more information (depending on customer activity in the month) than others (customers with no movement, for example), and this causes a lag in reading the second file that can be controlled with code but would be many variables.

    Ok, thanks for all and keep helping to the community dedicated to programming.

    Saludos desde México!

    raymundo

  47. Really a helpful code

  48. I am reading an xml file but some of th e code getchopped

  49. Can you provide a minimal working example of the code you are using the the input files you are working with?

  50. Good work…keep it up

  51. Hi, i am using websphere 7 server .. the problem i am facing that the same, i am reding the .tif files from one server location and merging them into one .tif file again. If the size of tif is small about 40 mb its merging fine and again desplaying as one content, but when the images are more than 500 and after merging all it becomes heavy size upto 50-100 mb and at same time it is not opening and server hangs up .. the exact exception i am getting is “Could not lock User prefs. Unix error code 24.
    Couldn’t flush user prefs: java.util.prefs.BackingStoreException: Couldn’t get file lock.” when i seached it on net it pointing to memory space … is u r code will be applicable to solve my problem as this post is related to this … plz help .. i m in big truble … na dits urgent …. i m also reding and writting images as same way .. will u r code works for me …. plz reply ASAP .. waiting eagerly

  52. Hi Pravin,

    This code is designed to help you read the files without putting the data all into memory. You can process the file lines one at a time. But if you use this to read (one or more) files, one line at a time, but then put the lines into an object (such as ArrayList) in memory, you will have problems with memory filling up. It would be too hard for me to troubleshoot your exact problem, but you just have to make sure not to read the files into memory. One thing you could try is to open an output file object at the beginning, read one line at a time from each file and output each line to the output file as you go. However, if you are simply concatenating files, I would suggest the cat command since you are on Linux.

    Good luck!

  53. Sir, actaully i want to read all the images stored in a perticular folder on server.. for eg. 213_1.tif, 213_2.tif ….. 213_800.tif and so on. so i have written a programe which will collect all the same type of images (like 213 i.e. image id) so finally i am copying these images into one tif file like 1234_PM.tif . So this one tif contain all images from 213_1.tif to 213_800.tif. if the genrated tif file length is small as 40 mb it is opening fine, but some documents are in bigger in size so by same type its size become 100 mb in temp folder but at this moment while opening this image it stuck up and server throw unable to lock file exception as mentioned in previous post. Afetr searching on net i raised the JVM heap size upto 1024 as its max, but still some images are not able to open as above for few images. and i am using UNIX server ..do u have any solution for this .. or will above u r code work ..or any other u suggest.. thanking u in advance

  54. Pravin, my code would probably help, but it’s too difficult for me to help you troubleshoot your problem this way. I wish I could sit down and work through a solution with you. Maybe you can find someone who has more experience programming Java to help you through it. Or you could try the cat command in UNIX.

  55. hi, I handled it by programatically … genrally all the conections, strings, input out stream, stringbuffers it wuold take some space .. and i take care to close this properly. Even handlled it through finally clause.
    Lastly after closing all connection i flush the memory and called runtime.gc() which is working now .
    now when total image is taking 100 % memory then after programe run it is frring 80% of memory as he alocates for perticular task..

    Anyways thanks for support.

  56. Hello admin.
    I’m here again. I have the following trivial problem and hope you can help me. This is my explanation:

    I have a zip file containing nested directories, the directory of the blade has 5 gigabytes of XML files (approx) and a txt file which is the list of all xml files in the directory (a short line for xml) and I need the txt file . If I try to unzip the zip file, I see that this will take around 50 hours to finish and make the return on my slow laptop also.

    I do not care xml files, but I need the txt file. Is there a way to extract or copy the txt file without decompressing the entire contents of the zip file?

    Thank you for your attention

    raymundo

  57. Raymundo,

    How are you unzipping the file? In Java? Or at the command line?

  58. Thanks for answer.

    I unzzip the file (or I try) using winzip tool to do it. It takes a lot of time and I have to pause it and search for another way to extract the txt file.

  59. Raymundo,

    Sounds like you have a very slow machine if it is taking that long, or else that you have a small amount of memory. One suggestion is to try another zip program that doesn’t try to store the whole file in memory. 7Zip might work for this. If you want to do it in Java, you could try it. There are a few explanations on the web on how to do this, such as here: http://java.sun.com/developer/technicalArticles/Programming/compression/

  60. Thanks! I will chech the link.

  61. Thanks. Nice piece of code. I learnt something and saved me hours!

  62. Hi Admin,

    I read all the post and my problem is also somewhat similar to others.

    I want to read pipe delimited file which has more then 100 million records.
    My server RAM is : 12 GB

    I have set Heap memory to 8 GB. OS is Windows server 2008 64 bit.

    I am using Apache tomcat 7.0.22 64 bit.

    I am using buffered reader and Scanner to read and extract the file and i am storing these records in the arraylist.

    But it is taking two much time.
    For reading 3 lakhs records it is taking 2 mins. So should i use scanner in the for loop like below code is this right approach to extract data from one line and storing them in the arraylist as i have 56 columns.

    BigFile file = new BigFile(“C:\Temp\BigFile.txt”);

    for (String line : file)
    {
    // System.out.println(line);
    Scanner part and generate arraylist

    }

    So will it give any performance impact to me. As our code is in production environment. Can i read and process 3 lakhs records within minute.

    Please help me

  63. Hi Bevang,

    I think the problem is that the ArrayList object is getting too large. If you have a huge file, that will probably cause a problem. Do you have to load everything into an ArrayList? What do you want to do after the data values are in the ArrayList?

  64. Devang Bhalgama on November 4th, 2011 at 10:19 am

    I am doing Data Profiling which is part Data Quality(Master Data Management). So for data profiling what i am doing is i’ll select some attributes for example SSN,DOB,State,Country_Code after selecting delimited file. Now after selecting these attributes i’ll try to find unique values and its occurences for that perticular attribute(means will select perticular columns). I’ll also try to find how many patterns or how many different different formats SSN number contains.We’ll also predict datatype for perticular attribute. So for these functionalities we have to check all the domain value and have to process them. So it is necessary to keep them in memory so then we get our result very fast. We have also applied another option, to process file using chunks of data but it is time killer.

    You are right that ArrayList will become very huge but it is necessary and we have dedicated server also to process them. We dont have any other application deployed on that server.

    As i have said in our current code we are using bufferdreader and Scanner in traditional way and i feel that your code is looking optimized but not sure will it give good performance to us or not thats why i am asking .

  65. THe code in this post will help you parse through large files, but only if you don’t read the contents into memory. You are putting a lot of data into the memory list, so you will eventually run out of memory. I suggest you look into using a micro database like HSQLDB. It will probably give you much better results. Here’s a post on that. http://code.hammerpig.com/what-is-an-in-memory-database-and-what-is-it-good-for.html

  66. Thank you Admin for your help. As we have decided to change our code as per your code and we’ll do the performance testing. After getting result i’ll definetely post our statistics on this forum.

    I’ll also recommend this forum to my friends and my colleague.
    Again thank you, thank you so much for your time and help.

  67. U have said that, entire file is not read in to memory. Then how is the memory managed here to read big file? Is there any diff between using ur file and scanner provided by Java.

  68. Rashmi, it is similar to the scanner. It is just a wrapper around it basically. It reads one line at a time into memory and allows you to parse each line before moving to the next.

  69. Hello admin,

    I need to reader one big file, with 16,000 lines, I’m using the class BufferedReader, but my program crash ever..
    so i found this topic and try your code, but in my IDE appers one error in the line ” BigFile file = new BigFile(“C:\Temp\BigFile.txt”);
    ” says: class not found” .. How I fix this?
    help me, ..

  70. Maxwell,

    Maybe post your full code and the exact error message, and I’ll see if I can help.

Leave a Reply