<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to Read Really Large Files in Java</title>
	<atom:link href="http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/feed" rel="self" type="application/rss+xml" />
	<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html</link>
	<description>Tips and short tutorials on various programming technologies</description>
	<lastBuildDate>Sun, 05 Feb 2012 17:17:05 -0600</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Devang Bhalgama</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1038</link>
		<dc:creator>Devang Bhalgama</dc:creator>
		<pubDate>Tue, 08 Nov 2011 05:57:50 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1038</guid>
		<description>Thank you Admin for your help. As we have decided to change our code as per your code and we&#039;ll do the performance testing. After getting result i&#039;ll definetely post our statistics on this forum. 

I&#039;ll also recommend this forum to my friends and my colleague.
Again thank you, thank you so much for your time and help.</description>
		<content:encoded><![CDATA[<p>Thank you Admin for your help. As we have decided to change our code as per your code and we&#8217;ll do the performance testing. After getting result i&#8217;ll definetely post our statistics on this forum. </p>
<p>I&#8217;ll also recommend this forum to my friends and my colleague.<br />
Again thank you, thank you so much for your time and help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1037</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Tue, 08 Nov 2011 05:18:43 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1037</guid>
		<description>THe code in this post will help you parse through large files, but only if you don&#039;t read the contents into memory. You are putting a lot of data into the memory list, so you will eventually run out of memory. I suggest you look into using a micro database like HSQLDB. It will probably give you much better results. Here&#039;s a post on that. http://code.hammerpig.com/what-is-an-in-memory-database-and-what-is-it-good-for.html</description>
		<content:encoded><![CDATA[<p>THe code in this post will help you parse through large files, but only if you don&#8217;t read the contents into memory. You are putting a lot of data into the memory list, so you will eventually run out of memory. I suggest you look into using a micro database like HSQLDB. It will probably give you much better results. Here&#8217;s a post on that. <a href="http://code.hammerpig.com/what-is-an-in-memory-database-and-what-is-it-good-for.html" rel="nofollow">http://code.hammerpig.com/what-is-an-in-memory-database-and-what-is-it-good-for.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Devang Bhalgama</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1032</link>
		<dc:creator>Devang Bhalgama</dc:creator>
		<pubDate>Fri, 04 Nov 2011 16:19:03 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1032</guid>
		<description>I am doing Data Profiling which is part Data Quality(Master Data Management). So for data profiling what i am doing is i&#039;ll select some attributes for example SSN,DOB,State,Country_Code after selecting delimited file. Now after selecting these attributes i&#039;ll try to find unique values and its occurences for that perticular attribute(means will select perticular columns). I&#039;ll also try to find how many patterns or how many different different formats SSN number contains.We&#039;ll also predict datatype for perticular attribute. So for these functionalities we have to check all the domain value and have to process them. So it is necessary to keep them in memory so then we get our result very fast. We have also applied another option, to process file using chunks of data but it is time killer. 

You are right that ArrayList will become very huge but it is necessary and we have dedicated server also to process them. We dont have any other application deployed on that server.  

As i have said in our current code we are using bufferdreader and Scanner in traditional way and i feel that your code is looking optimized but not sure will it give good performance to us or not thats why i am asking .</description>
		<content:encoded><![CDATA[<p>I am doing Data Profiling which is part Data Quality(Master Data Management). So for data profiling what i am doing is i&#8217;ll select some attributes for example SSN,DOB,State,Country_Code after selecting delimited file. Now after selecting these attributes i&#8217;ll try to find unique values and its occurences for that perticular attribute(means will select perticular columns). I&#8217;ll also try to find how many patterns or how many different different formats SSN number contains.We&#8217;ll also predict datatype for perticular attribute. So for these functionalities we have to check all the domain value and have to process them. So it is necessary to keep them in memory so then we get our result very fast. We have also applied another option, to process file using chunks of data but it is time killer. </p>
<p>You are right that ArrayList will become very huge but it is necessary and we have dedicated server also to process them. We dont have any other application deployed on that server.  </p>
<p>As i have said in our current code we are using bufferdreader and Scanner in traditional way and i feel that your code is looking optimized but not sure will it give good performance to us or not thats why i am asking .</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1031</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Fri, 04 Nov 2011 13:48:42 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1031</guid>
		<description>Hi Bevang,

I think the problem is that the ArrayList object is getting too large. If you have a huge file, that will probably cause a problem. Do you have to load everything into an ArrayList? What do you want to do after the data values are in the ArrayList?</description>
		<content:encoded><![CDATA[<p>Hi Bevang,</p>
<p>I think the problem is that the ArrayList object is getting too large. If you have a huge file, that will probably cause a problem. Do you have to load everything into an ArrayList? What do you want to do after the data values are in the ArrayList?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Devang Bhalgama</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1030</link>
		<dc:creator>Devang Bhalgama</dc:creator>
		<pubDate>Fri, 04 Nov 2011 12:19:50 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1030</guid>
		<description>Hi Admin,

I read all the post and my problem is also somewhat similar to others.

I want to read pipe delimited file which has more then 100 million records.
My server RAM is : 12 GB 

I have set Heap memory to 8 GB. OS is Windows server 2008 64 bit.

I am using Apache tomcat 7.0.22 64 bit.

I am using buffered reader and Scanner to read and extract the file and i am storing these records in the arraylist.

But it is taking two much time.
For reading 3 lakhs records it is taking 2 mins. So should i use scanner in the for loop like below code is this right approach to extract data from one line and storing them in the arraylist as i have 56 columns. 

BigFile file = new BigFile(&quot;C:\Temp\BigFile.txt&quot;);
 
for (String line : file)
{
//    System.out.println(line);
Scanner part and generate arraylist

}

So will it give any performance impact to me. As our code is in production environment. Can i read and process 3 lakhs records within minute.

Please help me</description>
		<content:encoded><![CDATA[<p>Hi Admin,</p>
<p>I read all the post and my problem is also somewhat similar to others.</p>
<p>I want to read pipe delimited file which has more then 100 million records.<br />
My server RAM is : 12 GB </p>
<p>I have set Heap memory to 8 GB. OS is Windows server 2008 64 bit.</p>
<p>I am using Apache tomcat 7.0.22 64 bit.</p>
<p>I am using buffered reader and Scanner to read and extract the file and i am storing these records in the arraylist.</p>
<p>But it is taking two much time.<br />
For reading 3 lakhs records it is taking 2 mins. So should i use scanner in the for loop like below code is this right approach to extract data from one line and storing them in the arraylist as i have 56 columns. </p>
<p>BigFile file = new BigFile(&#8220;C:\Temp\BigFile.txt&#8221;);</p>
<p>for (String line : file)<br />
{<br />
//    System.out.println(line);<br />
Scanner part and generate arraylist</p>
<p>}</p>
<p>So will it give any performance impact to me. As our code is in production environment. Can i read and process 3 lakhs records within minute.</p>
<p>Please help me</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vahid</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-2#comment-1009</link>
		<dc:creator>Vahid</dc:creator>
		<pubDate>Wed, 28 Sep 2011 04:54:47 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-1009</guid>
		<description>Thanks. Nice piece of code. I learnt something and saved me hours!</description>
		<content:encoded><![CDATA[<p>Thanks. Nice piece of code. I learnt something and saved me hours!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: raymundo</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-1#comment-971</link>
		<dc:creator>raymundo</dc:creator>
		<pubDate>Thu, 18 Aug 2011 21:43:05 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-971</guid>
		<description>Thanks! I will chech the link.</description>
		<content:encoded><![CDATA[<p>Thanks! I will chech the link.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-1#comment-969</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Wed, 17 Aug 2011 17:52:53 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-969</guid>
		<description>Raymundo,

Sounds like you have a very slow machine if it is taking that long, or else that you have a small amount of memory. One suggestion is to try another zip program that doesn&#039;t try to store the whole file in memory. 7Zip might work for this. If you want to do it in Java, you could try it. There are a few explanations on the web on how to do this, such as here: http://java.sun.com/developer/technicalArticles/Programming/compression/</description>
		<content:encoded><![CDATA[<p>Raymundo,</p>
<p>Sounds like you have a very slow machine if it is taking that long, or else that you have a small amount of memory. One suggestion is to try another zip program that doesn&#8217;t try to store the whole file in memory. 7Zip might work for this. If you want to do it in Java, you could try it. There are a few explanations on the web on how to do this, such as here: <a href="http://java.sun.com/developer/technicalArticles/Programming/compression/" rel="nofollow">http://java.sun.com/developer/technicalArticles/Programming/compression/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: raymundo</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-1#comment-968</link>
		<dc:creator>raymundo</dc:creator>
		<pubDate>Wed, 17 Aug 2011 17:46:22 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-968</guid>
		<description>Thanks for answer.

I unzzip the file (or I try) using winzip tool to do it. It takes a lot of time and I have to pause it and search for another way to extract the txt file.</description>
		<content:encoded><![CDATA[<p>Thanks for answer.</p>
<p>I unzzip the file (or I try) using winzip tool to do it. It takes a lot of time and I have to pause it and search for another way to extract the txt file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://code.hammerpig.com/how-to-read-really-large-files-in-java.html/comment-page-1#comment-967</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Wed, 17 Aug 2011 16:42:15 +0000</pubDate>
		<guid isPermaLink="false">http://code.hammerpig.com/?p=231#comment-967</guid>
		<description>Raymundo,

How are you unzipping the file? In Java? Or at the command line?</description>
		<content:encoded><![CDATA[<p>Raymundo,</p>
<p>How are you unzipping the file? In Java? Or at the command line?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

