diumenge, 9 de març del 2014

My first take on reducing Garbage

I have been trying to grok several concepts regarding JIT, memory footprint, language abuse, mechanical sympathy, non-GC algorithms, etc... A problem I found is that there's so much to learn I was finding it hard to focus on a single idea and any attempt to do it faced a great difficulty to obtain clean info.


Few months ago I met Peter Lawrey of the http://vanillajava.blogspot.com  and  https://github.com/OpenHFT. At the moment I was not aware of who he was and so I missed the chance to ask him for some directions. Luckily enough, when I reach out to him earlier this year and presented my dilemma he was very kind and agreed to help. His first suggestion was to select one subject and then he offered to help me take small steps. The choice was no-GC algorythms and this is my first post on the subject.


Simple Garbage generation


First challenge is was presented was simple enough: "How can you iterate an ArrayList (without generating garbage)?". My first reaction was shock. It never occurred to me that a simple ArrayList iteration was creating garbage (which it sometimes is... not always...). 



Demonstrating there’s Garbage


From the challenge I knew there's garbage on:

    public static void main(String…args) {
        List l = new ArrayList();
        for (String s : l) {}
    }

This can be verified with several methods but first thing that came to mind was running this main with params:

  -verbose:gc -Xms4m -Xmx4m


which display the output:

...
[GC (Allocation Failure) 2436K->388K(5632K), 0.0003570 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004620 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004700 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004410 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0006780 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004480 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004540 secs]
[GC (Allocation Failure) 2436K->388K(5632K), 0.0004570 secs]


which proves there's GC activity.

After a code review of the hotspot I found out the problem is caused in the foreach(). That is invoking ArrayList$iterator to iterate over the the list. In the (Hotspot) JVM that ArrayList$iterator returns a new instance of an inner class named ArrayList$Itr.

After checking out with Peter he confirmed I was on the right track but that I had to prove with a direct measure that my suspicion was correct.


Observing the memory consumption


At Peter’s direction I run the code and then checked the output of jmap (http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html). After playing around with parameters I fired up a ‘watch’ on my command:


   $watch “jmap -histo `jps|grep Main|awk -F\ '{print $1}’`"

With that I’d be constantly running jps and then filtering and asking to obtain y App’s PID. With my app’s PID I’d only need to jmap -histo.



 num     #instances         #bytes  class name
----------------------------------------------
   1:         65486        2095552  java.util.ArrayList$Itr
   2:          6619         853136  < methodKlass >
   3:          6619         764496  < constMethodKlass >
   4:           456         538304  < constantPoolKlass >
   5:           421         342368  < constantPoolCacheKlass >
   6:           456         308984  < instanceKlassKlass >
   7:          1373         135048  [C
   8:           111         117992  [I
   9:           702         116976  [B
...



What you don’t see in this output is that every 2 seconds (watch) the number of instances of java.util.ArrayList$Itr would change and increase until it’d disappear (GC wiping out old instances).

Go! Fire up your IDE of choice and try what we’ve done so far. I’ll wait.



Finding an alternative without Garbage



Now that I finally knew how to get some more insight re my memory usage and GC activity it was time to start searching for an alternative which didn’t generate Garbage. that was pretty easy:
        List l = new ArrayList<>();
        int lsize = l.size();
        for (long i = 0; i < GAZILLION; i++) {
            for (int j = 0; j < lsize; j++) {
            }
        }


Believe it or not. This plain old for-loop is better in terms of garbage generation.
Actually, if you don't believe you can go fire up your IDE and check. ;-)