Wednesday, January 6, 2010

Myth busting - String.intern() object allocations are never garbage collected

Java is becoming quite old (version 1.0 came out in 1996 if I'm not mistaken). When something turns old, legends, myths, and other perceived truths are quick to form around it (just imagine an old Gothic mansion with its stack of scare tales).
Most of the accumulated knowledge is beneficial and helpful, but some of it is not relevant anymore or just plain wrong.
Remembering that Java is 14 yeas old (2010), when I google for something, for Java info/answers, I always inspect the date of the article I landed on.
If you stumble upon somebody claiming that java can/can't do something, always check his comment's date. If I see something from 2001, you better search for newer references, instead of accepting it as is.

oldSome sites like http://Javaworld.com, have been there from the get go, were big then, but after losing popularity, are now a grave yard for old Java skeletons (I myself have a not that relevant article there).

The story with String.intern() is the same, you'll find people all around the place, claiming that over using it will finish up the perm area, because the perm area is never garbage collected. As discussed here, that's just not true.

Something I enjoy doing is not taking so called "facts" as granted, and re-validating on my IDE.
Thinking that those intern() allocations will never be GCed, I was planing a presentation on how to use weakHashMap based solution can serve as an alternative cache repository for Strings, wrote a program to demonstrate an OMME caused by intern() only to find out that intern() is not so bad  as I originally thought.
Try stuff yourself. You be surprised...

Other myths I'll should wright about some day are:

  1. Regular expressions in Java are slow - FALSE! I've tested this myself, and after compiling the regex, I was able to run over than 1 million matches per second (small strings of course).

  2. Always use StringBuffer to concatenate strings - dead wrong! if you have all concatenations in a single line, like the following, the compiler auto does it for you:
    s= "Hi my name is: "+myName+ ". my lucky number is: "+num;
    Run Javap on a class file using and not using StringBuffer to see that the byte code is the same.
    Though this piece of code could benefit from StringBuffer to prevent rapid object creation:
    for (...) {
    s += strOfThisCycle;
    }
    In any case, Java5 introduces StringBuilder which is the unsynchronized tween of the synchronized StringBuffer class. I guess you will rarely access the same builder from different threads, therefore StringBuilder should be the default choice for ya.

7 comments:

  1. Hi,

    In java 6 I think that if you run javap on the for loop example you will
    see that it uses StringBuilder also, But I might be mistaken :-)

    Eyal

    ReplyDelete
  2. Eyal, you seem to be right!

    I've checked on both IBM Java5 and IBM Java6.
    Both loop and multi-lines concatenations turn into a StringBuilder.
    Seems like there are less and less reasons to micro tune your code.
    Thanks.

    ------------------------------------------------
    concatenating in a loop - Javap will show you StringBuilder
    ------------------------------------------------
    public static volatile String s = "";
    public static void main(String[] args) throws InterruptedException {
    for (int i=0; i<10; i++) {
    String strOfThisCycle = String.valueOf(i);
    s += strOfThisCycle;
    }
    }

    ------------------------------------------------
    Concatenating over multiple lines - Javap will show you StringBuilder
    ------------------------------------------------
    public static void main(String[] args) throws InterruptedException {
    String strOfThisCycle = "0";
    s += strOfThisCycle;
    strOfThisCycle = String.valueOf(System.currentTimeMillis());
    s += strOfThisCycle;
    }

    ReplyDelete
  3. In some cases you still want to avoid String interns. Java allocates the memory to the interned strings on the permgen space. If you have a very heavy application with tons of classes (I had some like these), the classes are competing with the interned strings on the space. While the GC does clean up the permgen space, it does it only in a "stop the world" GC and not in a CMS which you typically configure your webapp or desktop to. The problem is more typical to web application where you may have very large heaps and very high object creation rate. If the GC is being called on the permgen too often it will eventually throw out of permgen space exceptions (seen it as well).

    So the final answer is "it depends" :-)
    I would still recommend not to use intern in the common application though there are few edge cases where it would make sense.

    ReplyDelete
  4. I think you can enable perm gen cleaning on cms

    CMSPermGenSweepingEnabled

    ReplyDelete
  5. Thanks Eyal, this is pretty cool, didn't know about this option.

    ReplyDelete
  6. Main main aim was to point out that something common wisdom that was true yesterday may not still hold true today. Especially things that concern low level stuff, that were likely to get attention from the JVM development team as time goes by. Also have to keep in mind that what can/can't be done is JVM implementation/version specific.

    From what Eishay is saying, it sounds like that even though the garbage collector has a solid foothold in the perm area, it's still not completely trivial.
    If I would to make extensive intern usage, I'll make sure I have time to tune the GC perm area during stress testing plus a back of the envelope calculation on how many String I'm expecting to place in the perm area, making sure enough space is left for everything except interened Strings.

    BTW, in my current project we employ hard limits on the number of sessions and users as a mean to protect against out of memory situations. Seen/implemented something like that in any of the projects you've worked on?
    I also like to put caps and report if any map/list grows beyond a certain limit. especially maps used as a temporal storage to preserve information between two executions. Provided useful to warn about memory leaks. Something like:

    if (ht.size()>10000) {
    System.out.println("WARNING: ht size above 10K, purging it.");
    ht.clean();
    }
    ht.put(key, value);

    ReplyDelete
  7. Yes, there are lots of myths are there like 64bit machines are better then 32bit, one GC is better then the other, IO myths, and so on. The common answer is "it depends" since one size does *not* fit all.

    All large scale software should have limits on in memory data growth. Almost every web framework limits the number of concurrent sessions per server, usually by limiting size of thread pool accepting inbound connections.

    If you have an in memory map used as a cache where you can discard some of the data if it gets too large then check out Ehcache and you might want to get distributed with memcached since the network is very IO cheap (all depends on your required latencies and cost of computing new values).
    Creating a java new object by itself is cheap and keeping many long term objects is a burden on the system's memory management. Given that, I would not throw cache (and string intern is a type of cache) without getting some numbers first.

    ReplyDelete