Friday, February 27, 2009

Why catch Throwable is evil - A real life story

Disclaimer: Now I know that this is an old idiom, I'm just presenting my own real life incident taken straight away from the bloody Java trenches.

Exceptions can be threads assassins
when running on top of Websphere thread pool, any Runtime exception that isn't caught by the applicative code, will bubble up in the stack, ending up killing the specific thread. WAS helps here, by automatically creating a new thread that will take the place of the murdered one, but still, killing and immediately creating a thread is everything but the thread pool rational.

Hiring a thread bodyguard
bodyguardA simple way to avoid thread death is wrapping the first applicative layer (e.g., Run() method) with a try block that catches and swallows any Exception that's thrown from anywhere in the application code.
Our project's code also used this concept, but instead of catch (Exception e), it had a catch (Throwable t), When I noticed that I didn't rushed to fix it, just in case someone before me had done funky stuff with dynamic class loading that might throw ClassNotFoundError (although this should be caught at a very localized resolution), or maybe it's there for some other historical reason that not being one the code's forefathers I’m just not aware of. In any case, I did promise myself that I'll revisit this piece of code in the future.

Getting some bulls to do correct things
today I finally got the excuse I needed in order to change the catch Throwable in a catch Exception:
We were running stress tests, when the server had an OOME (out of memory error). Since the catch Throwable caught and swallowed the OOME (as OOME is a subclass of Error which is a subclass of Throwable), the thread that generated the OMME kept on living, instead of dieing right there, and so, the JVM continued running, crippled and limping, instead of turning to an honorable solution like hara-kiri. Choosing the quick death route would have been rewarded with a quick resurrection to be provided by the gracious NodeAgent and its watchdog mechanism, and the end result would have been a newly born healthy server ready to get back in business. A retreat in order to attack, you might put it.
Instead, the server had to limp for long minutes, suffering from a series of consecutive strokes (OOME), until the OOME was so bad that the JVM just had to exit.

Conclusions
The Catch Throwable was causing down time, by preventing an imminent restart of the JVM due to an OOME.

Open Questions

  1. I know that an uncaught exception kills only the specific thread does the JVM treats an error differently? Put other words, if the OOME is not caught, will the entire JVM die or only the specific thread? I assume that the answer is the entire JVM, maybe this is implemented by the JVM itself, or maybe it's implemented somewhere in the WAS bedrock. If for some reason it's not the case, one could catch an Error and then execute System.exit(1); in order to hasten the process imminent death.


6 comments:

  1. Thanks for the great article.
    I've wrote a simple code that generates 2 threads were one of them created out of memory error - and the other thread kept working.
    I've validate this on both IBM and SUN JVM.
    So the conclusion is that the JVM does not terminate when OOME occurs.

    ReplyDelete
  2. Hi Guy,

    thanks for checking the thread thing - it would be nice to know which platform you tested on, as different platforms may implement threading differently. On linux, I believe, threads are processes, unlike windows.

    Can we assume you used windows from the icons on your post?

    /Michael

    ReplyDelete
  3. I believe Guy verified that on Windows.
    I'm not sure why Linux lightweight processes (LWP) would make a difference in this case. But as these behaviors are not specified no where, and are prone to be OS and JVM vendor and version depended, anything goes. Better not count on it.

    We need some clear specifications from the JCP on overall JVM behavior in case of OOME.

    ReplyDelete
  4. [...] to use the above code in your programs since it is shorter but if catching Throwable in Java is evil then catching Throwable in Scala is ten times more [...]

    ReplyDelete
  5. Well, I'd rather say that performing coarse-grained exception handling is evil (just like in your case, you "swallow" an exception at thread-level without concerning about the nature of an exception and how to avoid it).

    ReplyDelete
  6. I Don't quite agree, I think it's more subtle than that:

    At the top level loop (say the thread's run() method) you are expected to behave in a coarse-grained, coz at this point all you really care about is preventing your thread from dieing because of some random NumberFormatException, or NPE, that is better to swallow than to allow it to bubble up and kill the thread.

    Most programmers, try to accomplish this be swallowing Throwable, causing them to also swallow Errors (JVM unrecoverable problems), while they really wanted was to swallow Exceptions (recoverable problems).

    ReplyDelete