Java Tuning: 2010

Thursday, September 23, 2010

Case insensitive Map key - code smell

Here's the bug that had me working today (a Sukot holiday):


_myMap.put(key.toLowerCase())
...
_myMap.get(key) // without lower casing the key.

At first you might think of this as a common human error, but I claim that it's no less of a code smell:

Why trust yourself to always remember to lower/upper case all of the interactions with the map? What about trusting others?
So, instead of using a HashMap, use an Apache CaseInsensitiveMap that nicely and safely encapsulates this key's case concern.

P.S.
I would expect CaseInsensitiveMap to become a part of the Java SDK.

Wednesday, September 22, 2010

My attempts with IP Spoofing – Revisited

One upon a time (Jan 2009) I've written this post, basically saying that you're not likely to be able to spoof IP address over the Internet.
Turns out I was dead wrong!

It happened so the very experienced Mr Filipe, from Brazil, came across the post and left me a comment saying that Spoofing over the internet is quiet possible.
I replied surprised, and after a number of comments ping-pongs, we started chatting online, and Felipe had agreed to give me a live spoofing demo:
On my end, I've configured my home router to forward TCP/UDP packets to my desktop, where I ran a wireshark network capture to monitor any incoming packets.
Then Felipe sent a burst of packets from random IP source addresses. Proving me that IP spoofing over the Internet is a reality indeed.

(What do you think? Isn't this kind of stuff is what makes the Internet so amazingly wonderful? two people from two different parts of the world, united by joint interest and kindness :))

So, Thank you Filipe!

A few notes on why spoofing might *not* work:

According to Filipe, the recipient's ISP is much more likely to block the spoofed packet, than the sender's ISP. For example if the recipient's ISP see a bogon source IP.
That's a bit counter-intuitive, because, assuming the ISPs really do care about preventing spoofing, it's a very easy job for the sender's ISP to tell if the packet's source IP is one of the IPs that it handed out to customers, or moreover, to the particular customer (sender).

If you are behind a NAT device, then any source address you are planning to use (be it spoofed or real) will be overwritten by the NAT anyway, so make sure you are on a real public IP.

No reason to get excited. TCP spoofing is very limited as you won't make it across the TCP handshake, because the recipients will send their ACK,SYN response to the spoofed IP, which you probably don't have much control over.
In a LAN things are a bit different, if you can manipulate the recipient's ARP table to think that the spoofed IP MAC address is yours. I haven't dag deep.

Feel free to comment.

Thursday, April 15, 2010

“Hypervisor edition” – what’s that?

WebSphere have announced WAS hypervisor edition.

You get an OVF package with a ready to use WAS profile running on Linux. The OVF package can be deployed on VMWare ESX/ESXi and IBM's cludeburst appliance.
Websphere also say that they carried out WAS best-practice tuning for the OS. Not sure how mattering this tuning is considering the generic nature of WAS (different application=different tuning), and the generic drivers that a VM uses.

[caption id="attachment_252" align="alignnone" width="210" caption="Joys of installation"]

[/caption]

I wonder how enterprise IT administrators would accept an OS different from what they usually roll with.

important to mention that similar zero-install pre-configured WAS environment are available on the IBM test cloud (in Beta).

The real important message made here by IBM is that the WAS hypervisor edition is only a first bird. Although naked manual WAS installation is not a biggy, IBM products running on WAS are. As the OVF standard matures and virtualization becomes the default production hosting environment, we will be seeing complex WAS based products (say Portal, and Process Server) shipped as ultra consumable OVF packages. Even a complete topology consisting of many servers can be delivered as a single OVF package.
This delivery mode is quite similar to VMWare's software appliances, only applicable to more than one Hypervisor when packaged as OVF (theoretically).

Bad news to professional services people and install manager software developers.

Wednesday, April 14, 2010

IBM’s PLDE seminar 2010 – Review

I spent today at the IBM Programming Languages and Development Environments Seminar 2010, that took place at the beautiful Haifa Research lab mount Carmel campus. Things worth mentioning:

Gilad Bracha, father of Java Generics and auto-boxing, spent 60 minutes repenting Sun's Java 1.0 early design mistakes, such as allowing primitives and static members into the language. IMHO the lecture itself was so-so. Gilad pointed out Java's soft spots, but didn't bother presenting the crowd what he views as the alternatives. What he did suggest was to check out his new baby programing language Newspeak (something for the purists I guess).

Perhaps some of Java's charm at the early days was its simplicity and low learning curve, I'm not sure that a semantically perfect Java (could there by anything like this?) using nested classes instead of static members would have enjoyed the same mojo.

In one additional interesting lecture, Kathy Barabash, talked about how data structures with a sequential references object graph (say a LinkedList) do not allow traditional concurrent GC Tracing algorithms to scale on many-core (i.e., massive multi-core) platforms.

What good is your new 1,024 cores Intel processor if the desktop widget nuclear explosion simulation flickers because it can only scale on 400 of the available cores, right?

Wednesday, March 24, 2010

Software development podcast in Hebrew

Having recommended podcasts before, I now want to recommend Ran Tavori's and Ori Lahav's great (in Hebrew) software development podcast: רברס עם פלטפורמה.

Some of my favorite episodes include:

50. Content acceleration and CDNs

46. Multiple data-centers

45. references in Java

42. Garbage Collection (including myself with a guest appearance)

39. Designing products for the military

28. MySQL

25. Data centers

22. Internet products and usability in general

20. Introduction to DJango

18. ERLANG

17. Key-Value DB products

15. ASP.NET with Yossi Tagori

13. Scalability

10. SundaySky real-time video generation

8. think twice before you debug instead of trace

If you know any other good software development podcasts in Hebrew, please comment here to let me and the world know.

Tuesday, March 23, 2010

ConcurrentHashMap fat memory footprint

While running product sizing tests, we've found that an over enthusiastic usage of ConcurrentHashMap (CHM) had evaporated a good ~170MB of much needed heap space (we ran with a 1.5GB heap).

As it turns out, a empty CHM weighs around 1700B. Yes, I'm talking about a map with no entries at all, just the plumbing!
We used a CHM to store user session attributes, having 100,000 user sessions generated 100K CHM instances worth 170MB of heap (100K times 1.7KB).
We took measurements using the super Eclipse MAT.

The obvious solution for saving these scares 170MB, was to switch from a CHM to a Hashtable. A Hashtable cost only around 150B per instance (8% of a CHM).
Other possible solutions could have been: moving to a list structure (seek time is not an issue as we rarely have more than 4-5 attributes per session), or resorting to a an array of Objects.

Change implications:

1. Performance - The product doesn't have any user scenario that cause multiple threads to concurrently access the same session attributes map, so we don't expect any performance loss, on the contrary, I'm expecting a hashtable to prove faster for single thread access, over a CHM.

2. Thread safety is a low risk aspect, as both CHM and HT provide the same basic guarantees for a single API operation (e.g., map.get(key)).

To conclude, a CHM is a good idea when you have a shared map structure suffering from a high R/W thread access contention. But dragging behind itself such a large memory footprint, CHM is not ideal to use in masses, or when concurrency performance is not the focus.

P.S
A CHM automatically allocates 16 segments, each with a 16-element array - one best practice is to measure the average map population during your product's sizing tests, and initialize the CHM with the minimum initialCapcity and loadFactor, required to contain your usage.

Monday, February 8, 2010

Concurrent Modification Exception

I ran into a ConcurrentModificationException (CME) during stress testing.
What does CME actually mean?
It means that you've modified (add, remove, update) your Collection while you've been iterating over it (usually in a multi-threaded fashion, but it can occur in a single thread that modifies while iterating).

A few more things to note about CME:
Best effort detection - If you see a CME printout, first off, consider yourself lucky, CMEs are thrown only in best effort. In another universe, the concurrent modification would not have been detected, causing your collection to become corrupted, instead of fast-failing with a CME.

IDing the problem - Like deadlocks, CME's are easy to pinpoint once you inspected the exception's stack trace.

Avoiding CME:

ListIterator
To modify a collection by the same thread that is currently iterating on it, use a ListIterator that will allow you to perform both.
Drawbacks - single thread solution only.

Naive solution: Synchronizers
Use locks to for mutually excluding traversal and modification operations.
Advantages - easy to code.
Drawbacks - very long lock periods while iterating.

CopyOnWrite
Take advantage of the Java.util.concurrent collections like: CopyOnWriteArrayList, CopyOnWriteArraySet. If you require a map then grab CopyOnWriteMap from Apache (this guys have been doing Sun's dirty work for years now).
Advantages - very good reading performance (no locks are used, instead visibility is obtained via map member volatility).
Drawbacks - very bad write performance on large maps.
Conclusion - use for seldom mutating collections.

toArray()
toArray will create a new array holding a copy of your Set (Map.keySet() for a Map).
You can then iterate over the array, freely modifying the original collection (the array doesn't change of course).
Advantages - write operations are cheap.
Disadvantages - copying the entire set could be expensive if it occurs too often, and/or the set is very large.

Concurrent Collections
If you want to go heavyweight, consider using: ConcurrentHashMap (or one of its package friends).
Once you create an iterator over a ConcurrentHashMap (CHM), it does not freeze the collection for traversal, updates to the collection may or may not appear during the traversal (weakly consistent).

The approach I ended up taking
My use case was seldom modifying a ~ten items cache. A copyonwrite map was what I used.
In other cases I had, ConcurrentHashMap was the easiest solution (though make sure your code can live in peace with the CHM weak consistency property).

[caption id="attachment_227" align="alignleft" width="200" caption="Best pic idea I could think of to visualize Threads :)"]

[/caption]

Monday, February 1, 2010

NAT in VMWare vSphere/ESX – In a nut shell

This post is about NATing an ESX VM, but first, why do I need NAT:

The SIP protocol is not NAT oblivious. To traverse NAT our application has to replace the DNS in the SIP message contact header to the external FQDN that the message receiver will be sending responses to (A NAT with static routing configured).
Therefore I needed to test our software in a NAT topology.

In the past, when we used VMWare player/workstation, it had a build-in NAT network. But, unfortunately, the ESX hypervisor does not provide a NATed network option.
Seeking alternatives at VMWare's appliance marketplace, I found and downloaded the Vyatta's community edition (VC5) router appliance (also downladble from sourceforge), and comes under the GPL license.
After 3-4 hours - guided by the official quick start guide - I had a working NAT configuration in the ESX. Hurray!
Overall, not a hard nut to crack ;), though I wish VMWare will wise up and just add an build-in NAT option to vSphere.

Left to do:
Obtain some static IPs, so the config won't break each time the vm reboots and the DHCP lease expires.
Tip #1:
If you want want to access your NATed VM by RDP/VNC, without setting up extra NAT routing rules, consider adding the VM an additional un-NATed NIC, but when doing so, make sure that the OS routing tables are set to route through the NIC that is NATed.
Tip #2:
This short vyatta user installation report also helped me a bit.

Here's the complete configuration script I ended up feeding to the appliance console (network topology is similar to the one presented in the Vyatta's getting stated guide):
Where:
1.2.3.4 is your department's DNS server
192.168.1.199 is the VMs NATed private IP address (provided by the DHCP).
The script contains a NAT forward rule for VNC (port 5900)


configure
set system host-name vyatta-nat
set interfaces ethernet eth0 address dhcp
set service ssh
set service https
commit;
save;
# restart the appliance to switch from console remote desktop to SSH:

#login with user and password
configure
show interfaces

set interfaces ethernet eth1 address 192.168.1.254/24

commit;

delete service dhcp-server
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 start 192.168.1.100 stop 192.168.1.199
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 default-router 192.168.1.254
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 dns-server 1.2.3.4
commit;
show service dhcp-server

set service nat rule 1 source address 192.168.1.0/24
set service nat rule 1 outbound-interface eth0
set service nat rule 1 type masquerade
commit;
show service nat
save;
exit
show nat rules
configure
set service nat rule 20 type destination
set service nat rule 20 inbound-interface eth0
# use a negative fake address to so that all incoming communication will be nated
#set service nat rule 20 destination address !192.168.50.0
#Forward traffic to address 192.168.1.199
set service nat rule 20 inside-address address 192.168.1.199
set service nat rule 20 protocol tcp
set service nat rule 20 destination port 5900
commit;
save;
exit

Wednesday, January 6, 2010

Myth busting - String.intern() object allocations are never garbage collected

Java is becoming quite old (version 1.0 came out in 1996 if I'm not mistaken). When something turns old, legends, myths, and other perceived truths are quick to form around it (just imagine an old Gothic mansion with its stack of scare tales).
Most of the accumulated knowledge is beneficial and helpful, but some of it is not relevant anymore or just plain wrong.
Remembering that Java is 14 yeas old (2010), when I google for something, for Java info/answers, I always inspect the date of the article I landed on.
If you stumble upon somebody claiming that java can/can't do something, always check his comment's date. If I see something from 2001, you better search for newer references, instead of accepting it as is.

Some sites like http://Javaworld.com, have been there from the get go, were big then, but after losing popularity, are now a grave yard for old Java skeletons (I myself have a not that relevant article there).

The story with String.intern() is the same, you'll find people all around the place, claiming that over using it will finish up the perm area, because the perm area is never garbage collected. As discussed here, that's just not true.

Something I enjoy doing is not taking so called "facts" as granted, and re-validating on my IDE.
Thinking that those intern() allocations will never be GCed, I was planing a presentation on how to use weakHashMap based solution can serve as an alternative cache repository for Strings, wrote a program to demonstrate an OMME caused by intern() only to find out that intern() is not so bad as I originally thought.
Try stuff yourself. You be surprised...

Other myths I'll should wright about some day are:

Regular expressions in Java are slow - FALSE! I've tested this myself, and after compiling the regex, I was able to run over than 1 million matches per second (small strings of course).

Always use StringBuffer to concatenate strings - dead wrong! if you have all concatenations in a single line, like the following, the compiler auto does it for you:
s= "Hi my name is: "+myName+ ". my lucky number is: "+num;
Run Javap on a class file using and not using StringBuffer to see that the byte code is the same.
Though this piece of code could benefit from StringBuffer to prevent rapid object creation:
for (...) {
s += strOfThisCycle;
}
In any case, Java5 introduces StringBuilder which is the unsynchronized tween of the synchronized StringBuffer class. I guess you will rarely access the same builder from different threads, therefore StringBuilder should be the default choice for ya.

Monday, January 4, 2010

New Java blog out there

A new baby blog was born: Java Tech Sharing.
Proud father: Guy Moshkovich.

I recommend adding to your RSS/Atom reader.