Thursday, September 4, 2008

Miniature scale pro-active IT - Creating windows services dependencies

Yesterday night we had a scheduled power shutdown in the Dev lab. Today morning, the lab manager rose up early to get all servers running before the armies of developers arrive to the office. My work includes interacting with a Lotus Sametime server (IBM's Instant Messaging (IM) server), so I run my own private IM server. starting the day's work, I quickly noticed that my IM client fail to log in to the IM server. In fact, all of the developers could not log in to their own private IM servers.
Remembering that during the client log in process the IM server validates the client supplied log in credentials against a central LDAP server, the LDAP server became an immediate suspect.

The LDAP server we're using is an old IBM ITDS LDAP server running on Win2000. It's comprised out of two processes: the ITDS process that parses and execute LDAP queries, and a DB process (DB2) that takes care of data persistency. Both processes are registered as Windows services.

The investigation commenced! Maybe the LDAP server is down? I went a head and checked the ITDS and DB2 services status, both were running. Hmm... I moved on to inspect the ITDS log, and saw that during its start up stage it failed to create a connection to the DB2, therefore it resorted to starting in a, "crippled", configuration only mode. That means that it just sits there, wasting random CPU cycles, giving the illusion that it's there to provide service, but not actually answering any queries.
To remedy the situation I simply re-started the ITDS service. It started up normally and began servicing incoming LDAP queries from the IM servers.
At this point, you'll be tempted to announce world wide: "I fixed it!", but before you do that, stop and think about it for a minute; what is it exactly that you fixed? In did, the ITDS began servicing queries, and the client can log in, meaning that the current manifestation of the problem was eliminated, but did you fix the problem itself? Part of being pro-active means that you solve future problems before they actually occur. What stops the problem from re-occurring the next time someone decides to restart the server? in order to solve it for good, you first need to understand what was the cause of the problem.

So, what happens during a server start up? The ITDS and DB2 services startup-type is set on Automatic, thus they start when the OS starts. The db connection error message fits a scenario in which the ITDS service started and tried to connect to the DB2 before the DB2 service finished starting up.
We would like to instruct the ITDS service to be less hasty, and wait for the DB2 service to finish starting, before stepping into itself start up process. Educating it can be achieved by defining a service dependency, stating that the ITDS service is dependent on the DB2 service.

Implementing it: Dependencies can't be created using the windows MMC "computer manager" snap-in GUI, so you'll have to get your hands dirty with registry mud using the following procedure.

Problem uprooted! You won you pro-activity badge.