Wednesday, 12 December 2007

Republished: The Myth of Transparent Clustering

Since moving the JBoss Cache blogs to blogger.com, I've noticed that some of my old blogs on jboss.org seem to have disappeared. People have been specifically asking for this one, The Myth of Transparent Clustering - first published on jboss.org on the 5th of June 2006 and referenced by this TSS thread.

So here it is again, in it's original, unmodified form. Enjoy!

- Manik



I recently sat through a webinar by a certain unnamed vendor of Java clustering components, and was surprised to note that engineers experienced in the art of clustering Java applications still tout transparent clustering. It is one thing to see the concept of transparent clustering upheld by engineers whose core focus is not clustering, but coming from engineers who spend most of their waking hours focused on this task is another story.

Let's first explore what I understand to be - and what people seem to promote as - transparent clustering. In my mind, if clustering is to be truly transparent to your application - or more specifically, to you, the application's author - clustering should be enabled, configured, and tuned independent of the application, as a separate aspect. In addition, no changes in application design or implementation should be necessary for the application to be successfully clustered or for it to scale efficiently. Lets refer to these arguments as enablement and performance. In this article I only refer to the server-side aspects of clustering and not client-side code which needs to be able to fail over transparently.

Consider the first argument, enablement. Enabling, configuring and tuning the way your application is clustered in a transparent fashion is well within the grasp of what Java has to offer. Between XML configuration and annotations, AOP frameworks and runtime bytecode instrumentation, your code can be clean and have absolutely no dependency on clustering libraries or APIs. This is commonly called the API-less model. (In reality this is not completely API-less, as there will always be a dependency on an XML schema or annotations which, while may not be a compile-time Java dependency, still is a dependency on an API albeit a weaker one. I will not focus on this and am willing to overlook this dependency for the time being.). It is after achieving this that some stop, sit on their laurels and claim having achieved transparent clustering.

What is usually overlooked is the second half of my argument, performance, usually let down by not designing and implementing your application with clustering in mind. This is waved away as unimportant, that modern computing resources mean that 'minor inefficiencies' will not be a problem. Such wave-awayers obviously haven't heard of Sun Fellow Peter Deutsch's Fallacies of Distributed Computing. As much as people would like to think that with modern techniques like AOP, bytecode injection, annotations, along with a healthy dose of ignorance of reality, wishing upon a star and belief in the tooth fairy, clustering can be a truly decoupled aspect that can be applied to anything, they are wrong. I've heard a comment about JVMs handling garbage collection transparently and why clustering should similarly be as transparent. The bitter truth is that clustering can at best be thought of as half an aspect.

In a literal sense, with an API-less model, clustering can be implemented transparently. In all but the simplest of applications, though, this will not be sufficient to meet the goals of the clustering attempt without some consideration for clustering in the application. Large objects that need serialisation, considerations with use of transient and static variables, sub-optimal calls between layers of business logic, a tight loop of synchronisation can all lead to very inefficient clustering while these may be perfectly acceptable and performant in a standalone application. If your application wasn't written with clustering in mind, don't expect it to scale very well with the heavily promoted transparent clustering frameworks available.

So what does this mean for being able to cluster your application? Simple. Keep clustering in mind when designing it, even if there is no immediate requirement for clustering. It will save you the headache, cost and effort of refactoring your code at a later date when you find that your application does not scale as well as you thought it would. What about proprietary COTS applications, for which you don't have the source code? Well, let's just hope that your friendly neighbourhood proprietary COTS vendor had the foresight to design the application with clustering in mind!

--
Manik Surtani
Lead, JBoss Cache


No comments: