Thursday, 17 July 2008

JBoss Cache 3.0.0 "Naga" - first Alpha now available!


So to follow up from my post yesterday on MVCC, I'd like to announce that 3.0.0.ALPHA1 - codenamed Naga after the Naga Jolokia pepper - is now ready for download.

This release, in addition to a few major features like MVCC and a new configuration file format, contains a number of internal refactorings which will improve performance and stability even on previous locking schemes.

Have a look at the complete changelog of what is included in 3.0.0.ALPHA1. If you are interested, visit our JIRA page to look at what else is scheduled for 3.0.0.GA as well and vote on things - it helps us organise and prioritise features.

As usual, the download is available on SourceForge and the forums are open for your comments. :-) Please do stress this release out with your performance tests, I'd love to hear how it deals with load and massive concurrency.

Cheers
Manik

Configuration changes in JBoss Cache 3

JBoss Cache 3 comes with a new, improved layout of the .xml configuration file. Certain constraints were dropped (e.g. explicit mbean support from xml file), so we took a step forward an refactored the entire configuration to be more readable, concise and consistent.

Summary of changes


  • new elements were created to gather related configurations; many of the 2.x configuration elements are attributes of these new elements

  • xml schema file is shipped now to verify configuration correctness. Schema can also be used with editing tools and IDEs to make it simpler to write configuration files. The parser was also updated to validate against this XSD (validation errors are being reported in the logs)

  • we supply tools for converting from an old XML file format to the new one, so that one does not have to migrate the scripts by hand

  • we kept the old parser, so old style configuration files are perfectly supported (not recommended though, as there you cannot specify features like MVCC and custom interceptors)

  • Configuration API is unchanged, so if you used programmatic configuration in your code it will work just fine

  • all the time related configurations are now in milliseconds (old eviction configurations elements were in seconds). The XSLT script that transforms from old configuration file to new one know how to transform seconds to milliseconds, when necessary.

  • it is possible to configure custom interceptors declaratively, in the configuration file

Migration old files to new format

The distribution comes with scripts that will automatically handle the transformation. You can find them in '
DISTR_ROOT/resources/config-samples/config2to3.sh'
and equivalent script for DOS. Executing the script will provide usage and help.

More on the new elements...

The distributions comes with several sample configuration files (DISTR_ROOT/resources/config-samples) which are a good start if you need to write your own, customized configuration.
Below are snapshots from the new configuration file, followed by a description of their meaning.

<locking isolationLevel="REPEATABLE_READ" lockAcquisitionTimeout="10000" nodeLockingScheme="mvcc"/>

the cache will use an mvcc locking scheme (other possible values are 'pessimistic' and 'optimistic' with isolation level set to 'REPEATABLE_READ'
<transaction transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup"/>

specify a lookup class for the transaction manager.
<stateRetrieval timeout="20000" fetchInMemoryState="true"/>

This instance will retrieve state from other existing instance, having the given timeout.
<transport clusterName="JBossCache-Cluster"/>

specifies the JGroups cluster name and also uses the default JGroups configuration for transport.
<replication>
<sync replTimeout="15000"/>
<buddy enabled="true" poolName="myBuddyPoolReplicationGroup" communicationTimeout="2000">
<dataGravitation auto="false" removeOnFind="true" searchBackupTrees="true"/>
<locator class="org.jboss.cache.buddyreplication.NextMemberBuddyLocator">
<properties>
numBuddies = 1
ignoreColocatedBuddies = true
</properties>
</locator>
</buddy>
</replication>

This is a synchronously replicated cache using buddy replication.
For using the cache in asynchronous invalidation mode that does not use replication queues, following configuration might be used:
<invalidation>
<async useReplQueue="false"/>
</invalidation>

If you want to use a local cache, you can do that by not specifying neither an 'replication' nor an 'invalidation' element.

<startup regionsInactiveOnStartup=”true”/>

the regions will not be active when the cache starts, so it will not respond to replication messages.

<serialization useRegionBasedMarshalling="true"/>

Tells the cache to use different cache loaders for different regions, when (de)serializing data.

<jmxStatistics enabled='true'/>

tells the cache to gather usage information and expose it as JMX statistics


<eviction wakeUpInterval="5000" defaultPolicyClass="org.jboss.cache.eviction.LRUPolicy" defaultEventQueueSize="200000">
<default>
<attribute name="maxNodes">5000</attribute>
<attribute name="timeToLiveSeconds">1000</attribute>
</default>
<region name="/org/jboss/data">
<attribute name="maxNodes">5000</attribute>
<attribute name="timeToLiveSeconds">1000</attribute>
</region>
</eviction>

This is an eviction configuration. Default policy and queue size (attributes of 'eviction' element) allows defining defaults value for subsequent regions. Also, the 'default' element defines cache wide eviction strategy, overridden on an region bases by following 'region' elements.


<loaders passivation="false" shared="true">
<preload>
<node fqn="/"/>
</preload>
<!-- if passivation is true, only the first cache loader is used; the rest are ignored -->
<loader class="org.jboss.cache.loader.JDBCCacheLoader" async="false" fetchPersistentState="true"
ignoreModifications="false" purgeOnStartup="false">
<properties>
cache.jdbc.table.name=jbosscache
cache.jdbc.table.create=true
cache.jdbc.table.drop=true
cache.jdbc.table.primarykey=jbosscache_pk
...
</properties>
</loader>
</loaders>

This is an cache loader configuration for a shared JDBCCacheLoader, no passivation. The root node will be preloaded on startup, specified through 'preload' element.


<customInterceptors>
<interceptor position="first" class="org.jboss.cache.sample.AaaCustomInterceptor">
<attribute name="attrOne">value1</attribute>
<attribute name="attrTwo">value2</attribute>
<attribute name="attrThree">value3</attribute>
</interceptor>
</customInterceptors>

This configures the addition of a custom interceptor to the interceptor chain, after the default interceptor chain for the current configuration is built. The custom interceptor must have setters corresponding to 'attrOne', 'attrTwo' and 'attrThree', must implement org.jboss.cache.interceptors.base.CommandInterceptor, must have an default constructor.
Its location in the chain will be determined by the value of the "position" attribute".

Wednesday, 16 July 2008

MVCC has landed

I will soon be releasing the first alpha on JBoss Cache 3.0.0 - codenamed Naga after the Naga Jolokia pepper, naturally keeping with the pepper-naming theme. This is an alpha and doesn't nearly have all of what the GA release will have, but it does have two very significant features which I would like to share with the world. One is MVCC, and the other is a new configuration file format - which I've left for Mircea Markus to talk about in his post.

UPDATE: This has now been released. See my post on the release.

So on with MVCC. This is something people have been waiting for, for a long while now. Historic locking schemes - pessimistic locking and optimistic locking - have always been somewhat slow, inefficient, and synchronization-heavy. So we've redesigned concurrency and locking in the cache from scratch and have MVCC. Our designs for MVCC have been online for a while now. I do need to update the design page with more implementation details though, which did differ slightly from the original designs.

Feature Summary

In a nutshell, MVCC is a new locking scheme (MultiVersion Concurrency Control, something that most good database implementations have been using for a while now). our implementation is particularly interesting in that it is completely lock-free for readers. This makes it very efficient for a read-mostly system like a cache. The summary of features are:

  • Writers don't block readers. At all.
  • Completely lock-free for readers.
  • Writers are fail-fast (so no DataVersionExceptions at commit time, like when using our old Optimistic Locking implementation)
  • Deadlock-safe (thanks to non-blocking readers)
  • Memory efficient since we use lock striping (for writer locks) rather than one-lock-per-node.
  • Write locks implemented using the JDK's AbstractQueuedSynchronizer - which makes it free of implicit locks and synchronized blocks.
  • Locks held on Fqns and not Nodes. Which solves problems with concurrent create-and-removes seen occasionally with Pessimistic Locking under high load.
So what does this mean to everyone? Basically, a much more efficient locking scheme (expect statistics and comparisons here soon), with all the best that the previous two locking schemes had to offer (fail-fast - PL, concurrent readers and writers, deadlock-free - OL).

Switching it on

In the upcoming alpha, you enable this by setting your node locking scheme to MVCC:

Configuration c = new Configuration(); c.setNodeLockingScheme(Configuration.NodeLockingScheme.MVCC);

MVCC allows two isolation levels - READ_COMMITTED and REPEATABLE_READ. All other isolation levels - if configured - will be upgraded or downgraded accordingly. There are a further two configuration elements for MVCC: concurrencyLevel and writeSkewCheck.

Concurrency levels

Concurrency level is a tuning parameter that helps determine the number of segments used when creating a collection of shared locks for all writers. This is similar to the concurrencyLevel parameter passed in to the JDK's ConcurrentHashMap constructor. Unlike CHMs, this defaults to 50 - and should be tuned accordingly. Bigger = greater concurrency but more memory usage - thanks to additional segments.

Write skews

A write skew is a condition where with REPEATABLE_READ semantics, a transaction reads a value, and then changes to to something based on what was read, but another thread had changed it between the two operations. Consider:

// start transaction

line 1: int counterValue = readSomething();
line 2: writeSomething(counterValue + 1);

// end transaction


Since MVCC offers concurrent reads and writes, thread T1 could be at line 1, another thread (T2) could be at line 2 updating the value after T1 has read it. Then when T1 comes to writing the value, this would overwrite the update that T2 had made. This is a write-skew.

Now depending on your application, you may not care about such overwrites, and hence by default we don't do anything about it and allow the overwrite to continue. However, there are still plenty of cases where write skews are bad. If the writeSkewCheck configuration parameter is set to true, an exception is thrown every time a write skew is detected. So in the above example, when T1 attempts to writeSomething(), an exception will be thrown forcing T1 to retry its transaction.

What about other locking schemes?

With our final release, we hope to make MVCC the default option. Optimistic and Pessimistic locking will still be available but deprecated, and will be removed in a future release based on the success of MVCC.

Great! Where can I get it?

I will announce the release on this blog soon. Once the release is available, please download it and try it out - feedback is very important, especially in highly concurrent environments under a lot of stress.

If you are that impatient, you're welcome to check out JBoss Cache core trunk from SVN and have a play-around. :-)

Cheers
Manik

Friday, 4 July 2008

JBoss Cache Searchable

For those who don't know yet, I've been working on a GSoC (Google Summer of Code) project and is the integration package between JBoss Cache and Hibernate Search. Basically, this enables users to search through JBoss Cache using Hibernate Search. According to another Surtani at JBoss, it's a very cool project and as far as I'm concerned it's just a fun piece of code to be working on :). It's been an interesting, topsy-turvy ride over the past couple of months getting all this stuff to work - and is still not optimised, but that will be done over the next couple of weeks. Here is some basic information I copied out from the wiki.

About this package

The goal is to add search capabilities to JBoss Cache. We achieve this by using Hibernate Search to index user objects as they are added to the cache and modified. The cache is queried by passing in a valid Apache Lucene query which is then used to search through the indexes and retrieve matching objects from the cache.


Usage

How will I use jbosscache-searchable?

You can create an instance of a searchable-cache. People who use jbosscache-core frequently will find this quite easy as it is very similar to creating an instance of the cache.

The SearchableCache interface is a sub-interface of Cache. Hence, it has the usual put(), get() and remove() methods on it. However, it also has the createQuery() method. This will return a CacheQuery object with which the list() method can be called. This will return all the results from the search in a list. You can also call an iterator() method on it - which returns a QueryResultIterator which is a sub-interface of the jdk's ListIterator.

How to create a searchable cache, code example: -

Start by creating a core cache.

Cache cache = new DefaultCacheFactory().createCache();


Similar to that create a searchable cache. As parameters, you must pass in the cache instance that you have created and the classes that you wish to be searched.

SearchableCache searchable = new
SearchableCacheFactory().createSearchableCache(cache, Person.class);


Lets say that I have 100 objects in this class and I want to search for the people with name John.As with Hibernate Search, create a Lucene query and a QueryParser.

QueryParser queryParser = new QueryParser("name", new StandardAnalyzer());


"name" is the field within Person that I want to be searching through.

Query luceneQuery = queryParser.parse("John");


"John" is the word within the name field that I want to be searching for.

CacheQuery cacheQuery = searchableCache.createQuery(luceneQuery);


The cacheQuery object will now have all the instances of Person with name John. I can now put all of these into a List: -

List results = cacheQuery.list();


Annotations on your classes.

For the classes that you want to be searched, you will have to annotate them with Hibernate Search annotations.

  1. @ProvidedId - Firstly, you should not annotate a field with @DocumentId as normally with Hibernate Search. The @ProvidedId is so that Hibernate Search will not expect a @DocumentId and will know that you will provide one later. This is to say that a class with @ProvidedId doesn't need a @DocumentId. Although at this point it has not been tested, a @DocumentId in a field which is in a class that has a @ProvidedId should not break your system. This is a class annotation.
  2. @Indexed - This is so that Hibernate Search will index the class in Lucene and is annotated on the class.
  3. @Field - Hibernate Search will put all class-fields with this annotation into the index. With each @Field, you must also specify that the ''store'' property is set to ''yes''. Otherwise the field will not be stored. For example: - @Field (store = STORE.YES)

Also see http://www.hibernate.org/hib_docs/search/api/overview-summary.html for more information on annotations.


For more information, see the wiki page.

Wednesday, 2 July 2008

CRs everywhere. POJO Edition as well.

So there have been a whole host of CRs on JBoss Cache Poblano. We've been releasing a whole bunch of these fairly close to one another, and have finally come to a pretty stable and solid CR that may well be the last before GA. 2.2.0.CR6 it is - and you know where to get it.

In addition, after a long wait, we also finally have the POJO Edition catching up with the release numbers. jbosscache-pojo 2.2.0.CR5 has been released with a CR6 coming out pretty shortly. The wait, primarily due to dependencies on fixes and new features in javassist and JBoss-AOP, is worth it - POJO Edition now handles array interception, more of which I expect Jason to blog about shortly.

Since we are at the final stages of release candidates for 2.2.0, I urge you to download and try out this release, and feed back as much as possible.

Important links are as follows:

Enjoy!
Manik