Wednesday 16 July 2008

MVCC has landed

I will soon be releasing the first alpha on JBoss Cache 3.0.0 - codenamed Naga after the Naga Jolokia pepper, naturally keeping with the pepper-naming theme. This is an alpha and doesn't nearly have all of what the GA release will have, but it does have two very significant features which I would like to share with the world. One is MVCC, and the other is a new configuration file format - which I've left for Mircea Markus to talk about in his post.

UPDATE: This has now been released. See my post on the release.

So on with MVCC. This is something people have been waiting for, for a long while now. Historic locking schemes - pessimistic locking and optimistic locking - have always been somewhat slow, inefficient, and synchronization-heavy. So we've redesigned concurrency and locking in the cache from scratch and have MVCC. Our designs for MVCC have been online for a while now. I do need to update the design page with more implementation details though, which did differ slightly from the original designs.

Feature Summary

In a nutshell, MVCC is a new locking scheme (MultiVersion Concurrency Control, something that most good database implementations have been using for a while now). our implementation is particularly interesting in that it is completely lock-free for readers. This makes it very efficient for a read-mostly system like a cache. The summary of features are:

  • Writers don't block readers. At all.
  • Completely lock-free for readers.
  • Writers are fail-fast (so no DataVersionExceptions at commit time, like when using our old Optimistic Locking implementation)
  • Deadlock-safe (thanks to non-blocking readers)
  • Memory efficient since we use lock striping (for writer locks) rather than one-lock-per-node.
  • Write locks implemented using the JDK's AbstractQueuedSynchronizer - which makes it free of implicit locks and synchronized blocks.
  • Locks held on Fqns and not Nodes. Which solves problems with concurrent create-and-removes seen occasionally with Pessimistic Locking under high load.
So what does this mean to everyone? Basically, a much more efficient locking scheme (expect statistics and comparisons here soon), with all the best that the previous two locking schemes had to offer (fail-fast - PL, concurrent readers and writers, deadlock-free - OL).

Switching it on

In the upcoming alpha, you enable this by setting your node locking scheme to MVCC:

Configuration c = new Configuration(); c.setNodeLockingScheme(Configuration.NodeLockingScheme.MVCC);

MVCC allows two isolation levels - READ_COMMITTED and REPEATABLE_READ. All other isolation levels - if configured - will be upgraded or downgraded accordingly. There are a further two configuration elements for MVCC: concurrencyLevel and writeSkewCheck.

Concurrency levels

Concurrency level is a tuning parameter that helps determine the number of segments used when creating a collection of shared locks for all writers. This is similar to the concurrencyLevel parameter passed in to the JDK's ConcurrentHashMap constructor. Unlike CHMs, this defaults to 50 - and should be tuned accordingly. Bigger = greater concurrency but more memory usage - thanks to additional segments.

Write skews

A write skew is a condition where with REPEATABLE_READ semantics, a transaction reads a value, and then changes to to something based on what was read, but another thread had changed it between the two operations. Consider:

// start transaction

line 1: int counterValue = readSomething();
line 2: writeSomething(counterValue + 1);

// end transaction


Since MVCC offers concurrent reads and writes, thread T1 could be at line 1, another thread (T2) could be at line 2 updating the value after T1 has read it. Then when T1 comes to writing the value, this would overwrite the update that T2 had made. This is a write-skew.

Now depending on your application, you may not care about such overwrites, and hence by default we don't do anything about it and allow the overwrite to continue. However, there are still plenty of cases where write skews are bad. If the writeSkewCheck configuration parameter is set to true, an exception is thrown every time a write skew is detected. So in the above example, when T1 attempts to writeSomething(), an exception will be thrown forcing T1 to retry its transaction.

What about other locking schemes?

With our final release, we hope to make MVCC the default option. Optimistic and Pessimistic locking will still be available but deprecated, and will be removed in a future release based on the success of MVCC.

Great! Where can I get it?

I will announce the release on this blog soon. Once the release is available, please download it and try it out - feedback is very important, especially in highly concurrent environments under a lot of stress.

If you are that impatient, you're welcome to check out JBoss Cache core trunk from SVN and have a play-around. :-)

Cheers
Manik

9 comments:

Ivelin said...

Congratulations!

Question - is locking at tree node granularity or it goes deeper to node map entry level?

Manik Surtani said...

No, it still is at node level. We may consider moving to a finer granularity later but that depends on how successful/popular a flat-map view of the cache becomes (post-JSR-107).

Galder Zamarreno said...

Fantastic job Manik!! On top of all the technical benefits, I can't wait to having to support only one locking mechanism ;)

Anjan said...

hi manik,

congrats on the new release.

Do you know how this compares to the ehcache/terracotta combination in terms of features and performance ?

Also, the link to MVCC documentation : what tool did you use to do the UML diagrams ?

Thank you,

BR,
~A

Manik Surtani said...

Thanks.

There is a lot I can say about how this compares, but I will hold back until I publish benchmarks, which should be very soon. Let's put it this way - *I'm* very pleased with the way the numbers look so far! ;)

Regarding UML diagrams, I use MagicDraw.

Unknown said...

Great post!

One question that came to my mind (this might be obvious to you so bear with me for a bit) is about the write skew situation you mentioned.

The article mentions that write skew may happen during REPEATABLE_READ isolation level. But isn't that contrary to the REPEATABLE_READ definition. I mean isn't REPEATABLE_READ supposed to enforce that any schedule for transactions (threads in our case) will see the read the same value. Hence the "REPEATABLE" in REPEATABLE_READ.

Thanks

Manik Surtani said...

@Yousuf write skews have nothing to do with R_R. R_R guarantees that your reads are consistent. So, T1 reads x=1. T2 also reads x=1. T2 sets x = x+1. T1 still reads x=1 thanks to R_R. T1 now tries to update x = x+1. Logically, for a shared counter like this, you would expect x = 3 after both T1 and T2 complete. But due to a write skew (T1 overwrites/ignores what T2 has done) this does not happen.

Hope this clarifies.

Unknown said...

Aah. I see. In fact now that you say this I am wondering why I got confused in the first place.

Anyway thanks.

anothermarkus said...

Does the writeSkewCheck actually work?

I've enabled the parameter and performed a simple test:

Node rootNode = getRoot();
Node sequenceNode = rootNode.getChild(Fqn.fromString(SEQUENCE_FQN));

Integer sequence = sequenceNode.get(SEQ);

sequenceNode.put(SEQ,sequence+1);


I had two threads do this, the first had a longer delay between reading and writing.


What would I need to do to simulate (test) writeSkewCheck actually works, and is enabled???

Thanks so much!