Singleton Service

Being a data grid, Coherence is very good at doing things in a distributed way across all nodes in the cluster. However it doesn’t offer any functionality (currently) for running a service just the once, in a reliable manner. Most applications solve this problem by simply running another process, for example you might start a second process that reads data off some queue and keeps your cluster up to date. It’d be nice however if you could leverage Coherence’s fault tolerance to ensure that, if the cluster was running, your QueueListener was always running too. In fact this is fairly simple to do and can be used for a host of common applications including loading data, keeping it up to date, adding indexes and regulating a cluster wide time stamp (article to follow).

What we want is a service that will always run on one of our Coherence nodes no matter what happens to the cluster.

This solution is conceptually simple. You have lots of processes in your cluster. When each node starts it simply checks whether the service has already been started elsewhere by attempting to lock a fictitious, well-known key:

lockCache.lock(“SingletonLockKey”);

Only one of the processes in the cluster will attain the lock. If it does attain it then it starts the Singleton Service, adds indexes, loads data or whatever. Simple. If the node running the service dies then the lock is released and another process will acquire it and start the singleton service there.

//Run in a new thread on a wrapped DefaultCacheServer i.e. should run on every node
int blockUntilLockAquired = -1;
lockCache.lock(“SingetonLockKey”);
while(true){
   boolean locked = lockCache.lock("singletonLockKey", blockUntilLockAquired);
   if(locked){
      //start singletons here
      wait();
   }
}

Posted on November 5th, 2011 in Coherence

Test Driven Development (all)

Rjw

November 7th, 2011
23:26 GMT

Thats a nice technique. Any downsides? Have to admit I haven’t used locking much – always scared me!

What I have done to achieve something similarish is a partitioned service that’s storage enabled, with a backing map listener that ensures its doing “something” for each entry it’s primary for. Takes advantage of the redistribution. You can make that service storage enabled just where you want it to run. You can easily dynamically add more entries, the entry holds the parameters to what you want the service to do. You’ll get a reasonable distribution if most tasks are the same load. ( Can screw with the keys and partitioning strategy to sort of load balance if you like.)

Certainly heavier weight but may be worthwhile for some situations.

ben

November 8th, 2011
8:31 GMT

That’s an interesting pattern. I like it. I like the fact that Coherence owns it all. My method obviously involves wrapping DefaultCacheServer which isn’t a clean. Intuitively the idea of having a long running process inside a BML seems a bit odd but I can’t think of any reason why this would be a problem – so long as there are sufficient worker threads. In fact the nice thing about your pattern is that the threads in the service are easy to monitor through JMX.

Very nice 🙂

rjw

November 8th, 2011
16:00 GMT

In our case we quickly hand off from the BML to another pool to do the actual work. (We also might have more work than service threads). Not quite sure whether having it running forever in the BML would cause an issue or not… not tried doing it that way.
One thing is that the long running process can occasionally put statuses/heartbeat into either the same key or an associated one, so you can have a look in there to monitor it from elsewhere.

ben stopford

Singleton Service

3 Comments

Have your say

Talks (View on YouTube)

Essays (all)

Test Driven Development (all)

Coherence (all)

About

Data Tech (all)

Team / Process / Interviewing (all)