Comments on: Joins: Simple joins using CQC or Key-Association

By: Bret Calvey

Bret Calvey — Wed, 14 Sep 2011 14:15:49 +0000

Have also just noticed Jonathan’s post after refreshing the page…

All great info, thanks for this – got plenty to try out now!

Ta,

-Bret

By: Bret Calvey

Bret Calvey — Wed, 14 Sep 2011 14:12:59 +0000

Hi Ben,

Again, thanks for the quick response.

I did think about maintaining our own index, but didn’t think of using Triggers to maintain them – nice idea!

I’ve got several things to try out now – thanks for your help + will keep you posted!

Cheers,

-Bret

By: ben

ben — Wed, 14 Sep 2011 13:39:43 +0000

Thanks JK 🙂

By: Jonathan Knight

Jonathan Knight — Wed, 14 Sep 2011 11:14:22 +0000

Hi Bret,

I work with Ben on ODC.

If I understand you correctly then all your related facts are pinned to the same partition using key association so when you want to do a query you know all the data you want is on a single node.

The obvious way to query the related facts is to get the relevant backing map and iterate over it looking for what you want. This is easy enough to code but could be a little slow for big backing maps. As Ben said using a PartitionAwareBackingMap would result in smaller backing maps to search through.

Alternatively you could do Filter queries against the backing maps just like you do against a normal cache. Coherence has a utility class called InvocableMapHelper which can run Filter queries agains any Map. Note though that when running a query against a backing map you will get back a Set of Map.Entry instances that contain the Binary key and value, if you want then as proper objects you would need to convert them. This is not really going to be any different than iterating over the backing map yourself, it is just done in a single method call but…

InvocableMapHelper has a query method that also allows you to provide a map of indexes so you can make the queries more efficient by using the indexes that are already on the specific cache – you just need to be able to get the index Map. As Ben has said the code for getting the indexes is bit awkward prior to 3.7 but it is possible.

Maintaining your own indexes is also possible but there could be timing issues between your indexes mutating and caches being mutated as you would not have all the locking that is associated with the built in indexes so a query that runs at the same time as a mutation migh be inconsistent.

Below are three different versions of a method that will allow you to query any backing map on the same service as the original BinaryEntry using a Filter and will also use any available relevant indexes.

3.5

@SuppressWarnings({"unchecked"})
public Set<Map.Entry> queryBackingMap(String nameOfCacheToSearch, Filter filter, BinaryEntry entry) {
    Set<Map.Entry> results;
    BackingMapManagerContext context = ((BinaryEntry)entry).getContext();
    DistributedCache distributedCache = (DistributedCache) context.getCacheService();
    ValueExtractor storageExtractor = new ReflectionExtractor("getStorage");
    Object storage = storageExtractor.extract(distributedCache);
    if (storage != null) {
        ValueExtractor indexExtractor = new ReflectionExtractor("getIndexMap");
        Map indexMap = (Map) indexExtractor.extract(storage);
        Map backingMapToSearch = context.getBackingMap(nameOfCacheToSearch);
        results = InvocableMapHelper.query(backingMapToSearch, indexMap, filter, true, false, null);
    } else {
        results = Collections.emptySet();
    }
    return results;
}

3.6

@SuppressWarnings({"unchecked"})
public Set<Map.Entry> queryBackingMap(String nameOfCacheToSearch, Filter filter, BinaryEntry entry) {
    Set<Map.Entry> results;
    PartitionedCache partitionedCache = (PartitionedCache) entry.getContext().getCacheService();
    Object storage = partitionedCache.getStorage(nameOfCacheToSearch);
    if (storage != null) {
        ValueExtractor extractor = new ReflectionExtractor("getIndexMap");
        Map indexMap = (Map) extractor.extract(storage);
        Map backingMapToSearch = entry.getContext().getBackingMap(nameOfCacheToSearch);
        results = InvocableMapHelper.query(backingMapToSearch, indexMap, filter, true, false, null);
    } else {
        results = Collections.emptySet();
    }
    return results;
}

3.7

@SuppressWarnings({"unchecked"})
public Set<Map.Entry> queryBackingMap(String nameOfCacheToSearch, Filter filter, BinaryEntry entry) {
    Map backingMapToSearch = entry.getContext().getBackingMap(nameOfCacheToSearch);
    Map indexMap = entry.getBackingMapContext().getIndexMap();
    return InvocableMapHelper.query(backingMapToSearch, indexMap, filter, true, false, null);
}

You can see in the 3.5 and 3.6 versions we use ValueExtractors to call methods via reflection on various classes. This is because the code in the non-public parts of Coherence is not written in Java but something called TDE (Tangosol Development Environment) which compiles to byte code and the Java compiler has trouble with it so we have to use reflection. I use IntelliJ as an IDE and although IntelliJ complains about the 3.6 code and highlights it as errors it still compiles. Also as it is non-public you can see it can change between releases without any documentation or release notes. Version 3.7 is by far the easiest as Oracle seem to be opening up more of the internals and exposing them via the puplic API.

If you want to have a Set of the real key and value rather than the Binary versions then you can use the Coherence converter classes like this:

BackingMapManagerContext context = entry.getContext();
Converter keyUpConverter = context.getKeyFromInternalConverter();
Converter keyDownConverter = context.getKeyToInternalConverter();
Converter valueUpConverter = context.getValueFromInternalConverter();
Converter valueDownConverter = context.getValueToInternalConverter();
Set converted = new ConverterCollections.ConverterEntrySet(results, keyUpConverter, keyDownConverter, valueUpConverter, valueDownConverter);

The code above basically wraps your Set in a ConverterEntrySet using the converters from the cache service. When you access Map.Entry values from the set the key and value in these will be converted to the proper Object values.

One other comment would be that using an EntryProcessor perform your queries would be quite slow as there is a lot of locking involved. It would be better to use a custom EntryAggregator as this does not involve so much locking as they are read-only and tests have shown aggregators run much quicker. The aggregate method of the aggregator is passed a Set of entries, which will be BinaryEntry instances so you can work with them the same way you would with an EntryProcessor, you just cannot update them.

By: ben

ben — Tue, 13 Sep 2011 16:50:47 +0000

Hey Bret

So you are all “facts” in our model. We do this too (joining facts) but we ensure that all downward references are key based. Sounds like your keys are the other way around. Our model is not necessary for your use case so would add unnecessary complexity.

One quite simple option is to manage your own index by creating a cache that contains the ‘reverse index’ you need to join so you don’t have to do a scan. This is easy to maintain by simply configuring a trigger to keep these index caches in order when you add and remove from the caches.

I’d suggest trying if first with a PartitionAwareBackingMap. You may find that the performance is actually ok if you have a high partition count. If it’s too slow implement the reverse index (or try using the coherence ones but as I said the code is a bit crazy).

I’ll ask JK to comment too (a very knowledgeable colleague of mine)

By: Bret Calvey

Bret Calvey — Tue, 13 Sep 2011 13:11:22 +0000

Hi Ben,

Thanks for the quick reply.

In our system, we have a domain object called “Event” that has several child type objects. Some of these child objects in turn may have child type objects. Basically, everything under an event uses Key Association so that all of the related data about an event is stored on the same partition as the event.

Therefore, I think option 1) above may work for us.

We’re still in the experimental stage with this at the moment – we are basically using just Coherence as a map and we want to start using some of these more advanced features.

I’d be very interested in anything you can send me for either approach you suggest (I appreciate some of the information may be in the “draft” stage)

Thank you very much for your help,

-Bret

By: ben

ben — Mon, 12 Sep 2011 16:34:52 +0000

Hi Bret

This is an interesting topic (for me anyway) and one close to my heart. Firstly though, are you sure you can collocate all your data based on key affinity?? It is of course possible but most domain models will not support it due to crosscutting keys (i.e. there is no single key that all objects share that can be used for partitioning)?

I’ll answer your question twice based on the answer to this:
(1) If the answer is yes – i.e. you can partition everything with the same key association – then there is no easy way to access the Coherence indexes from an entry processor. It is possible but it involves doing some reflection on some non-java classes that exist deep in the coherence core (the coherence guys have some crazy language that generates bytecode). If you are really keen i can dig out the code. However you should be able to use a PartitionAwareBackingMap to reduce the length of your traversals significantly by limiting it to a single partition without having to do anything too crazy.

(2) However if you can’t ensure everything shares the same key – the more general use case (and the one we have on ODC) – you could try our solution. This is quite different. We split entities into Facts and Dimensions (like in a data warehouse snowflake schema) and then replicate the dimension data using CQCs. The result is “query nodes” that have all the Dimensions on (stuff with different keys). This is similar to the approach you refer to but is more efficient as we can apply indexes to the CQC’s and we have this funky Connected Replication Pattern that minimises our memory utilisation (important when you are replicating data)

I described the whole approach in a presentation which you can view here if you are interested: http://www.benstopford.com/2011/01/27/beyond-the-data-grid-building-a-normalised-data-store-using-coherence

I am also on the cusp of publishing a write up but it’s not quite ready yet. I’m happy to send you a draft copy if that is of use.

By: Bret Calvey

Bret Calvey — Mon, 12 Sep 2011 14:52:45 +0000

Hi,

I have read this article with interest and I am looking at doing something similar in our system to reduce the number of network hops. It would be nice to just make one call to get all of our data instead of several calls (i.e. get the parent, get the child type As, get the child type Bs etc).

The use case I want to experiment with is to look up all child items of a parent item by accessing the backing map directly.

Let’s say I have two classes, P and C (parent and child) in a one-to-many relationship.

The key of the C class contains the key of the P class so we use key association to locate related items on the same storage node.

In our domain, we cannot derive what the child keys will be given the parent keys and it would be difficult to change our legacy system so that it generates child keys in this way (we would also have to totally change our database schema – so not an option). But the child type contains the ID of the parent type. We have indexes set up so that this “parentId” on the child is indexed.

From what I have read above, it would be simple to find the child items if there was only one of them and it had the same ID as the parent.

In my case, it looks like I have to scan through all of the child entries looking for matching items (i.e. where the child’s parent ID = the ID of the parent object).

I am not too comfortable with having to iterate over potentially millions of items in order to find 2 or 3 records, so I wondered if there was any way I could take advantage of the indexes?

Is there a way within an EntryProcessor to get the index information and look up the child entries given the parent ID so we do not have to iterate over everything?

We are currently using Coherence 3.5, but I am aware that support for accessing backing maps from EntryProcessors has been improved in 3.7.

In the “BackingMapContext” API docs (3.7), I can see this method…

—————————————————
java.util.Map getIndexMap()

Return a map of indexes defined for the cache that this BackingMapContext is associated with. The returned map must be treated in the read-only manner.

http://download.oracle.com/docs/cd/E18686_01/coh.37/e18683/com/tangosol/net/BackingMapContext.html#getIndexMap__
——————————————–

I’m not sure, but I think this may be able to help me…??

Has anyone tried anything similar before?

I will keep experimenting and post any findings here…

Thanks in advance,

-Bret

By: Nicolas

Nicolas — Wed, 16 Feb 2011 09:51:56 +0000

Hi Ben,
Thx for answering the question and sorry for taking a month to get back to you 🙂

Cheers,
Nico

P.S.: I think Dave recommended your blog actually so thx to Dave as well !!!

By: ben

ben — Wed, 29 Dec 2010 17:41:49 +0000

Hi Nico

Merry Christmas!

A very good question.

My example was very simple. It is more usual to join caches that don’t share the same primary key as you say. The most usual case would be to have OrderDetails object having a Foreign Key reference back to the Orders object via the OrdersId (many to one relation).

OrderDetails.orderId => Order.orderId

This presents a problem if we are querying the Orders cache as the join as an implicit direction OrderDetails => Orders dictated by the presence of the foreign key. However this direction doesn’t help us much as we’d like to query Orders and join in the opposite direction TO the relevant OrderDetails. We really need a Reverse Index that points in this direction, but we don’t have one.

One solution I eluded to in the post is to just scan the keys in the cache. Not a scalable solution which is why I didn’t recommend it. The bit you refer to is suggesting an alternative approach in which a suitable heuristic is used to do the reverse lookup efficiently by making the OrderDetailsId derivable at runtime.

For example you might define OrderDetailsId as the association of the OrderId and a monotonically incrementing integer. Then you can simply code something to join all OrderDetails onto each Order via a limited set of HashMap lookups. In pseudocode:

public Object aggregate(Set setEntries) {
   Set results = new HashSet();
   for (Object e: setEntries){
      Binary binaryOrder = ( (BinaryEntry)e).getBinaryValue();
      PofValue pofOrder = PofValueParser.parse(binarOrder, (PofContext) entry.getSerializer());
      Object orderId = pofOrder.getChild(Order.ORDER_ID_POF_VAL).getValue();
      Object orderIdInternal = entry.getContext().getKeyToInternalConverter().convert(orderId);
      Map detailsCache = entry.getContext().getBackingMap("OrderDetails");
	  
      //different from here
      addDetails(detailsCache, results, ordersId, binaryOrder);
   }
   return results;
}

private void addDetails(Map detailsCache, Set results, Object ordersId, Binary binaryOrder){
   int monotonicId = 0;
   while(true){
      Object binaryOrderDetails = detailsCache.get(new OrderDetailsId(ordersId, monotomicId));
      if(ordersId ==null){
          break;
      }
      results.add(new Object[]{binaryOrder, binaryOrderDetails});
      monotonic++;
   } 
}

Does this answer your question?