Sample Answers: Are You an HPC Architect?

Question 1: Worked Answer:


The point of this question is to illustrate how both Coherence and Datasynapse can individually solve this problem. However Datasynapse is the better solution. The team should consider:

A grid solution:

A non-grid solution:


In the almost all situations the grid solution would be preferable for this use case. The reasons are as follows:-

But – for equally sized clusters Coherence can process tasks faster (at lower latency) than the grid due its ability to collocate data and processing. The question is whether the latency penalty involved with shipping the data to the engine prior to computation represents a significant proportion of the total task duration.

Question 2: Worked Answer :

This is actually supposed to be a trick question. I hoping that candidates would look at the problem and realise that the requirements should be easily met by the solution as is. There is no need for adding extra technology. The team should simply look at optimising what they currently have. The most likely scenario being that the query on the database is badly designed and is table scanning, slowing the whole thing down.

This conclusion can be corroborated by the simple calculation:

10,000 x 3KB = 30MB = 240Mb of data transported in up to 60s

=> Worst transfer rate is 240Mb/60 = 4Mb/s which is well below the theoretical performance of such a DB.

However, if we were to prove that the database was adequately optimised (lets assume it has some incredibly overly normalised structure which is somehow leading to the latencies observed) it would be reasonable to seek a performance improvement by adding a caching layer over the database

Using Coherence the trades would be loaded into a cache that would be keyed on the trade ID. An index could be used on the user field from which the queries that drive the home page of the application would be derived.

It should be noted that any performance benefit is more likely to come from the fact that such a change would force the data to be now held in a denormalised form and this in would likely constitute more of a speed increase than adding the caching technology itself.

Note to Facilitator: I suggest that you encourage them down both the paths i.e. make sure they notice that the solution should meet its requirements without additional technology as well as coming up with a caching solution in its own right.

Question 3: Worked Answer:

Candidate Guide:

  1. Transactionality: Data writes must be transactional for most financial applications of this sort. This means we need to make sure the record hits a disk before it returns to the user. Candidates should be encouraged to consider using a Coherence CacheStore to do this (write to DB). Should it be sync or async? Is async safe enough? Alternatively maybe a messaging system could be quicker than hitting the DB? Why??
  2. Replicated or Partitioned: There is a lot of data. To hold all of it in the cache would imply using a Partitioned topology. Replicated caching could only be used in a read-through mode with an eviction policy to ensure only a subset of the data was cached. Note replicated caching can be provided by a wealth of OS technologies too.
  3. Move Read Load to a Cache: Decrease the load on the RAC cluster by moving read load to a cache. The question would be whether to preload the cache or not. Preloading the cache would completely remove read load from the database but the cache would need to be large and expensive. A more cost effective option would be to not preload the cache but instead use a simple in process caching scheme that did ‘read-through’ – that is to say that the first read gets the value from the database but subsequent ones only hit the cache. Coherence would be needed for the first solution but noit the second (could use other open source cache)
  4. Move Write Load to a Messaging System: The above would significantly decrease the load on the RAC cluster but this may not be enough. To scale further we need to address write performance. Of course we could make the RAC cluster bigger, i.e. the brute force approach, but scaling RAC will produce a well below linear performance improvement as machines are added, particularly when writes predominate. Hence the best option is to decouple load from the database via some disk based resource. Do this with a store and forward messaging system such as Tibco EMS. EMS will provide a lower latency, higher bandwidth solution that will buffer the DB and provide on-disk queuing should load get too high. As an aside you can actually do this in Coherence too if you are using that. There is a pattern for store and forward messaging published here:

Comments on the Technologies Suggested:

Question 4: Worked Answer:

This question is quite open ended so this I just one possible solution for you to think about.

Assuming the application will have preloaded the market data that the user is interested in (either current or some previous snapshot):

(a) Perturb trade parameters

If (pricing strategy is simple) => local computation

else => send to server

(b) Perturb market data

If (pricing strategy is simple & single trade pricing request) => local computation

else => send to server

Due to the scaling requirements the compute grid would be the only sensible place to perform complex pricing routines such as this (Long running routines such as Monte Carlo simulations are bound to be used so a Cache only solution is not appropriate).

Next we want to pre-cache as much data as possible on different engines.

For the case of long running pricing routines that are based on Monte Carlo simulations the desire would be to parallelise each “path” calculation. To get minimum latency on such computations it would be ideal to utilise the full grid meaning all engines would need the same market data (for the same identifier). However it is unlikely that all engines could pre-cache all market data (as the data size would be too great). The options are thus to either:

a) Load the data from the cache for the appropriate CCY on all engines and perform the pricing routines.

b) Split engines so that different engines have different types of market data pre-cached. Use a custom discriminator which was sensitive to the market data identifier (CCY probably) to re-route computations to engines that are likely to have that market data pre-cached.

The requirement to be able to model the behaviour of the trader’s complete position, potentially thousands of trades, for sensitivity to market conditions is the generalisation of the single trade. Not only does one need to allow for the behaviour of each trade as market conditions vary but take account of how changes in value (of the trades) might be correlated. Do they reinforce each other or does a gain in one offset a loss in another? The result of the initial calculations on the individual trades is an intermediate dataset required to compute the overall results. The factors that determine the size of the intermediate results set would include the following;

Which do you use? This depends on the market data size i.e. is the extra time needed to load the data for a certain market data identifier to all engines greater than the penalty paid by pricing on the subset of engines assigned to a single identifier. This is a function of how complex the pricing

Have your say

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Safari hates me
IMPORTANT! To be able to proceed, you need to solve the following simple problem (so we know that you are a human) :-)

Add the numbers ( 10 + 10 ) and SUBTRACT two ?
Please leave these two fields as-is:

Talks (View on YouTube)