Database Y

A popular essay looking at today’s database, NoSQL and Big Data markets & thoughts on what will come next.

Posted at Nov 22nd |Filed Under: Distributed Data Storage, Top4 - read on

Big Data: An Epic Essay

A longer, retrospective look at Big Data: where it came from and the forces that shaped it.

Posted at Jul 28th |Filed Under: Analysis, Top4 - read on

A Story about George

A lighthearted look at Big Data using a metaphorical format. The style won’t suit everyone, but it’s meant to be fun :)

Posted at Jun 3rd |Filed Under: Analysis, Top4 - read on

Coherence Part I: An Introduction

An introduction to how you can store and process data in this unique technology.

Posted at Mar 4th |Filed Under: Coherence, Top4 - read on

Blog/News



Transactions in KV stores

​Something close to my own heart – interesting paper on lightweight milti-key transactions for KV stores.

http://hyperdex.org/papers/warp.pdf

Posted at Feb 25th |Filed Under: Blog - read on


Scaling Data Slides from EEP

Posted at Feb 4th |Filed Under: Blog, Uncategorized - read on


A little bit of Clojure

Slides for today’s talk at RBS Techstock:

Posted at Nov 15th |Filed Under: Blog - read on


Slides from JAX London

Similar name to the Big Data 2013 but a very different deck:

Posted at Nov 1st |Filed Under: Blog - read on


The Return of Big Iron? (Big Data 2013)

Posted at Mar 27th |Filed Under: Blog - read on


Slides from Advanced Databases Lecture 27/11/12

The slides from yesterday’s guest lecture on NoSQL, NewSQL and Big Data can be found here.

Posted at Nov 28th |Filed Under: Blog - read on


Big Data & the Enterprise

Slides from today’s European Trading Architecture Summit 2012 are here.

Posted at Nov 22nd |Filed Under: Blog - read on


Problems with Feature Branches

Over the last few years we’ve had a fair few discussions around the various different ways to branch and how they fit into a world of Continuous Integration (and more recently Continuous Delivery). It’s so fundamental that it’s worth a post of its own!

Dave Farley (the man that literally wrote the book on it) penned a the best advice I’ve seen on the topic a while back. Worth a read, or even a reread (and gets better towards the end).

http://www.davefarley.net/?p=160 (in case dave’s somewhat flakey site is down again the article is republished here)

Posted at Nov 10th |Filed Under: Blog - read on


Where does Big Data meet Big Database


InfoQ published the video for my Where does Big Data meet Big Database talk at QCon this year.

Thoughts appreciated.

Posted at Aug 17th |Filed Under: Blog - read on


A Brief Summary of the NoSQL World

James Phillips (co-founder of Couchbase) did a nice talk on NoSQL Databases at QCon:

Memcached – the simplest and original. Pure key value store. Memory focussed

Redis – Extends the simple map-like semantic with extensions that allow the manipulation of certain specific data structures, stored as values. So there are operations for manipulating values as lists, queues etc. Redis is primarily memory focussed.

Membase – extends the membached approach to include persistence, the ability to add nodes, backup’s on other nodes.

Couchbase – a cross between Membase and CouchDB. Membase on the front, Couch DB on the back. The addition of CouchDB means you can can store and reflect on more complex documents (in JSON). To query Couchbase you need to write javascript mapping functions that effectively materialise the schema (think index) so that you can create a query model. Couchbase is CA not AP (i.e. not eventually consistent)

MongoDB – Uses BSON (binary version of JSON which is open source but only really used by Mongo). Mongo unlike the Couchbase in that the query language is dynamic: Mongo doesn’t require the declaration of indexes. This makes it better at adhoc analysis but slightly weaker from a production perspective.

Cassandra – Column oriented, key value. The value are split into columns which are pre-indexed before the information can be retrieved. Eventually consistent (unlike Couchbase). This makes it better for highly distributed use cases or ones where the data is spread over an unreliable networks.

Neo4J – Graph oriented database. Much more niche. Not distributed.

There are obviously a few more that could have been covered (Voldemort, Dynamo etc but a good summary from James none the less)

Full slides/video can be found here.

Posted at Aug 11th |Filed Under: Blog - read on


Looking at Intel Xeon Phi (Kinghts Corner)

Characteristics:

  • Intel’s new MIC ‘Knights Corner’ coprocessor (in the Intel Xeon Phi line) is targeted at the high concurrency market, previously dominated by GPGPUs, but without the need for code to be rewritten into Cuda etc (note Knights Ferry is the older prototype version).
  • The chip has 64 cores and 8GBs of RAM with a 512b vector engine. Clock speed is ~ 1.1Ghz and have a 512k L1 cache. The linux kernel runs on two 2.2GHZ processors.
  • It comes on a card that drops into a PCI slot so machines can install multiple units.
  • It uses a MESI protocol for cache coherence.
  • There is a slimmed down linux OS that can run on the processor.
  • Code must be compiled to two binaries, one for the main processor and one for Knights Corner.
  • Compilers are currently available only for C+ and Fortran. Only Intel compilers at present.
  • It’s on the cusp of being released (Q4 this year) for NDA partners (though we – GBM – have access to one off-site at Maidenhead). Due to be announced at the Supercomputing conference in November(?).
  • KC is 4-6 GFLOPS/W – which works out at 0.85-1.8 TFLOPS for double precision.
  • It is expected to be GA Q1 ‘13.
  • It’s a large ‘device’ the wafer is a 70mm square form-factor!
  • Access to a separate board over PCI is a temporary step. Expected that future versions will be a tightly-coupled co-processor. This will also be on the back of the move to the 14nm process.
  • A single host can (depending on OEM design) support several PCI cards.
  • Similarly power-draw and heat-dispersal an OEM decision.
  • Reduced instruction set e.g. no VM support instructions or context-switch optimisations.
  • Performance now being expressed as GFlops per Watt. This is a result of US Government (efficiency) requirements.
  • A single machine is can go faster than a room-filling supercomputer of ‘97 – ASIC_Red!
  • The main constraint to doing even more has been the limited volume production pipeline.
  • Pricing not announced, but expected to be ‘consistent with’ GPGPUs.
  • Key goal is to make programming it ‘easy’ or rather: a lot easier than the platform dedicated approaches or abstraction mechanisms such as OpenCL.
  • Once booted (probably by a push of an OS image from the main host’s store to the device) it can appear as a distinct host over the network.

Commentary:

  • The key point is that Knights Corner provides most of the advantages of a GPGPU but without the painful and costly exercise of migrating software from one language to another (that is to say it is based on the familiar x86 programming model).
  • Offloading work to the card is instructed through the offload pragma or offloading keywords via shared virtual memory.
  • Computation occurs in a heterogeneous environment that spans both the main CPU and the MIC card which is how execution can be performed with minimal code changes.
  • There is a reduced instruction set for Knights Corner but the majority of the x86 instructions are there.
  • There is support for OpenCl although Intel are not recommending that route to customers due to performance constraints.
  • Real world testing has shown a provisional 4x improviement in throughput using an early version of the card running some real programs. However results from a sample test shows perfect scaling.  Some restructuring of the code was necessary. Not huge but not insignificant.
  • There is currently only C++ and Fortran interfaces (so not much use if you’re running Java or C#)
  • You need to remember that you are on PCI Express so you don’t have the memory bandwidth you might want.

References:

Other things worth thinking about:

http://www.altera.com/

Thanks to Mark Atwell  for his help with this post.

Posted at Aug 9th |Filed Under: Blog - read on


Progressive Architectures at RBS

Michael Stal wrote a nice article about the our Progressive Architectures talk from this year’s QCon. The video is up too.

Read the article  here.

Watch the video here.

A big thanks to Fuzz, Mark and Ciaran for making this happen.

Posted at Jul 6th |Filed Under: Blog - read on


Harvey Raja’s ‘Pof Art’ Slides

I really enjoyed Harvey’s ‘POF Art’ talk at the Coherence SIG. Slides are here if you’re into that kind of thing POF-Art.

Posted at Jun 15th |Filed Under: Blog - read on


Simply Being Helpful?

What if, more than anything else, we valued helping each other out? What if this was the ultimate praise, not the best technologists, not an ability to hit deadlines, not production stability. What if the ultimate accolade was to consistently help others get things done? Is that crazy? It’s certainly not always natural; we innately divide into groups, building psychological boundaries. Politics erupts from trivial things. And what about the business? How would we ever deliver anything if we spent all our time helping each other out? But maybe we’d deliver quite a lot.

If helping each other out were our default position wouldn’t we be more efficient? We’d have less politics, less conflict, fewer empires and we’d spend less money managing them.

We probabably can’t change who we are. We’ll always behave a bit like we do now. Conflict will always arise and it will always result in problems, we all have tempers, we play games, we frustrate others and retort to the slights and injustices.

But what if it was simply our default position. Our core value. The thing we fall back on. It wouldn’t change the world, but it might make us a little bit more efficient.

… right back to the real world

Posted at May 30th |Filed Under: Blog - read on


Valve Handbook

Valve handbook. Very cool:

http://newcdn.flamehaus.com/Valve_Handbook_LowRes.pdf

Posted at May 16th |Filed Under: Blog - read on


Welcome Jon ‘The Gridman’ Knight

Jon ‘The Gridman’ Knight has finally dusted off his keyboard and entered the blogsphere with  fantastic  post on how we implement a reliable version of Coherence’s putAll() over here on ODC. One to add to your feed if you are interested in all things Coherence.

http://thegridman.com/coherence/coherence-alternative-putall-2/

Posted at Jan 24th |Filed Under: Blog - read on


Interesting Links Dec 2011

Hardware

FPGA

High Performance Java

Distributed Data Storage

Interesting:

Posted at Dec 31st |Filed Under: Blog, Links - read on


Interesting Links Oct 2011

High Performance Java

Distributed Data Storage:

Distributed Computing:

Coherence related:

Just Interesting:

Posted at Oct 25th |Filed Under: Blog, Links - read on


Slides for Financial Computing course @ UCL

Posted at Oct 23rd |Filed Under: Blog, Talks - read on


Fast Joins in Distributed Data Grids @JavaOne

Here are a the slides for the talk I gave at JavaOne:

Balancing Replication and Partitioning in a Distributed Java Database

This session describes the ODC, a distributed, in-memory database built in Java that holds objects in a normalized form in a way that alleviates the traditional degradation in performance associated with joins in shared-nothing architectures. The presentation describes the two patterns that lie at the core of this model. The first is an adaptation of the Star Schema model used to hold data either replicated or partitioned data, depending on whether the data is a fact or a dimension. In the second pattern, the data store tracks arcs on the object graph to ensure that only the minimum amount of data is replicated. Through these mechanisms, almost any join can be performed across the various entities stored in the grid, without the need for key shipping or iterative wire calls.

See Also

Posted at Oct 5th |Filed Under: Blog, Talks - read on


JavaOne

I’m heading to JavaOne in October to talk about some of the stuff we’ve been doing at RBS. The talk is entitled “Balancing Replication and Partitioning in a Distributed Java Database”.

Is anyone else going?

Posted at Aug 9th |Filed Under: Blog - read on


Interesting Links July 2011

Because the future will inevitably be in-memory databases:

Other interesting stuff:

Posted at Jul 20th |Filed Under: Blog, Links - read on


A better way of Queuing

The LMAX guys have open-sourced their Disruptor queue implementation. Their stats show some significant improvements (over an order of magnitude) over standard ArrayBlockingQueues in a range of concurrent tests. Both interesting and useful.

http://code.google.com/p/disruptor/

Posted at Jun 27th |Filed Under: Blog - read on


QCon Slides/Video: Beyond The Data Grid: Coherence, Normalization, Joins and Linear Scalability

The slides/video from the my talk at QCon London have been put up on InfoQ.

http://www.infoq.com/presentations/ODC-Beyond-The-Data-Grid

Posted at Jun 17th |Filed Under: Blog - read on


The NoSQL Bible

An effort well worthy of it’s own post: http://www.christof-strauch.de/nosqldbs.pdf

Posted at Apr 27th |Filed Under: Blog - read on


QCon Slides

Thanks to everyone that attended the talk today at QCon London. You can find the slides here. Hard copies here too: [pdf] [ppt]

Posted at Mar 9th |Filed Under: Blog - read on


Interesting Links Feb 2011

Thinking local:

Thinking Distributed:

Posted at Feb 20th |Filed Under: Blog, Links - read on


QCon 2011

Just a little plug for the 5th annual QCon London on March 7-11, 2011. There is a bunch of cool speakers inlcuding Craig Larman and Juergen Hoeller as well as the obligitory set of Ex-TW types. I’ll be doing a session on Going beyond the Data Grid.

You can save £100 and give £100 to charity is you book with this code: STOP100

Posted at Jan 11th |Filed Under: Blog - read on


Interesting Links Dec 2010

More discussions on the move to in memory storage:

Posted at Jan 3rd |Filed Under: Blog, Links - read on


Talk Proposal: Managing Normalised Data in a Distributed Store

I’ve been working on a medium sized data store (around half a TB) that provides high bandwidth and low latency access to data.

Caching and Warehousing techniques push you towards denormalisation but this becomes increasingly problematic when you move to a highly distributed environment (certainly if the data is long lived). We’ve worked on a model that is semi normalised whilst retaining the performance benefits associated with denormalisation.

The other somewhat novel attribute of the system is its use of Messaging as a system of record.

I did a talk abstract, which David Felcey from Oracle very kindly helped with, which describes the work in brief. You can find it here.

I’ll also be adding some more posts in the near future to flesh out how this all works.

Posted at Nov 14th |Filed Under: Blog - read on


Submissions being accepted for RefTest @ ICSE

Submissions are being accepted for RefTest at IEEE International Conference on Testing, Verification and Validation.

Submissions can be short (2 page) or full length conference papers. The deadline in Jan 4th 2011.

Full details are here.

Posted at Nov 13th |Filed Under: Blog - read on