The Data Dichotomy

Data Systems are about exposing data, Services are about hiding it.

Dec 14th, 2016

Elements of Scale: Composing and Scaling Data Platforms

This transcribed talk explores a range of data platforms through a lens of basic hardware and software tradeoffs.

Apr 28th, 2015

Log Structured Merge Trees

A detailed look at the interesting LSM file organisation seen in BigTable, Cassandra and most recently MongoDB

Feb 14th, 2015

Building a Career in Technology

I did a short talk to some young technologists about about their career path in technology. These are my notes.

Jan 2nd, 2015


Handling GDPR: How to make Kafka Forget
Dec 4th, 2017

If you follow the press around Kafka you’ll probably know it’s pretty good at tracking and retaining messages, but sometimes removing messages is important too. GDPR is a good example of this as, amongst other things, it includes the right to be forgotten. This begs a very obvious question: how do you delete arbitrary data from Kafka? It’s an immutable log after all. (more…)

Posted at Dec 4th |Filed Under: Blog - read on

What could academia or industry could do (short or long term) to promote more collaboration?
Oct 14th, 2017

I did a little poll of friends and colleagues about this question. Here are some of the answers which I found quite thought provoking:


Posted at Oct 14th |Filed Under: Blog - read on

Delete Arbitrary Messages from a Kafka
Oct 6th, 2017

I’ve been asked a few times about how you can delete messages from a topic in Kafka. So for example, if you work for a company and you have a central Kafka instance, you might want to ensure that you can delete any arbitrary message due to say regulatory or data protection requirements or maybe simple in case something gets corrupted.

A potential trick to do this is to use a combination of (a) a compacted topic and (b) a custom partitioner (c) a pair of interceptors.

The process would follow:

  • Use a producer interceptor to add a GUID to the end of the key before it is written.
  • Use a custom partitioner to ignore the GUID for the purposes of partitioning
  • Use a compacted topic so you can then delete any individual message you need via producer.send(key+GUID, null)
  • Use a consumer interceptor to remove the GUID on read.

Two caveats: (1) Log compaction does not touch the most recent segment, so values will only be deleted once the first segment rolls. This essentially means it may take some time for the ‘delete’ to actually occur. (2) I haven’t tested this!


Posted at Oct 6th |Filed Under: Blog, Kafka/Confluent - read on

Slides Kafka Summit SF – Building Event-Driven Services with Stateful Streams
Aug 28th, 2017

Posted at Aug 28th |Filed Under: Blog - read on

Devoxx 2017 – Rethinking Services With Stateful Streams
May 12th, 2017

Posted at May 12th |Filed Under: Blog - read on

View full blogroll