This transcribed talk explores a range of data platforms through a lens of basic hardware and software tradeoffs.
Streams have many benefits, from promoting reactive architecture and asynchronicity to bridging operational and analytic worlds. This post explores how.
Essay exploring whether products like MongoDB are viable threats to the incumbent database vendors
A lighthearted look at Oracle & Google using a metaphorical format. The style won’t suit everyone, but it’s a bit of fun!
- CodeMesh-2015: Contemporary Approaches to Data at Scale (video)
- Øredev-2015: The Future of Data Technology (video)
- JAXLondon-2015: Intuitions for Scaling Data-Centric Architectures
- ProgsCon/JAXF-2015: Elements of Scale (video)
- RBS-2014: Scaling Data
- BigDataCon-2013: The Return of Big Iron?
- JAX-2013: The Return of Big Iron?
- QCon-2012: Where Big Data meets Big Database (video)
- QCon-2012: Progressive Architectures at RBS (video)
- JavaOne-2011: Balancing Replication and Partitioning in a Distributed Java Database
- QCon-2011: Beyond the Data Grid (video)
- UCL-2011: A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access
- CoSIG-2011: Oracle Coherence Implementation Patterns (Special Interest Group)
- ICST-2011: Test-Oriented Languages: a new era?
- ICST-2011: Enabling Development Practices in Remote Locations
- Birkbeck-2011: Data Storage for Extreme Use Cases
- RefTest-2010: Has Mocking Gone Wrong?
- RBS-2009: Data Grids with Oracle Coherence
- Brunel-2008: The Architect's Two Hats
- Brunel-2007: Architecture and Design in Industry
- Elements of Scale: Composing and Scaling Data Platforms (2015)
- Upside Down Databases: Bridging the Operational and Analytic Worlds with Streams (2015)
- Log Structured Merge Trees (2015)
- A World of Chinese Whispers (2014)
- Database Y (2013)
- The Big Data Conundrum (2012)
- Where does Big Data meet Big Database? (2012)
- A Story about George (2012)
- The Rebirth of the In-Memory Database (2011)
- Is the Traditional Database a Thing of the Past? (2009)
- Shared Nothing v.s. Shared Disk Architectures: An Independent View (2009)
- Component Software. Where is it going? (2005)
- Do Metrics Have a Place in Software Engineering Today? (2004)
Test Driven Development (all)
- Test Oriented Languages: Is it Time for a New Era? (2011)
- Beyond Stubs: Why We Need Interaction Testing (2010)
- Isolating Functional Units: Why We Need Stubs (2010)
- Are Mocks All They Are Cracked Up To Be? (2010)
Data Tech (all)
- Best of VLDB 2014 (2015)
- A Guide to building a Central, Consolidated Data Store for a Company (2014)
- An initial look at Actian’s ‘SQL in Hadoop’ (2014)
- The Best of VLDB 2012 (2012)
- Thinking in Graphs: Neo4J (2012)
- A Brief Summary of the NoSQL World (2012)
- ODC – RBS’s Distributed Datastore (2012)
- Looking at Intel Xeon Phi (Kinghts Corner) (2012)
Team / Process / Interviewing (all)
- Building a Career in Technology (2015)
- The Iffy Tractor (Can they code OO?) (2011)
- The Business Analyst Test (2011)
- Distributing Skills Across a Continental Divide (2011)
- Learning Practices for Distributed Teams (ICST) (2011)
- Interviewing: The Importance of Examining Applied Knowledge (2010)
- Mapping Personal Practices (2010)
- Four HPC Architecture Questions – With Answers (2009)
Abstract for Code Mesh 2015Jul 20th, 2015
Contemporary Approaches to Data at Scale (tbc)
We use a host of tricks these days for handling data at scale. Disk structures are tuned to specific workloads. Streams are used to create continuous pipelines of processing. Hardware offers incredible diversity in terms of latency and throughput.
The tools available: Cassandra, Postgres, Hadoop, Kafka, Hazelcast, Storm etc all come with tradeoffs unique to themselves. We’ll look at these as individual elements. We’ll also look at compositions that leverage these individual sweet spots to create more powerful, holistic platforms.
Abstracts for Øredev 2015Jul 9th, 2015
The Future of Data Technology (6th Nov 15.40)
No longer does one-size-fit-all when it comes to data technology. At least not for many of today’s use cases. Will this ever change? Will we continue to diversify? Will we go full circle? Certainly ours is an industry in flux. NoSQL, Big Data and stream technology, containerisation, commodity PCIe storage, non-volatile memory and a host of other forces will shape the data technologies of the future.
In this talk will make a case for what the future may look like, what challenges we’ll encounter and how it will likely change the applications we build.
Elements of Scale: Composing And Scaling Data Platforms (5th Nov 14.20)
Today there are a host of data-centric challenges that need more than a single technology to solve. Data platforms step in, blending different technologies to solve a common goal.
But to compose such platforms we need an understanding of the tradeoffs of each constituent part: their sweet spots, how they complement one another and what sacrifices they make in return.
This talk is really a grand tour of these evolutionary forces. We’ll cover a lot of ground, building up from disk formats right through to fully distributed, streaming and batch driven architectures. In the end we should see how these various pieces come together to form a pleasant and useful whole.
Abstract for JAX London 2015Jul 9th, 2015
Intuitions for Scaling Data-Centric Architectures (14th Oct 11.20)
This talk will examine the various intuitions and trade-offs needed to scale a data-centric application or architecture. Building from the fundamentals of data locality, immutability and parallelism, attendees will gain a sense for how fully blown architectures can be sewn together. The result: a balance of real-time storage, streaming and analytics that plays to the relative strengths of different component parts.
Elements of Scale: Composing and Scaling Data PlatformsApr 28th, 2015
This post is the transcript from a talk, of the same name, given at Progscon & JAX Finance 2015.
There is also video also.
As software engineers we are inevitably affected by the tools we surround ourselves with. Languages, frameworks, even processes all act to shape the software we build.
Likewise databases, which have trodden a very specific path, inevitably affect the way we treat mutability and share state in our applications.
Over the last decade we’ve explored what the world might look like had we taken a different path. Small open source projects try out different ideas. These grow. They are composed with others. The platforms that result utilise suites of tools, with each component often leveraging some fundamental hardware or systemic efficiency. The result, platforms that solve problems too unwieldy or too specific to work within any single tool.
So today’s data platforms range greatly in complexity. From simple caching layers or polyglotic persistence right through to wholly integrated data pipelines. There are many paths. They go to many different places. In some of these places at least, nice things are found.
So the aim for this talk is to explain how and why some of these popular approaches work. We’ll do this by first considering the building blocks from which they are composed. These are the intuitions we’ll need to pull together the bigger stuff later on.
Remember the days when people would write entire applications, embedded inside a database? It seems a bit crazy now when you think about it. Imagine writing an entire application in SQL. I worked on a beast like that, very briefly, in the late 1990s. It had a few shell scripts but everything else was SQL. Everything. Suffice to say it wasn’t much fun – you can probably imagine – but there was a slightly perverse simplicity to the whole thing.
So Martin Kleppmann did a talk recently around the idea of turning databases inside out. I like this idea. It’s a nice way to frame a problem that has lurked unresolved for years. To paraphrase somewhat… databases do very cool stuff: caching, indexes, replication, materialised views. These are very cool things. They do them well too. It’s a shame that they’re locked in a world dislocated from general consumer programs.
There are also a few things missing, like databases don’t really do events, streams, messaging, whatever you want to call it. Some newer ones do, but none cover what you might call ‘general purpose’ streams. This means the query-driven paradigm often leaks into the application space. Applications end up circling around centralised mutable state. Whilst there are valid use cases for this, the rigid and synchronous world produced can be counterproductive for many types of programs.
Best of VLDB 2014Mar 8th, 2015
Interesting paper on write ahead logs in persistent in memory media. Recent non-volatile memory (NVM) technologies, such as PCM, STT-MRAM and ReRAM, can act as both main memory and storage. This has led to research into NVM programming models, where persistent data structures remain in memory and are accessed directly through CPU loads and stores. REWIND outperforms state-of-the-art approaches for data structure recoverability as well as general purpose and NVM-aware DBMS-based recovery schemes by up to two orders of magnitude.
Asterix is an academically established hierarchical store. It’s now an Apache Incubator project. It utilises sets of LSM structures, tied transactionally together. Additional index structures can also be formed, for example R-Trees.
As the number of cores increases, the complexity of coordinating competing accesses to data will likely diminish the gains from increased core counts.We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.
View full blogroll