Thoughts on Big Data Technologies (1)

It may not have been its intention, but the undercurrent of the NoSQL movement seems to be something of a two-finger salute to the apathy of the database community. A community that was once the height of technological innovation seems to have sat on its laurels in recent years, propped up by the lucrative support contracts of its corporate dependents. NoSQL and the Big Data trends have breathed a welcome breath of fresh air through the cobwebbed world of rows and columns, forcing a rethink about how data should be stored and accessed.  Data has become sexy again!

But the marketing surge that has come with it is something of a mixed blessing. You can barely move for the hype. There are also many misconceptions, particularly around the technology itself. One of the most notable is the misconception that handling large data sets necessitates something that looks like Hadoop. At the other end of the spectrum the ‘big three’ database vendors are touting products that look very much like they products of ten years ago. Big Data seems to have caught them off guard and they seem to be floundering somewhat, pinning Big Data emblems to their existing products without really rethinking their the approach to problem in the context of today’s world.

This apathy is clearly evidenced by the host of upstart database and NoSQL technologies that have achieved market penetration. The database market is not an easy one to get into. It is an oligopoly, in economic terms: A market dominated by a small number of key players. The barrier to entry is high, making it hard for smaller companies to penetrate. The products are similar, largely interchangeable and no one vendor has total monopolistic control. In fact many markets end up in this state. Mobile technology and service provision, oil, airlines etc. The database industry is one of these too and has been for twenty years. The fact that a series of fledgling brands are gaining real traction in a market like this is a sign that the mainstream is lagging behind the curve.

Their ‘way in’ has been products that pander to subtly different use-cases. Some sold as databases, some NoSQL stores, some BigData. The categories are starting to blur but the general theme favours simplicity and scalability over more traditional worries about keeping data safe and consistent.  Clayton M. Christensen might well term these new approaches as disruptive; innovation driving new markets and value networks, in the process forcing the base market to change, or even be replaced. Whatever they are, they are bringing change.

Certainly if you are building a system today you will likely consider more than the core products from the top three database vendors, even if it is just to probe what the whole NoSQL and Big Data movement really is. But if you find the current breadth of choice confusing you are not alone. With the NoSQL and Relational fields taking very different approaches, each having a variety of upstarts that specialise further, there is a huge array of choice and it’s very hard to cut through the marketing spiel to where and why we might find these different technologies useful. We are bombarded by terminology: NoSQL, MapReduce, Big Data, Hadoop, ACID, BASE, Eventual Consistency, Shared Nothing Architectures, OLAP, OLTP, BI, MPP, Column Orientation … the list goes on. It has become downright c0nfuZ1nG.

This is the first in a set of articles that will drill a little further into the history of BigData, where it has come from and what it is for  (for a comprehensive treatment of the various technologies look elsewhere – like the NoSQL resource here).  The focus here will be on what ‘big’ really means and how data size affects your ability to process different types of data… a little on the field’s history… some disruption… a peek past the marketing blurb and ‘Big Data Mania’ to examine why you might choose one approach over the other.


About