Shared Nothing v.s. Shared Disk Architectures: An Independent View
The Shared Nothing Architecture is a relatively old pattern that has had a resurgence of late in data storage technologies, particularly in the NoSQL, Data Warehousing and Big Data spaces. As architectures go it’s there are fairly dramatic performance tradeoffs across the two. This article contrasts Shared Nothing with Shared Disk Architectures, which is largely equivalent to the tradeoffs between sharding and replication.
Shared Disk and Shared Nothing?
Shared Nothing is a data architecture for distributed data storage in a clustered environment. Data is partitioned in some manner and spread across a set of machines with each machine having sole access, and hence sole responsibility, for the data it holds.
By comparison Shared Disk is exactly what it says; disk accessible from all cluster nodes. These two architectural styles are described in the two figures. In the Shared Nothing Architecture notice how there is complete segregation of data i.e. it is owned solely by a single node. In the Shared Disk any node can access any piece of data and any single piece of data has no dedicated owner.
Understanding the Trade-offs for Writing
When persisting data in a Shared Disk architecture writes can be performed against any node. If node 1 and 2 both attempt to write a tuple then, to ensure consistency with other nodes, the management system must either use a disk based lock table or else communicated their intention to lock the tuple with the other nodes in the cluster. Both methods provide scalability issues. Adding more nodes either increases contention on the lock table or alternatively increases the number of nodes over which lock agreement must be found.
To explain this a little further consider the case described by the below diagram. The clustered Shared Disk database contains a record with PK = 1 and data = foo. For efficiency both nodes have cached local copies of record 1 in memory. A client then tries to update record 1 so that ‘foo’ becomes ‘bar’. To do this in a consistent manner the DBMS must take a distributed lock on all nodes that may have cached record 1. Such distributed locks become slower and slower as you increase the number of machines in the cluster and as a result can impede scalability.
The other mechanism, locking explicitly on disk, is rarely done in practice in modern systems as caching is so fundamental to performance.
However the Shared Nothing Architecture does not suffer from this distributed locking problem, assuming that the client is directed to the correct node (that is to say a client writing ‘A’, in the figure above, directs that write at Node 1) , the write can flow straight though to disk with any lock mediation performed in memory. This is because only one machine has ownership of any single piece of data, hence by definition there only ever needs to be one lock.
Thus Shared Nothing Architectures can scale linearly from a write perspective without increasing the overhead of locking data items, because each node has sole responsibility for the data it owns.
However Shared Nothing will still have to execute a distributed lock for transactional writes that span data on multiple nodes (i.e. a distributed two-phase commit). These are not as large an impedance on scalability as the caching problem above, as they span only the nodes involved in the transaction (as apposed to the caching case which spans all nodes), but they add a scalability limit none the less (and they are also likely to be quite slow when compared to the shared disk case).
So Shared Nothing is great for systems needing high throughput writes if you can shard you data and stick clear of transactions that span different shards. The trick for this is to find the right partitioning strategy, for instance you might partition data for a online banking system such that all aspects of a user’s account are on the same machine. If the data set can be partitioned in such a way that distributed transactions are avoided then linear scalability is at your fingertips.
In summary, Shared Disk Architectures are write limited as locks must be coordinated across the cluster. Writes to Shared Nothing architectures are limited should they require a distributed two phase commit.
The counter, from the Shared Disk camp, is that they can use partitioning too. Just because the disk is shared does not mean that data can’t be partitioned logically with different nodes servicing different partitions. There is some truth to this, assuming you can set up your architecture so that write requests are routed to the correct machine, as this tactic will reduce the amount of lock (or block) shipping taking place (and is exactly how you optimise Oracle RAC). However the cache coherence issue is fundamental and hence still exists: The cache on each node can contain data from any part of the shared disk and hence committing a transaction means ensuring that the all cached copies of potentially affected data have been flushed, this is often the limit to scalability.
Considering the Retrieval of Data
The retrieval of data in these architectures suffers from different constraints to writes. Looking firstly Shared Disk we find it to have two significant drawbacks when compared with Shared Nothing:
The first is the potential for resource starvation, most notably disk contention on the SAN/NAS drives. This can be combated, to a certain extent, through the use of partitioning but then the Shared Disk architecture starts to suffer from the other downside of Shared Nothing; the data must be physically partitioned in advance.
The second issues is that caching is less efficient due to the scope of each cache. Unlike the Shared Nothing architecture, where queries for a certain data set will be consistently directed to the node that owns the data, Shared Disk architectures will spread query load across all nodes. This increases the data churn in the cache on each node and hence caching is less effective (a better way to describe this may be that in Shared Disk each cache must serve the whole data set, in Shared Nothing each cache must only serve 1/n of the data where n = number of nodes).
Shared Nothing has just the one major flaw, but it can be a serious one: Shared Nothing works brilliantly if the query is self sufficient i.e. it can complete entirely on one node or through an efficient parallel processing pattern like MapReduce. However there will inevitably come a time when data, from multiple nodes, must be operated on. Such cases require that the data, that will not included in the final result, be shipped from one node to another and thus degrades query performance.
The reality is that the number of queries requiring data shipping will depend on the use case and the partitioning strategy and in many cases can be minimised or eliminated (for example commercial search engines). However, for general business cases , for example ones with related fact tables, some data shipping is inevitable. In some cases this can be quite crippling, for example the ‘Complex Case’ discussed here. This is why all self respecting Shared Nothing solutions today require the use of at least a 10GE network and todays fast netwoks serve them well. Five years ago it was a far (a.k.a. ten times worse) problem.
There is one additional issue with reads in a Shared Nothing architecture. Locating unindexed data (i.e where the partitioning key can not be leveraged) requires sending the query to all machines (partitions)). This presents a limit to the scalability of the architecture. For example adding more users will increase the number of such queries and this, in turn, will increase the number of queries that each machine must service (This being independent of how many machines you have). This becomes prohibitive as the database scales as each query will contribute a base latency to every node.
The retort is that the problem can be managed by reducing the amount of data stored per machine so that each query is faster and hence the average load per query is reduced. This being achieved by adding more machines to the cluster but keeping the total disk usage constant. The problem is there is always a base latency, an overhead on every machine. The reality is that such queries present problems for both architectures so to really scale you need a combination (For example MongoDB uses shared nothing and data replication to achieve the benefits of both of these architectures – replication being similar to the trade offs of shared disk)
So Which Should You Use?
If you are Google or Amazon then you will inevitably choose scalability over consistency and use a shared nothing architecture. If you are a business system that is never going to need more than two or three servers then the complexities of partitioning a complex domain model will be prohibitive and should favour the Shared Disk route.
These are the two ends of the spectrum and most of us are likely to lie somewhere in between. The other alternative is to blend the two. Shared disk is very similar conceptually to replication and balancing these patterns for read and write workloads can be powerful. For example the use of replica sets in MongoDB. If that’s you you may find it useful to consider the following three points:
- Do you require complex transactions containing multiple entities or can you manage with a simpler, less transactional model?
- Can you naturally partition your data such that queries can be node sufficient? How predictable are your queries (i.e. will your partitioning strategy have to change regularly)?
- Do you require extreme scalability (10+ machines)?
To look into this issue a little further there are three papers that are particularly good. The first is the seminal work of Michael Stonbraker back in the early 80′s. Michael was one of the original Shared Nothing evangelists. His paper The Case For Shared Nothing still makes good reading, even if it does skip some issues.
The next two are both excellent, but they are both biased in their own way. The first is Shared-Disk vs. Shared Nothing by the makers of ScaleDB – a Shared Disk database. It eloquently makes the case for Shared Disk and enumerates the downsides of Shared Nothing. However the treatment is biased towards the vendors chosen technology.
The last paper presents the opposite view. How to Build A High Performance Data Warehouse is well written, eloquently mapping the pros and cons of each architecture. However don’t be sucked in by the academic URL. The authors are all affiliated with Vertica and the paper noticeably favours a Shared Nothing Columnar Architecture model, like the one used by Vertica. Never the less it’s a good read.
See also Are Databases a Thing of the Past?