Learn All About Database Sharding
Publisher: Psychz Networks, May 24,2019In the following article, we will understand what Database Sharding is and the purpose of using sharding in various scenarios.
Database Sharding in simple words can be defined as a “shared-nothing” partitioning scheme for big databases stored across numerous servers. Sharding enables a high level of database performance and achieves scalability. To put it simply, Database Sharding is like breaking glass and all the smaller chunks called “shards” are spread across a number of distributed servers.
The following article highlights points like the need for Database Sharding, several techniques used for database partitioning, and the key considerations before you implement sharding.
- What is Database Sharding?
- Database Sharding Methods
- Points to consider before you choose Database Sharding
So what is Database Sharding anyway?
Database sharding is where you
- Take a big monolithic database
- Break it up into small pieces, across many servers
- Run them in parallel
The architecture of a sharded database is comprised of multiple nodes that are deployed across many servers. This will provide continuous uptime operation in the event of any hardware or network failure. When you perform database sharding, the database is divided into smaller chunks and spread across multiple data nodes in the cluster where each node contains and is responsible for its own subset of the data to create a shared-nothing environment.
Unlike the shared-disk clustered database, where data can be accessed from all cluster nodes and thus can cause contention during simultaneous reads and writes, shared-nothing clustered database’s nodes operate on their own subset of data. Data nodes are replicated to provide redundancy, and thus provide high availability and scalability. In a worst-case scenario, if a node, containing a subset data of a table, and the replicated node become unavailable, the other nodes, with a different subset of data, will remain online and available.
Methods used for data sharding
Regarding sharding there are a number of approaches and which one is right depends on a number of factors. There are a few concerns that need to be addressed before choosing the right technique like shared-nothing is that a partitioning scheme must be designed to apportion the data across the nodes of the database. Another challenge arises whenever data must be accessed or modified across multiple partitions. Here we’ll review a survey of five sharding approaches and dig into what factors guide you to each approach.
Algorithmic Sharding
Algorithmically sharded databases use a sharding function to locate data. In this method, the data is distributed by its sharding function only. The challenge with this method is that It doesn’t consider the payload size or space utilization. Hence, you need to ensure that each partition should be similarly sized to uniformly distribute data. Another challenge with this method is that the queries without a partition key require searching every database node.
Algorithmic Sharding can be very tricky because when you try to add new node/server to the database, the older ones each one of them will need a corresponding hash value and most of all, your other entries will need to be remapped to their new hash values before they are migrated to the newly added server. If not done cautiously, neither the new nor the old hashing function will be valid. This will stop any new data writing to the database.
Dynamic Sharding
Dynamic sharding uses external locator service determines the location of entries. This helps in addressing the problems that occur in Algorithmic Sharding. The external locator service provides the location of the shard where the data resides. This provides the ability to relocate users individually, as opposed to large groups of users, from one shard to another to relieve hot spots.
The locator service becomes a single point of contention and failure. Every database operation needs to access it, thus performance and availability are a must. However, locators cannot be cached or replicated simply. Out of date locators will route operations to incorrect databases. Misrouted writes are especially bad — they become undiscoverable after the routing issue is resolved.
Directory-Based Sharding
In this method, you create a lookup table that is responsible for keeping track of shards and the data they hold using a shard key. The lookup table holds a static set of information on where the desired data can be found. The shard key in the lookup table holds the value of the respective row where the data has to be written or to be fetched from. This method is effective and faster than range based sharding because each key is tied directly to its own shard.
If you compared this method with other sharding techniques, you will notice that this one's superior to others in many ways. This method allows you to use Dynamic or Algorithmic systems you want to for sharding and is relatively easier. One must also note that there is also a downside of using this method. The need to connect to the lookup table before every query or write can have a detrimental impact on an application's performance. The vulnerability of the lookup table can make it a single point of failure. And if it fails, it can impact one's ability to write new data or access their existing data.
Points to consider before you choose data sharding
It is highly recommended to improve your hardware infrastructure if you feel that your database is expanding at an exponential rate. But if your hardware has already hit the ceiling then data sharding is the only option you may want to consider. Database sharding can improve the performance of your database and can help you in many ways but it also increases your operational costs. The cost of maintenance, accessing, and processing increases drastically when you implement sharding.
Sharding can be the best possible solution for your business but you must understand that it comes with great complexity and is more vulnerable to failures compared to traditional methods. You must analyze all your options and choose the right one that suits your requirement. We hope that this article has helped you understand the Sharding method conceptually.