Cloud computing is now growing at a very fast pace for over 4 years now. Strictly related to the increased adoption of cloud technologies there is also an increasing interest in software as a service, as companies of all sizes around the world realized the benefits of paying a subscription for the software they use. In these perspective, software development companies and independent developers around the world build now multi tenant applications. However, multi tenant applications are a tricky from the planning phase, since customer data needs to be strictly isolated, the application itself must be highly available and easily scalable. And, as I also briefly mentioned in my previous article, everything starts from the database. That;s why I would like to briefly point out the main SQL design patterns for multi tenant applications.
In practical examples I will refer mostly to Azure SQL databases and the .NET ecosystem, although the main design patterns are still valid for any relational databases you might want to use. A lot of these aspects are described in different Microsoft Azure documentation articles. My goal is to summarize information that is otherwise dissipated in different sources.
Based on my experience there are two different database design patterns that, whereas a hybrid approach is also possible. These main design patterns are: one database per tenant and a sharded database approach where information generated by several tenants is stored in the same shard of a database.
1. Database per tenant
As a concept, this is not very complicated. You need to create one database for each tenant. In Azure, this is something that can be highly automated using Azure Automation. Here’s also an overview of this pattern:
So why do we need the catalog database? Well, as a developer imagine that if Customer A wants to access some information, you would need a database specific connection string. If customer B wants to access some information, you need a different connection string. If you have 100 customer, than it will be really a huge pain to programmatically play wit connection strings or environment variables and so on. That’s where the catalog database comes in. You can use this database to hold information and mappings to each database. And developers can use the Elastic Database Client Library to manage all this information. A major point here is that we will need only one connection string, the one for the catalog database.
A major advantage for this pattern is that we can have a total isolation between the data generated by the different tenant, which reduces the risk of disclosing information about tenant A to tenant B and so on. A drawback for this model is the higher cost and most probably a lower profit margin. However, here we can use Azure Elastic Pools. The main idea of elastic pool is that you pay for a pool of resources that is shared among all databases that belong to the elastic pool. This makes the application also scalable to usage peaks, because you have to scale up only the elastic pool, making more resources available to the databases that are part of the pool. With elastic pools, the entire pattern will look similar to this:
2. Sharded database approach
In a certain way, this pattern is similar to the previous one, only that we use different database shards, that contain one or more tenants. A catalog database is still needed for the exact same reason we discussed for the one database per tenant approach. This entire model can be resumed by the following picture:
However, sharding adds complexity to both the design and operational management. A catalog is required in which to maintain the mapping between tenants and databases. In addition, management procedures are required to manage the shards and the tenant population. For example, procedures must be designed to add and remove shards, and to move tenant data between shards. One way to scale is to by adding a new shard and populating it with new tenants. At other times you might split a densely populated shard into two less-densely populated shards. After several tenants have been moved or discontinued, you might merge sparsely populated shards together.
Another drawback is the lack of isolation. It’s true that a logical isolation can still be achieved, however the risks of misplacing information are clearly higher. On the other side, this model has some positive economical advantages which translate into lower costs and higher profit margins.
Some things to know
The easiest way to implement both design patterns programmatically is using Entity Framework and the Elastic Database Client Library (EDCL). I’ve spent some real time till I managed to find out how to implement these patterns since EDCl was the missing puzzle piece. However, beware that the EDCL is not working with .NET Core and Entity Framework Core for now. But there is also some good news: Microsoft and the entire community are already on it. Even if there’s still no official NPM Package, there is a branch where EF Core support is already implemented. You can clone it, deploy it as a local NPM Package and play around with it.
From a devops perspective there is still a lot of knowledge needed to automate atabase deployment, scaling and so on. So if you seriously think about building a multi tenant application, make sure that you have the necessary knowledge on the team.
Overview of the discussed patterns and links to code samples
A deeper description of all these patterns.
Your feedback is highly appreciated so if you want to drop your two cents on this topic, you’re more than welcome!