Despite some companies recently gaining a lot of attention for abandoning them, microservices are still a prevalent — and potentially powerful — design architecture.
Given the enormous complexity of applications, though, “loosely coupled services” still means that they can have strict dependencies on one another. Because these relationships rely on data ferried between them, it’s important to have a clear understanding of each service’s database and its characteristics. For example, if one service requires anomaly free inputs, but another service feeds it data from an eventual consistency database, those inconsistencies can cascade into thorny issues.
While there are no panaceas to the problem of sharing data between services, understanding the characteristics of each database will help you make these decisions much more confidently.
Evaluating Databases in a Microservice Context
Microservices have different requirements and pose different challenges than old monolithic applications, so it’s important to understand how a database works with that context in mind. Here’s the factors you should be taking into consideration:
Deployability: How simple is it to deploy and scale the database using DevOps tools like Docker and Kubernetes?
Performance: How easily can you control the database’s performance?
Access Patterns: Which kinds of queries does the database support?
Consistency: Does the database provide a consistent view of its data, or does it allow for anomalies?
Availability: How does the database handle machine failures?
Traditional SQL Databases
Even into the early 2000s, traditional relational databases dominated the field. These days they have largely been supplanted by more scalable technologies.
Examples: MySQL, PostgreSQL
Deployability: These databases are simple to deploy, but because they only run on a single machine, you can only scale them by getting newer, bigger machines.
Response Time: Their response times cannot be easily improved because they must run on a single machine, which is slow for users who are geographically distant. To improve throughput, you must get totally new hardware.
Access Patterns: Using a relational data model gives you good performance for most access patterns, including complex joins across tables.
Consistency: Traditional RDBMSs only run on one machine, so they easily offer strong consistency.
Availability: These databases represent single points of failure in your application; if it goes down, so will all services that depend on it.
Traditional SQL databases are sufficient for small but non-critical services that need to prioritize consistency over availability.
NoSQL databases are built specifically for scalable applications, which make them well suited for many microservices. However, given their limited access patterns and allowance of database anomalies, they aren’t always the best choice.
Examples: MongoDB, Cassandra
Deployability: Standing up and scaling NoSQL clusters is easy and can be done in many different ways — for example, with either one cluster per service, or one large cluster with individual services partitioned by a namespace. They also work well with most infrastructure tools.
Access Patterns: NoSQL databases work best when they only need to access one document at a time — for example a social media post, which might contain its contents and its comments. For complex joins, they’re not a great choice.
Response Time: Because you can easily scale NoSQL, you can make sure it’s placed close to users and has sufficient hardware to handle all of your service’s requests. However, if a node containing a document is down or far away, response times can still be slow.
Availability: NoSQL favors availability above all else, requiring little more than a single node being active. While any individual piece of data might be unavailable, the entire service will remain up and active. This can provide pretty incredible uptime.
Consistency: NoSQL makes a very conscious tradeoff: forgoing consistency for availability. This means that most NoSQL databases offer “eventual consistency,” but that is somewhat misleading because it makes it sound like values will eventually converge in a consistent state. Unfortunately, with many nodes being able to accept writes, that isn’t necessarily the case.
It’s best to choose NoSQL databases when you need uptime, but the data isn’t mission critical and can handle potential anomalies.
Databases as a Service (DBaaS)
Over the past few years, infrastructure companies have expanded their services to include databases. The obvious advantage is that someone else takes over the infrastructure for you. It also means you give up some flexibility in tuning your service to work exactly as you want.
Examples: Amazon DynamoDB, Google Cloud SQL
Deployability: In this scenario, you’re paying someone to handle the infrastructure, which makes things dramatically simpler.
One downside to this, though, is that you’re limited to only using the services where the vendor offers them––which might not be in every region––and can limit your flexibility.
Access Patterns: This depends on the underlying database technology. Most services map to either a SQL or NoSQL paradigm, so the above descriptions of the access patterns still apply.
Response Time: This is again dependent on the underlying technology, but also requires you to be conscientious of the service’s geography. For example, if the service uses active-passive replication (i.e., it requires all writes to go to a single primary node), writes for users who are far from it will be slow.
Availability: DBaaS platforms have historically provided great uptime as ensuring the service being available is a top priority for the platform.
Consistency: This, again, depends on the underlying technology.
DBaaS is ideal when you don’t have the expertise or the time to run a service’s database.
Examples: CockroachDB, Google Cloud Spanner
Deployability: NewSQL databases are built to scale, so they’re easy to deploy and easy to add capacity. They also work well with orchestration tools like Kubernetes.
Note that Spanner is a DBaaS platform, so it’s a bit of a hybrid in this category.
Access Patterns: Because they use relational schemas, NewSQL offers access patterns that are similar to traditional RDBMS systems — both flexible and powerful for many query types.
Response Times: Because NewSQL databases are built to be widely distributed, there are not intrinsic guarantees that data is localized near the service — a query in one zone might need to fetch data from another. However, some NewSQL databases offer features to help mitigate these issues.
Availability: NewSQL databases are highly available, and will continue handling requests as long as a majority of the data is available. This isn’t quite as robust as NoSQL, which requires only a bare minimum of nodes to be online, but is a conscious trade-off for consistency.
Consistency: This is the primary strength of NewSQL databases. They’re both widely and easily deployable (similar to NoSQL), while offering strong consistency (like a traditional RDBMS). This consistency ensures that there are no anomalies in your data, which is valuable in a microservices context.
NewSQL databases great choices when you have mission-critical services that you need to scale, such as customer’s financial transactions.
Microservices are complex environments, with many moving, connected pieces. With a little more context about how each type of database operates, we hope you’ll be able to make the right choice for your own microservices.
Feature image via Pixabay.
InApps Technology is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.