When I first started building TiDB with my co-founders, we encountered countless challenges, pitfalls, and critical design choices that could have made or broken the project. To build an enterprise-grade distributed database like TiDB from scratch, we have to constantly make difficult decisions that balance the speed of development with long-term considerations for our customers and our team.
Three years and two big releases later, TiDB 2.0 is now deployed in-production in more than 200 companies. Along the way, our team answered many questions from our users as they evaluated TiDB and other distributed databases. Choosing what to use in your infrastructure stack is an important decision and not an easy one. So I’ve gathered our learning from over the years, and summarized them into the following questions that every engineer should ask when looking at a distributed database.
1. Why Is the Distributed Database Not a Silver Bullet?
There’s no single technology that can be the elixir to all your problems. The database realm is no different. If your data can fit on a single MySQL instance without too much pressure on your server, or if your performance requirement for complex queries isn’t high, then a distributed database may not be a good choice. Choosing to use a distributed database typically means additional maintenance cost, which may not be worthwhile for small workloads. If you need High Availability (HA) for small workloads, MySQL’s master-slave replication model plus its GTID solution might just be enough to get the job done. If not, you can extend it with Group Replication. With a community as active and large as MySQL’s, you can Google pretty much any issue and find the solution. In short, if a single instance database is enough, stick with MySQL. When we first began building TiDB to be MySQL compatible, our goal was never to replace MySQL, but to solve problems that single instance databases inherently cannot solve.
If your data can fit on a single MySQL instance without too much pressure on your server, then a distributed database may not be a good choice.
So when is the right time to deploy a distributed system like TiDB? Like any answer to a hard question, it depends. I don’t want to give a one-size-fits-all answer, but there are signals to look for when deciding whether a distributed database is the right answer. For example, are you:
- Finding yourself thinking about how to replicate, migrate, or scale your database for extra capacity?
- Looking for ways to optimize your existing storage capacity?
- Getting concerned about slow query performance?
- Researching middleware scaling solutions or implementing manual sharding policy?
If you find yourself asking these types of questions on a regular basis, it’s time to consider whether a solution like TiDB can help you.
MySQL and TiDB are not mutually exclusive choices. In fact, we spent an enormous amount of work helping our users continue to use MySQL by building tools to make the migration from MySQL to TiDB seamless. This preserves the option of simultaneously using MySQL for single instance workloads where it shines.
2. Why Separate SQL from Storage?
There are several reasons why you’d want to separate SQL from the underlying storage.
Easy Maintenance. We separated the TiDB platform’s SQL layer (stateless and also called TiDB) from its storage layer (persistent and called TiKV) in order to make deployment, operations, and maintenance more simple. It is one of the most important design choices we made. This may seem counter-intuitive (don’t more components make deployment more complex?). Well, being a DevOps isn’t just about conducting deployment, but also quickly isolating issues, system debugging, and overall maintenance — a modular design really shines in supporting these responsibilities. One example: if you found a bug in your SQL layer that needs an urgent update, a rolling update can be quite time-consuming, disruptive, even risky, if your entire system is fused together and not layered separately. However, if your SQL layer is stateless and separated, an update is easy and doesn’t disrupt other parts of your system.
Better Resource Usage. A modular design is also better for resource allocation and usage. Storage and SQL processing rely on different kinds of computing resources. Storage is heavily dependent on input-output (I/O) and affected by the kind of hardware you use, e.g. PCIe/NVMe/Optane SSDs. SQL processing relies more on CPU power and RAM size. Thus, putting SQL and storage in separate layers help make the entire system more efficient in using the right kind of resources for the right kind of work.
TiDB is designed to support an HTAP (hybrid transactional and analytical processing) architecture, where both OLTP and OLAP workloads can be handled with great performance together. Each type of workload is optimized differently and needs different kinds of physical resources to yield good performance. OLAP requests are typically long, complex queries that need a lot of RAM to run quickly. OLTP requests are short and fast, so must optimize for latency and throughput in terms of OPS (operations per second). Because TiDB’s SQL layer is stateless and separated from storage, it can intelligently determine which kind of workload should use which kind of physical resources by applying its SQL parser and Cost-Based Optimizer to analyze the query before execution.
Development Flexibility and Efficiency. Using a separate key-value abstraction to build a low-level storage layer effectively increases the entire system’s flexibility. It can easily provide horizontal scalability — auto-sharding along the keys is more straightforward to implement than a table with complex schema and structure. A separate storage layer also opens up new possibilities to take advantage of different computing modes in a distributed system.
TiSpark, our OLAP engine that leverages Spark, is one such example, where it takes advantage of our layers design and sits on top of TiKV to read data directly from it without any dependency or interference from TiDB. From a development angle, separation allows multiple programming languages to be used. We chose Go, a highly efficient language to build TiDB and boost our productivity, and Rust, a highly performant systems language to develop TiKV, where speed and performance is critical. If the entire system isn’t modularized, a multiprogramming language approach would be impossible. Our layered architecture allows our SQL team to work in parallel with our storage team, and we use RPC (more specifically gRPC in TiDB) for communications between the components.
3. Why Isn’t Latency the Premier Measuring Stick?
Many people have asked me: can TiDB replace Redis? Unfortunately no, because TiDB is simply not a caching service. TiDB is a distributed database solution that first and foremost supports distributed transactions with strong consistency, high availability, and horizontal scalability, where your data is persisted and replicated across multiple machines (or multiple data centers). Simply put, TiDB’s goal is to be every system’s “source of truth.” But to make all these features happen in a distributed system, some tradeoffs with latency is unavoidable (and any argument to the contrary is defying physics). Thus, if your production scenario requires very low latency, say less than 1ms, you should use a caching solution like Redis! In fact, many of our customers use Redis on top of TiDB, so they can have low latency with a reliable “source of truth.” That way, when the caching layer goes down, there’s a database that’s always on with your data — consistent, available, and with unlimited capacity.
A more meaningful measuring stick of a distributed database is throughput, in the context of latency. If a system’s throughput increases linearly with the number of machines added to the system (more machines, more throughput), while latency is holding steady, that is the true sign of a robust distributed database. Some of our production users are already processing up to 1 million queries per second on a TiDB cluster; this level of throughput is nearly impossible to achieve on a single machine.
We have a saying inside PingCAP, “make it right before making it fast.” Correctness, consistency, and reliability always trump speed. Without them, what good does low latency do?
4. Is Range-Based Sharding Better than Hash-based?
We implemented range-based sharding inside TiKV instead of hash-based, because our goal from the beginning was to support a full-featured relational database, and a relational database must support various types of Scan operations, like Table Scan, Index Scan, etc. Even though range-based sharding is harder to build, hash-based sharding can’t maintain a given table’s data in sequence, thus even a small scan on a few rows in a hash-based setting could mean jumping around multiple shards between multiple nodes. Range-based sharding won’t have this issue.
Of course, that doesn’t mean range-based sharding doesn’t have its own tradeoffs. For example, if your query pattern is mostly sequential writes where write operations far exceed reads with no Update or Delete, like log, then hotspots could form. For this scenario, we plan to use Partition Table as a solution to alleviate the system’s pressure caused by this write pattern (this is on our roadmap for 2018). Another situation that is much harder to resolve is if your request pattern contains frequent reads/writes on a small table that happens to be in the same Raft region. Right now, each TiKV Raft region is split along ordered key-value pairs and defaulted to 96MB. If such a small table with frequent reads/writes is less than 96MB, a hotspot would no doubt form, and the only solution (so far) is to manually split the region into two and redistribute them on different nodes. This “fix” won’t solve the hotspot issue, but only somewhat alleviate it by making the hotspot smaller. This is a problem that a distributed database isn’t well-suited to solve, so if you have this type of request pattern, we suggest you use an in-memory caching layer, like Redis, or put this small table directly in your application layer.
5. Why Is Scalability So Important for Business?
When a company starts out small, any database and infrastructure would work, but when that company starts to experience explosive (often unexpected) growth, your choice of infrastructure technology matters a great deal. Making the right choice that can easily scale in either direction depending on your business needs could mean the life and death of your company.
We all remember Twitter’s infamous “fail whale” when that service was constantly going down due to not just user growth, but also poor infrastructure. When your database becomes the bottleneck, the solution is not to manually shard your tables, sacrifice relational capabilities, make a bunch of copies of the same table, and rinse and repeat this vicious cycle. The correct (and most cost-effective) way is to use a distributed database that scales elastically, so all you have to do is add new machines to increase capacity. Adding more machines may seem like a growing cost, but it’s far cheaper comparing to what you save in human resources and the precious engineering time you need to respond and adapt in a competitive environment.
When we first started designing and building TiDB, we carried a lot of these lessons, from both our own experiences and other fellow DBAs and infrastructure engineers, with us. A truly useful and robust distributed database should instantly solve scalability bottlenecks and render all the “quick fixes” like manual sharding obsolete, so application developers can focus on doing what they do best — serve customers and grow the business, not managing database shards.
Recently, we’ve seen two of our users, Mobike (dockless bikesharing) and Zhuan Zhuan (online marketplace), do exactly that. Both companies have been experiencing explosive growth, and because they deployed TiDB right as this growth was taking off, their database infrastructure was not the bottleneck that prevented them from growing the way they did. In fact, Zhuan Zhuan is all-in on TiDB, because it knows that a well-built distributed database is mission critical to its future.
Feature image via Pixabay.