Xianlin Chen
Xianlin Chen is the Head of Technology Hub at PalFish and also the person who initiated and built the Technology Hub at PalFish from scratch. He is an expert in the architecture of distributed systems, with rich experience in service governance, stability construction and high concurrency architecture. He has a great passion for cloud-native distributed databases and advocates the simple and elegant design of distributed databases.

PalFish is a fast-growing online education platform that focuses on English learning. It offers tailored English speaking experience to English as a Second Language (ESL) students. As of 2020, PalFish has over 35 million users, of which more than 1 million are paid users.

As our business rapidly grew, the surge of data posed a severe challenge to solve these problems, we migrated to an open-source, MySQL-compatible, distributed SQL database that supports Hybrid Transactional/Analytical Processing (HTAP) workloads. This turned out to be the right move.

In this post, we’ll share with you why we chose TiDB over MySQL. We hope our experience can help you find the most appropriate database for your application.

Why We Chose TiDB

TiDB is an open source, distributed SQL database. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

From our past impressions, distributed databases are good at coping with massive data that traditional standalone databases cannot handle. But it’s troublesome to guarantee consistency in distributed storage, not to mention ACID transactions.

Therefore, in our initial investigation, what surprised us most was that as a distributed database, TiDB supports ACID transactions. It uses the Raft consensus algorithm to achieve data consistency across multiple replicas, the two-phase commit (2PC) protocol to ensure the atomicity of transactions, and optimistic concurrency control, combined with MVCC, to implement a repeatable read level of isolation.

As we have mentioned, TiDB is an open source project. The primary maintainer behind it is the PingCAP team. Before we put TiDB into full production, we met with engineers at PingCAP. After learning about the wide usage of TiDB in various companies, we felt more than optimistic about TiDB’s prospects.

Read More:   Update What’s New in Prometheus 2.0

TiDB vs. Other NewSQL Databases

Before choosing TiDB, we compared it with other mainstream NewSQL databases. In this section, I’ll take one of those databases, CockroachDB, and compare it with TiDB. Among all the things we look for in a database, we value two the most: whether the database can meet the demands of the current application, and whether it can continue to innovate in the future.

For our current use scenarios, both TiDB and CockroachDB meet our requirements in respect of features, storage capacity, and read/write performance.

As for future development, we measured the potential of databases by two standards: the open source community and the system architecture.

  • The development of an open source project over the long haul is closely associated with its influence and popularity in the open source community. The more influential and popular the project gets, the more contributors it attracts, and thus the better its development in the future. TiDB and CockroachDB both appeal to their contributors, but on GitHub, TiDB has a larger number of watches, stars, and forks than CockroachDB. Besides, TiKV, the storage layer of TiDB, is an incubating project of the Cloud Native Computing Foundation (CNCF). In sum, we think TiDB has a more active community than CockroachDB.
  • A project’s system architecture has long-term influence on its evolution. With an advanced architecture, a database can stand the test of time and go further in the future.
  • Inspired by Google Cloud Spanner, TiDB separates computing from storage. This separated architecture allows the computing layer and storage layer to choose a different programming language that is more suitable: Go for computing and Rust for storage. By virtue of this architecture, TiDB achieves better performance and separate iteration of computing and storage. This provides many possibilities for integration with other ecosystems in the future.
  • CockroachDB has a decentralized architecture. In a cloud native era, when the number of nodes and data volumes grow to an unprecedented extent, elastic scaling becomes a must, and the intelligent scheduling capability is the key ingredient that determines the performance and stability of a database. However, the decentralized architecture makes scheduling difficult, especially when the scheduler needs a global view and multinode coordination.

Based on these considerations, we chose TiDB among all the NewSQL databases.

MySQL vs. TiDB

Some may ask, if we want ACID transactions so badly, why go through all that trouble to try out an unfamiliar database, when MySQL, a long-established database with transaction guarantees, is just a click away?

The motivation behind this is complicated. For mature businesses, choosing a database is never a simple decision. It’s not just about the technology today, but where it’s going. It’s a reflection of how you view technology and how you position your company in this fast-changing world.

Attitudes Toward New Technology: NewSQL Is the Future

When it first comes to the comparison between MySQL and TiDB, the latter doesn’t seem a strong competitor. MySQL is proven to be a stable database. For all our requirements, it has available solutions, such as high availability, scalability, although it is not quite elegant. (You have to do data sharding manually.)

Read More:   Apache Kafka Reaches 1.1 Trillion Messages Per Day – InApps 2022

However, at PalFish, the real problem is that a standalone database can’t handle the huge amount of data, and in this regard, a distributed database is a better fit. With TiDB, which is a horizontally scalable database, sharding is permanently taken out of mind. It provides a solution, not a compromise. Indeed, TiDB is less stable and mature than MySQL, but we believe given enough time, it will outperform traditional relational databases.

The Cost of the Machine or the Cost of the Talent?

Cost and efficiency were also important factors in our decision-making. At first glance, the hardware resources needed by TiDB are daunting. But we know that machines take up only a portion of the total cost—not all of it.

TiDB needs more and better machines than MySQL does, but it also saves more time for developers and DBAs. Remember the rule of economy in the Unix philosophy? “Programmer time is expensive; conserve it in preference to machine time.” At PalFish, we always expect the machine and software to do more, and our people to do less. That’s exactly what TiDB has to offer.

MySQL does provide high availability, but our DBA and infrastructure team needs to spend time on the implementation. MySQL can handle big tables, but the data sharding must be done by DBAs and engineers. It takes time, time that we could invest more wisely. With MySQL, the consumption of talent is also the cost: it’s just not so apparent and tangible as the extra machines TiDB requires. As machines become cheaper and talent more expensive, we must be aware of such invisible expenses.

The Latecomer Advantage

Another incentive is our role as a startup. In areas of mature technologies, we can hardly compete with tech giants. They have spent years accumulating their strengths, while we just built our business from scratch. Often, we have no choice but to be a follower of old tech; but when new technology is arriving, what will become of us if we don’t jump at the opportunity?

As latecomers, we must turn our weakness into our edge. That is, when established companies are still trudging along the old way, we’ll predict the trends and choose the technology that faces the future. This is how we avoid technical debt and overtake older tech companies like a fast car passing a slower one on a corner.

Tech Ecosystem

Choosing a technology is also choosing the ecosystem that comes with it. MySQL has a well-rounded ecosystem, which enables us to do more with less. Thanks to the MySQL-compatible strategy, as a TiDB user, we can share the MySQL ecosystem, as well as enjoying the benefits of NewSQL.

Read More:   What Happens to Your App When an npm Module Retires? – InApps 2022

All things considered, PalFish decided to go all-in with TiDB.

How We’re Using TiDB

Here’s our current status with TiDB:

  • We have 10 TiDB clusters, over 110 instances, and 6 core clusters, each of which processes more than 10,000 transactions per second (TPS).
  • Our 99.9th percentile latency remains as low as 16 ms.
  • Response time and stability meet our expectations.

Looking back, we can say with confidence that moving to TiDB was the right decision. We have overtaken established companies in database technology and enjoyed the benefits of NewSQL databases. Since the migration, the R&D and DBA teams are working with much greater efficiency. What’s more, TiDB’s new releases never fail to impress us.

Lesson Learned

While we reap the benefits of TiDB, we also have to bear the cost of the unknown. Here, I’d like to show you the problems we encountered in our adjustment period.

Optimizer Index Selection

There were several times when the optimizer couldn’t select a correct index, and this resulted in failure.

For example, in a table with 300,000 rows of data and over 10 concurrent requests, when we added a new index, the previous query selected the wrong index. The CPU of TiKV instances soared to full capacity, and the system failed.

Again, in another big table with 1.4 billion rows of data, somehow certain query conditions didn’t use the index and instead performed full table scans, which affected the applications.

PingCAP engineers investigated the issue and found it was caused by a bug in the optimizer index selection. Luckily, they took measures to fix it. From TiDB 1.x to 3.x, the optimizer has grown more and more efficient. With performance monitoring and slow log monitoring, our DBA team can now quickly spot the problem. We also force indexes on big tables to avoid potential risks.

Big Data Replication

To do data analysis, we gathered data from many upstream TiDB clusters into a single TiDB cluster to be used for big data analytics. However, the data replication was slow, and the data and data encoding were inconsistent, which caused replication failure.

As we deepened our knowledge of TiDB, and with the support of the PingCAP team, we got better at managing TiDB and successfully resolved the data replication problem.

Conclusion

As a NewSQL database, TiDB is compatible with the MySQL protocol and supports horizontal scalability, high availability, ACID transactions, and real-time analytics for large amounts of data.

In this day and age, when Moore’s law is starting to fail, and high availability and cost optimization have become high-level concerns, a distributed architecture inevitably gains traction. For stateless services, we have traffic routing strategies plus multi replica deployments (like microservices); for caching, we have schemes like Redis Cluster and Codis; with Kubernetes, the operating system also completed its own course of evolution.

And what about databases? If you ask me, the NewSQL database is surely the next big thing.

Feature image via Pixabay.