Home
>
Data Science
>
Update Embracing NewSQL: Why PalFish Chose TiDB

March 25, 2022 by Phu Nguyen

Update Embracing NewSQL: Why PalFish Chose TiDB

Main Contents:

Embracing NewSQL: Why PalFish Chose TiDB is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Embracing NewSQL: Why PalFish Chose TiDB in today’s post !

Why We Chose TiDB

TiDB is an open source, distributed SQL database. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

From our past impressions, distributed databases are good at coping with massive data that traditional standalone databases cannot handle. But it’s troublesome to guarantee consistency in distributed storage, not to mention ACID transactions.

Therefore, in our initial investigation, what surprised us most was that as a distributed database, TiDB supports ACID transactions. It uses the Raft consensus algorithm to achieve data consistency across multiple replicas, the two-phase commit (2PC) protocol to ensure the atomicity of transactions, and optimistic concurrency control, combined with MVCC, to implement a repeatable read level of isolation.

As we have mentioned, TiDB is an open source project. The primary maintainer behind it is the PingCAP team. Before we put TiDB into full production, we met with engineers at PingCAP. After learning about the wide usage of TiDB in various companies, we felt more than optimistic about TiDB’s prospects.

TiDB vs. Other NewSQL Databases

Before choosing TiDB, we compared it with other mainstream NewSQL databases. In this section, I’ll take one of those databases, CockroachDB, and compare it with TiDB. Among all the things we look for in a database, we value two the most: whether the database can meet the demands of the current application, and whether it can continue to innovate in the future.

For our current use scenarios, both TiDB and CockroachDB meet our requirements in respect of features, storage capacity, and read/write performance.

As for future development, we measured the potential of databases by two standards: the open source community and the system architecture.

The development of an open source project over the long haul is closely associated with its influence and popularity in the open source community. The more influential and popular the project gets, the more contributors it attracts, and thus the better its development in the future. TiDB and CockroachDB both appeal to their contributors, but on GitHub, TiDB has a larger number of watches, stars, and forks than CockroachDB. Besides, TiKV, the storage layer of TiDB, is an incubating project of the Cloud Native Computing Foundation (CNCF). In sum, we think TiDB has a more active community than CockroachDB.
A project’s system architecture has long-term influence on its evolution. With an advanced architecture, a database can stand the test of time and go further in the future.
Inspired by Google Cloud Spanner, TiDB separates computing from storage. This separated architecture allows the computing layer and storage layer to choose a different programming language that is more suitable: Go for computing and Rust for storage. By virtue of this architecture, TiDB achieves better performance and separate iteration of computing and storage. This provides many possibilities for integration with other ecosystems in the future.
CockroachDB has a decentralized architecture. In a cloud native era, when the number of nodes and data volumes grow to an unprecedented extent, elastic scaling becomes a must, and the intelligent scheduling capability is the key ingredient that determines the performance and stability of a database. However, the decentralized architecture makes scheduling difficult, especially when the scheduler needs a global view and multinode coordination.

Based on these considerations, we chose TiDB among all the NewSQL databases.

MySQL vs. TiDB

Some may ask, if we want ACID transactions so badly, why go through all that trouble to try out an unfamiliar database, when MySQL, a long-established database with transaction guarantees, is just a click away?

The motivation behind this is complicated. For mature businesses, choosing a database is never a simple decision. It’s not just about the technology today, but where it’s going. It’s a reflection of how you view technology and how you position your company in this fast-changing world.

Attitudes Toward New Technology: NewSQL Is the Future

When it first comes to the comparison between MySQL and TiDB, the latter doesn’t seem a strong competitor. MySQL is proven to be a stable database. For all our requirements, it has available solutions, such as high availability, scalability, although it is not quite elegant. (You have to do data sharding manually.)

However, at PalFish, the real problem is that a standalone database can’t handle the huge amount of data, and in this regard, a distributed database is a better fit. With TiDB, which is a horizontally scalable database, sharding is permanently taken out of mind. It provides a solution, not a compromise. Indeed, TiDB is less stable and mature than MySQL, but we believe given enough time, it will outperform traditional relational databases.

The Cost of the Machine or the Cost of the Talent?

Cost and efficiency were also important factors in our decision-making. At first glance, the hardware resources needed by TiDB are daunting. But we know that machines take up only a portion of the total cost—not all of it.

TiDB needs more and better machines than MySQL does, but it also saves more time for developers and DBAs. Remember the rule of economy in the Unix philosophy? “Programmer time is expensive; conserve it in preference to machine time.” At PalFish, we always expect the machine and software to do more, and our people to do less. That’s exactly what TiDB has to offer.

MySQL does provide high availability, but our DBA and infrastructure team needs to spend time on the implementation. MySQL can handle big tables, but the data sharding must be done by DBAs and engineers. It takes time, time that we could invest more wisely. With MySQL, the consumption of talent is also the cost: it’s just not so apparent and tangible as the extra machines TiDB requires. As machines become cheaper and talent more expensive, we must be aware of such invisible expenses.

The Latecomer Advantage

Another incentive is our role as a startup. In areas of mature technologies, we can hardly compete with tech giants. They have spent years accumulating their strengths, while we just built our business from scratch. Often, we have no choice but to be a follower of old tech; but when new technology is arriving, what will become of us if we don’t jump at the opportunity?

As latecomers, we must turn our weakness into our edge. That is, when established companies are still trudging along the old way, we’ll predict the trends and choose the technology that faces the future. This is how we avoid technical debt and overtake older tech companies like a fast car passing a slower one on a corner.

Tech Ecosystem

Choosing a technology is also choosing the ecosystem that comes with it. MySQL has a well-rounded ecosystem, which enables us to do more with less. Thanks to the MySQL-compatible strategy, as a TiDB user, we can share the MySQL ecosystem, as well as enjoying the benefits of NewSQL.

All things considered, PalFish decided to go all-in with TiDB.

How We’re Using TiDB

Here’s our current status with TiDB:

We have 10 TiDB clusters, over 110 instances, and 6 core clusters, each of which processes more than 10,000 transactions per second (TPS).
Our 99.9th percentile latency remains as low as 16 ms.
Response time and stability meet our expectations.

Looking back, we can say with confidence that moving to TiDB was the right decision. We have overtaken established companies in database technology and enjoyed the benefits of NewSQL databases. Since the migration, the R&D and DBA teams are working with much greater efficiency. What’s more, TiDB’s new releases never fail to impress us.

Lesson Learned

While we reap the benefits of TiDB, we also have to bear the cost of the unknown. Here, I’d like to show you the problems we encountered in our adjustment period.

Optimizer Index Selection

There were several times when the optimizer couldn’t select a correct index, and this resulted in failure.

For example, in a table with 300,000 rows of data and over 10 concurrent requests, when we added a new index, the previous query selected the wrong index. The CPU of TiKV instances soared to full capacity, and the system failed.

Again, in another big table with 1.4 billion rows of data, somehow certain query conditions didn’t use the index and instead performed full table scans, which affected the applications.

PingCAP engineers investigated the issue and found it was caused by a bug in the optimizer index selection. Luckily, they took measures to fix it. From TiDB 1.x to 3.x, the optimizer has grown more and more efficient. With performance monitoring and slow log monitoring, our DBA team can now quickly spot the problem. We also force indexes on big tables to avoid potential risks.

Big Data Replication

To do data analysis, we gathered data from many upstream TiDB clusters into a single TiDB cluster to be used for big data analytics. However, the data replication was slow, and the data and data encoding were inconsistent, which caused replication failure.

As we deepened our knowledge of TiDB, and with the support of the PingCAP team, we got better at managing TiDB and successfully resolved the data replication problem.

Conclusion

As a NewSQL database, TiDB is compatible with the MySQL protocol and supports horizontal scalability, high availability, ACID transactions, and real-time analytics for large amounts of data.

In this day and age, when Moore’s law is starting to fail, and high availability and cost optimization have become high-level concerns, a distributed architecture inevitably gains traction. For stateless services, we have traffic routing strategies plus multi replica deployments (like microservices); for caching, we have schemes like Redis Cluster and Codis; with Kubernetes, the operating system also completed its own course of evolution.

And what about databases? If you ask me, the NewSQL database is surely the next big thing.

Cockroach Labs and Redis Labs are sponsors of InApps Technology.

Feature image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.