Home
>
Data Science
>
Update Intel Gives the Etcd Key-Value Store a Needed Boost

March 30, 2022 by Anh Hoang

Update Intel Gives the Etcd Key-Value Store a Needed Boost

Main Contents:

Intel Gives the Etcd Key-Value Store a Needed Boost is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Intel Gives the Etcd Key-Value Store a Needed Boost in today’s post !

Key Summary

This InApps.net article, published in 2022, details Intel’s enhancements to the etcd key-value store to improve scalability for CoreOS’s Tectonic platform, which integrates Kubernetes. Written with an informative, technical tone, it aligns with InApps Technology’s mission to cover data science and software development trends, offering an accessible overview of distributed systems optimization.

Key Points:

Context: Etcd, a distributed key-value store used by Kubernetes, Cloud Foundry, and Red Hat, faced scalability limitations in handling high write volumes for cluster coordination and state management.
Core Insight: Intel addressed Kubernetes’ scaling issues by optimizing etcd with Active Data Replication (ADR), doubling its write performance to support Tectonic, a CoreOS platform combining Kubernetes and container-centric Linux.
Key Features:
- Etcd’s Role: Acts as a “source of truth” for cluster coordination in distributed systems, supporting platforms like Tectonic across various operating systems.
- ADR Enhancement: Intel’s lab tests showed etcd’s write capacity increased from 4,000–5,000 to 10,000 writes per second with ADR in a five-node cluster.
- Tectonic Platform: Combines CoreOS’s Linux distribution with Kubernetes, enabling enterprise-grade container orchestration, with Intel planning commercial availability via Redapt and Supermicro.
Outcome: Intel’s ADR optimization enhances etcd’s performance, enabling scalable, data-driven microservices architectures and highlighting the need for new communication patterns in containerized environments.

This article reflects InApps.net’s focus on innovative data science and software development, providing an inclusive, practical overview of etcd’s role in advancing distributed systems.

Understanding the Problem

To make the Tectonic stack a reality, Intel had to understand a particular scaling problem that the Kubernetes community had commented about. Kubernetes had scale limits.

How they solved the problem shows the way new technologies are not always the answers to the challenges posed by distributed architectures. Sometimes, it’s the technologies that were developed decades ago that make the difference.

Some Background

Google developed Kubernetes, an orchestration system, basing it on Borg — its own container system that manages just about everything at Google. Tectonic is a new service developed by CoreOS that combines its OS and components with Kubernetes. This combination is hoped to make Google’s form of infrastructure available to any enterprise customer that needs to scale their operations in some manner.

CoreOS, which last month received $12 million from Google Ventures, has made its mark with a container-centric Linux distribution for large-scale server deployments and its secure, distributed platform for auto-updating servers. Much as the Chrome browser updates automatically, so does CoreOS for Linux server deployments. It’s used by companies such as MemSQL, Rackspace and Atlassian.

Etcd uses the Raft consensus algorithm. On the Raft consensus algorithm web page, consensus is described as a fundamental problem with fault-tolerant distributed systems. The consensus involves what is described as multiple servers agreeing on values. Once a decision is made about a value, the decision is final:

Typical consensus algorithms make progress when any majority of their servers are available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).

A paper written by the Raft creators provides a more detailed analysis of the algorithm.

In etcd, the Raft consensus algorithm is most efficient in small clusters — between three and nine peers. For clusters larger than nine peers, etcd selects a subset of instances to participate in the algorithm in order to keep it efficient.

According to CoreOS, when writing to etcd, the peers redirect to the leader of the cluster, which then redirects back to the peers immediately:

A write is only considered successful when a majority of the peers acknowledge the write.

As described by CoreOS, that means in a cluster of five peers, the write operation is only as fast as the third fastest machine. Leaders are elected by a majority of the active peers before cluster operations can continue. With this process, write performance becomes an issue when running at scale, as there must be acknowledgement of the leaders by the peers before a cluster operation can continue, which can be an issue when writing about performance in high-latency environments, such as a cluster spanning multiple data centers.

Intel’s team under Nicholas Weaver, director of emerging technologies, looked at the Raft protocol for the first place they could find a bottleneck.

They discovered in the etcd code that every inbound entry to a follower was writing to disk before acknowledging to the leader. Weaver has a background in storage. He looked at the volume of entries and recognized how expensive things could get at scale. The likely reason: a requirement for what looked like “stable storage.”

The concept of stable storage runs deep in tech history. According to Wikipedia, “stable storage is a classification of computer data storage technology that guarantees atomicity for any given write operation and allows software to be written that is robust against some hardware and power failures. To be considered atomic, upon reading back a just written-to portion of the disk, the storage subsystem must return either the write data or the data that was on that portion of the disk before the write operation.”

Stable storage offers a view into the late 1980s when the demand for mirroring data became a higher order of need, simply due to the cost that had come with using mainframes, the massive computers of the day. Again, from Wikipedia, the RAID controller served as way to implement the disk writing algorithms, which then allowed the disks to act as a means of stable storage.

But in this all lies a problem in the requirements to acknowledge all the writes that comes with the Raft protocol. That’s where DRAM enters the picture, and more specifically, “Asynchronous DRAM” (ADR), a feature that, according to Intel, automatically flushes memory controller buffers into system memory, and places the DDR into self-refresh mode in the event of a power failure.

To reduce the latency impact of storing to disk, Weaver’s team looked to buffering as a means to absorb the writes and sync them to disk periodically, rather than for each entry.

Tradeoffs? They knew memory buffers would help, but there would be potential difficulties with smaller clusters if they violated the stable storage requirement.

Instead, they turned to Intel’s silicon architects about features available in the Xeon line. After describing the core problem, they found out this had been solved in other areas with ADR. After some work to prove out a Linux OS supported use for this, they were confident they had a best-of-both-worlds angle.

And it worked. As Weaver detailed in his CoreOS Fest discussion, the response time proved stable. ADR can grab a section of memory, persist it to disk and power it back. It can return entries back to disk and restore back to the buffer. ADR provides the ability to make small (<100MB) segments of memory “stable” enough for Raft log entries. It means it does not need battery-backed memory. It can be orchestrated using Linux or Windows OS libraries. ADR allows the capability to define target memory and determine where to recover. It can also be exposed directly into libs for runtimes like Golang. And it uses silicon features that are accessible on current Intel servers.

Using the new capability, Intel did its work with Raft and etcd in the lab. They tested a five node etcd cluster and found the maximum number of writes without ADR is about 4,000 to 5,000 writes per second:

etcd2.08

With ADR, etcd could handle about 10,000, essentially doubling the transaction speed:

etcdpatched

What This All Means

Weaver and his team have demonstrated how the state of a single machine has much in common with distributed systems. The difference comes in terms of how the state gets managed. In some respects it comes down to what gets monitored. Until more recently, the most valuable tools monitored the machines that existed in client/server environments.

I talked about this topic last week with SignalFx Founder and CEO Karthik Rau. In our conversation we discussed containers and how they behave requires people to collect the data and analyze it so the applications work accordingly across clusters.

That, more than anything, means a change in how people communicate. Apps are at the center of the universe. Increasingly, the compute will swarm to the data, which will be automated and orchestrated through microservices architectures. The services will consist of disposable containers that are portable, connected to git environments and the rest. The way containers are programmed and behave on these architectures means people will need to have different communication patterns themselves. That’s a business issue that can not be solved with org charts. The answers will surface in the data to such questions as speed and latency issues.

CoreOS, Rackspace, Red Hat and SignalFx are sponsors of InApps Technology.

Feature image via Flickr Creative Commons.

Source: InApps.net

Rate this post

Anh Hoang

Anh Hoang is Head of SEO Optimization at InApps Technology, ensuring that the message and research of InApps Technology reach the most people possible while adhering to our strict journalistic standards of excellence and integrity.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

April 10, 2026 by Anh Hoang

Update Intel Gives the Etcd Key-Value Store a Needed Boost

Key Summary

Key Points:

Read more about Intel Gives the Etcd Key-Value Store a Needed Boost at Wikipedia

Understanding the Problem

Some Background

What This All Means

Best Angular Projects for Beginners in 2026

Offshore Product Development and How It Differs?

Is It Too Late to Switch Into Tech? What Reddit Career Changers Say

Are Developers Becoming Too Dependent on AI Tools?

Is Being a Self-Taught Developer Still Viable in 2026?

Imposter Syndrome in Tech: Why So Many Developers Feel Like Frauds

Too Many Tools, Too Little Time: How Developers Deal With Stack Fatigue

Why AI Productivity Is Making Developers Feel More Stressed, Not Faster

How to Stay Relevant in Tech Without Learning Everything

Why So Many Developers Feel Burned Out (And What Actually Helps)

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

Hire Offshore Angular Developers: The Right Development Team In Vietnam

What Is ODC (Offshore Development Center)? Understand Offshore Development Center In 3 Seconds

Hire Full-Stack Developers From Software Outsourcing Companies in 2026

Locations

Key Summary

Key Points:

Read more about Intel Gives the Etcd Key-Value Store a Needed Boost at Wikipedia

Understanding the Problem

Some Background

What This All Means

Get a custom Proposal

You need to enter your email to download

Blog post

Locations