The 3 Measures of Successful Site Reliability Engineering – InApps Technology 2022

Main Contents:

The 3 Measures of Successful Site Reliability Engineering – InApps Technology is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn The 3 Measures of Successful Site Reliability Engineering – InApps Technology in today’s post !

Site Reliability Engineering in 3D

The trick of SRE is to balance the need to please the customer against the unnecessary expense of over-provisioning operations, or stifling innovation. Three key dimensions can cover this, according to Coulter.

“You need to consider all three dimensions for success,” Coulter said. Roughly, they are:

Service Level Indicators (SLIs): These are the numbers that describe the state of the running system. SLIs are defined at system boundaries or team boundaries. SLIs should measure system slowdowns, not outages, which happen less often these days. The numbers could be captured by an Application Monitoring Platform (APM) such as AppDynamics, DataDog or New Relic, or any one of a number of new observability tools like Honeycomb.io of IBM’s Instana.
Service Level Objectives (SLOs): These are the benchmarks that the SLIs numbers need to hit, as agreed upon between the service provider and the end user. They can be expressed in terms of performance curves.
Service Level Agreements (SLAs): These are the agreed-upon actions that the provider must adhere to should the SLOs go unmet. It could be a refund, or perhaps the development cycle gets suspended for 28 days to address the ongoing issues.

“In a perfect world, [the SLA] is defined by the business or the customer and then you build the SLOs and SLIs underneath it,” he said.

In the case of the hospital, the cause of the slowdown were malformed packets — messages that did not meet the HL7 standard for hospital data — that were emitted by a proprietary application. The dev team had no control over this application, beyond filing a bug reporter to the vendor, but they did have control of how success was defined by the SLO, and the expectation of the end user.

In many cases, the engineering team doesn’t need to set SLOs to the highest possible performance level. In fact, such a level could be unduly expensive for the service provider to maintain. Rather, they should be set to customer expectation (One exception to this rule are financial institutions where the speed of a transaction is a fiercely competitive differentiator).

The most difficult part of the measure is understanding the end-user. In the case of the hospital, this involved “observing behavior in the wards and talking to nurses,” Coulter said. In this case, they had found out that the nurses had an “instinctive expectation” of when the lab results would come back — in about five minutes or so — though some nurses would hit the submit button repeatedly, particularly when the system was slow, dragging down the average response time even further.

With this knowledge, the service provider would be able to set an SLA that centered on returning the full results within five minutes, rather than the 10 second processing time.

“The SLAs are not there to beat each other up. They are there to capture the mutual understanding. You reach that mutual understanding through negotiation,” Coulter said. “Negotiating is a key skill for any SRE person.”

Enjoy the full presentation here:

Feature image by National Cancer Institute on Unsplash.

InApps Technology is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Honeycomb.io.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.