• Home
  • >
  • DevOps
  • >
  • Site Reliability Engineering Is a Kind of Magic – InApps 2022

Site Reliability Engineering Is a Kind of Magic – InApps is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn Site Reliability Engineering Is a Kind of Magic – InApps in today’s post !

Read more about Site Reliability Engineering Is a Kind of Magic – InApps at Wikipedia

You can find content about Site Reliability Engineering Is a Kind of Magic – InApps from the Wikipedia website

Peter Waterhouse

Peter Waterhouse is a senior strategist at CA Technologies. He is a business technologist with more than 20 years’ experience with development, strategy, marketing, and executive management. Through his regular work with CA, Waterhouse covers key trends such as DevOps, mobility, cloud, and the Internet of Things.

A site reliability engineer (SRE) can be considered the IT equivalent of a wizard, or as Andrew Widdowson, an SRE at Google, described it “Like being part of the world’s most intense pit crew… changing the tires of a race car as it’s going 100 mph.”

So how is a site reliability engineer (SRE) different from traditional IT operations, and can a discipline originating from the world of web-scale, cloud-native unicorns ever apply to steady as she goes state of Enterprise IT?

Yes, it can. The scale out way is really the new way of managing enterprise IT. The notion that Enterprise IT exists behind closed walls doesn’t exist anymore. Now, the only way to create and conduct business at scale is through engineering reliability managed in an unprecedented manner. The demand for mobile experiences and the advent of complex cloud architectures has shifted the operational focus. It’s no longer about keeping the lights on. It’s instead about performance. The apps have to work well, the experience great and the infrastructure behind it needs continual monitoring.

Reliability like any feature isn’t something that’s retrofitted after deployment; it’s established and enhanced as software is developed, tested and released. That means establishing a new discipline, which Ben Treynor — Google’s original SRE lead — describes as “what happens when a software engineer is tasked with what used to be called operations.”

Read More:   Update Why Literate Programming Might Help You Write Better Code

A Sobering Reality

It’s easy to throw out yet another three-letter acronym and claim it’s a magical elixir for all the problems involved with running complex IT systems. In reality, engineering reliability into distributed systems with thousands of containerized applications and microservices is a tough gig. Not least because of all the moving parts, but also because any preconceived notions about predictable system behavior no longer apply.

Take for example keeping watch over a modern software application. This might consist of business logic written in polyglot languages and linked to the legacy ERP system (custom built or packaged or both). There’ll also be a raft of databases — traditional relational for transactional support, yes, but more likely a smorgasbord of NoSQL data stores — be that in-memory, graphing or document — perhaps fronted by recently adopted Node.js.

Some of this componentry will be on-premise, some will be containerized and moved to the public cloud — that might mean Docker and Kubernetes on AWS, but maybe Azure and Mesos — heck, why not both for some hybrid-style resilience?

But like the old Monty Python sketch, “you’ll be lucky” if this is all you ever have to manage. Depending on the nature of the business, there’ll also be a glut of third-party services — including payment processing and reconciliation. That’s not to mention all the new web and mobile apps interacting with the core business systems through an API gateway and possibly some analytics horsepower delivered by the likes of Hadoop and ElasticSearch. It’ll take a lot of operational wizardry to keep all that performant.

Fortune Favors the Bold

In a wonderful talk at SREcon earlier this year, Julia Evans from Stripe described the realities of managing today’s complex distributed systems. What was refreshing about her presentation was the open admission that she often finds the work difficult and how there’s always a ton of new stuff to learn. As she says in her abstract, she doesn’t always feel like a wizard (echoing the protestations of Harry Potter).

This honesty illustrates what’s exciting about being an SRE. With systems like the ones described above causing any number of thorny problems, it’ll be the inquisitive and brave that keep business on track. Being an SRE isn’t for the faint-hearted or those happy with a fire-fighting status-quo. It’s for those within our ranks who get bored easily — those super sleuths who keep asking reliability questions, crafting improvements — and learning as they go.

Read More:   How Toyota Drove Agile Load Testing to the Cloud – InApps 2022

So, if we consider a typical business-critical problem that could impact our modern application — let’s say some latency issue is causing an increasing number mobile app users to abandon a booking service? How would teams address the issue? Problems like this might go unnoticed for some time, or there could be a deluge of alarms. Even when a problem is identified, where do teams find the root-cause? Is it a problem with a new code release or at the API gateway? Is it a down to some weird microservices auto-scaling issue and was that earlier CPU increase we thought was OK but actually was really bad?

With an SRE-style approach, business critical problems are never addressed in knee-jerk fashion. Using modern tooling in areas such as application performance management and app analytics, SREs can observe the real-time behavior of applications, with systems collecting and correlating information from all related components. Rather than react after the fact, these solutions continuously identify anomalous patterns (like those mobile app abandonments) and compare them to historical trends — meaning SRE’s are alerted well before the business is impacted.

But beyond exposing new normal application weirdness and “unknown-unknowns,” modern tools also encourage and stimulate more of the SRE detective work — the real valuable stuff. These tools won’t just detect anomalies and then leave teams scrambling to find the needle in a haystack of needles. Instead, they’ll analytically gather all the evidence and lead teams in fact-based fashion towards a solution. Like for example, using an SRE inspired monitoring service to detect a performance anomaly introduced with a new software build and then tracing to the actual code causing the problem.

Like Harry Potter, operations professionals might have a hard time accepting they’re wizards. But ask yourself this — do you want to remain a silly muggle getting burnt out by constant fire-fighting? Of course not, it’s career limiting and sucks. Time then for some SRE magic — gaining the skills and tools needed to adopt new tech like containers and microservices — becoming an essential part of future-proofing your business.

Read More:   Update Redis Pulls Back on Open Source Licensing, Citing Stingy Cloud Services

CA Technologies is a sponsor of InApps.

Feature image via Pixabay.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.

Source: InApps.net

List of Keywords users find our article on Google:

“performance based fire protection engineering”
magic edtech
kubernetes chaos engineering
mesos monitoring
magic ed tech
ca technologies api gateway
site reliability engineer jobs
hire distributed systems engineers
fire extinguisher website template
hire elixir developers
etsy trustpilot
sre questions
exists elasticsearch
constant fire protection
monty python wikipedia
raft wikipedia
harry nguyen real estate
ats reliability
collection net raft
site reliability engineering manager jobs
ca application performance management
elasticsearch react native
google sre culture
elasticsearch multi field
site reliability engineering logo
managed elasticsearch azure
whatsapp business api gateway
race car party favors
elasticsearch node js example
status quo cd
monitor mesos
elasticsearch service performance
magic software linkedin
etsy notion template
sre wikipedia
site reliability engineer linkedin
si finds etsy
capital one backend development
what is kubernetes equivalent in aws
dont touch my phone muggle
modern wizard ui
docker deluge
peter nguyen linkedin
site reliability engineer questions
wizard of wikipedia
argo blockchain share chat
jerk pit reviews
kubernetes cost anomaly detection
harry potter letter template
reliability engineering manager jobs
steady as she goes meaning
sre manager google
ca technologies jobs
harry potter icons for apps
harry potter website template
qa technologist
harry potter template letter
reliability wikipedia
advanced elasticsearch course
etsy trust pilot
might and magic upload
elixir developer jobs
sre booking
kubernetes cloud cost anomaly
stripe software engineer jobs
mesos health
argo workflows
hire elixir developer
nodejs component with stripe api
stripe reviews trustpilot
needles case management software
elasticsearch net
harry potter party favors
modern application development with python on aws
argo tires
google site reliability engineer
hire site reliability engineers
performant healthcare
reliability of wikipedia
site reliability engineering at google
workwell technologies jobs
argo workflow
docker elixir
mesos docker
nodejs elasticsearch
honesty net solutions
detect magic
elixir web solutions
kubernetes node status unknown
azure logic app performance
aws auto scaling latency
elixir dev
auto scaling latency
monitor aws elasticsearch service
mesos performance
elasticsearch monitoring
auto scaling monitoring
Rate this post

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Get a custom Proposal

Please fill in your information and your need to get a suitable solution.

    You need to enter your email to download


      Success. Downloading...