Home
>
DevOps News
>
The Art of DevOps Communication, at Scale and On-Call – InApps Technology 2022

March 21, 2022 by Phu Nguyen

The Art of DevOps Communication, at Scale and On-Call – InApps Technology 2022

Main Contents:

The Art of DevOps Communication, at Scale and On-Call – InApps Technology is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn The Art of DevOps Communication, at Scale and On-Call – InApps Technology in today’s post !

The Art of DevOps Communication at Scale

DevOps takes a certain personality profile. Since companies are in constant flux, like Tetris pieces continually trying to fit together in a new way, you need a team of individuals committed to a common vision. For Senior Site Reliability Engineer at Klaviyo, Laura Stone, this all comes down to hiring the right people.

Klaviyo, a marketing automation tool focused on applying big data and segmentation to really personalized marketing, is five years old and has been DevOps from the start. Of course, for the first two years it was just the founder. But in the last three years, it has had to scale up their DevOps to a team that’s grown to 150 people. That’s why the interview process is more of a culture and code test than a Q&A.

The first step is a take-home assignment or, what Stone calls, a simulation for what it’s like to work at Klaviyo. Candidates have to write a small CRUD (create, read, update, and delete) application that deals with weather data and sends people a personalized email. The right candidates don’t have to be masters of certain languages but they must show they are eager to learn and that they are thinking of the next person who has to use that code, including attention to documentation, algorithms, readability, and cleanliness.

”I love it when people write tests — it shows that it’ll be easier to use your code in the future. I think a lot of people should test their code and document their code and they just don’t do it,” Stone told InApps Technology.

And DevOps isn’t just about scaling a company but scaling a code base.

Stone gave the example: “Let’s say you had a 100 people signed up to this service and 40 people are in Boston, do you make one API call for each or create it and use it once and cache it?”

“I don’t look at a ton of resumes — they’re very difficult for SRE roles. Show me the code.”
— Laura Stone on importance of hands-on DevOps interviews.

Once you pass your first test, you come in for some collaborative coding.

“The interview process that we have is set up to simulate what it is like to work here,” said Stone.

“You are put in charge of refactoring. They don’t need to look stuff up. They can ask us anything.” She continued that it’s to understand how they work and look for candidates’ “openness around when they don’t know something.”

The next part of the interview test is about of DevOps ownership. They inform the candidates that they now own this code, asking them questions like:

Where do you want it to run?
How do you want to be notified if it fails?
What’s the infrastructure it’s running on going to look like?
How would you scale to different user needs?

Stone says they aren’t looking for candidates who know all the answers, especially those fresh out of university, but they want to see signs of a desire to want to be on-call and to own their service from creating to maintaining it.

The team at Klaviyo isn’t looking for precise answers but for a candidate’s ability to think ahead. They should suggest running their code on a machine, not their computer, so they aren’t tying it to one person and personal infrastructure. They should think about where automation can speed up processes.

“Someone who is ready to be in a DevOps culture would have this culture,” Stone said.

Most importantly they need to see signs of customer empathy and a desire to make sure the solutions are as stable as possible.

Stone says they are looking for “People who are motivated to learn and who are technically savvy and who can show empathy. Given the right structures and resources in place, then they can be successful owning their service from start to finish.”

She continued that “As the company scales, we’re being very thoughtful about how we codify processes within the engineering team because even people who are motivated and highly skilled and empathetic, if they don’t have the right structure and resources in place, things still won’t work.”

When Stone joined 18 months ago, there was just one product team and an SRE team. Now there are an additional four or five product engineering teams focused on specific areas of the solution. They have a greater need to concentrate “knowledge transfer as engineers can no longer know the entire product and, in many cases, can no longer have deep relationships with other engineers on the team.”

Scaling DevOps all comes down to one question: How can communication flow and how can people still have ownership?

One form of scalable knowledge transfer they use is mob programming. It’s like agile’s pair programming, but the whole team is working on the same thing at the same time on the same computer. They did this when adopting Terraform to automate their infrastructure.

“No one was familiar with it within our organization so I had to come in and teach Terraform. We did a mob programming session where the SREs acted as mentors to the product teams,” Stone said.

She thinks this knowledge transfer is working because it used to be exceptional if she didn’t get paged on-call. Now it’s exceptional if she does.

The Art of DevOps Communication During Incidents

One of the important aspects of DevOps is breaking down barriers and providing cross-functional training so everyone feels an equal responsibility for code that’s being released. For most DevOps teams, that means streamlining incident response and instituting all-hands-on-deck on-call rotations. Because when you are trying to create a world of always-on continuous delivery and integration, you need people willing to work increased uptime, any time of day or night.

“DevOps is a buzzword, so given that we moved to DevOps, it was really up to us to define what that means for our team, which for us meant everyone is responsible for infrastructure and you’re on-call,” Senior Site Reliability Engineer Nida Farrukh told InApps Technology, talking about her personal experience as part of Microsoft Social Engagement and Market Insights (MSE) DevOp’s journey, which began a year ago.

This small, nimble team is working to minimize downtime for thousands of customers. In a recent restructuring, a handful of SREs are now sharing infrastructure and on-call responsibilities with developers.

“We moved to DevOps [and] it was really up to us to define what that means for our team, which for us meant everyone is responsible for infrastructure and you’re on-call.” — Nida Farrukh

“This generally increases the health of your monitoring system because the people who are writing the code are fixing it and feeling the pain points too,” explained Farrukh.

Each team member is on call for one week at a time. To start, each trainee has a shadow week, acting as back-up for an SRE or another fully-trained dev. Then, within a few weeks, the trainee takes on the role of primary administrator on duty or AOD, and the more experienced person shadows. There are also ample tutorials, docs and regular simulated outage exercises to assist.

“People generally are very nervous when they go on call for a service for the first time and they’ve never been on call for anything,” Farrukh said.

She continued that it isn’t about knowing a product inside out, it’s about knowing where to find what you need:

Do you have the tools to debug a new problem that comes up?
Do you know where to find the answer in documentation?
Do you know who can answer doubts and how to contact them?

On the MSE team, while there is some leeway to choose appropriate tools for tasks, the infrastructure team works hard to keep standardization across their stacks, with a limited number of logging systems, languages, libraries and monitoring systems, so everyone shares a baseline knowledge.

Ask for Help

A lot of times new folks can feel afraid to ask for help, but Farrukh’s team works hard to emphasize that everyone can and should ask for help because, in DevOps, the entire product is the entire team’s responsibility.

“In the MSE team, as AOD you are first responder, not sole responder. You can pull in anyone in the entire team to help you during an incident,” she said.

The engineering leads and SREs have even gone out of their way to volunteer to respond any time.

“An AOD has the power to call anyone but there is some sort of psychological barrier so these people have come out and said ‘Please call me’,” Farrukh said.

She said software architects are usually a good first contact for issues within a DevOps organization because if they don’t know exactly the problem, they’ll know who to call.

Farrukh opines that AODs can and should delegate tasks that act as distractions for their main goal — debugging efficiently. Even looking for the right contacts and then calling them can be a distraction.

She suggests two roles to help limit these interruptions: bridge manager and comms person.

The bridge manager can be in charge of bringing on everyone that should be on a conference call about the issue. The bigger the issue, the more people may be called in from the escalation matrix.

Other engineers, execs and even marketing and customer service often have loads of questions that can distract the AODs from their work. The suggested comms role — something Farrukh says she often volunteers for — fields those questions and is the main point of contact for the AOD to disseminate information through.

For smaller issues, the comms and bridge managers may be the same person or even the AOD herself for really small issues. In reality, on the MSE team, the AOD often has five to six engineers helping her with larger problems, including SREs for faster deployments and rollbacks.

“We tend to know how the infrastructure works and how to put in place workarounds. We either advise the AOD or take responsibility for specific tasks,” Farrukh explained.

To learn more about DevOps communication in incident response, Farrukh recommends you study the examples of active, analog response systems like the Incident Command System developed for wildfire response in California and the U.S. National Transportation Safety Board.

The opinions and views expressed in this article are those of the interviewees and do not necessarily state or reflect those of Microsoft.

Feature image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

The Art of DevOps Communication, at Scale and On-Call – InApps Technology 2022

Read more about The Art of DevOps Communication, at Scale and On-Call – InApps Technology at Wikipedia

The Art of DevOps Communication at Scale

The Art of DevOps Communication During Incidents

Ask for Help

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about The Art of DevOps Communication, at Scale and On-Call – InApps Technology at Wikipedia

The Art of DevOps Communication at Scale

The Art of DevOps Communication During Incidents

Ask for Help

Get a custom Proposal

You need to enter your email to download

Blog post

Locations