Home
>
DevOps News
>
SRE Tips to Prepare for Black Friday – InApps Technology 2022

March 19, 2022 by Anh Hoang

SRE Tips to Prepare for Black Friday – InApps Technology 2022

Main Contents:

SRE Tips to Prepare for Black Friday – InApps Technology is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn SRE Tips to Prepare for Black Friday – InApps Technology in today’s post !

Review Past Incidents

Reviewing past incidents is a powerful way to gain an understanding of how your system has failed previously; and will offer you a lot of insight into how the system actually behaves in production. Armed with this insight, you’ll be more confident in the case of an outage. Plus it will give you a checklist of questions to ask your teams.

Have we validated fixes for past incidents in light of any new code changes? To prevent the drift into failure, it’s important to revisit fixes for past bugs to ensure the reliability of code and configuration updates.
Are we prepared with the right amount of infrastructure and correct autoscaling rules to handle a surge in traffic?
Have we tested the reliability of our application’s critical paths? Validating that the core functionality of our application will perform under stress will make a massive difference to our company’s bottom line.

Get to Know Your ‘Problem Services’

A pragmatic way to identify “problem services” is to ask your team “which services do folks avoid writing code for?” Once you have a list of these services, you can start looking into how to make sure those services don’t cause any headaches on the big day.

Do a little bit of digging to see how those services tend to fail and how the rest of the system responds. Once you understand the failure patterns of a given service, the reliability mechanisms become more obvious. Does the service need a bit more redundancy? Does it have issues with auto-scaling properly? Is the connection to an upstream service a little fragile?

Run a Remote FireDrill to Test Your Observability and Runbooks

A FireDrill is a planned event that validates people and processes. Specifically, it is designed to run a team through the proper actions to take when a specific problem arises. Like business continuity plans, FireDrills should be a regular and expected facet of our incident management preparation.

Now that we’re working from home, it’s important for us to do a dress rehearsal to make sure that we are confident we’ll find gaps in our process before we end up troubleshooting an incident from the living room in the middle of Thanksgiving. Are our alerts set up properly, or are we getting paged for non-issues and missing alerts for real problems? Will our dashboards give us the right data, so that we can resolve an incident quickly? And are our runbooks up to date, complete, and accurate?

Create a One-Pager for Your Whole Company About the Event

One of the more time-consuming elements of incident management is making sure that everyone is on the same page. Publishing a company wiki page about the traffic spike and sharing it across your organization will save valuable minutes in the event of an outage.

Here’s a starter list of topics you can include:

Why you expect the traffic spike and how long you estimate it to last.
Contact information for all on-call people and a link to the rotation calendar (this should be easily accessible in the first place).
Known system trouble spots, like potential bottlenecks or single points of failure. This allows everyone in the organization to keep an eye out for potential problems.
Check primary database query plans and any expected query pattern changes, including how long these queries take to run under normal conditions.
Scaling bounds and known capacity limits, such as a capacity limit on Lambdas.
Results from Chaos Engineering experiments run on services.

Reproduce Past Incidents with Chaos Engineering

Sometimes we think we have a fix for our past incidents, but we never actually go and test that the fix works. This can be for a number of reasons: inadequate tooling, hesitance to test in production, or perhaps even laziness. But this is a core use case for Chaos Engineering. Because Chaos Engineering enables engineers to precisely and repeatedly recreate turbulent production conditions, we can often reproduce what led to a major incident and verify that a fix does work.

Uneventful Black Fridays

There’s an apt quote for 2020 that goes, “may you live in interesting times.” But when it comes to our on-call rotations and system behavior, we’d prefer things be boring and predictable. We hope that the above list can help your team prepare for a Black Friday full of happy customers and plenty of downtime with your loved ones.

Source: InApps.net

Rate this post

Anh Hoang

Anh Hoang is Head of SEO Optimization at InApps Technology, ensuring that the message and research of InApps Technology reach the most people possible while adhering to our strict journalistic standards of excellence and integrity.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

April 10, 2026 by Anh Hoang

SRE Tips to Prepare for Black Friday – InApps Technology 2022

Read more about SRE Tips to Prepare for Black Friday – InApps Technology at Wikipedia

Review Past Incidents

Get to Know Your ‘Problem Services’

Run a Remote FireDrill to Test Your Observability and Runbooks

Create a One-Pager for Your Whole Company About the Event

Reproduce Past Incidents with Chaos Engineering

Uneventful Black Fridays

Best Angular Projects for Beginners in 2026

Offshore Product Development and How It Differs?

Is It Too Late to Switch Into Tech? What Reddit Career Changers Say

Are Developers Becoming Too Dependent on AI Tools?

Is Being a Self-Taught Developer Still Viable in 2026?

Imposter Syndrome in Tech: Why So Many Developers Feel Like Frauds

Too Many Tools, Too Little Time: How Developers Deal With Stack Fatigue

Why AI Productivity Is Making Developers Feel More Stressed, Not Faster

How to Stay Relevant in Tech Without Learning Everything

Why So Many Developers Feel Burned Out (And What Actually Helps)

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

Hire Offshore Angular Developers: The Right Development Team In Vietnam

What Is ODC (Offshore Development Center)? Understand Offshore Development Center In 3 Seconds

Hire Full-Stack Developers From Software Outsourcing Companies in 2026

Locations

Read more about SRE Tips to Prepare for Black Friday – InApps Technology at Wikipedia

Review Past Incidents

Get to Know Your ‘Problem Services’

Run a Remote FireDrill to Test Your Observability and Runbooks

Create a One-Pager for Your Whole Company About the Event

Reproduce Past Incidents with Chaos Engineering

Uneventful Black Fridays

Get a custom Proposal

You need to enter your email to download

Blog post

Locations