- Home
- >
- DevOps News
- >
- Make SRE More Proactive by Shifting Left – InApps Technology 2025
Make SRE More Proactive by Shifting Left – InApps Technology is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn Make SRE More Proactive by Shifting Left – InApps Technology in today’s post !
Key Summary
This article, authored by Andreas Grabner of Dynatrace, discusses how AIOps (Artificial Intelligence for IT Operations) can transform Site Reliability Engineering (SRE) by adopting a proactive, “shift-left” approach. Presented in the context of DevOps, it addresses the limitations of traditional manual troubleshooting and older AIOps solutions, proposing a more integrated, automated strategy to enhance software delivery and system reliability. Key points include:
- Current Challenges in DevOps:
- Manual Troubleshooting: According to the Puppet State of DevOps Report and Dynatrace Autonomous Cloud Survey, 90% of organizations rely on manual troubleshooting and remediation, which is unsustainable with the expected tenfold increase in production deployments over the next 12 months (as of 2022).
- Dynamic Environments: Modern multicloud, containerized, microservices-based architectures with frequent deployments (e.g., blue/green, canary, feature flags) make root-cause analysis complex due to millions of dependencies.
- Limitations of Gen 1 AIOps:
- How It Works: Early AIOps solutions ingested logs, metrics, and traces to find correlations for root-cause analysis, suitable for low-frequency, predictable deployments.
- Shortcomings: Struggle in dynamic, high-frequency deployment environments where correlating data across numerous services is inefficient, failing to provide fast, precise insights.
- Shifting AIOps Left:
- Concept: Integrate AIOps into development, testing, and pre-production stages (shifting left) to create test-driven operations, similar to test-driven development.
- Implementation:
- Use Keptn (a CNCF open-source project) to orchestrate pre-production environments, where AIOps monitors load tests, chaos engineering, and auto-remediation scripts.
- Validate AIOps’ ability to detect anomalies and trigger remediation before production, ensuring proactive fixes rather than reactive responses after user issues.
- Benefits: Reduces downtime, improves mean time to repair (MTTR), and ensures consistent digital experiences by battle-testing remediation in chaotic scenarios.
- Integration with Platforms and Processes:
- Holistic Approach: Embed AIOps into CI/CD pipelines, development, testing, and SRE practices to automatically learn and adapt to intentional/unintentional behavior changes.
- Outcome: Enhances anomaly detection and auto-remediation, enabling faster, more reliable software delivery and healthier production systems.
- Future Outlook:
- A follow-up article will explore additional AIOps best practices, but this shift-left approach lays the foundation for proactive SRE, aligning with modern DevOps needs.
- InApps Insight:
- Shifting AIOps left aligns with modern DevOps trends, enabling proactive reliability and scalability in complex, cloud-native environments.
- InApps Technology can integrate AIOps solutions like Dynatrace and tools like Keptn into client workflows, enhancing CI/CD pipelines and ensuring robust, automated operations for microservices-based applications.
Read more about Make SRE More Proactive by Shifting Left – InApps Technology at Wikipedia
You can find content about Make SRE More Proactive by Shifting Left – InApps Technology from the Wikipedia website

Andreas Grabner
Andreas is a DevOps activist at Dynatrace. He has over 20 years of experience as a software developer, tester and architect, and is an advocate for high-performing cloud operations. As a champion of DevOps initiatives, Andreas is dedicated to helping developers, testers and operations teams become more efficient in their jobs with Dynatrace’s software intelligence platform.
Many organizations are turning to AIOps in hopes of creating better, more secure software faster. But the ability to create robust and fast software delivery pipelines is constantly hampered by the need to troubleshoot and remediate issues in production environments manually. According to both the Puppet State of DevOps Report and the Dynatrace Autonomous Cloud Survey, that is still the approach 90% of organizations are taking.
At the same time, these surveys also show that organizations expect to grow the frequency of production deployments tenfold over the next 12 months. This is almost certainly doomed to fail, if 90% of these organizations continue to rely on manual troubleshooting, remediation and root-cause analysis.
Organizations have begun to tap into the potential for AIOps to reduce this level of manual work and provide faster, automated solutions to get more precise insights into the performance and security of their applications, microservices and infrastructure. Not all AIOps solutions, however, are equal. Older “Gen 1” solutions — solutions that try to find patterns across independent, disconnected data sources — are not as efficient or effective at creating better software faster as they could, or should, be.
In this article, and an accompanying article I’ll post later this month, I will describe what it looks like to deploy AIOps “the right way,” to ensure that you’re deriving maximum value from your AIOps solutions and identify where older iterations may have gone awry. To start, I’ll break down why Gen 1 AIOps solutions did not deliver this value and then outline a few examples of how AIOps is done best, beginning with shifting AIOps left to create more “test-driven operations.”
Why Gen 1 AIOps Solutions Fall Short
The first wave of AIOps solutions provided observability by ingesting data, including logs, metrics and traces, and analyzing this data for possible correlations to explain the root cause of technical problems or changed user behavior. At the time, IT teams could count how many deployment and configuration challenges associated with production workloads occurred each year, so this use of AIOps worked fine for a relatively low number of these challenges. Because the frequency of changes was so low and predictable, it was easier for ITOps teams to manage maintenance windows and keep downtime and mean time to repair (MTTR) to a minimum.
But that is not the environment digital teams are living in today. Now, production deployments are counted in days, not years. Multicloud environments have grown increasingly more dynamic and containerized. Most new application architectures leverage microservices that are deployed as containers in multicluster, multicloud environments, making it even harder to keep track of changes and find root causes.
Teams are moving toward progressive delivery models for deployments (blue/green, canary, feature flags), where instead of replacing entire systems, individual services are upgraded and replaced with new iterations on a piecemeal basis. Environments change too quickly for correlation-based machine learning algorithms to establish a baseline of what’s normal. Also, with potentially millions or billions of dependencies between applications, infrastructure, containers and microservices, it’s harder to correlate logs, metrics, and traces for conclusions. There are too many services involved.
As dynamic multicloud environments drive new changes in delivery and operations, AIOps must adapt accordingly for DevOps teams and site reliability engineers (SREs) to maximize the value they, and their organization, can get out of it. In other words, teams need to ensure they’re doing AIOps the right way.
Tighter Integration Between Processes and Platforms
A more dynamic, comprehensive approach to AIOps goes beyond simply updating your AIOps tools. It means integrating AIOps solutions into everything — development processes, testing, DevOps and SRE practices — and embedding it within your internal platforms. Closing the gap between your AIOps solutions and your internal platforms and processes is what enables AIOps to precisely, and automatically, absorb and learn about both intentional and unintentional behavior changes occurring in your CI/CD pipelines.
The more that ITOps teams can leverage AIOps as part of chaos engineering, the more battle-tested and validated those solutions become at anomaly detection. That validation then gives teams confidence in their AIOps solution’s ability to auto-remediate issues in production environments. If it can handle itself in chaotic scenarios, its automated anomaly detection can deliver fast, precise answers — along with the remediation to back them up — in any situation.
Creating More Proactive ‘Test-Driven Operations’
SREs use service-level objectives (SLOs) to validate and track how systems behave in production, under different workloads or conditions, and write auto-remediation scripts to make whatever adjustments are needed to maintain availability and a consistent digital experience. But this is a reactive position, so engineers are often only deploying the auto-remediation code after a user has had a problem and their digital experience has been compromised.
Shifting AIOps left enables a more proactive approach, where resiliency and auto-remediation scripts are tested before they enter production. One way to do this: engineers can use Keptn, an open-source CNCF project to orchestrate a pre-production environment monitored by the AIOps solution for loading tests, injecting chaos and validating auto-remediation scripts. This is the “shift left” part: By integrating the AIOps solution into this “test-driven operations” environment, you validate the ability of AIOps to trigger auto-remediation scripts in the event of an issue. Rather than the engineers having to script and deploy auto-remediation code after a user has experienced an issue, the AIOps tool can proactively deploy the fix immediately, because it’s already been battle-tested for those scenarios ahead of time.
In my next article, I’ll delve into a couple more examples of how engineers can leverage AIOps the right way, but this use case should hopefully begin to highlight how AIOps, when done right, helps ensure healthy systems in production. Just as test-driven development processes help developers create better quality code, test-driven operations will help engineers maintain more stable production systems and more consistent digital experiences for users, in turn driving more value for the organization overall.
Feature image via Pixabay.
Source: InApps.net
Let’s create the next big thing together!
Coming together is a beginning. Keeping together is progress. Working together is success.