- Home
- >
- DevOps News
- >
- LitmusChaos and Argo Bring Chaos Workflows to Kubernetes – InApps 2025
LitmusChaos and Argo Bring Chaos Workflows to Kubernetes – InApps is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn LitmusChaos and Argo Bring Chaos Workflows to Kubernetes – InApps in today’s post !
Key Summary
This article, presented at KubeCon+CloudNativeCon North America 2020, highlights LitmusChaos, an open-source chaos engineering framework for Kubernetes, and its integration with Argo, a GitOps-oriented CI/CD tool, to create scalable chaos workflows. Umasankar Mukkara (MayaData) and Sumit Nagal (Intuit) explain how these tools enhance application resiliency in cloud-native environments. Key points include:
- Why Chaos Engineering for Kubernetes?:
- Rationale: 90% of an application’s resiliency depends on other cloud-native applications, making chaos engineering critical to identify weaknesses.
- Process: Introduces random faults into a steady-state system to test stability, revealing vulnerabilities if the system fails to maintain stability.
- LitmusChaos Overview:
- Framework: A CNCF sandbox project (joined June 2020) providing CustomResourceDefinitions (CRDs) for declarative chaos orchestration on Kubernetes clusters.
- Capabilities: Supports chaos at infrastructure, application, and node levels (e.g., CPU, memory, disk, network). Includes 22 generic experiments like pod delete (most popular), container kill, network latency, and disk fill.
- ChaosHub: Offers “off-the-shelf” chaos experiments for easy onboarding in three steps, extensible via the Litmus SDK for custom “bring your own chaos” contributions.
- Integration with Argo:
- Purpose: Combines LitmusChaos with Argo (2020 CNCF sandbox) to create scalable chaos workflows within GitOps pipelines, consolidating experiment results.
- Workflow: Declarative chaos configurations are stored in Git, executed via Argo, and metrics/events are uploaded to Prometheus for monitoring.
- Benefits: Simplifies automation, supports complex scenarios, and integrates seamlessly with CI/CD pipelines.
- Intuit’s Implementation:
- Scale: Intuit’s Developer Platform manages 2,500 services across 230 Kubernetes clusters with 4,000 developers.
- Approach: Since February 2020, Intuit’s reliability team developed a Litmus plug-in infrastructure using CRDs, role-based access control, and Jenkins pipelines to target specific applications/namespaces.
- Execution: Chaos experiments are embedded in containers, defined in Git, and executed via Argo workflows, triggered by Jenkins, ensuring predictable and automated testing.
- Benefits:
- Cost savings through optimized resource use.
- Enhanced reliability by combining chaos and performance testing.
- Simplified onboarding and lifecycle management.
- Confidence in predictable, automated chaos execution.
- Why Argo Workflows?:
- Advantages: Single YAML configuration simplifies management across numerous clusters, avoiding multiple YAML files. Fits Intuit’s CI/CD pipeline, supporting infrastructure as code and automation.
- Outcomes: Enables complex scenario testing, stateless chaos/performance execution, and rapid self-service adoption.
- InApps Insight:
- LitmusChaos and Argo provide a powerful combination for implementing chaos engineering in Kubernetes, ensuring robust, resilient cloud-native applications.
- InApps Technology can leverage these tools to enhance client projects, integrating chaos workflows into CI/CD pipelines to proactively identify and mitigate system vulnerabilities, improving reliability and performance.
Read more about LitmusChaos and Argo Bring Chaos Workflows to Kubernetes – InApps at Wikipedia
You can find content about LitmusChaos and Argo Bring Chaos Workflows to Kubernetes – InApps from the Wikipedia website
Honeycomb is sponsoring InApps’s coverage of Kubecon+CloudNativeCon North America 2020.
“Why do chaos engineering for Kubernetes? It’s because your application resiliency depends on other cloud-native applications.”
This is how the Chief Operating Officer of cloud native storage software provider MayaData, Umasankar Mukkara, began his talk at KubeCon+CloudNativeCon last week. In fact, he quoted that 90% of the average application’s resiliency is reliant on other applications.
He described the stabilizing effect of chaos engineering as the process of introducing a random fault into your system that is running at a steady state. If it remains steady, you’re good. If you not, you’ve found a weakness.
Mukkara was joined by Sumit Nagal, principal engineer at Intuit. Both are maintainers of LitmusChaos, an open-source cloud native chaos engineering framework for Kubernetes, LitmusChaos, which entered the Cloud Native Computing Foundation sandbox last June. They presented how Intuit, as a CNCF end user, uses LitmusChaos to manage and orchestrate cloud native experiments, including creating DevOps chaos workflows.
Litmus’s Declarative Flavor of Chaos Engineering
LitmusChaos provides custom APIs via CustomResourceDefinitions or CRDs to orchestrate chaos on Kubernetes clusters.
LitmusChaos “works in cloud native, totally declarative way,” Mukkara said, which means it allows you to define chaos like a custom resource within Kubernetes. This customizability works the same at the infrastructure level, at the application level, and within Kubernetes nodes, as well as other resources inside the node like memory, CPU and discs.
He went on to say that “Litmus provides all that’s required to run chaos engineering at scale across your enterprises.”
This includes the ChaosHub, which allows even the user with limited experience to introduce “off-the-shelf” chaos into their systems, onboarding in three simple steps. It now includes 22 generic experiments:
- Pod delete
- Container kill
- Pod network latency
- Pod network loss
- Pod CPU hog
- Pod memory hog
- Disk fill
- Disk loss
- Node CPU hog
- Node memory hog
- Node drain
- Kubelet service kill
- Pod network duplication
- Node taint
- Docker service kill
- Pod autoscaler
- Service pod — application
- Application service
- Cluster pod — kiam
- Pod IO stress
- Node IO stress
Pod delete is by far the most popular chaos template, while memory and service reliability are also used often.
Mukkara said that LitmusChaos is highly extensible and users can use the Litmus SDK in BYOC “bring your own chaos,” which they are then encouraged to contribute to the project.
Litmus uses another 2020 CNCF sandbox addition, the Argo GitOps-oriented Cloud Native Continuous Integration/Continuous Deployment (CI/CD) tool to create chaos workflows at scale, allowing consolidation of the results of different experiments.
Mukkara explained that “Because this entire workflow is configured declaratively you can practice chaos engineering using GitOps” where “you set up a chaos workflow which results in a set of chaos metrics and events, which are uploaded to Prometheus,” which is a CNCF monitoring tool.
These chaos workflows were initiated by the Inuit team in order to execute chaos while simulating other workload behavior in parallel.
Intuit Applies Litmus Chaos Workflows to a DevOps Pattern
The Intuit Developer Platform has 4,000 software developers with 2,500 services on 230 clusters — and growing. The reliability team, which Nagal leads, has been working with chaos engineering for about three years now.
Nagal and Mukkara began their Litmus proof of concept last February. In October they open sourced the Litmus plug-in infrastructure and the Litmus Python and Argo workflow, which includes the Argo Workflow, performance and chaos with Argo, and the Argo workflow via Jenkins.
At Intuit, the team built a plugin infrastructure where all their work was done by custom resources. They are using role-based access control to target specific applications and Kubernetes specific namespaces. All of this data is then pushed to various monitoring and observability tooling, executed by the company’s Jenkins pipeline. The chaos operator will look for the custom resources.
And then these experiments were embedded within containers. They write their chaos experiment tests and put them in the Argo workflow, written in Git and integrated with Jenkins. Then Argo executes the workflow, picking the specified experiment and launching the experiment.
Why use these unique workflows instead of something like a YAML?
Nagal said, “Logically speaking if you really want to execute everything as part of pipeline, many scenarios, it becomes very challenging. So automation was one thing. Now Argo workflow, everything is coming as a one YAML where we can just use one of the parameters to the go-submit.”
He went on to say that since everything is code, you don’t have to maintain the different kinds of YAML across their hundreds of software clusters. It also fits right into the Intuit CI/CD pipeline with automation and infrastructure as code.
Nagal continued to list the benefits of the Litmus-Argo workflow including:
- Cost savings with optimum resource utilization
- Reliability with chaos for performance
- Ability to build complex scenarios
- An ease with self-service rapid onboarding
- Covers the whole lifecycle
He says it also allows you to not only getting the statelessness of the chaos but the statelessness of the performance.
“As this whole execution is happening in a manner that is very predictable, it brings a lot of confidence in the whole set-up,” Nagal said.
InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.
Source: InApps.net
Let’s create the next big thing together!
Coming together is a beginning. Keeping together is progress. Working together is success.