Home
>
Software Development
>
When to Use, and When to Avoid, the Operator Pattern – InApps 2022

March 30, 2022 by Phu Nguyen

When to Use, and When to Avoid, the Operator Pattern – InApps 2022

Main Contents:

When to Use, and When to Avoid, the Operator Pattern – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn When to Use, and When to Avoid, the Operator Pattern – InApps in today’s post !

Applications Aren’t Services

The second misuse of operators is to use it to expose non-Kubernetes applications as Kubernetes services.

“I think better paradigms will come along for how to do that than what operators are doing right now,” Shepherd said. “I’m looking at frameworks like CUE where I think we’ll see the ability to consolidate a bunch of primitives into a higher-level primitive without requiring a lot of custom programming and orchestration logic.”

Rancher’s experience writing controllers for Kubernetes has shown that there are some standard patterns. “They’re largely data in, data out,” Shepherd says.

“If you had a better configuration management like a configuration language, you could largely automate away a lot of the complexity of controllers and operators and whatnot. Because most of the time it’s ‘I want to take this input data, and then I want to render it with a bunch of existing data or produce new data, and then reconcile that state’; it’s a very common pattern.”

Operators don’t turn complex applications into something you can consume as a service, he noted. “It’s a little ideal to think that I’m going to give you this operator and it just automatically does everything for you. No, it’s just going to make it a little bit easier but you’re still operating a very complex system.”

Security Concerns

Although there are a few use cases where operators are so extremely useful that they’re invaluable, they’re also a highly abused concept and it’s not clear that people are thinking through the security implications before diving into writing operators, Alcide founder and Chief Technology Officer Gadi Naor told us. In the majority of cases, he would pass on building operators.

Operators perform tasks that become part of the platform infrastructure; that means a lot of moving parts that human operators need to understand clearly.

“Operators are a high-privilege component: by design, they run persistently inside your cluster and from a security standpoint, that’s introducing risk. Having many of them, which is the current situation in the ecosystem, introduces a lot of risks to your cluster. There are a lot of complexities around building an operator and getting it right and making it run fully autonomous in a way that is bulletproof; that’s a lot of heavy lifting. Why would I want to run an operator which is highly privileged components, either at the cluster level or namespace level when it performs its intent only a very small fraction of the time?”

Needing an operator may be an indication that you’ve created too complex an architecture and Kubernetes isn’t the best solution for the application you’re building, Naor pointed out. You should also consider how it will be consumed.

One test of whether you should write an operator or not would be if this is a one to one relationship where your operator is managing just one instance: there has to be a different way of building sophisticated highly privileged automation.”

“One option is that you can just run jobs or use existing components and of tie them together, instead of building operator that would do the same thing. Instead of writing your own operator, you can declaratively articulate what your operator should do and then a component that is more hardened and well designed can take care of the heavy lifting.”

“Think about what the lifecycle is that you think your operator should manage. If it’s something as generic as backup and restore, probably someone already did that in a generic way that would fit your application. If you’re going to build an operator that sense there’s a new microservice version available and you need to upgrade stuff, GitOps is a pretty generic way of achieving the same thing: you commit that there’s a new version and the GitOps agent will synchronize that to your target environment and everybody’s happy.”

Naor suggested KUDO as a declarative alternative for automating the deployment, installation and lifecycle of complex applications on Kubernetes. “You write in a declarative way something that is equivalent to an operator; not as full blown but it gives good coverage for the majority of use cases. And where you would have run many operators, instead you run just a single one that performs all the heavy lifting.”

Helm and Operators

KUDO bridges the gap between what Helm can do and building the entire dependency tree of an application, Naor explained.

Helm may even do what you need, Shepherd noted. “The Prometheus operator installs a Helm chart and then you get a bunch of CRDs to do things. All an operator is, is a set of controllers so why did I have to make this a first-class concept?” He suggested the pattern of using a Helm chart there the operator is just controllers inside that to deliver more types. “That makes more sense to me than wring an operator that’s a very specialized approach; Helm is a generic approach that applies to both my own and third-party applications. What is the value that I’m getting out of this one-off [operator]?”

While Helm maintainer Matt Butcher has previously rejected the idea of using Helm to perform “operator-like tasks,” more operators are stretching the original definition and acting as custom installers in ways that overlap with Helm. The project is currently discussing how Helm 4 will deal with the fragility of CRDs as cluster-wide shared global resources without sacrificing the usability of Helm, and there are questions about handling the operator pattern as part of this.

That discussion describes CRD handling as “the most intractable problem in Helm’s history” and suggests that the real problem is that “Kubernetes is not yet mature enough” for Helm to be able to deliver “robust support” for CRDs. There’s also concern that needing to understand CRD management to consume a Helm chart safely would change the current assumption of the project that Helm users should not need significant Kubernetes knowledge.

Because writing operators commits you to maintaining the code, you will also have to commit to hiring more developers and a Kubernetes platform team who have skills with operators in the future, to make sure you use of Kubernetes is robust and secure.

Another automation option Naor suggests considering is the OpenKruise project. This project expands on some shortcomings in Kubernetes’ basic scheduling constructs, using the notion of advanced stateful set, an enhanced version of the Kubernetes default stateful set for building stateful services on Kubernetes. Another idea, the sidecar set is a declarative way of injecting sidecars.

“Open source projects building technologies based on sidecars were implementing the same thing over and over again, which is a mutating admission controller that inject sidebars. So they built something that you install once and then declaratively say what to inject and where to inject,” Naor said.

Karl Isenberg, who previously worked with the KUDO team at Mesosphere and was technical lead manager of the PaaS team at Cruise Automation that built Isopd (a YAML-free tool to help manage common resources across clusters before multi-cluster addon management became more sophisticated) compiles a useful taxonomy of Kubernetes app deployment tools that includes both operators and alternatives.

“The ecosystem is still trying to work out the best way to manage Kubernetes workloads: there’s a lot of options and they all have pros and cons,” he told us. With so many ingress and service mesh options around, there’s no standard to build deployment automation on top of, so solutions tend to be non-transferable and limited to a custom stack.”

He noted that similar patterns to operators were found in Mesos, where the requirement for applications and services to have a custom workload scheduler as part of the two-level scheduler led to a mix of generic schedulers for stateless applications and specific schedulers for complex distributed systems like Spark and Cassandra that needed lifecycle management. “Operators built on that. But after a while, they took over and everybody was writing operators to handle custom lifecycles, partially because it became harder to upstream feature changes into the Kubernetes scheduler.”

Improvements to the Kubernetes scheduler may obviate the need for operators in some cases, Isenberg suggested: “When operators emerged people were using their own custom controllers and operators to manage the workflow or lifecycle of their application because they couldn’t customize the scheduler or plugin a custom scheduler. It’s now possible to set annotations on your workload so the primary scheduler ignores it and it gets scheduled by a custom scheduler.”

But custom schedulers are still rare: “I only know of two that aren’t just operators,” he noted.

Sets and Scale

Operators are helpful for workloads with “data weight,” Isenberg suggested. “Data has gravity and it might take more than 30 seconds to evacuate a node, or it might require something more complicated that stateful sets, which were designed for the etcd pattern where you have name nodes and those names aren’t supposed to change, they’re just supposed to come back up. That’s a design that predates cloud native, for bare metal or VMs that didn’t autoscale. Those pre-cloud native deployment designs needed to be adapted to Kubernetes and so people did that a lot with operators.”

If you’re writing a stateless application, you don’t need to write an operator at all. “But nobody just writes a stateless application anymore; everybody’s writing complex microservice distributed systems, and if you have any sort of data pipeline or streaming, you end up with dozens of services and many of those you pull off the shelf and they might come with their own operator because the way somebody who had a more complicated product adapted that to Kubernetes was with an operator.”

Sometimes, what you think is an operator is really a controller, he noted. “The API was designed to be used both declaratively and imperatively, and operators are an imperative way to use Kubernetes. Otherwise, most imperative use is either direct API usage with a controller that doesn’t have its own CRD, or just scripting and your CI/CD workflow. Because more people are using Kubernetes, more people are writing code to exercise the imperative workflows and most of them are calling those operators because they end up with a CRD in them. But sometimes there’s a little bit of name conflation with controllers and they mistakenly get called operators just because it’s trendy.”

Operators are fundamentally a way to get around the lack of dependency management or resource hierarchies in Kubernetes, Isenberg suggested (echoing much of the Helm discussion about CRDs), and neither of those are problems that will be solved quickly so sometimes an operator is the answer.

“Writing an operator isn’t as easy as making a deployment but if a deployment doesn’t work for you and a stateful set doesn’t work for you, it’s not a problem to write an operator. But you’re building a piece of software that you’re going to have to maintain indefinitely and not just a configuration that you can tweak and replace. The more code you write, the more locked into that platform you are. So when you write operators, you’re locked into Kubernetes, whereas a deployment is like a configuration file; you could just throw away the deployment and take your service somewhere else. Gitops is definitely more portable, writing Terraform is more portable.”

Because writing operators commits you to maintain the code, you will also have to commit to hiring more developers and a Kubernetes platform team who have skills with operators in the future, to make sure you use of Kubernetes is robust and secure.

Operators are the ‘break glass’ option for when the resources in Kubernetes aren’t expressive enough for what you need to do, Isenberg noted. But if you’re adapting applications and workloads to run on Kubernetes and operators aren’t already available for them, it’s possible Kubernetes isn’t the best place to run them.

“Everybody wants to maximize their investment in Kubernetes but I feel there’s a little bit of a sunk cost fallacy that goes on. Just because you can run something on Kubernetes doesn’t mean you should. There are some applications that are just easier to manage on VMs because they were built in an era where that’s how they were designed to be deployed and managed.”

Isolate Operators with Side Clusters

If you do decide you need to write operators to do what you need, “You should really think about having something that reduces privileges for the duration that the operator is idle,” Naor suggested.

“Think about having something that externally reprovisions those permissions so that you don’t have something highly privileged running all the time. Or maybe running those operators outside of the cluster and disconnecting and reconnecting them, rather than running them forever. From a security standpoint, it makes more sense to not having everything run in the same place, because if the operator is compromised, then potentially the entire cluster can be compromised.”

Side clusters — a term he coined in analogy to the sidecar pattern — are particularly useful for security and monitoring services and could also improve the security and privilege concerns with operators.

“The recommended way to monitor the Kubernetes audit log is not from inside the cluster, but rather than having an external security operations cluster that connects to the target cluster and perform the monitoring, which means that the cluster is more resilient to a threat actor trying to neutralize your security.

With Prometheus, if you’re running inside the cluster, then the Prometheus instance is susceptible to the cluster conditions, and if the cluster became unstable it would destabilize your monitoring system. This is why you have projects like Thanos and others that are trying to push the monitoring piece outside of the cluster and then just scrape the cluster from the outside world.

From a security standpoint, you have similar sets of challenges. “With some of our customers, where they are running multiple clusters, we are building side clusters that are responsible for operating, in our case, security-related tasks that need to run or be kept outside of the application clusters. If I’m running my operators on a side cluster and regulating the permissions to the code the primary cluster, that can improve the security side of things, though not necessarily the overhead of managing many operators,” Naor said.

And if the namespace and RBAC considerations of hosting operators outside the cluster seem complex, it’s probably another sign that you should take a different approach from operators.

Feature image by Adriano Gadini via Pixabay.

Source: InApps.net

List of Keywords users find our article on Google:

prometheus real estate

workload automation

persistent systems jobs

spark application kubernetes

scale operator jobs

ca workload automation agent

case hardened pattern

docker workload automation

workload scheduler

workload automation services

prometheus labs wikipedia

persistent systems review

building operator jobs

rancher backup

mysql in operator

mysql operator

workload automation solution

spark cassandra

running spark on kubernetes

articulate wikipedia

ca workload automation desktop client

prometheus group jobs

specific gravity wiki

baremetal operator

fallacies wiki

constraints in workflow advanced case management

kudo jobs

the operator

not operator mysql

higher logic app

mysql operators

led monitor wikipedia

gitlab helm

rancher 2.5

backup rancher

rancher cluster autoscaler

operators in mysql

rancher ingress

rancher ingress controller

rancher templates

configuration management lead jobs

persistent systems usa

thanos helm chart

notion two databases side by side

operator offshore

it workload automation and job scheduling

security operations center operator jobs

rancher create namespace

cassandra operator kubernetes

custom helm

install rancher helm

mesos scheduler

the type or namespace image could not be found

the type or namespace name ‘system’ could not be found

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.