In 2016, CoreOS coined the term, Operator. They started a movement about a whole new type of managed application that achieves automated Day-2 operations with a user-experience that feels native to Kubernetes.
Since then, the extensions mechanisms that underpin the Operator pattern, have evolved significantly. Custom Resource Definitions, an integral part of any Operator, became stable, got validation and a versioning feature that includes conversion. Also, the experience the Kubernetes community gained when writing and running Operators accumulated critical mass. If you’ve attended any KubeCon in the past 2 years, you will have noticed the increased coverage and countless sessions focusing on Operators.
The popularity that Operators enjoy, is based on the possibility to achieve a cloud-like service experience for almost any workload available wherever your cluster runs. Thus, Operators are striving to be the world's best provider of their workload as-a-service.
But what actually does make for a good Operator? Certainly the user experience is an important pillar, but it is mostly defined through the interaction between the cluster user running kubectl and the Custom Resources that are defined by the Operator.
This is possible with Operators being extensions of the Kubernetes control plane. As such, they are global entities that run on your cluster for a potentially very long time, often with wide privileges. This has some implications that require forethought.
For this kind of application, best practices have evolved to mitigate potential issues, security risks, or simply to make the Operator more maintainable in the future. The Operator Framework Community has published a collection of these practices: https://github.com/operator-framework/community-operators/blob/master/docs/best-practices.md
They are covering recommendations concerning the design of an Operator as well as behavioral best practices that come into play at runtime. They reflect a culmination of experience from the Kubernetes community writing Operators for a broad range of use cases. In particular, the observations the Operator Framework community made, when developing tooling for writing and lifecycling Operators.
Some highlights include the following development practices:
- One Operator per managed application
- Multiple operators should be used for complex, multi-tier application stacks
- CRD can only be owned by a single Operator, shared CRDs should be owned by a separate Operator
- One controller per custom resource definition
As well as many others.
With regard to best practices around runtime behavior, it’s noteworthy to point out these:
- Do not self-register CRDs
- Be capable of updating from a previous version of the Operator
- Be capable of managing an Operand from an older Operator version
- Use CRD conversion (webhooks) if you change API/CRDs
There are additional runtime practices (please, don’t run as root) in the document worth reading.
This list, being a community effort, is of course open to contributions and suggestions. Maybe you are planning to write an Operator in the near future and are wondering how a certain problem would be best solved using this pattern? Or you recently wrote an Operator and want to share some of your own learnings as your users started to adopt this tool? Let us know via GitHub issues or file a PR with your suggestions and improvements. Finally, if you want to publish your Operator or use an existing one, check out OperatorHub.io.