At a high-level, the Kubernetes control plane executes a relatively simple control loop: requests to the API are parsed by the Master API service and, if accepted, are stored in etcd. A set of controllers and custom operators watch changes in ectd and take actions to converge the current state to the desired state.

Inside the Master API process, a request goes through several stages before being accepted and stored in etcd. The stages are shown in the diagram below (additional details can be found  here):

The primary steps consists of:

Authentication: authenticates the subject making the request. Several authentication mechanisms can be used at the same time. For an overview of the authentication methods supported in OpenShift, see here.

Authorization: the request is authorized. In Kubernetes, Attribute Based Access Control (ABAC) and Role Based Access Control (RBAC) authentication schemes are available, but in OpenShift, only RBAC is supported. In addition,  it is possible to configure exactly one authorization webhook.

Conversion and Defaulting: The request is converted to the format supported by the specific master API version. Missing fields are defaulted.

Mutating Admission Controllers: mutating admission controllers have the responsibility of modifying the body of the request. Mutating admission controllers are in-code plugins that can be enabled via the master API configuration. OpenShift standard admission controllers are described here. Starting with Kubernetes 1.10, it is now possible to customize the standard behavior by declaratively adding more mutating admission webhooks. These controllers are invoked via a webhook, which called only for CREATE and UPDATE verbs.

Object Schema Validation: validates the JSON object in question. This is based on the API definition and works also for CRD where validation is defined.

Validating Admission Controllers: Validating admission controllers work the same as the mutating variety. The validating admission webhook mechanism is described here in detail.

Etcd Access: the resource is either retrieved, created or updated in etcd.

This new webhook mechanisms allows cluster administrators to declaratively express the intention to add one or more validation steps. Typically, these are fine-grained validations that cannot be managed by the in-code admission controllers.

As this new option became more popular, more admission controllers were being created in the open source space to implement validation use cases that were not previously handled by the platform.

While this was a step forward, in an enterprise environment, an uncontrolled proliferation of admission controllers is organizationally untenable and it introduces inefficiencies as each controller webhook must be called in sequence.

An improved approach would have a centralized location where all additional custom admission policies could be defined and stored and expose a single webhook endpoint for admission of all the requests.

Enter Open Policy Agent

Open Policy Agent (OPA) is a project that aims to provide a general purpose policy definition and decision platform. The architecture looks as follows:


As depicted in the above diagram, the policy enforcement point is not part of OPA. In the OPA model, some external agent will enforce policies for the system that requires protection. This is accomplished by querying OPA for a request compliance. OPA will respond with a pass or a deny result and then it is a responsibility of the agent itself to enforce the decision.

OPA can make contextual decisions for which the context is not just the request being evaluated, but also a database of facts. This approach allows for finer grained policies than one would be able to create from looking at the request alone, and is one of the key strengths of OPA.

Policies and facts in OPA are expressed in the Rego language, a declarative language derived from Datalog, which is designed to perform a query based on a set of rules (policies) from a database of facts.

OPA can be integrated with several platforms (see the get started section of the documentation). Besides the integration with OpenShift, which is described below, the integration with Istio, which is outside the scope in this article, affords the opportunity to create fine grained microservices authorization policies..

OPA Integration with OpenShift

OPA can present itself as both a mutating webhook and validating webhook to OpenShift. In addition, OPA can offer a webhook for the authorization phase. The diagram below illustrates the integration architecture in the Kubernetes Deployment Language (KDL) notation:


Besides the OPA container, the architecture consists of two more components:

  1. Kube-mgmt: this component is responsible for watching for OpenShift resources and loading them in OPA as facts. Kube-mgmt is also responsible for loading policies into OPA. Policies can be specified as configmaps with a specific annotation.
  2. gatekeeper: this component receives the callback calls from OpenShift and translates them into policy queries for OPA.

In addition OPA also offers an endpoint to execute and audit. The audit functionality allows you to run the policy rules on all previously accepted resources. You can then produce a report on the resource that violate policies.

Installation

A Helm chart is available to deploy OPA with the above architecture. You can find detailed instructions here.

Make note that additional manual steps are necessary if you want to enable the authorization webhook.

As a precaution, the Helm chart configures OPA to only evaluate policies for namespaces labeled with the opa-controlled: 'true'. You can modify this constraint later on.

Policies Examples

The repository also contains a library of reusable policies.

In these examples we have focused on admission and mutation policies. We haven’t explored authorization policies or the audit functionality.

Here are some examples:

Latest tag and IfNotPresent image pull policy not allowed together.

The latest tag is also known as a floating tag, and it is supposed to move to new images as they are added to a docker registry repository. The IfNotPresent image pull policy makes OpenShift pull a new image only if an image with the same name and tag is not already present in the local container runtime image storage. When used together, one can enter situations where a new image has been released, but it’s not being pulled by the container runtime. Given these undesirable results, it may make sense to not allow this combination of settings. An OPA policy to enforce this behavior would look like the following:

validate_containers(containers) {
  containers.imagePullPolicy
  containers.imagePullPolicy = "IfNotPresent"
  endswith(containers.image,":latest")
}

deny[{
   "id": "pods-imagepullpolicy-latest",
   "resource": {"kind": "pods", "namespace": namespace, "name": name},
   "resolution": {"message": "image pull policy and image tag cannot be respectively IfNotPresent and latest at the same time"},
}] {
  matches[["pods", namespace, name, matched_workload]]
  containers := matched_workload.object.spec.containers[_]
  validate_containers(containers)
}

Setting a quota on Loadbalancer type of services.

LoadBalancer services, which are typically served by cloud providers, are a billed resource. Having several allocated for a long period of time can become expensive. It may be beneficial to have a policy to limit the number of LoadBalancer services that a project can create. Here is an example of how to implement that use case:

deny[{
   "id": "loadbalancer-service-quota",
   "resource": {"kind": "services", "namespace": namespace, "name": name},
   "resolution": {"message": "you cannot have more than 2 loadbalancer services in each namespace"},
}] {
  service := data.kubernetes.services[namespace][name]
  loadbalancers := [s | s := data.kubernetes.services[namespace][_];    s.object.spec.type == "LoadBalancer"]
  2 < count(loadbalancers)
}

CMDB Integration

Especially in enterprise environments, there might be the need to track OpenShift workloads in a Configuration Management Database (CMDB) for compliance reasons. One strategy to meet this requirement is to label workloads deployed in OpenShift with metadata so that it’s possible to reference these resources. Often, additional fields are also added as labels (for example: tier or emergency contact). To make sure that all the workloads are deployed with the required labels, we can use the following policy:

 

required_labels = ["cmdb_id", "emergency_contact", "tier"]

deny[{
   "id": "cmdb-labels",
   "resource": {"kind": "deployments", "namespace": namespace, "name": name},
   "resolution": {"message": "all deployments must have the cmdb_id, emergency_contant an tier labels"},
}] {
  matches[["deployments", namespace, name, matched_deployment]]
  l := required_labels[_]
  not check_labels(matched_deployment.object.metadata.labels, l)
}

check_labels(obj, key) {
 obj[key]
}

Enforcing software licenses

There are several ways to license commercial software (per seat, per instance, per core, etc…), but they consist of some counter that needs to be lower that the licence count. It can be relatively easy to help ensure that a project is compliant with a license count, but how can compliance be managed at the cluster level? Assuming the license count be can be ascertained from the workload description, an OPA policy can be used that enforces restraint above the maximum value allowed. In this example, imagine the hypothetical scenario that the software is licenced by CPU core and that the workload can be identified via the image name, then compliance can be enforced with the following policy:

default max_cpu_requests = 500
default licensed_image = "myrepo/myimage:v3.2"

deny[{
"id": "software-license",
"resolution": {"message": sprintf("we cannot have more than %v total cpu core for the %v workload", [max_cpu_requests, licensed_image])},
}] {
    containers := [c | c := data.kubernetes.pods[_][_].object.spec.containers[_]; c.image == licensed_image]
    container_millicore_requests := [s | num := containers[_]; s = process_millicore_cpu(num.resources.requests.cpu) ]
    container_core_requests := [s | num := containers[_]; s = process_core_cpu(num.resources.requests.cpu) ]
    total_requests := sum(container_millicore_requests) + sum(container_core_requests)
    total_requests > max_cpu_requests
}

process_millicore_cpu(obj) = millicore_cpu_result {
   re_match("m$",obj)
   regex.split("m$", obj, parsed_obj)
   to_number(parsed_obj[0],int_obj)
   millicore_cpu_result = int_obj / 1000
}

process_core_cpu(obj) = core_cpu_result {
   not re_match("m$",obj)
   to_number(obj,int_obj)
   core_cpu_result = int_obj
}

Preventing mounting the service account secret

By default, all pods deployed to OpenShift will mount secrets associated with the service account specified. Arguably, the default should be the opposite for the following reasons:

  1. Most of the applications do not need to talk to the master API and therefore do not need that secret.
  2. Currently, the mounted secret represents the identity of the service account and not of the workload. Multiple workloads can run under the same service account and will have the same identity. Also, service account tokens are very hard to revoke.  A redesigned service account model will be made available starting in Kubernetes 1.13, but for now, it’s more secure to not mount that secret if it is not strictly necessary.
  3. The service account secret makes the kubelet poll the master API, checking for a secret update. This constant poll creates unnecessary load on the master API.

A better design would be to not have that secret mounted unless explicitly requested by the user (for example with an annotation: requires-service-account-secret: “true”). Here is an example of how such a policy would work look:

default no_sa_annotation = "requires-service-account-secret"

deny[{
  "id": "no-serviceaccount-secret",
  "resource": {"kind": "pods", "namespace": namespace, "name": name},
  "resolution": {"patches":  p, "message" : "service account secret not mounted"},
}] {
  matches[["pods", namespace, name, matched_pod]]
  isCreateOrUpdate(matched_pod)
  isMissingOrFalseAnnotation(matched_pod, no_sa_annotation)
  p = [{"op": "add", "path": "/spec/automountServiceAccountToken", "value": "false"}]
}

Conclusion

OPA seems like a promising approach to manage fine grained policies for OpenShift. The project is still evolving, so expect changes as the open source community matures the solution.

While there is some initial learning curve to become proficient at writing policy rules in rego, once mastered, the language seems to be suitable for the purpose of writing these types of policy rules.


About the author

Raffaele is a full-stack enterprise architect with 20+ years of experience. Raffaele started his career in Italy as a Java Architect then gradually moved to Integration Architect and then Enterprise Architect. Later he moved to the United States to eventually become an OpenShift Architect for Red Hat consulting services, acquiring, in the process, knowledge of the infrastructure side of IT.

Read full bio