Kubernetes 1.8 was released recently, and plenty of exciting features have been introduced or graduated to beta or general availability. In this post, I’m going to report on two features I’ve been working on and that shipped with Kubernetes 1.8.
My adventure with Kubernetes development started two and a half years ago when my first proposal was presented to the public. It was about implementing a job resource. It started from my read of the Borg paper where batch tasks co-existed next to long-running services, serving as a one-time execution environment. Although my original proposal combined what is now known as
CronJob into one, the initial reviewers suggested we should keep the two separate. This is how
CronJobs were born. It’s also worth mentioning that initially they were named
Because the majority agreed to build
CronJobs on top of
Jobs, the latter took most of my time and were released in Kubernetes 1.1 in September 2015. Finishing that part of the story was definitely one of the best birthday presents I could get :)
The simplest example that allows running a job is the following:
$ kubectl run pi --image=perl --restart=OnFailure -- perl -Mbignum=bpi -wle 'print bpi(2000)'
This will create a very simplistic job which will run one pod computing pi to 2000 places and print it out.
CronJob implementation was released a year later, as part of Kubernetes 1.4—again in September 2016, another birthday present. Although the API resources were available already in 1.3, there’s some “mystery” behind
CronJobs that they apparently always slip releases.
The simplest example, allowing to run a
$ kubectl run pi --schedule="0/5 * * * ?" --image=perl --restart=OnFailure -- perl -Mbignum=bpi -wle 'print bpi(2000)'
This, in turn, creates a
CronJob which will spin up a new job every 5 minutes. The result of running this job is exactly the same as before.
Again, a year later in September 2017, we’ve finally migrated
CronJobs to beta. Maybe the September release is the release where all jobs-related resources should migrate, and looking at the past two years I’m willing to make that a rule.
In the meantime, many people reported feedback and greatly helped improved both
CronJobs. The main new features that landed as part of 1.8 release are:
Job, it's the failure policy, allowing us to specify how many failures a job can have before marking it failed, with a default value of 6. Previously, a job controller would retry a job indefinitely, causing a lot of problems—see kubernetes/kubernetes#30243, kubernetes/kubernetes#43512, socialwifi/kubepy#18, and this tweet.
batch/v1beta1, which guarantees a higher level of stability of the API. With that, we’ve changed successful and failed history limits to 3 and 1, accordingly.
batch/v2alpha1version and entirely remove support for the old
There is a lot of work needed to improve cronjobs in the upcoming releases. One of them is being able to manually kick off a cronjob, as well as, most importantly, improve the controller implementation.
If you’re interested in the development here, I suggest you keep an eye on Issue 19 and propose any improvements you find useful for cronjobs.
Because of my previous experience in writing auditing systems, I was tasked with writing a simple add-on to OpenShift allowing a cluster administrator to log user actions. A few weeks later a similar request appeared in Kubernetes, so I moved the logic upstream.
With this addition, I started gathering feedback about further improvements of this feature. I put together a proposal for advanced auditing and talked about it during this year's KubeCon EU. The initial implementation of this feature happened in the 1.7 timeframe as an alpha feature. Yet, 1.8 promotes this to beta and additionally expands it with the following:
- Improved policy rules, which now allows specifying resource and sub-resource matching, as well as omitting stages every request goes through.
- Adding new output format (JSON), when logging to a file.
- Failed authentication requests logging, previously only authenticated requests were logged.
Although a lot has changed since the initial implementation and the current state of audit, the inner workings of it are still the same. Audit is yet another filter in the chain every request has to go through; see the section Request Flow and Processing in this API Server deep dive post for details about what a filter is.
Keep an eye on Issue 22 for the timeline and improvements to be implemented for audit.
To set up advanced audit in your cluster, all you need is a simple policy file, which can look like this:
- level: Metadata
This logs all audit events at the request level. There are four possible levels:
None... don’t log events that match this rule.
Metadata... log request metadata (requesting user, timestamp, resource, verb, etc.) but not request or response body.
Request... log event metadata and request body but not response body.
RequestResponse... log event metadata, request, and response bodies.
Next, you’d need to specify following flags when launching the API server:
--audit-policy-file, the Manifest file defined above
--audit-log-path, which is the location to place the audit logs
With the above setup, you should now be able to see basic information about every request hitting the API server. If you’re interested about how to tweak the advanced audit to squeeze more information out of it you should check out the official docs.