Today, I’d like to talk a bit more about some of the base Kubernetes functions and some of the specific features available on Red Hat OpenShift to support those functions.

Following a previous blog that I wrote on Kubernetes and the Red Hat platform built around it called OpenShift, I wanted to expand a bit more on some of the basic concepts around cluster and application scalability in Kubernetes.

All right, so how do we size an application in Kubernetes?

One of the questions that keeps coming back as part of my job is how to scale workloads in Kubernetes. Most people in our industry have a good understanding of scaling applications in a bare-metal or virtualized environment (typically by providing more resources to the VM or server running the application, also called Vertical Scaling) but are not always very familiar with how to do this for Kubernetes.

As we saw in the previous blog, applications in Kubernetes are effectively made of a group of PoDs distributed on the various worker nodes (please see the previous blog for more details on worker nodes and PoDs).

Scaling in Kubernetes is handled using a Replication Controller or a Replica Set (There are some subtle differences between the two, but they effectively provide the same function, such as controlling how many PoDs should be running in the environment based on metrics or design decisions). This replication controller is to check that a specified number of PoD replicas is running at all times. As always with Kubernetes, a Replication Controller or Replica Set can be entirely described using a YAML file. An example of a Replication Controller resource (for example, a YAML file) is just below.

As you can see, the Replication Controller describes how many replicas of the PoDs will be deployed (line 6), as well as the container specifications (lines 14-19).

One of the limitations of Replication Controllers though is that they do not provide the ability to rollout changes and roll them back if necessary. So for example, if a user needed to deploy an update to an application and then, for whatever reason, needed to rollback to the previous version of this application, this would require a fair bit of a manual processing.

This is where Deployments come in.

With a Deployment, it is possible to:

  • Create a deployment (for example, deploying an application)
  • Update a deployment (for example, deploying a new version)
  • Do rolling updates (zero downtime deployments)
  • Rollback to a previous version
  • Pause/Resume a deployment (for example, to roll out to only a certain percentage)
  • Scale up or down the deployment to facilitate more load (for example, increase or decrease the number of PoDs for this application).

Under the hoods, Deployments make use of Replica Sets.

To make it more practical, I’ll use an example taken from one of the workshops I run. One of the applications in this workshop, os-toy, allows the user to play with Kubernetes and discover various aspects of the technology.

One part of the tutorial asks the student to use a Deployment to create an application. The following is an extract of the os-toy deployment YAML file:

As you can see from the YAML file, the Deployment contains all the information required to describe the application, for example, how many PoDs to run (line 11), what container image to use (lines 17-30), the ports associated for the containers (lines 22-23), and the resources associated with the containers (lines 24-30).

If we use the command line, when we display the Deployment, we can see that it refers to the Replica Set, and when we display the PoDs, those PoDs carry the name of the RS (cf8bfb4c):

Now, back to the point of this blog (specifically,  scaling applications): By issuing the following command, it is possible to manually increase the number of PoDs for this application (please note,all the PoDs are following the naming used for the RS with the cf8bfb4c):

However, a manual process to scale up and down applications is not really convenient, so to solve this problem and automate such capability, Horizontal PoD Autoscalers (HPA) have been introduced in Kubernetes.

The idea with those HPAs is to provide an automated loop mechanism that monitors the health of every PoD for a specific deployment, and then, based on metrics utilization (like CPU or any other metric defined), decides to either increase or decrease the number of PoDs.

This is shown in the following figure (taken from here):

So going back to our os-toy application and associated deployment, let’s create an HPA for it:

By clicking on the link in the os-toy application, we then generate load for this application (for example,  increase the CPU usage for it), and based on the HPA, we then see multiple PoDs running there. I have included both the cli (for example, running the kubectl command) as well as the OpenShift web UI: 

How about the Kubernetes cluster itself?

Now, it’s great that Kubernetes can dynamically scale up and down the PoDs using Deployments, Replica Sets, and HPAs. However, for this feature to be useful, the underlying infrastructure must also be able to follow a similar pattern. Otherwise, we fall back into the same way of handling infrastructure, or  overprovisioning by typically designing it to scale to peak workload conditions that may happen only a few times a day, a week, a month, or a year.

Similarly to the notion of Replica Sets, in OpenShift, we now also define Machine and Machine Sets. Machines represent hosts (think EC2 instances in AWS, for example), while Machine Sets group “machines” together (think EC2 instance type in AWS).

Those Machine Sets are used to scale the number of machines up or down (similar to what Replica Sets are to PoDs). This operation of increasing or decreasing the number of machines can be done manually, but again, for automation Machine Autoscalers (which define a number of min-max machines within a Machine Set) are used.

Finally, to make sure that there is an overall control for the size of the Kubernetes cluster, a cluster autoscaler is then used to define the overall min-max for all the Machine Autoscalers.

This is presented in the figure below:

If you want to see some of those elements in action, I have uploaded several short videos (around 2-3 minutes) demonstrating some of the basic capabilities presented in this section.

You can have a look at:

What’s next?

In this blog, we’ve looked at how Deployments are used in Kubernetes to control the number of PoDs for a specific application and how they can be automatically scaled up and down using Horizontal PoD Autoscalers (HPAs).

Similarly, to support the dynamic nature of those applications, we have covered how Machines, Machine Sets, and Cluster AutoScalers can be used in OpenShift to also scale the infrastructure automatically up and down.

For those of you who are interested in trying the workshop, I have added a link (you can do most of it yourself).