Let’s face it: Networks are not always reliable. Your datacenter robot accidentally disconnects an ethernet cable. A UPS goes bad and you lose your switch. Any number of things can interfere with your application continuity. In the world of bare metal and virtual machines, testing that interference is well understood. Unplug the cable or disconnect the virtual NIC, and you have  simulated an outage. But what can you do for more nuanced circumstances like general network latency? What if we are not even running on bare metal or virtual machines but instead on your favorite Kubernetes distribution, OpenShift? Testing these conditions in pods can get much more subtle and confusing when you consider the number of layers that make up the entirety of the OpenShift networking stack. In this blog post, we will show you how to use the Traffic Control (tc) utility and initContainers on OpenShift 4.7 (and earlier) to introduce latency on a pod’s virtual interface. This improves engineering in lower environments to help your application remain resilient to local bandwidth issues, or after a swapover for a disaster recovery or continuity-of-business event.

A Brief Introduction to tc

tc is a utility used to configure Traffic Control in the Linux kernel. Traffic Control allows you to shape, schedule, police, and drop traffic by using three kinds of objects: qdiscs, classes, and filters. We will focus on qdiscs as the queue mechanism to introduce a traffic delay.

The command we will use to introduce the delay is:

tc qdisc add dev eth0 root netem delay 1s

What is that command actually doing? It is modifying the queuing discipline (qdisc) and adding a new rule to device eth0 (add dev eth0) on the root (egress qdisc) by emulating a WAN property (netem) with a delay of one second (delay 1s).

Setting up the Sandbox

For our sandbox, we will be using OpenShift 4.7, but these steps have been tested on OpenShift 4.6 and 4.5. This has also been tested with both ubi7-minimal and ubi8-minimal images. All setup on OpenShift was done with a user that has system:admin permissions.

Part 1: OpenShift Setup

First, we will create a project named “latency” and deploy the httpd-example template. Using the httpd-example template is not required; any Deployment/DeploymentConfig should work, but it does give us a good baseline for testing. We will also create a ServiceAccount to which we can attach the tc-scc Security Context Constraint (SCC) that we will create:

$ oc new-project latency
$ oc new-app httpd-example
$ oc create serviceaccount tc-sa

We will also need to create a custom SCC to allow the tc container the right permissions to actually change the pod interface. In this case, we need the NET_ADMIN capability granted by the following SCC:

$ cat tc-scc.yaml
kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
 name: tc-scc
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowHostNetwork: false
 type: MustRunAsNonRoot
 type: RunAsAny
- configMap
- downwardAPI
- persistentVolumeClaim
- projected
- secret

$ oc apply -f tc-scc.yaml

# Add tc-scc to the tc-sa ServiceAccount
$ oc adm policy add-scc-to-user tc-scc -z tc-sa

One note of caution on this SCC is that generally NET_ADMIN is too broad of a permission to allow, except in circumstances where it is absolutely necessary. Even in those cases, it is safest to use something like a seccomp profile to reduce the available syscalls to the containers using this SCC. In this SCC, we are mitigating some of the risk by setting allowHostNetwork: false (the default), enforcing that the container must run as non-root, and keeping this SCC only in a development environment.

Next, we need a place to put our image once it is built. In this sandbox, we have exposed the internal image registry, so references to the registry from here forward will be our sandbox cluster. You can use whatever image registry you would like as long as OpenShift can reach it. In this case, we will just create an ImageStream where we can push the image:

$ oc create imagestream tc

There is one final step that we need to take to allow the tc container to run. Since tc is manipulating kernel functionality, it will need kernel modules enabled depending on what subcommand you need to execute. In this case, we need the sch_netem kernel module. We will use a MachineConfig to put a file in /etc/modules-load.d/ which contains the name of our module.

First we need to get the base64 representation of the file contents to include in the MachineConfig:

$ echo -n "sch_netem" | base64 

Then we can create the MachineConfig YAML file and include the base64 we generated on the source: line:

$ cat mc.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
 name: 99-worker-modprobe-sch-netem
   machineconfiguration.openshift.io/role: worker
     version: 3.2.0
     - contents:
         source: data:text/plain;charset=utf-8;base64,c2NoX25ldGVt
         verification: {}
       path: /etc/modules-load.d/sch_netem.conf

To wrap up this step, all you need to do is apply the YAML to the cluster with oc apply -f mc.yaml and wait for the machine-config-operator to apply it to your worker nodes.

Part 2: Creating the tc Image

Before we can affect any change on a pod, we will need to get tc into a container image. We will be using podman to accomplish this.

First, create a Dockerfile with the following content (which may need to be adjusted if you are pulling images from a non-Red Hat repository):

$ cat Dockerfile
FROM registry.redhat.io/ubi8-minimal
RUN echo -e "\
[rhel8-3] \n\
" > /etc/yum.repos.d/rhel8-3.repo
RUN microdnf -y install --enablerepo=rhel8-3 --nodocs iproute-tc iputils procps-ng && microdnf clean all
RUN setcap cap_net_admin+ep /usr/sbin/tc
USER 1000

Second, build the image and verify:

$ podman build -t tc-ubi8-minimal -f Dockerfile
$ podman images | grep tc-ubi8-minimal

Lastly, we need to tag and push the image to a registry. We will push the image to the tc ImageStream we created in Part 1:

$ podman tag localhost/tc-ubi8-minimal cluster.example.com/latency/tc
$ podman push cluster.example.com/latency/tc

Slow It Down!

Now that all of the setup work is done, we can put our creation into practice. Introducing networking latency at a fundamental level in the pod has very noticeable effects, so before making any changes, we should develop a baseline. We will do this using file download time as a metric by making a curl request to https://access.redhat.com:

$ oc rsh dc/httpd-example
sh-4.4$ curl https://access.redhat.com -o redhat.html
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  114k  100  114k    0     0   266k      0 --:--:-- --:--:-- --:--:--  266k

We can see in the output that there’s virtually no lag in downloading the file. Now that our baseline is established, we can move forward with introducing the tc container to the DeploymentConfig. Edit the DeploymentConfig with oc edit dc, then under the spec.template.spec key add the following YAML:

     serviceAccount: tc-sa
     - command: ["tc","qdisc","add","dev","eth0","root","netem","delay","1s"]
       name: tc
       image: image-registry.openshift-image-registry.svc:5000/latency/tc
           - NET_ADMIN

There are two key pieces of information here that are influenced by the YAML hierarchy. The first is the serviceAccount; that is added to the spec to allow the pod to leverage the SCC that we created and linked to the tc-sa service account. The second is initContainers; this stanza tells the deployment to run the tc container before the other container(s) in the DeploymentConfig. We define the command that we want to run, a name for the container, and the image to use, and we are requesting that the NET_ADMIN capability be added to the container.

After saving your changes, a new deployment will happen automatically. You will notice that there is now an init step during the deployment and also a general slowdown with the deployment compared to the initial deployment. Now we will collect our metrics again to show the speed decrease:

$ oc rsh dc/httpd-example
sh-4.4$ curl https://access.redhat.com -o redhat.html
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  114k    0  114k    0     0   8857      0 --:--:--  0:00:13 --:--:-- 19205

Now we can see that it took around 13 seconds to download. Clearly, this is a measurable decrease in overall speed due to the latency introduced by the tc container. As an alternative measure, you can run a debug container with elevated permissions and use ping to see the before and after of using the tc command. The output has been snipped for readability, but you can see that after running the tc command, there is an additional 1000ms of latency on the pings:

$ oc debug dc/httpd-example -c tc --as-root
sh-4.4# ping access.redhat.com -c 3 | head -n 4 | tail -n 3 | cut -d " " -f 8

sh-4.4# tc qdisc add dev eth0 root netem delay 1s

sh-4.4# ping access.redhat.com -c 3 | head -n 4 | tail -n 3 | cut -d " " -f 8

What Next?

The example we have shown is relatively simplistic, but the tc utility has many more capabilities that can be leveraged to influence traffic. It can shape, drop, and filter traffic among many other more granular activities. Check out the documentation for your version of tc to see all of the capabilities available.

If you like using tc, you could explore creating a privileged daemonset with a web-based interface to control traffic to specific nodes or pods. Or, if you are like me and want to know that you have the power of Red Hat Support behind you, consider using the traffic management capabilities of OpenShift Service Mesh or split traffic between application revisions with OpenShift Serverless.


Kubernetes, How-tos, OpenShift 4, Linux

< Back to the blog