This post updates the previous version based on OpenShift 3.9 with relevant changes for OpenShift 3.10. Notably, the device manager API is marked as GA in OpenShift 3.10.

Introduction

This blog post will show how to use NVIDIA GPUs in OpenShift 3.10.  We start with a description of the environment, then show how to setup the host.  Host setup includes driver and container runtime hook installations, both required to use NVIDIA GPUs with OpenShift and Kubernetes. Next, in part 2, we show how to install and enable the NVIDIA GPU device plugin on an OpenShift cluster.

Environment Overview

  • Red Hat Enterprise Linux 7.5, CentOS PostgreSQL 10 image
  • OpenShift Container Platform 3.10 Cluster running on AWS
  • Container Runtime: crio-1.10.5 or docker-1.13.1
  • Container Tools: podman-0.6.1, buildah-1.1, skopeo-0.1.30
  • Master node: m4.xlarge
  • Infra node: m4.xlarge
  • Compute node: p3.2xlarge (One NVIDIA Tesla V100 GPU, 8vCPUs and 61GB RAM)

Host Preparation

NVIDIA drivers for RHEL must be installed on the host as a prerequisite for using GPUs with OpenShift. Let’s prepare the host by installing NVIDIA drivers and NVIDIA container enablement. The following procedures will make containerized GPU workloads possible in OpenShift, leveraging the Device Plugin feature in OpenShift 3.10.
The yaml and configuration files used for this blog can be found at https://github.com/redhat-performance/openshift-psap/tree/master/blog/gpu/device-plugin

Checkout the needed files, we will refer them as device-plugin/<file> throughout the blog.

# git clone https://github.com/redhat-performance/openshift-psap

Part 1: NVIDIA Driver Installation

NVIDIA drivers are compiled from source. The build process requires the kernel-devel package to be installed.

# yum install kernel-devel-`uname -r`

The xorg-x11-drv-nvidia package requires DKMS package. DKMS is not supported or packaged by Red Hat. Work is underway to remove the NVIDIA driver requirement on DKMS for Red Hat distributions. DKMS can be installed from the EPEL repository.

First install the epel repository.

# yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

The newest NVIDIA drivers are located in the following repository.

# yum install -y https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-9.2.88-1.x86_64.rpm

Auxiliary tools and libraries are contained in the following packages. This will also install the nvidia-kmod package, which includes the NVIDIA kernel modules.

# yum -y install xorg-x11-drv-nvidia xorg-x11-drv-nvidia-devel

Remove the nouveau kernel module, (otherwise the nvidia kernel module will not load). The installation of the NVIDIA driver package will blacklist the driver in the kernel command line (nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off) so that the nouveau driver will not be loaded on subsequent reboots.

# modprobe -r nouveau

Load the NVIDIA and the unified memory kernel modules.

# nvidia-modprobe &amp;&amp; nvidia-modprobe -u

Verify the installation and the drivers are working. Extracting the name of the GPU can later be used to label the node in OpenShift.

# nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'
Tesla-V100-SXM2-16GB

Adding the nvidia-container-runtime-hook

The version of docker shipped by Red Hat includes support for OCI runtime hooks. Because of this, we only need to install the nvidia-container-runtime-hook package and create a hook file. On other distributions of docker, additional steps may be necessary. See NVIDIA’s documentation for more information.

The next step is to install libnvidia-container and the nvidia-container-runtime repository.

# curl -so /etc/yum.repos.d/nvidia-container-runtime.repo https://nvidia.github.io/nvidia-container-runtime/centos7/nvidia-container-runtime.repo

The next step will install an OCI prestart hook. The prestart hook is responsible for making NVIDIA libraries and binaries available in a container (by bind-mounting them in from the host). Without the hook, users would have to include libraries and binaries into each and every container image that might use a GPU. Hooks simplify management of container images by ensuring only a single copy of libraries and binaries are required. The prestart hook is triggered by the presence of certain environment variables in the container Dockerfile: NVIDIA_DRIVER_CAPABILITES=compute,utility.

# yum -y install nvidia-container-runtime-hook

The next step is to make the CRI-O container runtime aware of the hook. To activate the hook in CRI-O  or podman create the following JSON file and install podman.

# cat <<’EOF’ >> /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
{
"hook": "/usr/bin/nvidia-container-runtime-hook",
"arguments": ["prestart"],
"annotations": ["sandbox"],
"stage": [ "prestart" ]
}
EOF
# yum -y install podman

To use the hook with docker create the following bash script and make it executable.

# cat <<’EOF’ >> /usr/libexec/oci/hooks.d/oci-nvidia-hook
#!/bin/bash
/usr/bin/nvidia-container-runtime-hook $@
EOF

# chmod +x /usr/libexec/oci/hooks.d/oci-nvidia-hook

Everything is now set up for running a GPU-enabled container. To verify correct operation of driver and container enablement, try running a cuda-vector-add container. We can run the container with docker or podman.

# podman run --privileged -it --rm docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
# docker run --privileged -it --rm docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If the test passes, the drivers, hooks and the container runtime are functioning correctly and we can move on to configuring OpenShift.

Part 2: OpenShift 3.10 with the GPU Device Plugin

The Device Plugin API is now GA in OpenShift 3.10, and is enabled by default. Readers of previous versions of our OpenShift GPU blogs will note that we no longer have to enable the device-plugin or accelerators feature-gate on each node.

After successful installation of OpenShift 3.10, the first step is to create a new project:

# oc new-project nvidia

The project is necessary for the creation of additional service accounts that will have different security context constraints depending on the pods scheduled. The nvidia-deviceplugin service account will have different responsibilities and capabilities compared to the standard SCC. First let’s create a service account that will be tied to the new security context constraint.

# oc create serviceaccount nvidia-deviceplugin

Use the following security context constraint from the cloned repository (device-plugin/nvidia-deviceplugin-scc.yaml). This SCC will be associated with the nvidia-deviceplugin service account. The Device Plugin creates sockets and mounts host volumes. For increased security, these two capabilities are disabled by default in OpenShift. However, device plugins require elevated privileges. This makes them a perfect candidate for a custom SCC to be used when starting the device plugin pod.

Create the new SCC and make it available to OpenShift.

# oc create -f nvidia-deviceplugin-scc.yaml

Verify the newly installed SCC.

# oc get scc | grep nvidia
nvidia-deviceplugin true [*] RunAsAny RunAsAny RunAsAny RunAsAny 10 false [*]

To schedule the Device Plugin on nodes that include GPUs, label the node as follows:

# oc label node <your-gpu-node> openshift.com/gpu-accelerator=true
node "<your-gpu-node>" labeled

This label will be used in the next step.

Deploy the NVIDIA Device Plugin Daemonset

The next step is to deploy the NVIDIA Device Plugin Note that the NVIDIA Device Plugin (and more generally, any hardware manufacturer’s plugin) is supported by the vendor, and is not shipped or supported by Red Hat.

Here is an example daemonset (device-plugin/nvidia-deviceplugin.yaml) which will use the label we created in the last step so that the plugin pods will only run where GPU hardware is available. Use the following (device-plugin/cuda-vector-add.yaml) as a sample pod description that uses the NVIDIA Device Plugin and leverages the service account we just created along with a node selector.

Now create the NVIDIA Device Plugin daemonset.

# oc create -f nvidia-deviceplugin.yaml

Lets verify the correct execution of the Device Plugin. You can see there is only one running, since only one node was labeled in a previous step.

# oc get pods
NAME READY STATUS RESTARTS AGE
nvidia-device-plugin-daemonset-s9ngg 1/1 Running 0 1m

Once the pod is running, let’s have a look at the logs.

# oc logs nvidia-device-plugin-daemonset-7tvb6
2018/07/12 12:29:38 Loading NVML
2018/07/12 12:29:38 Fetching devices.
2018/07/12 12:29:38 Starting FS watcher.
2018/07/12 12:29:38 Starting OS watcher.
2018/07/12 12:29:38 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2018/07/12 12:29:38 Registered device plugin with Kubelet

At this point the node itself will advertise the nvidia.com/gpu extended resource in it’s capacity:

# oc describe node ip-172-31-15-xxx.us-west-2.compute.internal|egrep ‘Capacity|Allocatable|gpu’
Capacity:
nvidia.com/gpu: 2
Allocatable:
nvidia.com/gpu: 2

Nodes that do not have GPUs installed will not advertise GPU capacity.

Deploy a pod that requires a GPU

Let’s run a GPU-enabled container on the cluster. We can use the cuda-vector-add image that was used in the Host Preparation step. Use the following file (device-plugin/cuda-vector-add.yaml) as a pod description for running the cuda-vector-add image in OpenShift. Note the last line requests one NVIDIA GPU from OpenShift. The OpenShift scheduler will see this and schedule the pod to a node that has a free GPU. Once the pod create request arrives at a node, the Kubelet will coordinate with the Device Plugin to start the pod with a GPU resource.

Create the file and start the pod.

# oc create -f cuda-vector-add.yaml

After a couple of seconds the container finishes.

# oc get pods
NAME READY STATUS RESTARTS AGE
cuda-vector-add 0/1 Completed 0 3s
nvidia-device-plugin-daemonset-s9ngg 1/1 Running 0 9m

Let’s have a look at the logs for any errors.

# oc logs cuda-vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

This output is the same as when we ran the container directly using podman. If you see a permission denied error, check to see that you have the correct SELinux label.

Settings to run a custom GPU container

If a custom GPU container is built, make sure to include the following environmental variables in the Dockerfile or in the pod yaml description. See device-plugin/cuda-vector-add.yaml as an example how to use a pod with these environment variables.

# nvidia-container-runtime-hook triggers
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=8.0" # depending on the driver

Now that we have a fully configured OpenShift cluster with GPU support, let’s create a more sophisticated workload on the cluster, see part two of this installment how to use GPU accelerated SQL queries with PostgreSQL & PG-Strom in OpenShift-3.10.

Conclusion

This blog is meant to get you started with using GPUs on OpenShift 3.10, leveraging the device plugin feature.

We’re working with hardware accelerator vendors (such as NVIDIA and others) to streamline the installation, administrator- and user-experience for popular machine learning and artificial intelligence application design frameworks.

We encourage you to give this procedure a try, and give us feedback by posting comments to this blog!