This article is focused on guiding the creation of automatic backups for ETCD in OCP 4.x clusters. This activity is of paramount importance for a successful disaster recovery.

This solution has been tested from versions 4.7 onwards.

You may be curious how ETCD automated backups can assist in the recovery of one or more Master Nodes Cluster on OpenShift 4. Below I will demonstrate what necessary resources you will need to create automatic backups using CronJob from OpenShift.

Let’s go into what matters. First, you must create the following features:

  • Namespace
  • Service Account
  • Cluster Role
  • Cluster Role Binding
  • Set Privileges for Service Account
  • CronJob

Namespace

Specific namespaces must be created for running ETCD backup pods. In the CronJob section, I will explain the pods that will be created to perform the backup in more detail. An example of setting this up is in the following command:

$ oc new-project ocp-etcd-backup --description "Openshift Backup Automation Tool" --display-name "Backup ETCD Automation"

Service Account

You must create a service account in the namespace dedicated to backup ETCD. This service account will be responsible for performing backup commands for the master nodes:

---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: openshift-backup
  namespace: ocp-etcd-backup
  labels:
    app: openshift-backup
---
$ oc apply -f sa-etcd-bkp.yml

Cluster Role

As a security measure, the service account created earlier can not have excessive permissions on the cluster, so you must create a Cluster Role with specific permissions for running the backup.

This Cluster Role has specific permissions to specific resources and verbs, such as:

  • Resource: Nodes
  • Verbs: Get e List
  • Resource: pods and pods/log
  • Verbs: Get, List, Create, Delete, and Watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-etcd-backup
rules:
- apiGroups: [""]
  resources:
    - "nodes"
  verbs: ["get", "list"]
- apiGroups: [""]
  resources:
    - "pods"
    - "pods/log"
  verbs: ["get", "list", "create", "delete", "watch"]
---
$ oc apply -f cluster-role-etcd-bkp.yml

In this link you can find more information about RBAC permissions.

Cluster Role Binding

After creating the Service Account and Cluster Role, you need to create the link between the two resources:

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: openshift-backup
  labels:
    app: openshift-backup
subjects:
  - kind: ServiceAccount
    name: openshift-backup
    namespace: ocp-etcd-backup
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-etcd-backup
---
$ oc apply -f cluster-role-binding-etcd-bkp.yml

Service Account With Special Privileges

You will need to set special privileges to the service account running the directory creation commands and backup execution of ETCD (The aforementioned commands are executed with sudo, so you will need the service account to have special privileges):

$ oc adm policy add-scc-to-user privileged -z openshift-backup

In the above command, we are using the -z parameter for the SCC (Secure Content Context.) "Privileged" is applied to the service account.

CronJob

Finally, you need to create CronJob.

In CronJob, we will be using the OpenShift client image to create the backup and debug pods.

In CronJob, the backup will be created in /home/core/backup. If this directory does not exist, it will be created automatically.

The CronJob will also delete the backups older than 1 minute, avoiding unnecessary use of disk resources of Master Nodes.

To create CronJob, do this:

---
kind: CronJob
apiVersion: batch/v1
metadata:
  name: openshift-backup
  namespace: ocp-etcd-backup
  labels:
    app: openshift-backup
spec:
  schedule: "56 23 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  jobTemplate:
    metadata:
      labels:
        app: openshift-backup
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            app: openshift-backup
        spec:
          containers:
            - name: backup
              image: "registry.redhat.io/openshift4/ose-cli"
              command:
                - "/bin/bash"
                - "-c"
                - oc get no -l node-role.kubernetes.io/master --no-headers -o name | xargs -I {} --  oc debug {}  --to-namespace=ocp-etcd-backup -- bash -c 'chroot /host sudo -E /usr/local/bin/cluster-backup.sh /home/core/backup/ && chroot /host sudo -E find /home/core/backup/ -type f -mmin +"1" -delete'
          restartPolicy: "Never"
          terminationGracePeriodSeconds: 30
          activeDeadlineSeconds: 500
          dnsPolicy: "ClusterFirst"
          serviceAccountName: "openshift-backup"
          serviceAccount: "openshift-backup"
---
$ oc apply -f cronjob-etcd-bkp.yml

After creating CronJob, you can force the execution for validation with the command:

$ oc create job backup --from=cronjob/openshift-backup

During the execution of CronJob, the automatically created pods will debug on nodes for backup execution.

With these commands, you can check the pods that were created for a backup running:

$ oc get pods -n ocp-etcd-backup
NAME                               READY   STATUS    RESTARTS   AGE
pod/backup-jcn87                   1/1     Running   0          40s
pod/zmaciel-f9fbb-master-0-debug   1/1     Running   0          18s

$ oc get pods -n ocp-etcd-backup
NAME                               READY   STATUS    RESTARTS   AGE
pod/backup-jcn87                   1/1     Running   0          56s
pod/zmaciel-f9fbb-master-1-debug   1/1     Running   0          14s

$ oc get pods -n ocp-etcd-backup
NAME                               READY   STATUS    RESTARTS   AGE
pod/backup-jcn87                   1/1     Running   0          67s
pod/zmaciel-f9fbb-master-2-debug   0/1     Running   0          6s

These pods will be automatically excluded from the end of the backup execution.


If CronJob ran successfully, you will have a result similar to the images below.

Backup Pod, Job, and CronJob

Backups Created on Master Nodes

Final Thoughts

So, now you can see how easy it is to create backups of the ETCD through the OpenShift Platform.