When you need to build something people can depend on, you start with a strong foundation. For OpenShift’s foundation, we have been building on Kubernetes for over a year.
We have enjoyed the journey, but are not simply along for the ride. OpenShift has tirelessly helped make Kubernetes one of the fastest growing and sought after container orchestration engines available. OpenShift 3.3 is leveraging Kubernetes 1.3, and for that reason, Red Hat has contributed to the following projects in the Kubernetes 1.3 release:
- Init Containers
- Rolling Update Status
- Disk Attach Controllers
- Pod Security Policies
- Pod Evictions
- Quota Controlled LoadBalancer Services
- Quota Controlled nodePorts
- Scale to 1,000 nodes
- Dynamic Provisioning of Storage
- Multiple Schedulers in Parallel
- Seccomp Policy Support
This work represents a significant investment from Red Hat in the Kubernetes community and technology. We truly feel this is the place to work on cloud native container enabled solutions.
While we are working upstream in Kubernetes, we also work upstream in Origin (the open source community for OpenShift) where we leverage a vibrant gathering over over 200 corporations that have been driving innovation into OpenShift. We tailor the features of Kubernetes towards enterprise class use cases and deliver an out of box experience that people can immediately start taking advantage of in order to increase their ability to leverage these popular technologies on private and public clouds. I’d like to call out a few new features that will grab your attention in OpenShift 3.3.
- Cluster Longevity
- Framework Services
Four main features in OpenShift 3.3 open the door to easier usage of the solution in production environments. Due to the unbelievable popularity of OpenShift 3, the product has found itself in some critical production situations with customer revenue generating applications. People are leveraging OpenShift 3 to run critical business services...today. OpenShift 3.3 delivers the following four features to assist in that usage:
Controllable Source IP
Tenants on the platform would like to leverage data sources or end points that live outside of the platform such as HRM, CRM, or ERM systems. These could be anything from specialized hardware appliances to decade old deployments that simply have stood the test of time. Due to where they are located in the datacenter, customers have chosen to guard access to them by firewalling them off and only allowing for approved IP address connections. This access design was common before the age of API management and is still used today for many business services.
The problem with cloud architectures and containers, is as you increase the mobility and manageability of the container it has a higher chance of moving around the cluster. This means that the underlying source IP packet, that comes from the actual node level, can change as the container moves around the cluster. If that is changing, it becomes difficult to grant access from an application living on a cloud platform to a service that is behind corporate firewalls. Thus we have a problem getting to the good stuff!
In comes controllable source IPs in OpenShift 3.3. Now a platform administrator can identify a node in the cluster and allocate a number of static IP address to the node (at the host level). If a tenant needs an unchanging source IP for his or her application service, they can request access to one during the process they use to ask for firewall access. The platform admin will then deploy an egress router from the tenant’s project leveraging a nodeSelector in the deploymentConfig to insure the pod lands on the host with the pre-allocated static IP address.
The egress pod’s deployment will declare one of the sourceIPs, the destinationIP of the protected service, and a gatewayIP to reach the destination. Once the pod is deployed, the platform admin can create a service to access the egress router pod. They will then add that sourceIP to the corporate firewall and close out the ticket. The tenant will now have access information to the egress router service that was created in their project (ie service.project.cluster.domainname.com).
When the tenant would like to reach the external, firewalled service they will call out to the ergress router pod’s service (ie service.project.cluster.domainname.com) in their application (ie the JDBC connection information) rather than the actual protected service url.
Our customers have been leveraging the platform to offer a multi-tenant, docker compliant, platform. As such, they are placing thousands of tenants on the platform from all different walks of life. In some cases, the tenant are subsidiary corporations or have drastically different affiliations. With such diversity, often times business rules and regulatory requirements will dictate that tenants not flow through the same routing tier. To solve this issue, OpenShift 3.3 releases router sharding. With router sharding a platform administrator can group specific routes or namespaces into shards and then assign those shards to routers that may be up and running on the platform or be external to the platform. This allows tenants to have separation of egress traffic at the routing tiers.
OpenShift has always been able to support non-standard TCP ports via SNI routing with SSL. As the internet of things (IoT) have exploded, so to as the need to speak to dumb devices or aggregation points without SNI routing. At the same time, with more and more people running data sources (such as databases) on OpenShift, many more people want to expose ports other than 80/433 for their applications so that people outside of the platform can leverage their service.
Until today, the solution for this in Kubernetes was to leverage NodePorts or External IPs. The problem with NodePorts is that only 1 tenant can have the port on all the nodes in the cluster. The problem with External IPs is that duplications can be common if the admin is not carefully assigning them out.
OpenShift 3.3 solves this problem through the clever use of edge routers. What happens is the platform administrator will either select one or more of the nodes (more than one for high availability) in the cluster to become edge routers or they can just run additional pods on the HAProxy nodes. The additional pods we are going to run are ip failover pods. But this time, we will specify a pool of available Ingress IPs that are routable to the nodes in the cluster and resolvable externally via the corporate DNS.
This pool of IP address are going to be served out to tenants who want to use a port other than 80 and 433. In these use cases, we have services outside of the cluster trying to connect to services inside the cluster that are running on ports other than 80/433. This means they are coming into the cluster (ingress) as opposed to leaving the cluster (egress). By resolving through the edge routers, we are able to insure each tenant gets the port they desire by pairing it with a Ingress IP from the available pool rather than giving them a random port.
In order to trigger this allocation of an IngressIP, the tenant will just declare a ‘LoadBalancer’ as type in their service json for their application. Afterwards they can use a ‘oc get $servicename’ in order to see what IngressIP was assigned to them.
A/B Service Annotation
This OpenShift 3.3 feature is one of my favorites. We have always been able to do A/B testing with OpenShift, but it was not a "easy to use" feature. Now in OpenShift 3.3 we have added service lists to routes. Each route can now have multiple services assigned to it and those services can come from different applications or pods. We then designed automation with HAProxy to be able to read weight annotations on the route for the services. A tenant can now very easily from the command line or webconsole declare 70% of traffic will flow to appA and 30% will flow to appB.
Three main improvements to the security of the cluster come in the form of stronger AUTH control, the ability to disable system calls for containers via security context constraints, and a easier way to keep track of and update CERTs we use for the SSL traffic between OpenShift framework pieces.
SCC Profiles for seccomp
seccomp is a relatively unknown feature in RHEL that has been enabled for docker 1.10 or higher. seccomp allows containers to define interactions with the kernel using syscall filtering. This will reduce the risk of a malicious container exploiting a kernel vulnerability, thereby reducing the guest attack surface. We have added an ability to create seccomp policies with OpenShift 3.3 security context constraints (SCC). This will allow platform administrators to set SCC policies on tenants that will impose a filter on their containers for linux level system calls.
Kerberos Support in the oc client for Linux
We now can recognize and handle the kinit process of generating a kerberos tickets during a tenant’s interaction with the oc client on Linux.
$ kinit user1@MYDOMAIN.COM (password = 'password')
$ oc login <OPENSHIFT_MASTER>
OpenShift leverages TLS encryption and token based authentication between its framework components. In order to accelerate and ease the installation of the product, we will self sign CERTs during a hands free installation. OpenShift 3.3 adds the ability to update and change those CERTs that govern the communication between our framework components. This will allow platform administrators to more easily maintain the life cycles of their OpenShift installations.
Once you stand up a cluster and people start using it, your attention as a platform administrator will turn to care and feeding for the cluster. The platform should possess features that help it remain stable under frequent and constant use. In OpenShift 3.3 we spent some time focusing on features that will help. We take advantage of Kubernetes workload priority and eviction policies, we offer an ability to idle and unidle workloads, we increased the number of pods per node and node per cluster, and we help the tenant find the right persistent storage for their deployments needs.
OpenShift 3.3 allows platform administrators more control over what happens over the life cycle of the workload on the cluster once the process (container) is started. By leveraging limits and request setting at deployment time, we can figure out automatically how the tenant wants us to treat their workload in terms of resources. We can take one of three positions. If the tenant declares no resource requirements (best effort), we can offer them slack resources on the cluster. But more importantly, that choice allows us to decide to re-deploy their workloads first should an individual node become exhausted. If the tenant tells us their minimum resource requirements but does not ask for a very specific range of consumption (burstable), we can offer them their min while also giving them an ability to eat slack resources should any exist. We will consider this workload more important than best effort in terms of re-deployment during a node eviction. Lastly, if a tenant tell us the minimum and maximum resource requirements (guaranteed), we will find a node with those resources and lock them in as the most important workload on the node. These workloads will remain as the last survivor on a node should it go into a memory starvation situation. The decision to evict is an intimate one to the platform administrator. With that in mind, we have made it configurable. It is up to the platform administrator to turn on the ability to hand a pod (container) back to the scheduler for re-deployment on a different node should out of memory errors start to occur.
For OpenShift 3.3 we have taken the time to qualify the solution on larger environments. You can see some of this work publicly via the Cloud Native Foundation work we completed recently as well as increased information within the product documentation on expectations. We are now up to 1,000 nodes per cluster at 250 pods per node (with a recommendation of 10 pods per hyper-threaded core). That is a ¼ of a million containers per cluster. A truly remarkable milestone considering we are not just talking about starting a container. We are talking about establishing developer projects, enforcing quota, running multi-tier application services, exposing public routes, offering persistent storage, and all the other intricacies of deploying real applications.
Wouldn’t it be great if we lived in a world where developers did not have to care about giving back resources from innovation projects they have paused while working on emergencies? OpenShift 3.3 delivers something that will help. New in OpenShift is an API to idle an application’s pods (containers). The idea is to have your monitoring solution call the API when a threshold to a metric of interest is crossed. The magic happens at the routing tier. The HAProxy will hold the declared route url, that is connected to the service, open and then we will shutdown the pods. Should someone hit this application URL, we will re-launch the pods on available resources in the cluster and connect them to the existing route.
Ephemeral containers (ones that erase once they are rebooted) are extremely powerful. But figure out how to give them persistent in a fluid manner across a 1,000 node cluster and you have got something to write home about. OpenShift has had an ability to offer remote persistence block and file based storage for over a year. In OpenShift 3.3, we increase the ability of the application designer or tenant to select a storage provider on the cluster in a more granular manner than stating just file or block. Storage labels can help people call out to a specific provider in a simple manner by adding a label request to their persistent volume claim (PVC).
OpenShift provides resource usage metrics and log access to tenants. These are native framework services that run on the platform that are based on the hawkular and elasticSearch open source projects. With every release of OpenShift, these services become stronger and more feature rich.
We have delivered a log curator utility to help platform administrators deal with the storage requirements of storing tenant logs over time. We have also enhanced the integration with existing ELK stacks you might already own or be invested in by allowing logs to more easily be sent to multiple locations.
Metric Install Enhancement
We added network usage attributes to the core metrics we track for tenants in this release. But we also made metrics a core installation feature instead of a post-install activity. Now the OpenShift installer will guide you through the ansible playbooks required to successfully deploy metrics. Thus driving more usage of the feature in the user interface and CloudForms.
The point of cluster management is to enable the tenants to get the most out of the platform without knowing the details. We try to remove as many barriers that the underlying technologies, infrastructure, or runtimes may impose and allow developers and operators to focus on delivering business services in a high velocity pattern at low operational risk. We hope you enjoy OpenShift 3.3. Please be sure to check out the user interface enhancements and improved developer experience!
If you want to learn more about the new features of OpenShift 3.3 Don't miss the following blog posts from our engineering team: