Part 3: Evolving Our Infrastructure

Our journey to OpenShift across multiple clouds has taken three parallel paths: Changing our culture, rethinking the application lifecycle, and evolving our infrastructure. This post, the last one in our 3-part series, describes how we're working around the infrastructure differences of our various clouds.

Making Cloud Differences Invisible to Developers

If you're a developer, you shouldn't have to think about whether you're deploying an image to an on-premises or public cloud, or to a live or backup data center. We want all of our data centers to be interchangeable.

We took a major step toward that goal when we built a single CI/CD pipeline for deploying images in any OpenShift environment. But developers still had to follow different processes for different clouds. For example, they had to write logic to adapt to infrastructure differences. Following a consistent security model was difficult because of confusing network ACLs. And database and storage resources available in one environment were not available in others.

Now we're taking steps to make the cloud environments appear identical to developers, even though they're not. Our method? Abstracting the differences—moving them "under the hood"—by creating common infrastructure services that plug into our application lifecycle. The same services work with all of our clouds: On-premises OpenShift clouds, AWS, and other public clouds we might use in the future.

We're starting by building infrastructure services for data mobility and automated infrastructure management.

Data Mobility

Using OpenShift across multiple clouds requires moving data between clouds and keeping it synchronized. We store many types of data at Red Hat: Traditional relational databases, MongoDB, and unstructured data such as the documents that customers submit through the Red Hat Customer Portal.

To synchronize the data, we use Galera Cluster for MySQL, a solution for active-active database clusters. Galera uses synchronous replication to build a single cluster entity so that all servers always have identical data. We're currently investigating other data solutions that complement Galera. We're also working to make our application architectures support eventual consistency, which allows for asynchronous replication and other data mobility options.

For local storage on OpenShift, we already use Red Hat Storage Gluster in both AWS and our on-premises data centers. Replicating across AWS and on-premises storage is still in the works. For unstructured data replication, we're working with NetApp to synchronize data from our on-premises storage into AWS Simple Storage Service (S3) using ONTAP Cloud.

Automated Infrastructure Configuration and Management

We're taking several approaches to make configuration and management the same in all of our OpenShift clouds.

Asset management with Red Hat CloudForms

It may sound boring, but asset management is very important for governance. In the past we took inventory using Puppet and homegrown scripts. Now we're switching to Red Hat CloudForms because it:

  • Scales more easily
  • Is easier to integrate with ServiceNow, which is part of our configuration management database
  • Augments inventory with functions like governance and provisioning
  • Produces more detailed reports

Datacenter abstraction with standardized deployment templates

We're creating common playbooks for on-premises data centers and AWS. The plan is that a developer rolling out a new service will use the same interface to deploy an app in any cloud. To make that happen, we're abstracting the infrastructure differences using Red Hat CloudForms and Ansible Tower by Red Hat.

Image-based deployment

AWS is designed for image-based deployment. We're already using it for our existing OpenShift and Red Hat Storage Cluster deployments in AWS. Now we're also making image-based deployment our standard for other clouds. The appeal includes:

  • Image deployment in minutes instead of hours, enabling self-healing and auto-scaling
  • More consistent security because security checks and standards validation are performed during image build and deployment lifecycle
  • Simplified patch management

We've built a prototype for image-based deployment in our on-premises Red Hat Enterprise Virtualization clouds, using Ansible and Jenkins.

Software-defined infrastructure

Instead of manually configuring servers, storage, and switches, we're automating configuration using Ansible for Red Hat and Red Hat CloudForms. Ansible works with most of our infrastructure, including Cisco and Juniper switches; NetApp and Red Hat Storage; and our OpenShift, Red Hat Virtualization, and Red Hat Enterprise Linux infrastructures. We're also working closely with NetApp, Cisco, and Juniper as they integrate Ansible by Red Hat into their solutions. The integration will make it easier for us to provision, deploy, and manage these platforms as software-defined infrastructure.

Early Benefits

Infrastructure differences between AWS and our on-premises data centers still exist, but they don't slow us down as much:

  • We can deploy OpenShift and Red Hat Storage Gluster in AWS in a few minutes—down from hours. Scaling is simpler.
  • We can build an app using OpenShift on AWS and deploy it on-premises, or vice versa. Teams are gradually getting more comfortable that this actually works.
  • Soon we'll no longer make a distinction between primary and backup data centers, or between live or active-active data centers. All data centers will be endpoints with a consistent set of application services.

What's Next

We're still early in the journey to OpenShift in a multi-cloud environment. We'll continue to abstract out the differences in our OpenShift environments in the ways I've described. At the same time, we're also trying to minimize the differences between environments. We decide which differences to eliminate based on a cost/benefit analysis.

Our current infrastructure projects include:

  • Using CloudForms for cost reporting on infrastructure usage, showing each development team their resource consumption. We're starting with showback, and might later move to chargeback.
  • Image scanning. Our idea is to initiate an image scan on demand with an API call. Then we'll do continuous passive scanning of all images to see if they comply with our latest security model. If an image is outdated, we'll either block it or initiate automated remediation using CloudForms.

  • Provisioning and controlling AWS services from within OpenShift. We'll use the new AWS Service Broker as we integrate it into our existing automation.

To sum up this blog series, our OpenShift journey has taken three paths: Culture, application lifecycle, and infrastructure. The payoff is rapid service delivery, which accelerates innovation for our workforce and our customers.