The Capabilites and Advantages of Cartridges in OpenShift

OpenShift Cartridge

Background

What are these things called cartridges in OpenShift and why are they so important? Well, let’s take a step back and look at a typical application. While some might argue the specifics, most applications are still multi-tier applications and utilize multiple technologies with some separation between them. A classic case is a web application and a database. While some of the databases might be experimenting with NoSQL backends, the general pattern largely holds. And maybe you throw in a caching tier in there or something more exotic, but at the end of the day, very few applications I’ve seen get very far with just a web application runtime and nothing else.

Composition

So if you’re still with me and not yet posting to the comments about that initial claim being ridiculous, let’s talk about how that process often plays out. When building an application, many developers think from a technology standpoint. They might think of Ruby and want to use Mongo for storage. Despite claims that the most effective route is to focus on the use cases first (e.g. I’m building a coupon generating web application that has to store large amount of redundant data), at the end of the day, the technology decision is often a major factor. I often operate this way myself – half of the time, an idea I’m pursuing is as driven by getting to try out some new technology as it is on a successful and fast implementation. Engineers like to learn and new technology is a great vehicle for that.

But while the learning curve around new technology has some benefits, it also has many disadvantages. It’s hard for me to argue that learning to wire up a MySQL database to a Ruby application server versus a Java application server has any practical benefit. It’s just something I need to do. I need a database driver for the language I’m using, authentication details and endpoints. It’s the same in theory but just different enough in every language and runtime to be a major pain. And databases are well know. The newer the technology gets, the more time is often wasted on the mundane aspects of integration. But don’t give up on being a developer yet because this is what cartridges in OpenShift eliminate.

The cartridge model in OpenShift is all about enabling choice in technology and language while also reducing the effort around the integration portions that can be automated. If you have an application that consists of a JBoss cartridge and a MySQL cartridge, the two are automatically wired together. You don’t need to know or care about what MySQL driver is being used in JBoss or how the data source is setup. You can just get down to writing code and queries. This is beneficial in both development and production. In development, this gives engineers the ability to trial a lot of different software to find the best solution to their problem. They can spend more time on the analysis and not the administrivia of learning the setup environment of each technology. But that same approach and power also extends to production. Cartridges don’t only automate things like wiring up different components, they also can implement functionality like scaling. For example, the JBoss cartridge has auto-scaling built in so that when the application is getting more load than it can handle, it will spin up new instances automatically. And for those who might be wondering, clustering is automatically setup as well – new instances automatically join the cluster. The goal of the cartridge model is to capture these capabilities in a standardized, easily consumable format that bring benefits throughout the entire lifecycle of application development.

The Technology

OpenShift cartridges have an amazing amount of functionality but there are two capabilities that are my favorite:

  • Providing a first class way to interact with each other, even across multiple machines
  • Giving the cartridges the ability to influence their deployment topology (i.e. can they run embedded with other cartridges or do they scale differently)

Publish / Subscribe

Let’s talk about the interaction model first. By interaction model, I simply mean having multiple cartridges communicate with each other. That sounds incredibly simple but it’s also amazingly powerful, especially as you consider building applications from many cartridges. The concept is that a cartridge like MySQL can publish information about itself that other cartridges might want to know. For example, when a new MySQL instance is created, you probably need to know the username, password and JDBC URL – all of that information can be published. That process is described with the cartridge in a file that we call a manifest. Here is an example of how MySQL actually publishes its connection information in it’s manifest:

Publishes:
  publish-db-connection-info:
    Type: ENV:NET_TCP:db:connection-info

That command will invoke a script called publish-db-connection-info that will publish a collection of environment variables of type ENV:NET_TCP:db:connection-info. You can think of the type as an arbitrary string that can be used by consumers to filter out what they may or may not support. This published information can then be consumed by any other cartridge that subscribes to a matching type. For example, in the JBoss cartridge, you’ll see the following section in it’s manifest:

Subscribes:
  set-env:
    Type: ENV:*
      Required: false

This instructs the JBoss cartridge to listen to all environment variables set by publishing events that start with the string ENV. More restrictive matching can also be done in cases where you might have a cartridge that is only compatible with a certain class of published information (e.g. subscribing to ENV:NET_TCP:db:connection-info instead of ENV:*). Either way, if the publish and subscribe string match, the JBoss cartridge has access to the published MySQL information. With that information, the JBoss cartridge is then able to automatically wire up a datasource definition in standalone.xml by using those values:

 <datasource jndi-name="java:jboss/datasources/MysqlDS" 
 ...
 <connection-url>
   jdbc:mysql://${env.OPENSHIFT_MYSQL_DB_HOST}:${env.OPENSHIFT_MYSQL_DB_PORT}/${env.OPENSHIFT_APP_NAME}
 </connection-url>
 <driver>mysql</driver>
 <security>
   <user-name>${env.OPENSHIFT_MYSQL_DB_USERNAME}</user-name>
   <password>${env.OPENSHIFT_MYSQL_DB_PASSWORD}</password>
 </security>
 ...
 </datasource>

While this is just a simple example, hopefully the beauty of it to a developer is apparent. Just the act of adding a MySQL cartridge to your JBoss application will automatically wire up your application to it. Adding Mongo would do the same thing, as would Postgres, etc, etc. And this isn’t limited to databases either. It also works with monitoring cartridges, metrics cartridges, caching, and many others – the possibilities are limitless.

Deployment Topology

The second capability isn’t about the development process as much as it is about production. We all know that different application technologies scale differently. You might have a Ruby application whose throughput is determined by the number of Passenger instances that are running. If it starts slowing down, you need to add more. However, if this same application depends on a database, you probably need to scale the data tier independently. You don’t want to add another MySQL instance every time you add a new Passenger instance. Not only is that unnecessary and expensive, it most likely wouldn’t even work. When scaling your web tier, you need to think about session affinity, connection persistence, stateless / stateful behavior and similar concepts. However, when scaling MySQL, you need to think about your master / slave model, how many to add of each and what type of query patterns you are using. In OpenShift, since these are different cartridges, each cartridge can approach scaling in a unique manner.

From the cartridge standpoint, the Ruby cartridge is going to respond to a scaling event very differently than MySQL. While this requires real work and thought from the cartridge authors, it captures the complexity in a model that is easily leveraged by developers. Developers are able to specify how they want scaling to occur (e.g. automatically or manually) and also put limits around how many of their resources they want each cartridge to be able to consume. They might want their Ruby tier to always start with pre-allocated resources (called gears in OpenShift) but still limit the maximum number of resources it could consume. Using the OpenShift command line tools, that would be as simple as:

 rhc scale-cartridge ruby -a myapp --min 5 --max 10

In my application, that would always start the Ruby cartridge with 5 gears and never consume more than 10. The best part though is that the cartridges themselves can also influence what sort of scaling is possible so that you aren’t blindly adding resources to a cartridge that can’t use them. The default Ruby cartridge supports scaling but the default MySQL cartridge can only run standalone. The MySQL cartridge is able to express limitation this by setting the scaling options to a single gear in the manifest:

 Scaling:
   Min: 1
   Max: 1

The end result is that when you are creating a scaled application, the Ruby runtimes and MySQL runtimes will get created on separate gears to give the maximum amount of resources to each tier, but the MySQL cartridge and Ruby cartridges will implement their own unique scaling approach.

At the end of the day, this is really about separation of concerns. Cartridges in OpenShift are used to describe lifecycle characteristics of the technology they represent as well as integration options with other cartridges. Since the OpenShift cartridge format is completely open, it’s easy for commercial vendors as well as open source users to create cartridges. For developers, that means they get to access a broad choice of technologies, both commercial and community. But in addition to choice, the most value comes from allowing developers to spend more time doing the thing they do best – coding.