New OpenShift Cartridge Format - Part 2

This is part two of a multi-part blog series to explain the new cartridge format. See Part One of the New Cartridge Series for more background. This post is quite a bit longer than the last post, so give yourself thirty minutes to an hour to complete it. We're introducing a lot of new concepts and hopefully everything will make sense by the end.

Prerequisites

This post builds off of work done in Part One so please make sure to complete the tasks in that post first to make the most of this one. Let's do a quick double check. The following files should be found in the httpd-example: directory:

./run/
./modules/
./html/
./html/index.html
./etc/
./etc/httpd.conf
./logs/
./logs/access_log
./logs/error_log

The following command should also start Apache httpd:

httpd -f ~/httpd-example/etc/httpd.conf

In addition, confirm that the webserver can properly listen on and respond to requests at: http://127.1.2.3:8080/. Next let's take a look at the template cartridge we'll be working from. It's called a ''mock cartridge'' and it's actually used for automated testing by the OpenShift engineers when they are building a new release. It's also useful as a base for building new cartridges: https://github.com/openshift/origin-server/tree/master/cartridges/openshift-origin-cartridge-mock

So go ahead and clone or download the repo zip from github: https://github.com/openshift/origin-server/archive/master.zip Then look for the cartridges/openshift-origin-cartridge-mock directory. Here's a high level overview of its layout:

 # The cartridge “root” directory will get installed into the gear at ~/$CARTRIDGE_NAME/
./
./README.md
# ./conf.d is not really needed; we'll be removing it.
./conf.d
./conf.d/mock.conf.erb
# mock.conf is also not used and we'll be removing it.
./mock.conf
 # The usr directory gets symlinked in instead of copied; we will talk about it later
./usr
./usr/shared-script
# This hook directory is a list of scripts that allow the cartridges to connect to each other
./hooks
./hooks/set-db-connection-info
# Where environment variables get set
./env
./env/OPENSHIFT_MOCK_SERVICE_URL.erb
./env/OPENSHIFT_MOCK_EXAMPLE.erb
./env/PATH
# This “template” directory is what will be placed in the application's git repo on creation.
./template
./template/.openshift
./template/.openshift/README.md
./template/.openshift/action_hooks
./template/.openshift/action_hooks/build
./template/.openshift/action_hooks/post-deploy
./template/.openshift/action_hooks/pre-build
./template/.openshift/action_hooks/deploy
./template/index.html
# Cartridge metadata, the most important of which is the manifest.yml
./metadata
./metadata/manifest.yml
# managed_files.yaml for .erb processing
./metadata/managed_files.yaml
# This is a spec file we use for packaging, not strictly required for cartridge creation.
./openshift-origin-cartridge-mock.spec
# This conf directory is also not strictly needed and we'll be removing it
./conf
./conf/httpd.conf.erb
# bin directory contains our basic scripts for the cartridge.
./bin
# control is your start/stop script so “./control start” should start the application.
./bin/control
# teardown is not required for most basic uses
./bin/teardown
# setup is run as part of application creation as well as updates.
./bin/setup

Cartridge Workflow

The mock cartridge actually "works" as-is, so it's important to know what happens when a new application is created. Below is a walk-through of roughly what would happen if a user created an application with mock via “rhc app create -a mymock -t mock-0.1”

  1. The client tools contact the broker via a REST API. The broker then finds a node to put it on. An empty gear is then created on that node.
  2. Once the gear is created, our cartridge gets copied into that gear. In the case of our mock cartridge - /var/lib/openshift/$OPENSHIFT_GEAR_UUID/mock/
  3. After the cartridge is copied, every file listed in ./metadata/locked_files.txt is created and set as owned by the user.
  4. The setup script is run.
  5. All of the files listed in ./metadata/locked_files.txt are chowned as root so the user can no longer modify them
  6. Connection hooks are run
  7. The cartridge is started via control start

The two most important steps there, especially for this example, are step 3 (the copy of the directories) and step 4 (where setup is run). The exception to copying of directories is the ./usr/ directory. This directory will typically get symlinked in. Why? Mostly to help save disk space and memory. ./usr/ should have all of your services binaries, blobs, images or other fairly large static data, the user will not have write access to this directory. Keep in mind, anything copied into the user's gear counts against their disk quota.

Note that in our example we're not using any binaries or blobs; we are just using the httpd that is provided by RHEL6. It's nice to be able to rely on those binaries already being there, but if they weren't, you could always provide your own httpd and put it in the usr directory.

First steps

So now we have a good idea of the layout of our httpd application and cartridge and how the cartridge is installed. Let's start by removing all the bits in the mock cartridge we don't need.

cd ~/httpd-example-cartridge
# note: The rest of these commands are run from inside the httpd-example-cartridge directory unless otherwise specified.
rm -rf ./conf ./conf.d
rm ./openshift-origin-cartridge-mock.spec ./hooks/set-db-connection-info ./env/OPENSHIFT_MOCK_SERVICE_URL.erb ./mock.conf
# Lets zero out the action hooks
cd template/.openshift/action_hooks/
echo > build
echo > deploy
echo > post-deploy
echo > pre-build

Everything else seems fairly useful or at least not harmful to have in place. So next step, copy our original files into this repo.

cp -adv ~/httpd-example/* ./

The first somewhat non-obvious change we need to make is to our template directory. This directory contains the default git environment at app creation. In our case we want our default index.html “hello world” file to be the default home page. So let's move it and clean up.

mv html/index.html template/
rmdir html

We also need to make changes to our etc/httpd.conf file. Recall from part 1 we have some hard-coded paths in the config file, specifically ServerRoot. Here's the example provided:

ServerRoot "/home/mmcgrath/httpd-example/"

Lucky for us, OpenShift has a full template system that can easily substitute the proper value in instances like this. To enable it we need to rename the file to a .erb:

mv etc/httpd.conf etc/httpd.conf.erb

Note: the “httpd.conf.erb” is important because after the template has been processed, the “.erb” suffix will be removed leaving only “httpd.conf” Next we need to inject the cartridge home directory into our ServerRoot string. So replace the ServerRoot example above with:

ServerRoot “<%= ENV['OPENSHIFT_HTTPD_DIR'] %>”

There are several environment variables provided; the full list can be found here: https://github.com/openshift/origin-server/blob/master/node/README.writing_cartridges.md but look closely because you'll see “OPENSHIFT_HTTPD_DIR” isn't one of the variables explicitly listed, but it is provided. Later in the blog post we define how this cartridge is internally referred to in the manifest.yml, with "Cartridge-Short-Name: HTTPD". Once defined, OPENSHIFT_HTTPD_DIR will be set to something similar to “/var/lib/openshift/511e1b1e500d451fb70003e8/httpd/”

In part 1 of the series we set some other paths to be relative to this ServerRoot. We can keep all of those paths the same except for one. In earlier steps index.html was moved to a different directory. Apache httpd needs to be made aware of these changes. Look for:

DocumentRoot "html"

and change it to

DocumentRoot "<%= ENV['OPENSHIFT_REPO_DIR'] %>"

Using the same gear ID as above this directory would resolve to:

DocumentRoot "/var/lib/openshift/511e1b1e500d451fb70003e8/app-root/runtime/repo"

This is where the exploded git repository ends up when a user issues a git push. It's also, therefore, where the template/ directory in our cartridge ends up. At least until the user makes a change to it.

There was also the matter of our IP address and port. We had initially picked 127.1.2.3 which is a loopback address. OpenShift uses loopback addresses for intranode communication. When a cartridge is installed on a gear, it must use one of the IP addresses allocated to that gear. An IP address will be assigned to each cartridge (if it is needed) via the manifest.yml file (more details below). That IP will be available in an environment variable and it's one we can reference in the template just as we did above with OPENSHIFT_REPO_DIR. So let's do that. Find:

Listen 127.1.2.3:8080

and replace it with

Listen <%= ENV['OPENSHIFT_HTTPD_IP']%>:<%=ENV['OPENSHIFT_HTTPD_PORT'] %>

The last bit before we move on is simply cleanup from our earlier tests. When the log dir was copied over it had logs in it. Let's clean that out.

rm -f logs/*
echo "Logs are stored here" >> logs/README

Cartridge Metadata

The cartridge metadata is how the cartridge describes itself to the rest of the system. It also has some handy files for controlling how the cartridge behaves. lock_files.txt was described earlier but let's look a bit closer. All of the files in locked_files.txt get created as part of the installation setup. Once setup is done, all of the files listed get owned as root. Any file created by the cartridge not listed in locked_files.txt remain owned by the user.

For example, we have a logs/ dir that we want the user to be able to write to. Httpd will run as the 'gear' user and it's certainly going to generate logs. The env/ dir, on the other hand, has information in it that we want to remain static. Thus, after the cartridge is done installing, it gets chowned as root and prevents further alterations.

The snapshot* files in metadata/ allow the cartridge writer to decide to ignore certain directories during the snapshot phase. This can be useful when creating a database cartridge. Perhaps you'd prefer the db dump be snapshotted instead of the raw db files. Remember a snapshot is more like a backup and will download files onto the users workstation (it's usually called from the command line tools)

The last and most complex part of the cartridge metadata is the manifest.yml. This file contains all the naming, version and connection hook information about the cartridge. You can dig deeper into the cartridge creation documentation for more information, but for now let's just edit the metadata/manifest.yml file and make it look like this:

Name: httpd
Cartridge-Short-Name: HTTPD
Display-Name: HTTPD Cartridge
Description: "A httpd cartridge for development use only."
Version: '0.1'
License: "None"
Vendor: Custom Cartridges Inc
Cartridge-Version: 0.0.1
Cartridge-Vendor: customcarts
Categories:
  - service
  - web_framework
Provides:
  - httpd
Cart-Data:
  - Key: OPENSHIFT_HTTPD_EXAMPLE
    Type: environment
    Description: "An example environment variable using ERB processing"
Group-Overrides:
  - components:
    - httpd
Subscribes:
  set-db-connection-info:
    Type: "NET_TCP:db:connection-info"
    Required: false
Endpoints:
  - Private-IP-Name:   IP
    Private-Port-Name: PORT
    Private-Port:      8080
    Public-Port-Name:  PROXY_PORT
    Mappings:
      - Frontend:      ""
        Backend:       ""
        Options:       { websocket: false }

managed_files.yml

The managed_files.yml file contains a list of files that OpenShift should manage. Previously some may have known this as locked_files.txt but that has been deprecated for managed_files.yml which has additional features built in to it. Locked files dictates what files get owned as root and therefore can not be altered by the user. processed_templates should be a list of erb files to process as part of installing the system. Recall we made httpd.conf.erb in part-2 of this series.

---
locked_files:
- env/
- env/*
processed_templates:
- '**/*.erb'

Creating your control script

The control script (found in bin/control) will look familiar to anyone who has written an init script before. It takes some of the expected arguments as well like start, stop, restart, status and reload. In OpenShift we have implemented a few other functions which I'll go over now:

  • tidy: tidy should be implemented to cause your script to make efforts to clean up disk space. It could, for example, remove temp files, or rotate and compress old logs. There are gear-level tidy commands as well; for example, a git garbage collection (git gc) is done.
  • pre-build: When a user does a build, usually as part of a git push, pre-build gets run before the build
  • build: When a user does a build, the build script is called. This should attempt to auto-resolve deps, update cache, build the user's code, possibly even run tests, etc.
  • deploy: This is part of the deploy phase and should be run whenever the application is deployed (which would typically happen after it is built)
  • post-deploy: This script would typically be called after the application is deployed but before it is started.

For the purposes of this post, let's implement start, stop and restart. Pick your favorite editor and open the bin/control file. First, let's pick the start section. As you'll recall previously we started httpd with this command:

httpd -f ~/httpd-example/etc/httpd.conf

Perhaps a more correct command would have been:

/usr/sbin/httpd -f ~/httpd-example/etc/httpd.conf -k start

So let's put those into our functions. We want to replace the sections that look like:

function start {
  touch $MOCK_STATE/control_start
  touch $MOCK_STATE/mock_process
}

and change them to look like (remembering that /usr/sbin is not in the default PATH in OpenShift):

function start {
    /usr/sbin/httpd -f $OPENSHIFT_HTTPD_DIR/etc/httpd.conf -k start
}

Do the same for stop and restart. $OPENSHIFT_HTTPD_DIR is the same environment variable we used earlier to reference where the cartridge was installed (so ~/httpd/etc/httpd.conf would be used). While we're in there, let's also replace other MOCK specifics with something more generic. Something like this:

#!/bin/bash -e

PATH=/bin/:/usr/bin:$PATH

source $OPENSHIFT_CARTRIDGE_SDK_BASH

function start {
    httpd -f $OPENSHIFT_HTTPD_DIR/etc/httpd.conf -k start
}

function stop {
    httpd -f $OPENSHIFT_HTTPD_DIR/etc/httpd.conf -k stop
}

function restart {
    httpd -f $OPENSHIFT_HTTPD_DIR/etc/httpd.conf -k restart
}

function catchall {
    echo "not yet implemented"
}

case "$1" in
  start)       start ;;
  stop)        stop ;;
  restart)     restart ;;
  status)      status ;;
  reload)      catchall ;;
  tidy)        catchall ;;
  pre-build)   catchall ;;
  build)       catchall ;;
  deploy)      catchall ;;
  post-deploy) catchall ;;
  *)           exit 0
esac

exit 0

I would also recommend making the control script non-executable until we're confident it works. So a simple:

chmod -x bin/control

Will do the trick. The reason for this is that when a cartridge is installed, the default action is to start it. If the resulting control start fails, OpenShift will assume the cartridge installation process has failed. By leaving it non-executable we can ssh into the gear and run bin/control start manually and easily troubleshoot / catch any errors without having to check logs.

Next let's look at the setup script.

Creating setup

The setup script is called as part of installing the cartridge into the gear. It is run as the gear user and can help you further customize the cartridge. For example, let's say we want to pre-create a database or download a bunch of libraries. The setup script can help with this. In other cartridges we also use it to allow a single cartridge to support multiple versions. Take a look at the ruby cartridge for example: https://github.com/openshift/origin-server/tree/master/cartridges/openshift-origin-cartridge-ruby

In our case though, once the httpd cartridges are put in place there's very little we need to customize. In fact there's nothing we need to customize. So let's blank out the file and move on:

echo > bin/setup

I'll cover more advanced setup use cases in a future blog post. It's important to note that when doing a cartridge upgrade, setup may be called. We're building out a more advanced systems so cartridge writers can specify 'patch' level updates. But it's important to remember that an update could very well be a “copy the cartridge in place of what's there, and re-run setup”. So setup needs to be aware that it will be called multiple times, so make it re-entrant or idempotent.

Testing

Unfortunately as of the writing of this blog there's no way for you to test your cartridge on OpenShift Online directly. I promise it's coming. For now though, to test your cartridge you'll need to use OpenShift Origin (our free community edition) or OpenShift Enterprise (our paid enterprise edition). Unfortunately the current version of Enterprise does not use the new cartridge format but a soon to be released version will. For this demo I'll assume you have Origin already up and running. If not check out the Origin docs at http://openshift.github.com/

Great post! I like the walk-through explaining what is happening "behind-the-scences" which is vital for understanding how to hook/plug my custom cartridge into OpenShift.

%%%%%%%%%%%%%%%%%%%%%

Regarding "Cartridge Workflow step 2": currently this tutorial says:
"Once the gear is created, our cartridge gets copied into that gear. In the case of our mock cartridge - /var/lib/openshift/$GEAR_UUID/mock/".

Should it say instead:
"[...] /var/lib/openshift/$OPENSHIFT_GEAR_UUID/mock-0.1/"? (notice different environment variable and different cartridge type name. The tutorial states that the application was created using rhc app create -a mymock -t mock-0.1).

%%%%%%%%%%%%%%%%%%%%%

Regarding the specification of locked files: is that specification file called locked_files.txt (as it is named in this tutorial) or managed_files.yml (as can be seen in some cartridges' public git repositories)?

%%%%%%%%%%%%%%%%%%%%%

metadata/snapshot_*.txt doesn't seem to exist in the mock cartridge.

That's correct, this post was written before that part of the spec changed. I'll update the post now (managed_files.yml is correct and $GEAR_UUID was an oversight though it should be /mock/ not /mock-1.0/).

Also stay tuned, part 3 is coming early next week and will show you how to test cartridges in openshift online! No need to setup origin/enterprise first to test your cartridge.

Where is Part 3?

I think the quoting may be typoed for replacing the listen address with the openshift variable.

Nice catch! Updated the blog to remove the extra "

Any update when Part 3 will be posted?

We're hoping to publish part 3 later this week or early next week. Stay tuned.

As i was following this tutorial at the command: cp -adv ~/httpd-example/* ./

I wasn't sure if this should have been: cp -adv ~/httpd-example/* ~/httpd-example-cartridge/

From where the last command left off, the current working directory is:
~/httpd-example-cartridge/template/.openshift/action_hooks/

The command in the tutorial implies: cp -adv ~/httpd-example/* ~/httpd-example-cartridge/template/.openshift/action_hooks/

Is this correct? pretty sure i just polluted the /action_hooks/ directory and assumed to start over?

Also, and forgive me for being presumptuous and not completing part 2 yet either, but any w3rd on part 3?

Thanks, %

More from this author