Scaling in Action on OpenShift

Today's post is a follow up to a talk I just gave at OpenWest Conference in Utah on "Scale or Fail". While I don't like the title (I will plead that someone else submitted the talk), I had a lot of fun preparing and giving the talk. The idea of the talk was to talk a bit about what scaling means and then show auto-scaling in action on OpenShift.

If you want to see my deck I am hosting it on OpenShift using reveal.js. Here are all my talks that I have moved over and here is the specific scaling talk.

Most of you can go read all sorts of things about scaling and "Web Scale" all over the internet so I am not going to spend this post trying to do a lot of "teaching" about Scaling.

Types of Scaling

Vertical versus horizontal application scaling

The two types of scaling I covered in the talks were horizontal versus vertical scaling. In my talk (and in the image above) I compared vertical scaling to basically being a sumo wrestler. When traffic picks up (the little white kid) you basically increase the size of your server resources - more RAM, more CPU, faster disk - bigger, stronger, faster. This represented the way most of us learned to scale - throw a bigger server at it.

As hardware has become commoditized and networks have gotten faster we have started to move to horizontal scaling - which I compared to a swarm of ants. Each piece of your server infrastructure is cheap, small, and loosely coupled. When the load gets bigger you just throw more ants at the problem. So as the load increases on your services, rather than bringing in a bigger server you just start adding more cheaper generic servers. If a server dies, just swap it out and throw in a new one.

The strategy you pick for scaling depends a lot on 1. Your budget 2. Your use case. With vertical scaling you have tighter coordination of resources and less moving parts - I used the example of a database at a bank. A social checkin service would be a great example for horizontal scaling because each individual transaction is not critical and there is a ton of read activity.

When you move to "cloud hosting" such as a PaaS, your main scaling strategy should shift to horizontal scaling. As a matter of fact, that is how you would achieve scaling with OpenShift.

The Application and Scaling on OpenShift

The way we help you scale on OpenShift is by automagically adding new servers as the number of HTTP connections increases on your application tier. Right now we do not support scaling at the data tier though you can get that with our partner MongoLabs and you can expect a solution to be in place by the end of the year.

How OpenShift Scales

The basic idea on OpenShift is that when you create a scaleable application on OpenShift we add HAProxy, a software load balancer, to the main application gear. We also create a gear group for each tier in your application. In this way you can say your application tier scale from 1 to 5 gears while your database tier scales to only 1 gear.

As your application starts to get more HTTP connections than the single application tier gear can handle, OpenShift automatically spins up another gear with the same configuration as the current gear, plugs it into HAProxy, brings the gear down to rsync the content over to the new gear, brings it back up, and then HAProxy starts sending connections over to it. It will keep doing this if more connections pile up again up to the limit you have set on the number of gears you have allocated for the gear group. The criteria OpenShift uses for a scale up event is if you have more than 10 concurrent HTTP connections to your Application.

We then watch connections over time and when a gear is idle for a while, OpenShift then spins down the gear and removes it from HAProxy. All of this happens without any intervention on your part.

If you want to see what HAProxys is doing on your application all you need to do is to add haproxy-status to the end of your base url (-.rhcloud.com/haproxy-status ).

Enough talk, let's create a scaleable app

Ok, rather than just talk about this let's get a scaleable app going and then load test it. For the application today I am just going to take the example I wrote to show a spatial REST web service using Python Flask and MongoDB. The github repository has instructions on how to build the app but there is one difference we need to do today. YOU NEED TO ADD THE -s FLAG to the app creation step.

   rhc app create pythonws python-2.6 mongodb-2.2 -s

You will need to make sure you have at least 2 free gears in your account to do this. One gear is needed for python (Apache with mod_wsgi) and HAProxy and another gear for MongoDB.

Then follow the steps for adding the upstream github repo, push the new code up to your gear, import the data into mongo, add the index, and you now have a scaleable python application. You are ready for WEB SCALE!.

Blazemeter and Load Testing

Now that we are ready with a web scale application - how do we actually go about testing it? Well on one of my trips to Israel I met with the fine folks at Blazemeter and remembered the fantastic testing application they stand up as a service. They have basically stood up an Apache JMeter compatible service in "the cloud". No need for your to set up servers, no need to learn some new language, and pay by what you need with no fixed costs. You can read all the nifty parts of their service on their features page.

I can not say enough good things about how easy it was to sign up (instant access) and then make a test. All the test configuration happens on one screen.

If you have an already existing JMeter script you can just upload and use that. Instead today we are going to click on "GET and POST requests" icon and use that screen.

First you need to pick a name for your test (since you can have multiple tests) and then you pick a data center where you want your requests to originate from. It is great to look at load and response time for your service throughout the world without having to stand up a data center in each of those locations. I am going to call my test "My great test" and pick U.S. East for the data center, the same data center as OpenShift, since I really want to look at load, not response time or latency.

Those two fields are hightlighter in green in the following picture:

Blazemeter test creation screen 1

After this we then add the urls we want to test. I recommend adding the request that gives back all the parks:

http://<appname>-<namespace>.rhcloud.com/ws/parks 

and some near queries (make sure to name them different names)

http://pythonws-spdemo.rhcloud.com/ws/parks/near?lat=45.5&lon=-82

http://pythonws-spdemo.rhcloud.com/ws/parks/near?lat=45.5&lon=-111.17

Now let's set some of the load parameters. Set the number of users to maybe 10 or 20 with 5 seconds between requests if you only have 3 gears total. In my load testing, with 50 users and 2 seconds between requests, I ended up using 6 Python gears and 1 MongoDB gear.

You want to set the load type to extreme stress as this will ramp up the load to the maximum load amount very quickly (like getting on the front page of the App Store).

You screen should now look like the following image:

Blazemeter test creation screen 2

After you are done with this you hit save at the bottom of the page and you are all done setting up your test.

You should immediately be brought to a page where you can start your test. All you have to do is click the start button on the top of the page (highlighted in green below). Go ahead and hit the button and then wait a little while as the tests spin up. After waiting go head to the Load Report button to watch the test progress.

Blazemeter test creation screen 3

What you see if you play the home game

Once the tests really start up there are several pages you are going to want to look at.

  1. Keep looking at the BlazeMeter logging page to watch what is happening with requests and other metrics

  2. You will want to keep up some of the URLs in your browser for the actual service requests. By hitting these during testing you can get a feel for what is actually happen when other users try to hit your app. For me, it was great to see no change in app responsiveness even when I was hitting the servers at peak load. You can play around with some of the values on the near query to make sure the results are not cached.

  3. You are also going to look at your HAProxy status page.

    http://-.rhcloud.com/haproxy-status

Be sure to look at this page before you start your testing. It is hard to find good documentation on how to interpret this page but the import part for today is the bottom table (the table with the red express tab).

HAProx status on OpenShift

When you look at this table for the first time there will be four rows. The one you care about is the one called local-gear. This is the gear that is your Python app. server and HAProxy gear. Since we have not hit the applicaiton with any load yet you will notice there is only one gear.

The columns you will want to watch are Sessions and Bytes. Sessions shows how many users are connecting to the application and Bytes is how much data is being streamed by the app. Again, I am not entirely clear on what these columns means and how to interpret them but as the load goes up these numbers will increase.

Once you start the testing, after about 3 to 4 minutes, you can refresh this page and you should start to see more green rows serving up content. This is OpenShift automatically adding in new gears and then using them to serve content.

HAProx status on OpenShift

If you happen to refresh quite often as gears are being added you will occasionally see a row (gear) added as green, then turn red, then back to green. This corresponds to OpenShift bringing a gear online, then bringing it offline to rsync the contents, and then bringing it back online again to serve content.

HAProx status on OpenShift

In the end you should see quite a few number of green rows serving up content. You can leave the test run for the full hour if you want or you can shut it down after you have seen enough. Remember to hit some of the URLs on your own during the testing so you can see what users would be experiencing during the scaled up load.

HAProx status on OpenShift

At the end of the testing, if you wait a little while, you will see that all of your scaled up gears will be spun down, and you will just go back to having two gears running - the MongoDB gear and the Python gear.

Conclusion

Besides being in awe of how easy it was to build and test a scalable application, I was also intrigued by the performance results. Despite the fact that we did not scale up the MongoDB tier, we never saw a slow down in performance for serving up requests. What I think this demonstrates is that for many applications which are read heavy, the database tier is not going to be the bottleneck for serving up requests. I think this is predominantly due to the fact that the test was hitting the same queries over and over again so MongoDB held the results in cache. My application was able to serve up 50 users hitting the app basically every two seconds and making large data requests.

I also hope you saw how easy it was to load test your application with Blazemeter. Within a minute of signing up I had configured and begun load testing my application. No need to find and configure machines or EC2 instances. Just simple and easy forms, perfect for developers who don't like to sysadmin.

I certainly plan to play with more of the features built into Blazemeter in the future. When I get back to my spatial application it will be a great way to look at how much resources are needed for what kind of load. If I was a Drupal dev. I would certainly be digging deeper into Blazemeter's specialized Drupal libraries. Maybe we can get them to do a blog piece with more in depth coverage of their Drupal integration.

In real life you would probably put a caching layer in front of this, especially for the query which returns all the parks in the mongoDB collection. In this way you would reduce the load on the application server and return results faster. There would be several ways to achieve this caching. You could use a content delivery network (CDN) which would take your content and put it on their network to serve up. The problem with this is how to cache the dynamic content with near queries.

You could also install memcache or something similar key-value store on OpenShift. The two ways of doing this would be through either adding a DIY cartridge to your application but you would have to do that as another OpenShift application. Because we had made a scaling application, the memcache'ed DIY should be able to see your original Python application. The other option would be to install the binaries for Memcache in your $OPENSHIFT_DATA_DIR and then start and stop the memcache server using start and stop hooks in git repository.

In the end I hope you enjoyed building and testing autoscaling on a PaaS. Think about how much work would have be involved trying to do this yourself. How many phone calls would you have gotten when your site was Slashdotted or in the top 10 on the app store. Remember all of this happened with no human intervention. Of course we will have more blog posts soon to talk about scaling and application design - so stay tuned.

What's Next?