How HAProxy Scales OpenShift Apps

The binary code is the ground In this blog post we'll take a look at the basic components that make scaling possible on OpenShift and what it does for us.

What is HAProxy?

Let's start with a citation from HAProxy's homepage

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with todays hardware. 

if you find this a bit cryptic, take a look at this picture that explains it's purpose

HAProxy Proxy mode

HAProxy forms an entry point for your web application. HAProxy takes an incoming request and proxies it to one of the preconfigured backends. Yes, as simple as that.

Is it good? I've Never Heard of It!

Sometimes I hear people say they have never heard of HAProxy and thus are't sure if it's production ready.

Well, let's take a look at a page of all the people using it, where we can see that many high-profile companies are using it: DISQUS, Github, Zynga, Instagram or Twitter.

And no, it's not a new product. HAProxy's history dates back to 1996 when the first version was created.

HAProxy is a robust, well-tested and production ready product.

Inside of HAProxy

HAProxy uses a well known event-driven architecture. It is designed for Unix systems (sorry Windows) and utilizes the best features of these system to provide the best performance possible. HAProxy is a reverse proxy server and nothing else. Thus, the architecture is built to do one thing, and it excels at this one thing.

With a single process architecture, HAProxy lowers the cost of context-switches and memory usage while not not locking up resources with long blocking operations.

According the claims on the official web site, it can easily handle "between 3 and 4 Gbps of traffic 24 hours a day," handle 100k HTTP requests per second, or saturate 10Gps network.

HAProxy is designed for speed and performance on Unix systems.

Running HAProxy

If you have a Unix based system, you can test-run HAProxy yourself on your own machine:

In the case of compilation, the result consists of single binary and a possible configuration file. Very lean and minimal.

The configuration file can be as simple as:

    global
        daemon
        maxconn 256
 
    defaults
        mode http
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms
 
    frontend http-in
        bind *:80
        default_backend servers
 
    backend servers
        server server1 127.0.0.1:8000 maxconn 32

This configures HAProxy to listen on port 80 for HTTP requests and proxy those to server1 on the port 8000 of the same machine.

More details are in the configuration manual.

HAProxy & OpenShift

Now that we understand more about HAProxy, let's take a look at it's place in OpenShift. It is said that one picture is worth a thousand words.

How OpenShift Scales

Next, let's look at scalable applications. To learn more about how to enable app scaling take a look at this post I recently did on scaling a Ruby application.

When a request comes, it is translated by OpenShift's load-balancing layer from the domain name (Host: HTTP header) into an internal IP address and is proxied there.

Then the HTTP request hits the main gear on port 8080 where HAProxy is listening. It is configured in a way that makes it aware of all the gears that are provisioned for that particular application. HAProxy's task is to load-balance the request among all those gears, and that is also the use-case it is designed for.

As your application scales up and down, the HAProxy is automatically configured by OpenShift to be aware of all actively deployed gears. Thus the user does not need to be know that scaling is happening and also the application developer can sleep peacefully know that OpenShift is handling all the configuration automatically.

Our recent blog post Scaling In Action on OpenShift shows how how to monitor HAProxy while scaling your application.

Conclusion

HAProxy is currently an integral part of OpenShift and forms the foundation for balancing scalable applications. OpenShift automates the configuration of HAProxy and thus relieves the developer (or deploying sysadmin) of the need of learn another piece of infrastructure.

Go ahead and grab a drink with your friends while OpenShift serves your application to your customers.

What's Next?