In this blog post we'll take a look at the basic components that make scaling possible on OpenShift and what it does for us.
What is HAProxy?
Let's start with a citation from HAProxy's homepage
HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with todays hardware.
if you find this a bit cryptic, take a look at this picture that explains it's purpose
HAProxy forms an entry point for your web application. HAProxy takes an incoming request and proxies it to one of the preconfigured backends. Yes, as simple as that.
Is it good? I've Never Heard of It!
Sometimes I hear people say they have never heard of HAProxy and thus are't sure if it's production ready.
Well, let's take a look at a page of all the people using it, where we can see that many high-profile companies are using it: DISQUS, Github, Zynga, Instagram or Twitter.
And no, it's not a new product. HAProxy's history dates back to 1996 when the first version was created.
HAProxy is a robust, well-tested and production ready product.
Inside of HAProxy
HAProxy uses a well known event-driven architecture. It is designed for Unix systems (sorry Windows) and utilizes the best features of these system to provide the best performance possible. HAProxy is a reverse proxy server and nothing else. Thus, the architecture is built to do one thing, and it excels at this one thing.
With a single process architecture, HAProxy lowers the cost of context-switches and memory usage while not not locking up resources with long blocking operations.
According the claims on the official web site, it can easily handle "between 3 and 4 Gbps of traffic 24 hours a day," handle 100k HTTP requests per second, or saturate 10Gps network.
HAProxy is designed for speed and performance on Unix systems.
If you have a Unix based system, you can test-run HAProxy yourself on your own machine:
- it may either be packaged for your favourite system
- you can download the source code and compile it yourself
In the case of compilation, the result consists of single binary and a possible configuration file. Very lean and minimal.
The configuration file can be as simple as:
global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend http-in bind *:80 default_backend servers backend servers server server1 127.0.0.1:8000 maxconn 32
This configures HAProxy to listen on port 80 for HTTP requests and proxy those to server1 on the port 8000 of the same machine.
More details are in the configuration manual.
HAProxy & OpenShift
Now that we understand more about HAProxy, let's take a look at it's place in OpenShift. It is said that one picture is worth a thousand words.
Next, let's look at scalable applications. To learn more about how to enable app scaling take a look at this post I recently did on scaling a Ruby application.
When a request comes, it is translated by OpenShift's load-balancing layer from the domain name (Host: HTTP header) into an internal IP address and is proxied there.
Then the HTTP request hits the main gear on port 8080 where HAProxy is listening. It is configured in a way that makes it aware of all the gears that are provisioned for that particular application. HAProxy's task is to load-balance the request among all those gears, and that is also the use-case it is designed for.
As your application scales up and down, the HAProxy is automatically configured by OpenShift to be aware of all actively deployed gears. Thus the user does not need to be know that scaling is happening and also the application developer can sleep peacefully know that OpenShift is handling all the configuration automatically.
Our recent blog post Scaling In Action on OpenShift shows how how to monitor HAProxy while scaling your application.
HAProxy is currently an integral part of OpenShift and forms the foundation for balancing scalable applications. OpenShift automates the configuration of HAProxy and thus relieves the developer (or deploying sysadmin) of the need of learn another piece of infrastructure.
Go ahead and grab a drink with your friends while OpenShift serves your application to your customers.