As you know by now, this year’s Red Hat Summit 2020 was virtual. This allowed for people all over the world to attend our live Keynote Sessions. In one of the keynotes, we did a demonstration involving a “Guess That Price” game. The goal of the game was simple, a Red Hat swag item was presented to the attendees, and you had to guess the price either by entering a number or drawing on the screen (more on that later). The game continuously, in real time, tallied up the results and displayed them to the players. While the concept of the game was simple, the architecture and implementation of it was fairly sophisticated.

This game had different components such as TensorFlow, Skupper, PostgreSQL, Infinispan, NodeJS, Quarkus, and Kafka managed by Advanced Cluster Management for Kubernetes with the OpenShift deployed both on-premises(bare metal and OpenStack) and across many public clouds(AWS, GCE, Azure, and IBM cloud). As you can expect this introduced some interesting challenges to accommodate those attendees. We knew there would be thousands of attendees playing the game, but we didn’t know exactly how many, and from where. So our solution was a globally scaled out architecture that could account for worst case scenarios.

As the engineers responsible for constructing the demo, we had to come up with a strategy to route attendees to the most appropriate cluster. OpenShift clusters were deployed in Sydney, New York, Singapore, São Paulo, San Francisco and London. We needed to provide routing allowing for a low latency experience to the nearest OpenShift cluster and in one case, demonstrating the failover of a datacenter. We also needed to make sure that we could accommodate the amount of traffic that Summit brings in while also ensuring security was in place to stop malicious users trying to ruin the experience for others.

Geo-DNS

The two main considerations were having a single ingress entrypoint and routing users based on latency. We wanted to ensure that users playing the game in Australia were routed to the game on their continent. Some DNS providers such as Amazon Web Services Route 53 and Cloudflare for example, allow for Geolocation and Geo-steering which allows for traffic to be routed to specific regions. For Summit, we chose to use Geolocation within Route53 to route traffic to specific continents and countries. This allowed us to define specific mappings of players to clusters.

For each of our 6 locations we deployed HAProxy to allow us to define tighter restrictions for the traffic. Within Route53 we added an A record pointing to the HAProxy and then defined a specific geographic location.

  • New York
  • São Paulo
  • San Francisco
  • Singapore
  • Sydney
  • London

We supplied the Elastic IP address of our HAProxy instances to Route53 instead of using an Elastic Load Balancer as we could not properly manage session stickiness when an ELB was in front of HAProxy. Below you will see an example of the Route 53 entry for the United Kingdom.

 

In the end we had 10 specific geolocation DNS entries. But why 10 entries when we only had 6 deployments? We wanted to take full advantage of the Geolocations and divide the traffic even more precisely than just by continent. Because of the time that we presented the keynote we knew San Francisco would have very limited local traffic so we sent all non-US North American traffic to San Francisco and also set San Francisco as the “default” geolocation cluster meaning that if rules did not match the specific countries and continents then traffic was sent to the San Francisco cluster. Doing a dig in various regions showed the unique IP addresses for redhatkeynote.com allowing us to verify that the geolocation entries were indeed working.

# Holtsville NY, United States
dig +short @208.67.222.220 redhatkeynote.com
35.175.65.228

# Johannesburg, South Africa
dig +short @197.189.234.82 redhatkeynote.com
13.236.85.83

# Monterrey, Mexico
dig +short @200.56.224.11 redhatkeynote.com
3.101.23.244

# Madrid, Spain
dig +short @194.224.52.37 redhatkeynote.com
52.77.254.51

# Gloucester, United Kingdom
dig +short @109.228.25.186 redhatkeynote.com
35.177.212.1

HAProxy

The back end load balancer used for each location was HAProxy. We could have routed game traffic directly using the Route53 entries to the OpenShift cluster but that would have lacked some traffic management features that we required.

We opted for HAProxy load balancers for stickiness, security, performance and controlled failover. It also provided SSL offloading for the front end connection of the website. Two interesting configurations were in New York and London.

We were concerned with the load coming in from New York based on the time zone of the demo and the location of the summit registrations. To address this possible large influx of requests we opted for a round-robin back end including both east and west coast clusters and utilizing stick tables to ensure the user’s game server persistence.

Here is the front and backend HAProxy configuration for New York.

frontend https 
 bind *:443 ssl crt /etc/ssl/private/redhatkeynote.com.pem  no-sslv3
 mode http
 http-request capture hdr(Host) len 100

  stick-table type ip size 1m expire 60m store conn_rate(3s),conn_cur,gpc0,http_req_rate(10s),http_err_rate(20s)

  acl abuse src_http_req_rate(https) ge 10 
 acl flag_abuser src_inc_gpc0(https) -m bool
 acl scanner src_http_err_rate(https) ge 10
 http-request deny if abuse flag_abuser scanner
 capture request header X-Forwarded-Proto len 100
 capture request header X-Forwarded-Host len 100
 capture request header Host len 100
 maxconn 10000
default_backend ny-backend

backend ny-backend
mode http
# source ip stick table for connectivity
 stick-table type ip size 1m expire 60m
 stick on src 
 balance roundrobin 
 option http-keep-alive
 option forwardfor
 option httpchk GET / HTTP/1.1\r\nHost:redhatkeynote.com
 cookie SERVERID insert indirect nocache
 http-request set-header X-Forwarded-Port %[dst_port]
 http-request add-header X-Forwarded-Proto https if { ssl_fc }
 http-request set-header X-Forwarded-Host %[req.hdr(Host)]
 http-request set-header Host %[req.hdr(Host)]
# Game capped users at 2000 per location
 server nyc game-frontend.apps.summit-gcp-ny1.redhatgcpkeynote.com:80 check maxconn 2000
 server sf1 game-frontend.apps.summit-aws-sf1.openshift.redhatkeynote.com:80 check maxconn 2000

The stick table sources based on IP address, stores up to 1 million requests and stores requests for 60 minutes. The `http_req_rate` counter keeps track of http requests over a span of 10 seconds while the `http_err_rate rate` counter tracks http error return codes. Both counters are evaluated against an IP address greater than 10 individually then denied http access from the front end by ACLs. The HAProxy website has more information on other security measures like the ones above.

  stick-table type ip size 1m expire 60m store conn_rate(3s),conn_cur,gpc0,http_req_rate(10s),http_err_rate(20s)

  acl abuse src_http_req_rate(https) ge 10 
 acl flag_abuser src_inc_gpc0(https) -m bool
 acl scanner src_http_err_rate(https) ge 10
 http-request deny if abuse flag_abuser scanner

London has a similar configuration to New York except it requires a failover server.

In addition, we added a CNAME of stage.redhatkeynote.com to allow our awesome presenters, Burr Sutter and Tracy Rankin, to use the London cluster even though they were located within the United States bypassing the Geolocation DNS entries.

backend london 
 mode http 
 balance roundrobin 
 option http-keep-alive
 option forwardfor
# Rewrite host header to keep application available with subdomain
 acl is_stage hdr_dom(host) -i stage.redhatkeynote.com
 http-request set-header Host redhatkeynote.com if is_stage
 option httpchk GET / HTTP/1.1\r\nHost:redhatkeynote.com
 cookie SERVERID insert indirect nocache
 http-request set-header X-Forwarded-Port %[dst_port]
 http-request add-header X-Forwarded-Proto https if { ssl_fc }
 http-request set-header X-Forwarded-Host %[req.hdr(Host)]
 http-request set-header Host %[req.hdr(Host)]
 server lnd game-frontend.apps.summit-aws-lnd1.openshift.redhatkeynote.com:80 check
# Set Frankfurt as the backup server in the event that London fails
 server fft game-frontend.apps.summit-gcp-ffm1.redhatgcpkeynote.com:80 check backup

OpenShift Routes

Each OpenShift cluster contained a route entry for redhatkeynote.com which mapped to our game-ui service. This is important also for the SSL certificate generated for redhatkeynote.com and was used by HAProxy to verify the OpenShift cluster was indeed serving the game by using the ```option httpchk GET / HTTP/1.1\r\nHost:redhatkeynote.com```.

To perform the failover of London to Frankfurt we removed both the OpenShift route and the deployment from the London cluster which caused the health check in HAProxy to fail routing the traffic to Frankfurt.

Useful Troubleshooting Tools

Verifying the DNS records were correct was somewhat challenging without the ability to ensure the correct server was contacted geographically. The following website was helpful to quickly test record location:

https://dnschecker.org/#A/redhatkeynote.com

Tunnelbear is a browser based VPN connection service that allows for connections across the world. By utilizing Tunnelbear we were able to actually play the game in the other regions to verify we were routed to the correct game endpoints.

https://www.tunnelbear.com/

Summary

This post summarized how you can leverage Red Hat’s OpenShift, external services such as Geolocation DNS and HAProxy to create a global traffic management solution capable of providing services to thousands of users with no downtime while even introducing an outage in one of the locations. This summit demonstration was so engaging and popular that Burr had to pause the game to get the attendees’ attention back!

To see the full HAProxy configuration files check out https://github.com/RHsyseng/summit-haproxy. If you haven’t checked out the live demonstration, check out and share https://www.pscp.tv/w/1ynJOppLNeWxR.