Automatically scale HAProxy with confd and etcd

Load balancing with HAProxy is pretty easy; today we're going to use etcd and confd to automatically configure cluster nodes to make things more elastic.

For the unfamiliar, etcd is "a highly-available key value store for shared configuration and service discovery" built by the guys over at CoreOS. Each node in our cluster (which will be a CoreOS machine) will run etcd by default allowing units deployed to the cluster to register themselves when they start up and remove themselves when they shutdown.

Confd is a configuration management tool that pulls data from etcd at set intervals and is reponsible for generating updated configs when it detects a change.

Cluster configuration

The example cluster we're going to use looks a little like this:

1 Machine running Fedora

This is going to be our loadbalancer. Im choosing Fedora for this one machine because it comes with systemd by default which is going to make it super easy to setup HAProxy and confd. We also don't necessarily want this machine updating all the time like our CoreOS machines will; we want it to remain relatively static and we need it to keep a static IP address. This of course could be remedied by having multiple loadbalancers.

3 CoreOS nodes

For this test, we're going to run a cluster of CoreOS machines that will run our etcd cluster. When running etcd, its a good idea to run at least 3 machines in order to maintain quorum across the cluster. We're also going to be using fleet (which also uses etcd) to schedule our test webservice to the cluster.

Note: to make configuring things easier, I will be using AWS and providing a cloud-config file when creating these machines.

Creating a CoreOS cluster

For the CoreOS cluster, we're going to provide some initialization data via a cloud-config file. This will tell CoreOS to start things like fleet, etcd, and docker and will also provide etcd with the discovery endpoint to use (note, this is etcd 0.4.6, not the new and improved 2.0 [yet]).

Note: you'll need to generate a discovery token by going to https://discovery.etcd.io/new

#cloud-config
 
coreos:
  etcd:
    discovery: https://discovery.etcd.io/<put your token here>
    addr: $public_ipv4:4001
    peer-addr: $public_ipv4:7001
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start
      metadata: type=webserver

When running this on AWS, make sure to open up the necessary ports for etcd (4001 and 7001) as well as the ports for your application.

Setting up HAProxy and confd

Now that our CoreOS cluster is running, we're going start up a Fedora based machine to run HAProxy and confd. In this case, I picked Fedora 21 as that was the most up to date version I could find on AWS.

The latest version of HAProxy (1.5.x) is available as an RPM and can be installed using yum:

yum install haproxy.x86_64

The latest version in this case is 1.5.10

The config for HAProxy is located at /etc/haproxy/haproxy.cfg. What we're going to do now is install confd which will overwrite the config, so you may want to save the default config to reference later.

confd - installation and configuration

We're going to be installing version 0.7.1 which can be fetched from the releases page on the confd Github page. The release is a pre-built confd binary, so we don't need to worry about building it ourselves.

curl -OL https://github.com/kelseyhightower/confd/releases/download/v0.7.1/confd-0.7.1-linux-amd64

mv confd-0.7.1-linux-amd64 confd

cp confd /usr/bin && chmod +x /usr/bin/confd
cp confd /usr/sbin && chmod +x /usr/sbin/confd

Running the above commands will download the binary from Github, copy it to /usr/bin and /usr/sbin and make it executable. If you were to just run confd you'd get some errors that look like this:

2015-01-30T18:51:54Z confd[840]: WARNING Skipping confd config file.
2015-01-30T18:51:54Z confd[840]: ERROR cannot connect to etcd cluster: http://127.0.0.1:4001

By default, confd will look for a config file in /etc/confd. The structure for /etc/confd will look something like this:

├── confd
│   ├── conf.d
│   │   └── haproxy.toml
│   ├── confd.toml
│   └── templates
│       └── haproxy.cfg.tmpl

confd.toml is the overall config for confd which will describe the backend we want to use (etcd), the interval to poll it at, the config directory, etc.

confd.toml

confdir = "/etc/confd"
interval = 20
backend = "etcd"
nodes = [
        "http://<address that points to one of your CoreOS nodes>:4001"
]
prefix = "/"
scheme = "http"
verbose = true

The "nodes" property needs at least one node specified and should point to one of your CoreOS nodes. You could also list each of your three nodes here so that if confd isn't able to reach one, it will try one of the others.

Also a thing to note is the "interval" value. Here we're telling confd to poll etcd for changes every 20 seconds.

Now lets look at the HAProxy specific config located at /etc/confd/conf.d/haproxy.toml

[template]
src = "haproxy.cfg.tmpl"
dest = "/etc/haproxy/haproxy.cfg"
keys = [
        "/app/your_awesome_app"
]
reload_cmd = "echo restarting && /usr/bin/systemctl reload haproxy"

The "keys" attribute lists the various keys within etcd we want confd to monitor. When we launch our app on our CoreOS cluster, each unit file will register itself with etcd by creating a key in the /app/your_awesome_app directory that contains information to insert into the HAProxy config (it's IP address and port to forward traffic to).

The "reload_cmd" attribute is an optional command that confd can run whenever it writes a change to your config. Here we're

Now lets take a look at what the HAProxy template will look like (/etc/confd/templates/haproxy.cfg.tmpl)

global
    log 127.0.0.1    local0
    log 127.0.0.1    local1 notice
    maxconn 4096
    user haproxy
    group haproxy
    daemon
    stats socket /var/run/haproxy.sock mode 600 level admin    

defaults
    log    global
    mode    http
    option    httplog
    option    dontlognull
    retries    3
    option redispatch
    maxconn    2000
    contimeout    5000
    clitimeout    50000
    srvtimeout    50000
    option forwardfor
    option http-server-close

frontend stats
    bind *:8888
    stats enable
    stats uri /

frontend http-in
    bind *:80
    default_backend application-backend

backend application-backend
    balance leastconn
    option httpclose
    option forwardfor
    cookie JSESSIONID prefix

    {{range getvs "/app/your_awesome_app*"}}
    server {{.}} cookie A check
    {{end}}

Most of this is boilerplate from the default HAProxy config, so the sections we want to look are the frontend and backend at the bottom.

frontend http-in
    bind *:80
    default_backend application-backend

backend application-backend
    balance leastconn
    option httpclose
    option forwardfor
    cookie JSESSIONID prefix

    {{range getvs "/app/your_awesome_app*"}}
    server {{.}} cookie A check
    {{end}}

With our frontend, we're accepting all traffic on port 80 and sending it to the "application-backend". Here we have some Go templates (confd is written in Go; this template will loop over the keys in the etcd directory we defined and print out their value. (You can find more template examples here in the confd docs)

Running confd using systemd

Since we need confd to constantly be monitoring etcd, we're going to use systemd to manage it. This way, if confd crashes or if the machine restarts, confd will always come back up.

Lets create the file /etc/systemd/system/confd.service

[Unit]
Description=Confd
After=haproxy.service

[Service]
ExecStart=/usr/bin/confd
Restart=always

[Install]
WantedBy=basic.target

If you're unfamiliar with systemd's unit files, I would highly suggest reading the docs as there are a lot of available options and configurations. This one is pretty simple though. We're telling systemd where to find the confd binary and to always restart if the process dies. The line WantedBy=basic.target tells systemd to start the process on boot as well.

Now we can install and activate the service:

sudo systemctl enable /etc/systemd/system/confd.service
sudo systemctl start /etc/systemd/system/confd.service

Enabling our unit will symlink the file to /etc/systemd/system/basic.target.wants so that it starts on boot. Calling systemctl start actually starts it for the first time.

If you want to see the log output, you can do so by running:

journalctl -f -u confd.service

Registering you app with etcd

As an example service, we're going to look at a project I have called "stupid-server". Its a simple webserver written in NodeJS. Theres a docker container over on quay.io that we'll be using and scheduling on our cluster using fleet.

stupid-server@.service

Here's what our unit file will look like:

[Unit]
Description=Stupid Server
After=docker.service
Requires=docker.service

[Service]
ExecStartPre=/usr/bin/docker pull quay.io/seanmcgary/stupid-server:latest
ExecStart=/usr/bin/docker run --name stupidservice -p 9000:8000 quay.io/seanmcgary/stupid-server
ExecStopPre=/usr/bin/docker kill stupidservice
ExecStop=/usr/bin/docker rm stupidservice
TimeoutStartSec=0
Restart=always
RestartSec=10s

[X-Fleet]
X-Conflicts=stupid-server@*.service

Each time we start the unit, we'll try to pull the latest container from quay then proceed with actually starting the server. Now we're going to modify it to register itself with etcd when it starts and de-register when it stops.

[Unit]
Description=Stupid Server
After=docker.service
Requires=docker.service

[Service]
ExecStartPre=/usr/bin/docker pull quay.io/seanmcgary/stupid-server
ExecStart=/usr/bin/docker run --name stupidservice -p 9000:8000 quay.io/seanmcgary/stupid-server
ExecStartPost=/bin/bash -c 'etcdctl set /apps/stupid_server/%n "%p-%i $(curl http://169.254.169.254/latest/meta-data/public-ipv4/):9000"'
ExecStopPre=/usr/bin/docker kill stupidservice
ExecStop=/usr/bin/docker rm stupidservice
ExecStopPost=/bin/bash -c 'etcdctl rm /apps/stupid_server/%n'
TimeoutStartSec=0
Restart=always
RestartSec=10s

[X-Fleet]
X-Conflicts=stupid-server@*.service

These are the two lines of note:

ExecStartPost=/bin/bash -c 'etcdctl set /apps/stupid_server/%n "%p-%i $(curl http://169.254.169.254/latest/meta-data/public-ipv4/):9000"'
ExecStopPost=/bin/bash -c 'etcdctl rm /apps/stupid_server/%n'

After our service starts, we make a curl request to the AWS metadata service to get the public IP of the machine that we're on (you can also get the private IP if you want) to build the name/IP of the server that will be written to the HAProxy config. The key/value that gets written to etcd looks like this:

Key: /apps/stupid_server/stupid-server@1.service
Value: stupid-server-1 10.10.10.10:9000

Note that the actual IP will be whatever the IP of the machine is.

On the ExecStopPost line, we delete the key from etcd which in turn will cause confd to recompile the config and reload HAProxy.

Start your server

Now, we can actually start our server by submitting it to fleet

fleetctl start stupid-server@1.service

Thats it! Now we can start as many stupid-servers as we want and they'll automatically show up in HAProxy when they start.