Sean McGary

Software Engineer @ ThirdLove.com, builder of webapps

Deploying Docker Containers on CoreOS Using Fleet

by on

Docker containers are the hot tech du jour and today we're going to look at how to deploy your containers to CoreOS using Fleet.

In my previous post, I talked about how to deploy a NodeJS application using a pretty vanilla Docker container. Now Im going to walk you through what CoreOS is and how to deploy your containers using Fleet.

What is CoreOS?

The masthead on the CoreOS website puts it perfectly:

Linux for Massive Server Deployments. CoreOS enables warehouse-scale computing on top of a minimal, modern operating system.

Without all the buzzwords, CoreOS is a stripped down Linux distro that gives you a bare-bones environment designed to run containers. That means you effectively get Systemd, Docker, fleet and etcd (as well as other low level things), all of which play a role in deploying our containers.

CoreOS is available on a bunch of different cloud platforms including EC2, Google Compute Engine, Rackspace. You can even run a cluster locally using Vagrant. For today, we're going to be using EC2.

Fleet and Etcd

Bundled with CoreOS you'll find fleet and etcd. Etcd is a distributed key-value store that acts as the backbone for your CoreOS cluster that is built on top of the Raft protocol. Fleet, is a low-level init system that uses etcd and interacts with Systemd on each CoreOS machine. It handles everything from scheduling services to migrating services should you lose a node, to restarting them should they go down. Think of it as Systemd but for a distributed cluster.

Setting up a container

For this tutorial, I created a stupidly simple NodeJS webserver as well as a container. You have find it on github at seanmcgary/stupid-server. All it does is print out the current time for every request you make to it on port 8000.

var http = require('http');

var server = http.createServer(function(req, res){
    res.end(new Date().toISOString());
});

server.listen(8000);

The Dockerfile for it is pretty simple too. Its built off another container that has NodeJS already built and installed. Like in my previous tutorial, it includes a start.sh script that pulls the latest git repo and runs the application each time the container is run. This way updating your application only requires restarting your container.

Dockerfile

# DOCKER-VERSION 0.11.0

FROM quay.io/seanmcgary/nodejs-raw-base
MAINTAINER Sean McGary <sean@seanmcgary.com>


EXPOSE 8000

ADD start.sh start.sh

RUN chmod +x start.sh

CMD ./start.sh

start.sh

git clone https://github.com/seanmcgary/stupid-server.git stupid-server

node stupid-server

Creating a Systemd unit

Remember how I said fleet is like a distributed Systemd? That means that all we need to do is create a Systemd unit file (in this case a template) that we will submit to fleet for scheduling. Fleet will be responsible for finding a machine to run it on, but once it does that, the unit file is compied directly to the machine to be run. This is what our unit file will look like:

stupidServerVanilla@.service

[Unit]
Description=Stupid Server
After=docker.service
Requires=docker.service

[Service]
ExecStartPre=/usr/bin/docker pull quay.io/seanmcgary/stupid-server:latest
ExecStart=/usr/bin/docker run --name stupidservice -p 9000:8000 quay.io/seanmcgary/stupid-server
ExecStopPre=/usr/bin/docker kill stupidservice
ExecStop=/usr/bin/docker rm stupidservice
TimeoutStartSec=0
Restart=always
RestartSec=10s

[X-Fleet]
X-Conflicts=stupidServerVanilla@*.service
  • ExecStartPre: before we start our service, we want to make sure that not only do we have the container downloaded, but we have the latest container version
  • ExecStart: Here we run our container, give it a name and map port 9000 on our host to port 8000 in the container (the one our server is listening on).
  • ExecStopPre: We need to make sure to kill the container
  • ExecStop: Then we can actually remove it
  • TimeoutStartSec: This is set to 0 telling Systemd to not timeout during the startup process. We do this because containers can be large and depending on your bandwidth, can take a while to download initially.
  • Restart: This tells Systemd to restart the unit if it dies while it is running.
  • X-Conflicts: This line (and this X-Fleet block) is specific to fleet. This tells fleet not to schedule services on the same machine as the matching service name. In this case, we want just 1 service per machine.

Spinning up some CoreOS nodes

We're going to spin up 3 instances of CoreOS on the beta channel (the current version in beta is 367.1.0). Simply search for "coreos-beta-367" if you're using the web console . You're looking for an ami with an ID of "33e5e776".

Once you have found it, select which size you want (I picked the micro instance, but you can pick which ever you want). On the configuration details screen, we'll want to enter "3" for the number of instances. We're also going to provide a cloud config so that CoreOS starts Docker and etcd on startup. We're also going to provide a discovery token for etcd so that the machines can all find each other.

NOTE: make sure to get your own discovery token and replace the one that is in the example. To get a new one, go to https://discovery.etcd.io/new.

#cloud-config

coreos:
  etcd:
    discovery: https://discovery.etcd.io/78c03094374cc2140d261d116c6d31f3
    addr: $public_ipv4:4001
    peer-addr: $public_ipv4:7001
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start

Thats pretty much it, hit the "launch and review" button and in a few moments you'll have three CoreOS instances up and running.

Scheduling Services with Fleet

Now that our cluster is running, we can start to schedule services on it using fleet. This can be done one of a few ways - you can log directly into one of the machines in your cluster and run fleetctl that way, or you can download the lastest binary and run it locally. Im going to run it locally to make things easier.

If you do decide to run it locally, I would suggest creating an alias as you'll need to specify some additional flags to tell fleetctl where to find your cluster. I have the following in my .zshrc:

alias fleetcluster="fleetctl --tunnel=path.to.a.node.com"

This way I can just run fleetcluster <command> each time.

To schedule a service on fleet, we need our unit file, so cd into the directory of your project (I'll be doing this based on the stupid-server from above). Scheduling a service is as easy as fleetcluster run <service>. To schedule the stupid-server, I would run:

$ fleetcluster start stupidServerVanilla@1.service
Job stupidServerVanilla@1.service launched on a33809a9.../10.10.10.10

If you look closesly you'll realize that there is no stupidServerVanilla@1.service file. This is because stupidServerVanilla@.service is a Systemd template. Rather than creating a uniquely named file for each service, we have one that is used as a template. You'll see below the command, fleet responds with where it scheduled your service. Now, if we run fleetcluster list-units we should see it:

$ fleetcluster list-units

UNIT                                 STATE       LOAD      ACTIVE        SUB          DESC                     MACHINE
stupidServerVanilla@1.service        launched    loaded    activating    start-pre    Stupid Server            a33809a9.../10.10.10.10

Fleet also takes care of letting you view logs as well. If we want to view the logs of our server, just run:

$ fleetcluster journal -f stupidServerVanilla@1.service

-- Logs begin at Sun 2014-08-24 14:57:19 UTC. --
Aug 25 02:13:49 10.10.10.10 systemd[1]: [/run/fleet/units/stupidServerVanilla@1.service:9] Unknown lvalue 'ExecStopPre' in section 'Service'
Aug 25 02:13:49 10.10.10.10 systemd[1]: [/run/fleet/units/stupidServerVanilla@1.service:9] Unknown lvalue 'ExecStopPre' in section 'Service'
Aug 25 02:13:49 10.10.10.10 systemd[1]: Starting Stupid Server...
Aug 25 02:13:50 10.10.10.10 docker[3401]: Pulling repository quay.io/seanmcgary/stupid-server
Aug 25 02:16:41 10.10.10.10 systemd[1]: Started Stupid Server.
Aug 25 02:16:41 10.10.10.10 docker[3426]: Cloning into 'stupid-server'...

Fleet communicates with systemd and journald and then pipes the log over ssh to your local terminal session.

Launching a Fleet of Services

Since we created a Systemd template for our unit file, we can use fleet to launch as many as we want at once. If we wanted to launch three more services we would just run:

$ fleetcluster start stupidServerVanilla@{2,3,4}.service

Now if we look at our units:

stupidServerVanilla@1.service        launched    loaded   deactivating    stop-sigterm        Stupid Server        a33809a9.../10.10.10.10
stupidServerVanilla@2.service        launched    loaded   activating      start-pre           Stupid Server        b4809b8d.../10.10.10.11
stupidServerVanilla@3.service        inactive    -        -               -                   Stupid Server        -
stupidServerVanilla@4.service        launched    loaded   activating      start-pre           Stupid Server        27b315e2.../10.10.10.12

You'll see that three of them have been deployed and we have one that's left as inactive. This is because we told fleet to only schedule one per machine.

Stopping and Destroying Your Service

When you need to take down your service or upload a new version of your service file, stopping and destroying are very easy:

$ fleetcluster stop stupidServerVanilla@1.service

$ fleetcluster destroy stupidServerVanilla@1.service

Getting started with DeepLinkr

by on

Being able to provide your customers with the best experience possible is a very important thing, especially in the ecommerce world. Doing so across platforms poses a challenge. Getting customers to the right place based on the platform they are currently on is an even bigger challenge but very beneficial if you can do it. Deeplinkr is a service that aims to accomplish that for companies that have both a web and native app presence. Deeplinkr uses deep links (also recently called app links) to accomplish this.

Deeplinkr is a platform that allows you to get your customers where they need be so that they have the best experience possible. A common case, espeically among e-commerce companies, is to have a web-based store and a native application. Say you send out a marketing email to your customers. According to statistics, around 65% of the emails you just sent out will be opened by people on mobile devices. Knowing that, would you rather be able to drop your customers directly into your optimized, native application, or put them on your website where they might fumble around a little because it's not streamlined for mobile use. The same thing goes if you're posting on social media which includes a large number of mobile users.

It all starts with a single link. This single link can take your customers in a number of directions including simply redirecting them onto your website, or dropping them into your iOS or Android app if you have one. If the customer doesn't have your native app installed, they will fall back to the website URL that you provide.

Lets pick a link to use. It so happens Etsy has deeplinking support built into their website. We'll use this link to a cool BMW M3 print that I purchased a little while back:
https://www.etsy.com/listing/94416200/classic-car-print-bmw-2002-turbo-m?ref=related-4

All we need to do is take the URL to this page and drop it into the URL field in the DeepLinkr link creator.

When we paste the link into the form, DeepLinkr crawls the linked page the see if it can find any meta information regarding deeplinks. As it turns out, Twitter and Facebook have done some heavy lifting for us already by establishing some standards for defining deeplinks for a given page. Applinks.org has all the documentation you need should you want to add these meta tags to your pages to make things a little easier.

If you dont have the meta tags implemented on your site, or you need some custom functionality, you can provide your own app handler URIs.

Now that we have our link, we can send it out over email, social media, etc and we can track and see where people are coming from and what platforms them are on (browser, operating system, mobile device platform, etc).

You'll be able to see links as they come in over time and even break down visits to see which time of day people are clicking your links.

App attribution

Since we've just launched, we're still working on adding some features. One of those is being able to flag a link click as having been opened in the app. Everything up until this point can be done with very little to no development at all. This however will require some integration within your app to communicate back to the (future) DeepLinkr API and say "hey, this click actually opened the application".

With this type of integration, you'll be able to accurately determine if customers are more apt to convert when being dropped directly into your app or not. Or you'll be able to accurately track clicks from Facebook, Twitter, or Google ads. There is a lot of potential in this space, so jump on board with us and help us build a system that can help you and other businesses out there!

How to use systemd timers

by on

Having started to switch over to Fedora recently (as well as CoreOS), I needed to figure out how to run jobs at certain times and/or intervals. On an OS like Ubuntu this is accomplished using Cron. Cron is the worlds largest pain in the ass, that is at least in comparison to how easy it is to create timers under Systemd.

For this example, we're going to set up a timer that runs every minute and then create a service that attaches and uses that timer.

Creating the timer

A timer is just like any other unit file except it has a [Timer] section. It looks a little bit like this:

/etc/systemd/system/minute-timer.timer

[Unit]
Description=Minute Timer

[Timer]
OnBootSec=5min
OnCalendar=*:0/1
Unit=minute-timer.target

[Install]
WantedBy=basic.target

Systemd is pretty power with its OnCalendar function. In this case we're telling it to run every minute, but we could get REALLY specific if we wanted. Have a look at the docs to learn more about whats possible.

Now that we have a timer, we need to create a target that will be used by our actual services.

/etc/systemd/system/minute-timer.target

[Unit]
Description=Minute Timer Target
StopWhenUnneeded=yes

Lets create a test service now that will simply print the current date each time it is run

/etc/systemd/system/testservice.service

[Unit]
Description=Prints the date every minute
Wants=minute-timer.timer

[Service]
ExecStart=/bin/date

[Install]
WantedBy=minute-timer.target

Start your timers

Now that we have our timers and test service created, we need to start everything using systemctl

systemctl enable /etc/systemd/system/minute-timer.timer
systemctl start  /etc/systemd/system/minute-timer.timer

systemctl enable /etc/systemd/system/testservice.service

Now if everything goes as planned, we can watch the logs of our service print the date every minute

> journalctl -f -u testservice.service

Jul 07 21:24:00 ip-10-10-10-10 systemd[1]: Starting Prints the date every minute...
Jul 07 21:24:00 ip-10-10-10-10 systemd[1]: Started Prints the date every minute.
Jul 07 21:24:00 ip-10-10-10-10 date[20887]: Mon Jul  7 21:24:00 UTC 2014
Jul 07 21:25:00 ip-10-10-10-10 systemd[1]: Starting Prints the date every minute...
Jul 07 21:25:00 ip-10-10-10-10 systemd[1]: Started Prints the date every minute.
Jul 07 21:25:00 ip-10-10-10-10 date[20889]: Mon Jul  7 21:25:00 UTC 2014
Jul 07 21:26:00 ip-10-10-10-10 systemd[1]: Starting Prints the date every minute...
Jul 07 21:26:00 ip-10-10-10-10 systemd[1]: Started Prints the date every minute.

When it comes segmenting data to be visualized, Elasticsearch has become my go-to database as it will basically do all the work for me. Need to find how many times a specific search term shows up in a data field? It can do that for you. Need to sum the totals of a collection of placed orders over a time period? It can do that too. Today though Im going to be talking about generating a date histogram, but this one is a little special because it uses Elasticsearch's new aggregations feature (basically facets on steroids) that will allow us to fill in some empty holes.

First came facets

Back before v1.0, Elasticsearch started with this cool feature called facets. A facet was a built-in way to quey and aggregate your data in a statistical fashion. Like I said in my introduction, you could analyze the number of times a term showed up in a field, you could sum together fields to get a total, mean, media, etc. You could even have Elasticsearch generate a histogram or even a date histogram (a histogram over time) for you. The date histogram was particulary interesting as you could give it an interval to bucket the data into. This could be anything from a second to a minute to two weeks, etc. That was about as far as you could go with it though.

Aggregations - facets on steroids

With the release of Elasticsearch v1.0 came aggregations. If you look at the aggregation syntax, they look pretty simliar to facets. A lot of the facet types are also available as aggregations. The general structure for aggregations looks something like this:

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

Lets take a quick look at a basic date histogram facet and aggregation:

// this is a facet
{
    query: {
        match_all: {}
    },
    facet: {
        some_date_facet: {
            date_histogram: {
                key_field: "timestamp",
                value_field: "widgets_sold",
                interval: "day"
            }
        }
    }
}

// this is an aggregation
{
    query: {
        match_all: {}
    },
    aggregations: {
        some_date_facet: {
            date_histogram: {
                field: "timestamp",
                interval: "day"
            }
        }
    }
}

They look pretty much the same, though they return fairly different data. The facet date histogram will return to you stats for each date bucket whereas the aggregation will return a bucket with the number of matching documents for each. The reason for this is because aggregations can be combined and nested together. So if you wanted data similar to the facet, you could them run a stats aggregation on each bucket.

{
    query: {
        match_all: {}
    },
    aggregations: {
        some_date_facet: {
            date_histogram: {
                field: "timestamp",
                interval: "day"
            },
            aggregations: {
                bucket_stats: {
                    stats: {
                        field: "widgets_sold"
                    }
                }
            }
        }
    }
}

Filling in the missing holes

One of the issues that Ive run into before with the date histogram facet is that it will only return buckets based on the applicable data. This means that if you are trying to get the stats over a date range, and nothing matches it will return nothing. If Im trying to draw a graph, this isnt very helpful. One of the new features in the date histogram aggregation is the ability to fill in those holes in the data. I'll walk you through an example of how it works.

Lets first get some data into our Elasticsearch database. We're going to create an index called dates and a type called entry.

curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-21T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-22T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-23T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-26T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-30T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-06-10T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-11T00:00:00.000Z", "value": 10 }'
curl -XPOST http://elasticsearch.local:9200/dates/entry -d '{ "date": "2014-05-12T00:00:00.000Z", "value": 10 }'

Run that and it'll insert some dates that have some gaps in between. Lets now create an aggregation that calculates the number of documents per day:

curl -XGET http://elasticsearch.local:9200/dates/entry/_search -d '
{
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "dates_with_holes": {
      "date_histogram": {
        "field": "date",
        "interval": "day"
      }
    }
  }
}
'

If we run that, we'll get a result with an aggregations object that looks like this:

"aggregations":{
  "dates_with_holes":{
     "buckets":[
        {
           "key_as_string":"2014-05-11T00:00:00.000Z",
           "key":1399766400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-12T00:00:00.000Z",
           "key":1399852800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-21T00:00:00.000Z",
           "key":1400630400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-22T00:00:00.000Z",
           "key":1400716800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-23T00:00:00.000Z",
           "key":1400803200000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-26T00:00:00.000Z",
           "key":1401062400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-30T00:00:00.000Z",
           "key":1401408000000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-06-10T00:00:00.000Z",
           "key":1402358400000,
           "doc_count":1
        }
     ]
  }
}

As you can see, it returned a bucket for each date that was matched. In this case since each date we inserted was unique, it returned one for each. Thats cool, but what if we want the gaps between dates filled in with a zero value? Turns out there is an option you can provide to do this, and it is min_doc_count. In this case we'll specify min_doc_count: 0. Our new query will then look like:

curl -XGET http://elasticsearch.local:9200/dates/entry/_search -d '
{
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "dates_with_holes": {
      "date_histogram": {
        "field": "date",
        "interval": "day",
        "min_doc_count": 0
      }
    }
  }
}
'

Alright, now we have some zeros:

"aggregations":{
  "dates_with_holes":{
     "buckets":[
        {
           "key_as_string":"2014-05-11T00:00:00.000Z",
           "key":1399766400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-12T00:00:00.000Z",
           "key":1399852800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-13T00:00:00.000Z",
           "key":1399939200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-14T00:00:00.000Z",
           "key":1400025600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-15T00:00:00.000Z",
           "key":1400112000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-16T00:00:00.000Z",
           "key":1400198400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-17T00:00:00.000Z",
           "key":1400284800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-18T00:00:00.000Z",
           "key":1400371200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-19T00:00:00.000Z",
           "key":1400457600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-20T00:00:00.000Z",
           "key":1400544000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-21T00:00:00.000Z",
           "key":1400630400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-22T00:00:00.000Z",
           "key":1400716800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-23T00:00:00.000Z",
           "key":1400803200000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-24T00:00:00.000Z",
           "key":1400889600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-25T00:00:00.000Z",
           "key":1400976000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-26T00:00:00.000Z",
           "key":1401062400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-27T00:00:00.000Z",
           "key":1401148800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-28T00:00:00.000Z",
           "key":1401235200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-29T00:00:00.000Z",
           "key":1401321600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-30T00:00:00.000Z",
           "key":1401408000000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-31T00:00:00.000Z",
           "key":1401494400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-01T00:00:00.000Z",
           "key":1401580800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-02T00:00:00.000Z",
           "key":1401667200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-03T00:00:00.000Z",
           "key":1401753600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-04T00:00:00.000Z",
           "key":1401840000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-05T00:00:00.000Z",
           "key":1401926400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-06T00:00:00.000Z",
           "key":1402012800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-07T00:00:00.000Z",
           "key":1402099200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-08T00:00:00.000Z",
           "key":1402185600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-09T00:00:00.000Z",
           "key":1402272000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-10T00:00:00.000Z",
           "key":1402358400000,
           "doc_count":1
        }
     ]
  }
}

All of the gaps are now filled in with zeroes. Here comes our next use case; say I want to aggregate documents for dates that are between 5/1/2014 and 5/30/2014 by day. Our data starts at 5/21/2014 so we'll have 5 data points present, plus another 5 that are zeroes. But what about everything from 5/1/2014 to 5/20/2014? Turns out, we can actually tell Elasticsearch to populate that data as well by passing an extended_bounds object which takes a min and max value. This way we can generate any data that might be missing that isnt between existing datapoints. Our query now becomes:

curl -XGET http://elasticsearch.local:9200/dates/entry/_search -d '
{
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "dates_with_holes": {
      "date_histogram": {
        "field": "date",
        "interval": "day",
        "min_doc_count": 0,
        "extended_bounds": {
            "min": 1398927600000,
            "max": 1401433200000
        }
      }
    }
  }
}
'

The weird caveat to this is that the min and max values have to be numerical timestamps, not a date string. Now our resultset looks like this:

"aggregations":{
  "dates_with_holes":{
     "buckets":[
        {
           "key_as_string":"2014-05-01T00:00:00.000Z",
           "key":1398902400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-02T00:00:00.000Z",
           "key":1398988800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-03T00:00:00.000Z",
           "key":1399075200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-04T00:00:00.000Z",
           "key":1399161600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-05T00:00:00.000Z",
           "key":1399248000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-06T00:00:00.000Z",
           "key":1399334400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-07T00:00:00.000Z",
           "key":1399420800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-08T00:00:00.000Z",
           "key":1399507200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-09T00:00:00.000Z",
           "key":1399593600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-10T00:00:00.000Z",
           "key":1399680000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-11T00:00:00.000Z",
           "key":1399766400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-12T00:00:00.000Z",
           "key":1399852800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-13T00:00:00.000Z",
           "key":1399939200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-14T00:00:00.000Z",
           "key":1400025600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-15T00:00:00.000Z",
           "key":1400112000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-16T00:00:00.000Z",
           "key":1400198400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-17T00:00:00.000Z",
           "key":1400284800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-18T00:00:00.000Z",
           "key":1400371200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-19T00:00:00.000Z",
           "key":1400457600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-20T00:00:00.000Z",
           "key":1400544000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-21T00:00:00.000Z",
           "key":1400630400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-22T00:00:00.000Z",
           "key":1400716800000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-23T00:00:00.000Z",
           "key":1400803200000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-24T00:00:00.000Z",
           "key":1400889600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-25T00:00:00.000Z",
           "key":1400976000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-26T00:00:00.000Z",
           "key":1401062400000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-27T00:00:00.000Z",
           "key":1401148800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-28T00:00:00.000Z",
           "key":1401235200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-29T00:00:00.000Z",
           "key":1401321600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-05-30T00:00:00.000Z",
           "key":1401408000000,
           "doc_count":1
        },
        {
           "key_as_string":"2014-05-31T00:00:00.000Z",
           "key":1401494400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-01T00:00:00.000Z",
           "key":1401580800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-02T00:00:00.000Z",
           "key":1401667200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-03T00:00:00.000Z",
           "key":1401753600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-04T00:00:00.000Z",
           "key":1401840000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-05T00:00:00.000Z",
           "key":1401926400000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-06T00:00:00.000Z",
           "key":1402012800000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-07T00:00:00.000Z",
           "key":1402099200000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-08T00:00:00.000Z",
           "key":1402185600000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-09T00:00:00.000Z",
           "key":1402272000000,
           "doc_count":0
        },
        {
           "key_as_string":"2014-06-10T00:00:00.000Z",
           "key":1402358400000,
           "doc_count":1
        }
     ]
  }
}

Elasticsearch returned to us points for every day in our min/max value range.

That about does it for this particular feature. Now if we wanted to, we could take the returned data and drop it into a graph pretty easily or we could go onto run a nested aggregation on the data in each bucket if we wanted to.

Extending Underscore Templates with Partials

by on

Back in March, I made a post on how to share Underscore templates between your client and server. Now its time to take what I talked about there to the next level to make it even easier and add some structure to your templates.

Organization and Structure

Out of the few template systems that I've looked at, hardly any of them come with a nice way of structuring template files, and thats if they use files at all in the first place. A lot of templating engines rely on embedding your templates in <script> tags in your HTML files. To me this seems really messy, disorganized, and isnt portable to be used to render things server side if you want.

Last time, we were able to take template that occupied a single file, dump it into Underscore (or LoDash if that's your preference) and be able to use that compiled template in the same way on the server and client. The trouble here is that each template, no matter how small or large, must occupy a single file. That can potentially get our of hand really quickly. It also doesnt give you an easy way of referencing templates from within other templates.

Introducing node-partials

Node-partials is an npm module that wraps Underscore/Lodash templates and allows you to define multiple partials within one template file. In doing so, this also allows us to easily reference partials from another template/partial file all together.

To install it, simply run:

npm install node-paritals

Now, lets take a look at how our template files are structured. Previously, our template files looked something like this:

<div class="some class">
    Like some HTML for example
</div>
<div class="some-val">
    This is some content of the partial
</div>

With node-partials, we would structure it like this:

## some-page-component
<div class="some class">
    Like some HTML for example
</div>
<div class="some-val">
    <%= foo %>
</div>

## text-content-partial
This is some content of the partial

Here, we have two paritals in our template file, some-page-component and text-content-partial. Each partial name is identified by using the ## followed by a space, followed by the name of your partial. Now, how do we use this?

initialization and setup

We have a template with some partials and we have the node-partials module installed. Lets set it up:

var partials = require('node-partials');
var templatePath = '/some/path/to/your/templates';

partials = new partials({
    delimiter: '## ',           // defaults to '## ' but can be pretty much whatever you want
    validFileTypes: ['html'],   // defaults to 'html' only
});

var templates = partials.compile(templatePath);
var serializedTemplates = partials.serializeTemplates(templates);

For node-partials to work, we create an instance that takes some options to know which type of files to look for and the delimiter to use for parsing the parials out of each of the template files. If you pass absolutely nothing, it will default to looking for .html files and use the double hash (##) as the partial name delimiter. Once it is initialized, we can call the .compile() function, passing a path to it. The module will traverse the directory looking for the types of files you specified, parse out the partials, run their contents through the Underscore/Lodash template function and store the compiled template in an object that is then returned.

When parsing the files, the name of the file/its path and the partial name will be used as the key to access the partial. For example, if a file with the name my-templates.html has the two partials in the example above, we would get two entries in the returned object that look like this:

{
    'my-template/some-page-component: <compiled template>,
    'my-template/text-content-partial: <compiled template>
}

Since the path/name of the file is used to construct the key, we can even have nested directories of templates. If we had some-widget/some-template.html, we would address the partials in like like this:

{
    'some-widget/some-template/<partial name>': <compiled template>
}

Rendering templates

Rendering the partials is nearly the same it was in my previous post. Rather than having a single variable, we have an object containing a bunch of templates. Putting it all together, it would look something like this:

var partials = require('node-partials');
var templatePath = '/some/path/to/your/templates';

partials = new partials({
    delimiter: '## ',           // defaults to '## ' but can be pretty much whatever you want
    validFileTypes: ['html'],   // defaults to 'html' only
});

var templates = partials.compile(templatePath);

console.log(templates['my-template/some-page-component']({ someData: 'some value' }));

Sharing the partials and templates with the client

Since each compiled template is a function, we can serialize everything and dump the source into a file that we can serve to the client, much like we did in the previous post. Once your templates are compiled, you can pass that object to the serialize() function of your instantiated node-partials object which will then return a stringified representation of the templates object that can then be passed to the client as a plain-old Javascript file.

var partials = require('node-partials');
var templatePath = '/some/path/to/your/templates';

partials = new partials({
    delimiter: '## ',           // defaults to '## ' but can be pretty much whatever you want
    validFileTypes: ['html'],   // defaults to 'html' only
});

var templates = partials.compile(templatePath);
var serializedTemplates = partials.serializeTemplates(templates);

When your templates are loaded on the client, the object can be found in the window.__views global variable and works exactly the as the templates object on the server.

Deploying a NodeJS Application Using Docker

by on

Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.

A little over a year ago, Docker was released, built on top of Linux Containers, or LXC for short. Linux Containers have been around for a little while and are a really interesting in that they provide operating system level virtualization. Rather than having a hypervisor running full operating systems on a piece of hardware (like Xen, if you're familiar), Linux Containers rely on the kernel of the host operating system. Think of LXCs as a fancy kind of chrooted environment. Docker then builds on top of that essentially allowing you to run an operating system on an operating system. This makes containers a really attractive method for distributing applications in that you can build one container and it'll run on any host operating system that can support Docker.

Building a base container

Our ultimate goal here is to end up with a container that can run a simple NodeJS web server.

Note: I am going to assume that you have done some reading on Docker and have probably done their introduction/walkthrough. Im also going to assume that your machine has docker installed and running

To start with, we need a base operating system for our container. I personally like Ubuntu, so we're gonna use 13.10 as our base image. Lets create a Dockerfile and populate it with the following:

# DOCKER-VERSION 0.10.0

FROM ubuntu:13.10

This tells Docker to go fetch Ubuntu from the registry and use version 13.10. In case you are wondering or have forgotten, containers can be used to build containers and can be publicly stored in the docker registry. If you were to publish your own container, you'd end up with the repository name looking like <username>/<container name>:<revision>. In this case, the ubuntu happens to be a special repository that doesnt belong to a particular user.

Now, we have a very base container. If we wanted to build it, we could run:

docker build -t <username>/ubuntu-base .

*This assumes that you are in the same directory as your dockerfile.

Installing node and npm

Remember how I said containers are effectively operating systems? This means that we can use the container exactly as we would our local machine. Just to show you that our container is basically a base Ubuntu image, try running:

docker run -i -t <username>/ubuntu-base /bin/bash

This will fire up our built container and execute /bin/bash. The -i flag tells docker to redirect the output of the command to stdout and the -t flag tells it to open a tty giving us an interactive session as if we logged in or ssh'd into a machine.

Now lets get NodeJS and NPM installed. We're going to install git while we're at it so that we can clone our repository into our container later. We can do this through apt-get.

# DOCKER-VERSION 0.10.0

FROM ubuntu:13.10

# make sure apt is up to date
RUN apt-get update

# install nodejs and npm
RUN apt-get install -y nodejs npm git git-core

If it isnt obvious already, the RUN instruction takes a command and will run it. The interesting thing to note about Docker here though is that it will cache the state after each command. This is how we can incrementally build a system in a container and not have to rebuild the entire thing each time we deploy that container.

Another thing to note here is that Docker will run these commands without the use of stdin when building, so we need to bass the -y flag to apt-get install to tell it "yes, install these packages and their dependencies"

Lets build our container again.

docker run -i -t <username>/ubuntu-base /bin/bash

Now, if we were to run our container and open up an interactive session, NodeJS, NPM, and git would be available to us on the commandline.

A simple Node webserver

Lets create a simple web server in node using express.

index.js

var express = require('express');

var app = express();

app.get('/', function(req, res){
    res.send('Hello from inside a container!');
});

app.listen(8080);

package.json

{
  "name": "my-cool-webserver",
  "version": "0.0.1",
  "description": "A NodeJS webserver to run inside a docker container",
  "main": "index.js",
  "author": "sean@seanmcgary.com",
  "license": "MIT",
  "dependencies": {
      "express": "*"
  }
}

To make this container easy to deploy and updateable, everytime it runs, it will pull the latest version of our app from a remote git repository, so go ahead and commit and push your app to a git repository.

Running the application

To run our application, we need to again modify our Dockerfile with a few things:

  • We need to expose/map port 8080 between the container and host. Remember, a container is basically a fancy chroot so unless we tell the host operating system to map a port to it, nothing can access the container from the outside, and nothing in the container can access the host.
  • We need to pull the app from the remote repository
  • Run npm install to make sure express is installed
  • Finally run our application

Let's modify our Dockerfile:

# DOCKER-VERSION 0.10.0

FROM ubuntu:13.10

# make sure apt is up to date
RUN apt-get update

# install nodejs and npm
RUN apt-get install -y nodejs npm git git-core

ADD start.sh /tmp/

RUN chmod +x /tmp/start.sh

CMD ./tmp/start.sh

So here we use the ADD instruction to copy a file called start.sh to /tmp/ in our container, make it executable, then run it. You're probably wondering what the hell is in start.sh. Aren't we supposed to be running a node app?

Heres what start.sh looks like:

cd /tmp

# try to remove the repo if it already exists
rm -rf <git repo name>; true

git clone <remote git repo>

cd <git repo name>

npm install

node .

The reason we put these commands into a script file is so that docker wont cache the result of it. See, unlike RUN, the CMD instruction is used to start and run whatever it is you want to run in your container. It is always the last thing in your Dockerfile and is run every time your container is started/restarted. This way, we clone the repository fresh every time. This makes deploying an update really easy - just restart your container!

Lets build this thing and name it something more descriptive:

docker build -t <username>/my-nodejs-webserver .

Now, to run it we're going to do something likethis:

docker run -p 8080:8080 <username>/my-nodejs-webserver

You'll notice that we have a -p flag in there. This says "take port 8080 on the host operating system and map it to port 8080 in the container". Now, we can send/receive web traffic from our container. The other thing you'll notice is that once you run that command, there isnt any output. To see what's going on run:

docker ps -a

This will give you something that looks like this:

$ docker ps -a
CONTAINER ID        IMAGE                                COMMAND                CREATED             STATUS                    PORTS                    NAMES
4acbdf4c6695        91f00a99f058                    /bin/sh -c ./start.s        2 days ago          Exited (0) 2 days ago    0.0.0.0:8080->8080/tcp    hopeful_hawking

That first column is the container id that we can use to attach to our container to view the logs.

docker logs 4acbdf4c6695 -f

This will tail the log for you.

When you're ready to stop your container, simply run:

docker stop 4acbdf4c6695

You can also start and restart it in the same way

docker start 4acbdf4c6695

docker restart 4acbdf4c6695

Now that we're all done, we can push our container to the public registry:

docker push <username>/my-nodejs-webserver

To be continued...

This is the first post in a series I plan on writing about my experiences with docker and implementing some technologies from CoreOS including etcd, fleet, and CoreOS itself to create an automated, distributed application environment.

A NodeJS Module for Delighted

by on

Recently at work, we decided to integrate with a cool little service called Delighted to start getting feedback from our users after they have made a purchase from us.

Delighted is the easiest and most beautiful way to measure customer happiness. Are your customers delighted?

Delighted does one thing and does it really well - they provide a service for automating the sending of NPS emails to customers. For those that don't know, NPS, or "Net Promoter Score" is a system designed to gauge loyalty of a company's customers. Ever got one of those emails or popups on a site that asks you "how likely are you to recommend our product to someone" ? That's exactly what Delighted does, but in a little more elegant way.

They send a simple and straightforward email asking you to provide a rating from 0 to 10. After picking a number, they will also give you the chance to provide a comment if you so choose.

Automation and Implementation

At ThirdLove, we needed a way to automate this. Lucky for us, Delighted provides a RESTful API. They provide endpoints for sending emails, fetching responses and even metrics. The thing we found to most useful so far is the ability to schedule emails to be sent to customers in the future. This way, when a customer makes a purchase, our backend will make a call to Delighted, telling it to send an email to the customer n many days in the future.

When I started looking at the API docs, I noticed that they didnt have a Node module - just a Ruby gem and raw curl examples. With that being the case, and our backend being written using Node, I decided to create a module that wraps their API to make it easy to interact with.

The source can be found here on Github.

The module is very very simple and basically wraps the API calls, forwarding on the JSON payloads that Delighted's API returns. This module also uses the Q promise library rather than traditional callbacks. Included in the repository are a few examples and a pretty detailed README that explains all of the methods and the parameters they take (nearly identical to the Delighted API docs).

So one day you decided to write a web application where the rendering of templates is shared between the server and client and you wonder to yourself "how in the world can I share templates between my server and client without needlessly duplicating code and/or templates?". Today we're going to look at how to address this problem using Underscore's (or Lo-Dash if you prefer) templating engine.

Templates in Underscore

For the uninformed, Underscore (and Lo-Dash, which is a fork of Underscore that has more functionality and is allegedly faster) is a Javascript utility library that provides a crap-load of useful (and cross-browser) helper functions, one of which is a templating system that is similar to both EJS and ERB for those of you that maybe have used Ruby. The even better thing is that Underscore and Lo-Dash work not only in the browser but in NodeJS as well making their templating system perfect for this use-case.

Templates look a like this:

<div class="my-super-awesome-div">
    <%= mySuperAwesomeVariable %>
</div>
<ul class="things-that-are-awesome">
    <% _.each(thingsThatAreAwesome, function(thing){ %>
        <li><%= thing %></li>        
    <% }); %>
</ul>

Unlike templating languages like Mustache/Handlebars, you can use all of the features of Javascript in your templates. This is entirely up to you and really depends on your general idea of the purpose of templates and if logic, let alone ALL of Javascript's features should be accessible.

Generating templates

To begin with, we're going to start on the server-side. We're going to assume that our template above lives in a file on the filesystem exactly as you see it in the block above. We're going to read it and feed it as a string into the template engine.

var _ = require('lodash'); // or 'underscore' if you so choose
var fs = require('fs');

fs.readFile('/path/to/your/template', function(err, data){
    data = data.toString();

    var template = _.template(data);
});

The result of _.template() is a function that you can then pass a block of data to. To use our template we would do something like this:

var templateString = template({
    mySuperAwesomeVariable: 'Im super awesome',
    thingsThatAreAwesome: ['This is awsome', 'So is this', 'This is too!']
});

The interesting thing to note here is that we passed an object into our template, but we're referencing variables in the template itself. Turns out, when you evaluate your template, it will take the keys of the object you passed in and create variables out of those keys in the scope of your template. If we do a console.log(template); we can kinda see what is going on (Ive formatted it to be a little more readable):

{ [Function]
  source: 'function(obj) {
      obj || (obj = {});
     var __t, __p = \'\', __e = _.escape, __j = Array.prototype.join;
     function print() { __p += __j.call(arguments, \'\') }\n
     with (obj) {
         __p += \'<div class="my-super-awesome-div">\\n\\t\' +\n
        ((__t = ( mySuperAwesomeVariable )) == null ? \'\' : __t) +\n\
        '\\n</div>\\n<ul class="things-that-are-awesome">\\n\\t\';\n 
        _.each(thingsThatAreAwesome, function(thing){ ;\n
                __p += \'\\n\\t\\t<li>\' 
                +\n((__t = ( thing )) == null ? \'\' : __t) +\n\
                '</li>    \\t\\n\\t\';\n 
         }); ;\n
        __p += \'\\n
        </ul>\\n\';\n\n}\n
        return __p\n
    }' 
}

In short, it creates a function that that takes a single argument (the object that we pass in), and builds a string withing a with block. The with block is the magic that takes our arguments object and creates variables in the template's scope from the keys and values of the object.

Using your template in the client

Now that we have a compiled template, how in the hell do we get it to the client? As we just saw, our template is just a function that returns an evaluated string. All we really need to do is serve up the "source" function to the client. Lets take a look at how we can do that:

var viewString = 'var __views = {};';

viewString += '__views["ourCoolView"] = ' + template.source;

What we're doing here is programmatically building the source of a Javascript file that we're going to serve up to the client. If we view the whole thing as if we wrote it by hand, it would look something like this:

var __views = {};

__views["ourCoolView"] = function(obj) {\nobj || (obj = {});\nvar __t, __p = \'\', __e = _.escape, __j = Array.prototype.join;\nfunction print() { __p += __j.call(arguments, \'\') }\nwith (obj) {\n__p += \'<div class="my-super-awesome-div">\\n\\t\' +\n((__t = ( mySuperAwesomeVariable )) == null ? \'\' : __t) +\n\'\\n</div>\\n<ul class="things-that-are-awesome">\\n\\t\';\n _.each(thingsThatAreAwesome, function(thing){ ;\n__p += \'\\n\\t\\t<li>\' +\n((__t = ( thing )) == null ? \'\' : __t) +\n\'</li>    \\t\\n\\t\';\n }); ;\n__p += \'\\n</ul>\\n\';\n\n}\nreturn __p\n};

When sent to the client, the variable __views will be placed in the global scope (window.__views). To evaluate our template to get the string output like we did before we would do:

var renderedTemplate = window.__views['ourCoolView']({
    mySuperAwesomeVariable: 'Im super awesome',
    thingsThatAreAwesome: ['This is awsome', 'So is this', 'This is too!']
});

$('.someDomElement').html(renderedTemplate);

Thats pretty much it! In the next week or two, I will be following up this post with a post on how to extend this system even further by introducing a small library I built called node-partials that introduces inter-file partials as well as compiling multiple files and partials together.

Using SSL/HTTPS with HAProxy

by on

Update (6/27/2014) - On June 19th, 2014, HAProxy 1.5.x was released and is now considered stable.

Last time I posted about HAProxy, I walked you through how to support domain access control lists (also known as "vitual hosts" for those of you using Apache and Nginx) so that you can route to different applications based on the incoming domain name. Since then, Ive had a few requests on how to support SSL and HTTPS with HAProxy since its not the most obvious thing.

The reason its not obvious is because its not "officially" supported yet in the current stable release (1.4) but it is available in the current 1.5 dev branch. If you intend to use this in a production setting, proceed with caution. As of June 19th, 2014, the 1.5.x branch is considered stable

Compiling

For this example, we'll be using Ubunut 12.04 LTS as our base operating system and will be building HAProxy from source. Before we start to build it, we need to make sure we have the dependencies installed

sudo aptitude update
sudo aptitude install build-essential make g++ libssl-dev

Next, let's download the latest version of HAProxy and compile it with the SSL option.

wget http://haproxy.1wt.eu/download/1.5/src/devel/haproxy-1.5-dev21.tar.gz
tar -xzf haproxy-1.5-dev21.tar.gz
cd haproxy-1.5-dev21
make USE_OPENSSL=1
sudo make install

Setup

Cool, now we have HAProxy installed and its time to setup our config file. In the following example config, we will setup HAProxy to accept connections on a single domain, but it will force redirect to the secure connection.

global
    log 127.0.0.1    local0
    log 127.0.0.1    local1 notice
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log    global
    mode    http
    option    httplog
    option    dontlognull
    option forwardfor
    option http-server-close
    stats enable
     stats auth someuser:somepassword
     stats uri /haproxyStats

frontend http-in
    bind *:80
    reqadd X-Forwarded-Proto:\ http
    default_backend application-backend

frontend https-in
    bind *:443 ssl crt /etc/ssl/*your ssl key*
    reqadd X-Forwarded-Proto:\ https
    default_backend application-backend

backend application-backend
    redirect scheme https if !{ ssl_fc }
    balance leastconn
    option httpclose
    option forwardfor
    cookie JSESSIONID prefix

    #enter the IP of your application here
    server node1 10.0.0.1 cookie A check

A lot of the stuff at the top of the config is fairly basic boiler-plate things. We want to pay attention to is everything below the defaults. As you can see, we're telling HAProxy to listen on both ports 80 and 443 (HTTP and HTTPS respectively) and each uses the backend "application-backend" as the default.

A side note here real quick; the things we learned in the previous post on access control lists can be directly applied to this situation.

The new section here is the additional https-in section. This tells HAProxy to listen on port 443 (the default port for HTTPS) and specifies the SSL certificate to use. Generating SSL certificates can be a huge pain in the ass and sometimes depends on the authority that is issuing it. The one thing to know though is that the certificate (unless it's a wildcard cert) MUST be issused for the domain that you are sending through HAProxy.

Now, in our backend definition, the first line is really the only thing thats different. This tells HAProxy that if the incoming request (since we're using the same backend for both HTTP and HTTPS) is not secured over SSL, to redirect to the same route using HTTPS if ssl is available (thats the !{ssl_fc}).

Wrap-up

That pretty much does it. It's not all that different from the config in the previous exercise, but it can be a little tricky to setup and configure, especially if your cert isnt configured correctly or doesnt have the correct permissions.

Beginner's Guide to Elasticsearch

by on

For the uninitiated, Elasticsearch is a schema free, JSON document based search server built on top of the indexing library Lucene. While it does provide a powerful full-text search system, Elasticsearch provides many other features that make it great for things like aggregating statistics. In this post, I am going to walk through how to setup Elasticsearch and the basics of storing and querying data.

Setup

For this setup, we'll be using Ubuntu 12.04 LTS as our base operating system.

Because Elasticsearch is written in Java, we need to install the JVM.

sudo aptitude install openjdk-7-jre-headless

We won't be needing any of the UI related features of Java, so we can same some space and time by installing the headless version.

Next, we need to download Elasticsearch. We're going to just download the standalone tarball, but there is also a deb package available if you wish to install it with dpkg.

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.9.tar.gz
tar -xvf elasticsearch-0.90.9.tar.gz

Now that we have it downloaded, running it is as simple as executing the included binary.

cd elasticsearch-0.90.9
./bin/elasticsearch

By default, Elasticsearch will run as a background daemon process. For this post, we're going to run it in the foreground so that we can watch the logs. This can be accomplished by providing the -f flag.

./bin/elasticsearch -f

Terminology

Before we start, there is some vocabulary that we need to familiarize ourselves with:

Node

A node is a Java process running Elasticsearch. Typically each node will run on its own dedicated machine.

Cluster

A cluster is one or more nodes with the same cluster name.

Shard

A shard is a Lucene index. Every index can have one or more shards. Shards can be either the primary shard or a replica.

Index

An index is the rough equivalent of database in relational database land. The index is the top-most level that can be found at http://yourdomain.com:9200/<your index>

Types

Types are objects that are contained within indexes. Think of them like tables. Being a child of the index, they can be found at http://yourdomain.com:9200/<your index>/<some type>

Documents

Documents are found within types. These are basically JSON blobs that can be of any structure. Think of them like rows found in a table.

Querying Elasticsearch

Out of the box Elasticsearch comes with a RESTful API that we'll be using to make our queries. Im running Elasticsearch locally on localhost, so all examples will be in regards to it, but simply replace localhost with your fully qualified domain. By default, this means the URL we'll be using is http://localhost:9200/

Creating an index

First thing we need to do is create an index. We're going to call our index "testindex". To do this, we simply need to make a POST request to http://localhost:9200/testindex

curl -X POST http://localhost:9200/testindex -d  '{}'

When you create an index, there are a number of options that you can pass along. Things such as mapping definitions and settings for number of shards and replicas. For now, we're just going to post an empty object. We'll revisit mappings later on in a more advanced post.

Inserting a document

To insert our first document, we need a type. For this example, we'll be using mySuperCoolType and we'll be inserting a document with a full name field and a field for a twitter handle.

curl -X POST http://localhost:9200/testindex/mySuperCoolType -d  '
{
    "fullName": "Sean McGary",
    "twitterHandle": "@seanmcagry"
}'

// response
{"ok":true,"_index":"testindex","_type":"mySuperCoolType","_id":"N_c9-SQ8RRSrRwPIBqG6Ow","_version":1}

Querying

Now that we have a document, we can start to query our data. Since we didnt provide any specifics around field mappings, Elasticsearch will try to determine the type of the field (string, number, object, etc) and run the default analyzers and indexers on it.

To test this, we'll query our collection to try and match the full name field.

curl -X GET http://localhost:9200/testindex/mySuperCoolType/_search -d '
{
    "query": {
        "match": {
            "fullName": "Sean"
        }
    }
}'

// result
{
   "took":2,
   "timed_out":false,
   "_shards":{
      "total":5,
      "successful":5,
      "failed":0
   },
   "hits":{
      "total":1,
      "max_score":0.19178301,
      "hits":[
         {
            "_index":"testindex",
            "_type":"mySuperCoolType",
            "_id":"N_c9-SQ8RRSrRwPIBqG6Ow",
            "_score":0.19178301,
            "_source":{
               "fullName":"Sean McGary",
               "twitterHandle":"@seanmcgary"
            }
         }
      ]
   }
}

When you make a query, Elasticsearch will spit back a bunch of data, like if it timed out, how long it took, how many shards it queried against and how many succeeded/failed. The last field that it returns is the "hits" object. This is where all of you results will appear. In the root hits object, it will tell you the number of matches found, the max score of all the hits, and of course, the array of hits. Each hit includes meta info (prepended with an underscore) such as the (auto assigned) ID, the score and the source, which is the original document data you inserted. Full text searching is one of the stong features of Elasticsearch, so when it performs the search, it will rank all matches based on their score. The score is how close of a match each document is to the original query. The score can be modified if you wish to add additional weight based on certain paramters. We'll cover that in a later, more advanced post.

As you can see here in our results, we got one match by querying for "Sean" in the fullName field. Becuase we didnt specify a mapping, Elasticsearch applied the "standard analyzer" to the fullName field. The standard analyzer takes the contents of the field (in this case a string), lowercases all letters, removes comon stopwords (words like "and" and "the") and splits the string on spaces. This is why when we query "Sean" it matches "Sean McGary".

Lets take a look at another query. This time though, we're going to apply a filter to the results.

curl -X GET http://localhost:9200/testindex/mySuperCoolType/_search -d '
{
    "query": {
        "match": {
            "fullName": "Sean"
        }
    },
    filter: {
        "query": {
            "match": {
                "twitterHandle": "seanmcgary"
            }
        }
    }
}'

This particular request returns exactly what we had before, but lets break it down a little bit. To start, it's important to understand the difference between queries and filters. Queries are performed initially on the entire dataset. Here that is the "mySuperCoolType" type. Elasticsearch will then apply the filter to the result set of the query before returning the data. Unlike queries though, filters are cached which can improve performance.

Conclusion

This concludes our introduction to Elasticsearch. In followup post, I'll introduce some more advanced features such as setting up mappings, custom analyzers and indexers, and get into how to use facets for things such as analytics and creating histograms.