Sean McGary

Software Engineer @ ThirdLove.com, builder of webapps

A NodeJS Module for Delighted

by Sean Mcgary on April 15, 2014

Recently at work, we decided to integrate with a cool little service called Delighted to start getting feedback from our users after they have made a purchase from us.

Delighted is the easiest and most beautiful way to measure customer happiness. Are your customers delighted?

Delighted does one thing and does it really well - they provide a service for automating the sending of NPS emails to customers. For those that don't know, NPS, or "Net Promoter Score" is a system designed to gauge loyalty of a company's customers. Ever got one of those emails or popups on a site that asks you "how likely are you to recommend our product to someone" ? That's exactly what Delighted does, but in a little more elegant way.

They send a simple and straightforward email asking you to provide a rating from 0 to 10. After picking a number, they will also give you the chance to provide a comment if you so choose.

Automation and Implementation

At ThirdLove, we needed a way to automate this. Lucky for us, Delighted provides a RESTful API. They provide endpoints for sending emails, fetching responses and even metrics. The thing we found to most useful so far is the ability to schedule emails to be sent to customers in the future. This way, when a customer makes a purchase, our backend will make a call to Delighted, telling it to send an email to the customer n many days in the future.

When I started looking at the API docs, I noticed that they didnt have a Node module - just a Ruby gem and raw curl examples. With that being the case, and our backend being written using Node, I decided to create a module that wraps their API to make it easy to interact with.

The source can be found here on Github.

The module is very very simple and basically wraps the API calls, forwarding on the JSON payloads that Delighted's API returns. This module also uses the Q promise library rather than traditional callbacks. Included in the repository are a few examples and a pretty detailed README that explains all of the methods and the parameters they take (nearly identical to the Delighted API docs).

So one day you decided to write a web application where the rendering of templates is shared between the server and client and you wonder to yourself "how in the world can I share templates between my server and client without needlessly duplicating code and/or templates?". Today we're going to look at how to address this problem using Underscore's (or Lo-Dash if you prefer) templating engine.

Templates in Underscore

For the uninformed, Underscore (and Lo-Dash, which is a fork of Underscore that has more functionality and is allegedly faster) is a Javascript utility library that provides a crap-load of useful (and cross-browser) helper functions, one of which is a templating system that is similar to both EJS and ERB for those of you that maybe have used Ruby. The even better thing is that Underscore and Lo-Dash work not only in the browser but in NodeJS as well making their templating system perfect for this use-case.

Templates look a like this:

<div class="my-super-awesome-div">
    <%= mySuperAwesomeVariable %>
</div>
<ul class="things-that-are-awesome">
    <% _.each(thingsThatAreAwesome, function(thing){ %>
        <li><%= thing %></li>       
    <% }); %>
</ul>

Unlike templating languages like Mustache/Handlebars, you can use all of the features of Javascript in your templates. This is entirely up to you and really depends on your general idea of the purpose of templates and if logic, let alone ALL of Javascript's features should be accessible.

Generating templates

To begin with, we're going to start on the server-side. We're going to assume that our template above lives in a file on the filesystem exactly as you see it in the block above. We're going to read it and feed it as a string into the template engine.

var _ = require('lodash'); // or 'underscore' if you so choose
var fs = require('fs');

fs.readFile('/path/to/your/template', function(err, data){
    data = data.toString();

    var template = _.template(data);
});

The result of _.template() is a function that you can then pass a block of data to. To use our template we would do something like this:

var templateString = template({
    mySuperAwesomeVariable: 'Im super awesome',
    thingsThatAreAwesome: ['This is awsome', 'So is this', 'This is too!']
});

The interesting thing to note here is that we passed an object into our template, but we're referencing variables in the template itself. Turns out, when you evaluate your template, it will take the keys of the object you passed in and create variables out of those keys in the scope of your template. If we do a console.log(template); we can kinda see what is going on (Ive formatted it to be a little more readable):

{ [Function]
  source: 'function(obj) {
     obj || (obj = {});
     var __t, __p = \'\', __e = _.escape, __j = Array.prototype.join;
     function print() { __p += __j.call(arguments, \'\') }\n
     with (obj) {
        __p += \'<div class="my-super-awesome-div">\\n\\t\' +\n
        ((__t = ( mySuperAwesomeVariable )) == null ? \'\' : __t) +\n\
        '\\n</div>\\n<ul class="things-that-are-awesome">\\n\\t\';\n 
        _.each(thingsThatAreAwesome, function(thing){ ;\n
                __p += \'\\n\\t\\t<li>\' 
                +\n((__t = ( thing )) == null ? \'\' : __t) +\n\
                '</li>    \\t\\n\\t\';\n 
         }); ;\n
        __p += \'\\n
        </ul>\\n\';\n\n}\n
        return __p\n
    }' 
}

In short, it creates a function that that takes a single argument (the object that we pass in), and builds a string withing a with block. The with block is the magic that takes our arguments object and creates variables in the template's scope from the keys and values of the object.

Using your template in the client

Now that we have a compiled template, how in the hell do we get it to the client? As we just saw, our template is just a function that returns an evaluated string. All we really need to do is serve up the "source" function to the client. Lets take a look at how we can do that:

var viewString = 'var __views = {};';

viewString += '__views["ourCoolView"] = ' + template.source;

What we're doing here is programmatically building the source of a Javascript file that we're going to serve up to the client. If we view the whole thing as if we wrote it by hand, it would look something like this:

var __views = {};

__views["ourCoolView"] = function(obj) {\nobj || (obj = {});\nvar __t, __p = \'\', __e = _.escape, __j = Array.prototype.join;\nfunction print() { __p += __j.call(arguments, \'\') }\nwith (obj) {\n__p += \'<div class="my-super-awesome-div">\\n\\t\' +\n((__t = ( mySuperAwesomeVariable )) == null ? \'\' : __t) +\n\'\\n</div>\\n<ul class="things-that-are-awesome">\\n\\t\';\n _.each(thingsThatAreAwesome, function(thing){ ;\n__p += \'\\n\\t\\t<li>\' +\n((__t = ( thing )) == null ? \'\' : __t) +\n\'</li>    \\t\\n\\t\';\n }); ;\n__p += \'\\n</ul>\\n\';\n\n}\nreturn __p\n};

When sent to the client, the variable __views will be placed in the global scope (window.__views). To evaluate our template to get the string output like we did before we would do:

var renderedTemplate = window.__views['ourCoolView']({
    mySuperAwesomeVariable: 'Im super awesome',
    thingsThatAreAwesome: ['This is awsome', 'So is this', 'This is too!']
});

$('.someDomElement').html(renderedTemplate);

Thats pretty much it! In the next week or two, I will be following up this post with a post on how to extend this system even further by introducing a small library I built called node-partials that introduces inter-file partials as well as compiling multiple files and partials together.

Using SSL/HTTPS with HAProxy

by Sean Mcgary on January 6, 2014

Last time I posted about HAProxy, I walked you through how to support domain access control lists (also known as "vitual hosts" for those of you using Apache and Nginx) so that you can route to different applications based on the incoming domain name. Since then, Ive had a few requests on how to support SSL and HTTPS with HAProxy since its not the most obvious thing.

The reason its not obvious is because its not "officially" supported yet in the current stable release (1.4) but it is available in the current 1.5 dev branch. If you intend to use this in a production setting, proceed with caution.

Compiling

Because support for SSL is not available by default, we are going to need to compile HAProxy from souce to include the SSL module. For this example, we'll be using Ubunut 12.04 LTS as our base operating system. Since we're building from source, these directions should be relatively portable across other platforms built on the Linux kernel (folks on BSD or some other flavor of Unix, try this at your own risk).

Before we start to build it, we need to make sure we have the dependencies installed

sudo aptitude update
sudo aptitude install build-essential make g++ libssl-dev

Next, let's download the latest version of HAProxy and compile it with the SSL option.

wget http://haproxy.1wt.eu/download/1.5/src/devel/haproxy-1.5-dev21.tar.gz
tar -xzf haproxy-1.5-dev21.tar.gz
cd haproxy-1.5-dev21
make USE_OPENSSL=1
sudo make install

Setup

Cool, now we have HAProxy installed and its time to setup our config file. In the following example config, we will setup HAProxy to accept connections on a single domain, but it will force redirect to the secure connection.

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode    http
    option  httplog
    option  dontlognull
    option forwardfor
    option http-server-close
    stats enable
     stats auth someuser:somepassword
     stats uri /haproxyStats

frontend http-in
    bind *:80
    reqadd X-Forwarded-Proto:\ http
    default_backend application-backend

frontend https-in
    bind *:443 ssl crt /etc/ssl/*your ssl key*
    reqadd X-Forwarded-Proto:\ https
    default_backend application-backend

backend application-backend
    redirect scheme https if !{ ssl_fc }
    balance leastconn
    option httpclose
    option forwardfor
    cookie JSESSIONID prefix

    #enter the IP of your application here
    server node1 10.0.0.1 cookie A check 

A lot of the stuff at the top of the config is fairly basic boiler-plate things. We want to pay attention to is everything below the defaults. As you can see, we're telling HAProxy to listen on both ports 80 and 443 (HTTP and HTTPS respectively) and each uses the backend "application-backend" as the default.

A side note here real quick; the things we learned in the previous post on access control lists can be directly applied to this situation.

The new section here is the additional https-in section. This tells HAProxy to listen on port 443 (the default port for HTTPS) and specifies the SSL certificate to use. Generating SSL certificates can be a huge pain in the ass and sometimes depends on the authority that is issuing it. The one thing to know though is that the certificate (unless it's a wildcard cert) MUST be issused for the domain that you are sending through HAProxy.

Now, in our backend definition, the first line is really the only thing thats different. This tells HAProxy that if the incoming request (since we're using the same backend for both HTTP and HTTPS) is not secured over SSL, to redirect to the same route using HTTPS if ssl is available (thats the !{ssl_fc}).

Wrap-up

That pretty much does it. It's not all that different from the config in the previous exercise, but it can be a little tricky to setup and configure, especially if your cert isnt configured correctly or doesnt have the correct permissions.

Beginner's Guide to Elasticsearch

by Sean Mcgary on January 3, 2014

For the uninitiated, Elasticsearch is a schema free, JSON document based search server built on top of the indexing library Lucene. While it does provide a powerful full-text search system, Elasticsearch provides many other features that make it great for things like aggregating statistics. In this post, I am going to walk through how to setup Elasticsearch and the basics of storing and querying data.

Setup

For this setup, we'll be using Ubuntu 12.04 LTS as our base operating system.

Because Elasticsearch is written in Java, we need to install the JVM.

sudo aptitude install openjdk-7-jre-headless

We won't be needing any of the UI related features of Java, so we can same some space and time by installing the headless version.

Next, we need to download Elasticsearch. We're going to just download the standalone tarball, but there is also a deb package available if you wish to install it with dpkg.

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.9.tar.gz
tar -xvf elasticsearch-0.90.9.tar.gz

Now that we have it downloaded, running it is as simple as executing the included binary.

cd elasticsearch-0.90.9
./bin/elasticsearch

By default, Elasticsearch will run as a background daemon process. For this post, we're going to run it in the foreground so that we can watch the logs. This can be accomplished by providing the -f flag.

./bin/elasticsearch -f

Terminology

Before we start, there is some vocabulary that we need to familiarize ourselves with:

Node

A node is a Java process running Elasticsearch. Typically each node will run on its own dedicated machine.

Cluster

A cluster is one or more nodes with the same cluster name.

Shard

A shard is a Lucene index. Every index can have one or more shards. Shards can be either the primary shard or a replica.

Index

An index is the rough equivalent of database in relational database land. The index is the top-most level that can be found at http://yourdomain.com:9200/<your index>

Types

Types are objects that are contained within indexes. Think of them like tables. Being a child of the index, they can be found at http://yourdomain.com:9200/<your index>/<some type>

Documents

Documents are found within types. These are basically JSON blobs that can be of any structure. Think of them like rows found in a table.

Querying Elasticsearch

Out of the box Elasticsearch comes with a RESTful API that we'll be using to make our queries. Im running Elasticsearch locally on localhost, so all examples will be in regards to it, but simply replace localhost with your fully qualified domain. By default, this means the URL we'll be using is http://localhost:9200/

Creating an index

First thing we need to do is create an index. We're going to call our index "testindex". To do this, we simply need to make a POST request to http://localhost:9200/testindex

curl -X POST http://localhost:9200/testindex -d  '{}'

When you create an index, there are a number of options that you can pass along. Things such as mapping definitions and settings for number of shards and replicas. For now, we're just going to post an empty object. We'll revisit mappings later on in a more advanced post.

Inserting a document

To insert our first document, we need a type. For this example, we'll be using mySuperCoolType and we'll be inserting a document with a full name field and a field for a twitter handle.

curl -X POST http://localhost:9200/testindex/mySuperCoolType -d  '
{
    "fullName": "Sean McGary",
    "twitterHandle": "@seanmcagry"
}'

// response
{"ok":true,"_index":"testindex","_type":"mySuperCoolType","_id":"N_c9-SQ8RRSrRwPIBqG6Ow","_version":1}

Querying

Now that we have a document, we can start to query our data. Since we didnt provide any specifics around field mappings, Elasticsearch will try to determine the type of the field (string, number, object, etc) and run the default analyzers and indexers on it.

To test this, we'll query our collection to try and match the full name field.

curl -X GET http://localhost:9200/testindex/mySuperCoolType/_search -d '
{
    "query": {
        "match": {
            "fullName": "Sean"
        }
    }
}'

// result
{
   "took":2,
   "timed_out":false,
   "_shards":{
      "total":5,
      "successful":5,
      "failed":0
   },
   "hits":{
      "total":1,
      "max_score":0.19178301,
      "hits":[
         {
            "_index":"testindex",
            "_type":"mySuperCoolType",
            "_id":"N_c9-SQ8RRSrRwPIBqG6Ow",
            "_score":0.19178301,
            "_source":{
               "fullName":"Sean McGary",
               "twitterHandle":"@seanmcgary"
            }
         }
      ]
   }
}

When you make a query, Elasticsearch will spit back a bunch of data, like if it timed out, how long it took, how many shards it queried against and how many succeeded/failed. The last field that it returns is the "hits" object. This is where all of you results will appear. In the root hits object, it will tell you the number of matches found, the max score of all the hits, and of course, the array of hits. Each hit includes meta info (prepended with an underscore) such as the (auto assigned) ID, the score and the source, which is the original document data you inserted. Full text searching is one of the stong features of Elasticsearch, so when it performs the search, it will rank all matches based on their score. The score is how close of a match each document is to the original query. The score can be modified if you wish to add additional weight based on certain paramters. We'll cover that in a later, more advanced post.

As you can see here in our results, we got one match by querying for "Sean" in the fullName field. Becuase we didnt specify a mapping, Elasticsearch applied the "standard analyzer" to the fullName field. The standard analyzer takes the contents of the field (in this case a string), lowercases all letters, removes comon stopwords (words like "and" and "the") and splits the string on spaces. This is why when we query "Sean" it matches "Sean McGary".

Lets take a look at another query. This time though, we're going to apply a filter to the results.

curl -X GET http://localhost:9200/testindex/mySuperCoolType/_search -d '
{
    "query": {
        "match": {
            "fullName": "Sean"
        }
    },
    filter: {
        "query": {
            "match": {
                "twitterHandle": "seanmcgary"
            }
        }
    }
}'

This particular request returns exactly what we had before, but lets break it down a little bit. To start, it's important to understand the difference between queries and filters. Queries are performed initially on the entire dataset. Here that is the "mySuperCoolType" type. Elasticsearch will then apply the filter to the result set of the query before returning the data. Unlike queries though, filters are cached which can improve performance.

Conclusion

This concludes our introduction to Elasticsearch. In followup post, I'll introduce some more advanced features such as setting up mappings, custom analyzers and indexers, and get into how to use facets for things such as analytics and creating histograms.

Creating a blogging platform - TryCompose.com

by Sean Mcgary on October 30, 2013

A few months ago I started working on a blog platform for myself that quickly evolved into a larger product that I decided I would try to productionalize. The following is a blog post called "Just a blog" that I wrote over on the offical TryCompose.com blog.

Just a blog

Over the last 5 years or so, Ive often found myself bouncing from blog platform to blog platform in search of something that will make me happy. Ive hosted my own Wordpress blog, Ive used Tumblr, Ive tried Github pages, and Ive built my own blogging "engines" probably about three times by now (usually its to learn something new). But now I decided that Im tired of moving around from place to place and want to build something that not only meets my needs, but hopefully (at least some of) the needs of the community as well.

Just a blog

For years, Wordpress has been the go-to for blog platforms. It's a veteran, (mostly) stable, has a shit load of plugins for basically everything you need, any theme you can think of, and you can customize it. All if you want to host and manage it yourself. Wordpress.com does exist, but for the free version it has ads and is very limiting in that you cant install themes (you have to use what they have), you cant use any plugins, you cant use your own domain and you can't use Google Analytics.

Thats a problem....

Wordpress has become a great extendable content management system. People even go so far as to abuse and hack it into something completely different entirely. This problem needs solving...

Ideal features

So here is what I think would be features to have a great blogging platform, and ONLY a blogging platform.

Completely hosted

As much as I like tending to servers from time to time, I dont want to think about a blog server. Running your own server means you have to not only keep your blog platform up to date, but keep the entire machine (or virtual machine) up to date and secure. I dont want to worry about that. I want to click a button and start writing.

Markdown

Wordpress really shows its age with it's old rich text/html post editor. HTML is a pain to write for anyone, especially when using it to format a blog, and I dont want to have to click on various modifier buttons to format my content. Markdown is great in that it's simple and intuitive enough for anyone to learn and doesn't break the writing flow.

Take ownership of your brand/identity

I want to be able to use my own domain without paying some fee. I dont want a subdomain, I want MY domain. I also want to be able to own the content I create and take it with me anywhere I go should I choose to switch platforms.

Programmer/Hacker friendly

A lot of hosted platforms (aside from Github pages) dont provide support for syntax highlighted code blocks by default. Definitely something that I would love to have.

Google Analytics

Google Analytics is definitely a heavyweight in the world of web analytics as its super simple to setup and use. Just let me enter in my sites ID and start tracking stuff. Dont give me some half-assed baked in solution. Give me the option to use what I want.

Content that looks great

This is an area that I think Medium and Svbtle accel at. Both format your writing in a way that is free of distractions and very easy to read. They both have some drawbacks (Medium lacks control over your personal brand, and Svbtle is invite-only), but they both have a really great content consumption experience. I dont care if your platform has a million themes if theyre all crap. Give me a choice between a few themes that look great.

File/image hosting

Generally I like to include images with various posts, so a way to upload and manage files would be a great feature. This simplifies things greatly as it means I dont need to upload things to thrid parties or try and host them myself on a private file server.

Shut up and take my money

Let me pay for a service like this. Make it affordable, but let me throw money at you so I can have it. I fully believe that services that charge for their product end up being better in the end because they manage to weed out users that expect everything for nothing and can provide great support and features to those that really want to support the product.

Try Compose

Compose strives to include all of the features listed above and that is just the beginning. Over the coming weeks, we'll be opening up a free beta for people to try out so that we can gather feedback to make the platform even better. If you're interested in getting access to the beta, head on over to the signup page, provide us with your email, and we'll let you know when you can start using it!

HAProxy - route by domain name

by Sean Mcgary on September 28, 2013

I tend to build a lot of web applications in NodeJS using the Express.js webserver. When you have a few of these apps running on one server, you generally want to run them on unique ports and put some kind of proxy in front of them. Nginx works great for this and Apache can be another decent, though more bloated, alternative. Recently I decided to branch out for the sake of variety and to learn something new. HAProxy filled that role.

For the uninformed, HAProxy is more than just a reverse proxy; it's a high performance load balancer. Sites with lots of traffic will use something like HAProxy to funnel traffic to a cluster of web servers or even balance taffic between database servers. But what happens when we want to route multiple domains (or subdomains) to different hosts or clusters?

Setting up a single proxy

First lets take a look at the basics of proxying to an app server.

# config for haproxy 1.5.x

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 4096
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        option forwardfor
        option http-server-close
        stats enable
        stats auth someuser:somepassword
        stats uri /haproxyStats

frontend http-in
        bind :80
        default_backend web-app-cluster

backend web-app-cluster
        balance leastconn
        option httpclose
        cookie JSESSIONID prefix
        server node1 10.0.0.1:8080 cookie A check
        server node2 10.0.0.2:8080 cookie A check
        server node3 10.0.0.3:8080 cookie A check

So this is a pretty basic config that will loadbalance across 3 application servers, each of which is on a unqiue IP and probably on its own dedicated machine. Generally you'll also want to run your load balancer(s) on a different server.

So what does this all mean? global and defaults should be pretty self-explanatory, then we have a frontend and abackend. The frontend, as you can see, tells HAProxy what to bind to and defines a default backend. There are a lot of things that can be specified in the front end and you can also have multiple frontend definitions (for example, if you wanted to provide an unsecure route running on port 80 and SSL on port 443 and have different, or the same, backends for each). We'll go over some other options in the multiple domain example.

Diving into multiple domains and ACLs

Now lets take a look at how to route to multiple domains based on matching specific domain names.

# config for haproxy 1.5.x

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 4096
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        option forwardfor
        option http-server-close
        stats enable
        stats auth someuser:somepassword
        stats uri /haproxyStats

frontend http-in
        bind *:80

        # Define hosts
        acl host_bacon hdr(host) -i ilovebacon.com
        acl host_milkshakes hdr(host) -i bobsmilkshakes.com

        ## figure out which one to use
        use_backend bacon_cluster if host_bacon
        use_backend milshake_cluster if host_milkshakes

backend baconcluster
        balance leastconn
        option httpclose
        option forwardfor
        cookie JSESSIONID prefix
        server node1 10.0.0.1:8080 cookie A check
        server node1 10.0.0.2:8080 cookie A check
        server node1 10.0.0.3:8080 cookie A check


backend milshake_cluster
        balance leastconn
        option httpclose
        option forwardfor
        cookie JSESSIONID prefix
        server node1 10.0.0.4:8080 cookie A check
        server node1 10.0.0.5:8080 cookie A check
        server node1 10.0.0.6:8080 cookie A check

So here we are routing between two applications; ilovebacon.com and bobsmilkshakes.com. Each one has its own cluster of app servers that we want to load balance between. Lets take a closer look at where all the magic happens in the frontend.

frontend http-in
        bind *:80

        # Define hosts
        acl host_bacon hdr(host) -i ilovebacon.com
        acl host_milkshakes hdr(host) -i bobsmilkshakes.com

        ## figure out which one to use
        use_backend bacon_cluster if host_bacon
        use_backend milshake_cluster if host_milkshakes

If you've ever used nginx or Apache as reverse proxies, youd generally set things up using virtual hosts. HAProxy uses the notion of access control lists (acl) which can be used to direct traffic.

After we bind to port 80, we set up two acls. The hdr (short for header) checks the hostname header. We also specify -i to make sure its case insensitive, then provide the domain name that we want to match. You could also setup acls to match routes, file types, file names, etc. If you want to know more, feel free to check the docs. So now we effectively have two variables; host_bacon and host_milkshakes. Then we tell HAProxy what backend to use by checking to see if the variable is true or not. One other thing that we could do after the use_backend checks is specify a default_backend <backend name> like we did in the first example to act as a catch-all to traffic hitting our load balancer that doesn't match either of the domain names.

Next time, I'll go over how to add HTTPS/SSL support to your websites.

FeedStash.net - Your news made simple

by Sean Mcgary on June 26, 2013

In less than 5 days, Google will be shutting down Google Reader. A lot of people are now scrambling to find a new service to migrate to. The closing of Reader has opened up a relatively large market for new new reader apps. Some focus on social and sharing, some are bold and are rethinking the news reading experience, and some are trying to be as simple as possible or as close to Reader as possible.

Today, Im launching an app that Ive been working on for the past few months (I actually started it before Google decided to close down Reader) to compete with the others to become the next Google Reader replacement.

Introducing FeedStash.net

FeedStash started out as a side experiment when I wanted to play around with Elasticsearch. I needed a fairly large set of textual data and RSS feeds from blogs seemed to fit the bill quite nicely. I began writing a collection system to grab stories from a set of RSS feeds, quickly discovering that RSS is probably one of the most inconsistent formats in the world and every site does things a little differently than the next. After building out a solid collector, it seemed fairly natural to give it a web interface to use Elasticsearch through. This is where things started moving away from a playing with Elasticsearch and into building an RSS reader.

I had managed to get a basic web UI up and running to view and manage feeds. It was right around this time that Google dropped the bomb that it would be closing Reader on July 1st. Everyone was in an uproar over it and it was very clear that the seemingly dormant RSS community was very much alive and boy was it angry. The few weeks immediately following the apocalyptic announcement, RSS and news reader apps started popping up all over the place. Looks like it was time to get my shit together if I wanted to build something to be used by people.

Keeping things simple

The benefit of all these other people and companies building apps early on is that they ended up doing a bit of market research that would benefit everyone else. The countless posts on Reddit and HackerNews about people launching new apps spawned threads of comments of very useful feedback. Turns out people dont want the super social applications; they just want to be able to read their news in peace with the ability to organize and filter their various subscriptions. They also wanted it to be super simple. Theres no need to take a radical new approach on UI design here. Just make it easy for people to use.

This first iteration of FeedStash is just that - simple. You can import your feeds from Google Reader either by uploading the exported OPML file or by signing in with your Google account and letting FeedStash grab it using the Reader API. This will import and subscribe you to your existing subscriptions.

We decided to keep reading and organizing feeds really simple so that we could get this out the door in time for July 1st. On the right side, you’ll start with a single “folder” called “All Feeds”. This is the master list of all of your subscriptions. By default it will show a stream of all your feeds combined sorted by the date the headline was published. If you want to view just the headlines for a specific feed, simply click on the feed name on the left side of the screen.

You can create as many folders as you’d like. Just like the “Add Folder” link on the left side. It’ll prompt you for a display name then allow you to add any of your current subscriptions to it. When you click on that folder on the left side, it will slide open much like the “All Feeds” folder and will give you a stream of only the feeds in that folder. Again, if you want to view headlines from just one feed, simply click on it.

Headlines are displayed in chronological order based on published date on the right side of the screen. Clicking on the headline title will expand it and show you the preview content. Content quality and length will vary from site to site. Some sites put entire articles rich with pictures in video in their previews, and some put hardly anything at all. Clicking the “continue reading” button will open the headline in a new tab and navigate to its source. Some people actually like sharing, so Ive included some very subtle social share buttons that are visible when you expand the headline. You can share on Facebook, Twitter, Google+, and App.net. Also since a decent number of people were reqeusting it from other apps, I added a link to add the story to Pocket so that you can read it later. None of these require you to connect your account as they are all handled through web intents. When you click on one to share, it will pop open a small window on each site which will render its own share dialog that will be pre-populated with the story title and link.

Personally I like to view the headlines and stories that Ive already read, rather than trying to dig through a huge list to find them, so I have included a dedicated page to displaying headlines that you have read in the past. Headlines are marked as read as soon as you click on them in the stream.

Favoriting is another pretty basic but widespread notion as well, so Ive included a dedicated page for favorites as well. To favorite a headline, just look for the “favorite” link with the heart next to it and click it.

Feeding the servers

With deciding to launch FeesStash publicly, I have also decided to charge $24/year for it. The decision to do is was based on the fact that:

  1. keeping servers running to support the app costs money2. people have been quite vocal that they are willing to pay money for a service like this
  2. Charging for it keeps me on top of everything. All to often I see people create free services that fall by the wayside because the creator either forgot about it or moved on. Having paying customers is far more motivating because often times they are more loyal and demand quality.

The road ahead

There are a number of features that Im looking to add to make the FeedStash experience even better. Heres a look at what’s to come.

A RESTful API

Syncing seems to be a one of the highly desired features of any news feed service. Everyone consumes news and content in their own way and establishing an API that can sync with our core system would allow users to build applications to suit their needs.

Better mobile experience

This first iteration of FeedStash was built to be mostly usable on desktop browsers and tablets (it happens to fit perfectly on an iPad in landscape mode). We hope to bring a better mobile experience to smaller form factor devices like smart phones through the web interface and then (hopefully) eventually through native apps.

Bookmarklet

Sometimes finding that pesky RSS icon on the site you’re browsing is hard to do. With our bookmarklet you wont have to search for it. Thankfully there is a standard for a meta tag to define where the RSS feed for the current page is located. When you click the bookmarklet, it finds those tags, lists the feeds available, and allows you to subscribe to it right on the spot. No copy and pasting it into the app or anything. We want to provide you with the quickest way to subscribe in order to not disrupt your web browsing.

Searching

This is something that I haven’t seen all that often in news and RSS applications. Id love to be able to search for new feeds and headlines or even search through those that I have already read or search through my favorites. FeedStash stores each feed uniquely in the database and keeps records of the posts that it pulls in. Indexing that bit of data to make it searchable could open up a world of new discovery options.

Personalization

Some people like to organize feeds into folders, some prefer to get even more specific and create tags on the post level. We want to expand the amount of personalization possible so that you can organize your feeds in almost any way you see fit. Allowing users to have such fine customization over the content they read could also allow us to further analyze your content to do things such as surface feed and story suggestions.

The state of RSS and news feeds

by Sean Mcgary on April 30, 2013

The nice thing about standards is that you have so many to choose from.

This quote comes to mind a lot while I have been trying to build an RSS/news feed reader. For something like RSS and Atom feeds that have nicely defined specs, it seems that everyone doesnt really follow them. A lot of feeds have extra fields that are undefined, custom namespaces, or are missing fields all together. Why is it so hard to follow a spec?

RSS vs. Atom

Atom and RSS set out to solve what is effectively the same problem; provide a means for news syndication. Atom, supposedly, boasts an IETF spec making it better than RSS (whose spec wasnt all that official and had numerous shortcomings that Atom sought to fix) yet Ive seen the same problems and inconsistencies in both. Part of me thinks that these problems stem from the fact that both Atom and RSS are done using XML. At this point in time, XML seems like such an old and verbose markup language. Maybe it’s time to move on to something like JSON which is more lightweight and can be parsed easily by basically every programming language, not to mention it’s easy to read.

Lack of attention to detail

I think the main problem however is that people simply don’t know what they are doing half the time. For example, one feed that I managed to pick up through an automated crawler I built was a feed hosted by Rackspace. This feed seemed to be used for keeping track of the status of something. Turns out they were serving all 13,000 entries (probably since the beginning of time) ALL AT ONCE. I was stumped for a little while as to why my feed collector was choking and taking forever until I realized that it was waiting to download each of these then insert them all one at a time into the database. By the way, this is all within the RSS 2.0 spec.

A channel may contain any number of <item>s.

Now if only there were a way to paginate through past entries…

Pick a mime-type, any mime-type

This makes me think that people set up their RSS feeds, build there own, use Wordpress or whatever and never properly set up their web server to serve the correct type of content. Ive seen feeds served up as:

  • application/xml
  • text/xml
  • application/rss+xml
  • text/html
  • application/rdf+xml
  • application/atom+xml

Sure all of these will produce some kind of valid XML document, but Im of the belief that you should be sending the correct headers for your document. Sending an RSS feed? Your Content-Type should be application/rss+html. Sending an Atom feed? You should be using application/atom+xml. C’mon, is it really that difficult? (hint, no its not).

At least provide the important fields

In the world of news, one of the more important fields is the date on which the article, story, event, item, whatever was published. Some feeds neglect to provide this important piece of information (thats right, Im looking at you HackerNews).

Defining the bare minimum

The way we consume news and media has changed a lot in the last few years. No longer are we looking at just a page of words. As we can see with apps like Flipboard, content is king. People like to see pictures and images. RSS doesn’t have a field to provide such images and the spec for its thumbnail image is too small. Atom has a generic "media:thumbnail" element, but some people (cough Mashable cough) like to be difficult and define their own namespace for their thumbnails (e.g. "mash:thumbnail"). So lets get some things straight here:

On the top level, we need to describe the feed:

  • title
  • description
  • last updated
  • link
  • site logo

These are pretty standard. Its the feed/item/article definition where things get a bit messy. But heres what we need in a world like today:

  • title
  • publish date
  • authors name
  • tags/categories
  • content
  • description (should be shortened preview of content)
  • image
  • unique id
  • original link/permalink

One of the more important fields in that list would be the unique id. Currently, it is rather difficult to determine if an article is unique. You can't go on title alone as someone could easily have two articles in the same feed with the same name. So it ends up being a comparison of normalizing a bunch of fields like the permalink/article link, title, and the feed its come from in order to tell if its unique or not. So why not include something like a UUID? With a UUID, you could then determine uniqueness on a feed by feed basis which is more than acceptable.

Personally, in the end Id love to see a new protocol built with JSON that people actually adhere to. The internet is already series of APIs and web services using JSON as a payload medium, so why not extend that to RSS and other news type feeds? Why not make it more like an API where you can actually request a number of entries, or a date range for enries, or at the very least paginate through entries so that you arent sending 13,000 of them all at once?

Getting Started with Native NodeJS Modules

by Sean Mcgary on October 28, 2012

NodeJS has quickly gained in popularity since inception in 2009 due to its wide adoption in the web app community, a lot due to the fact that if you already know javascript, very little has to be learned to begin developing with it. As evident by its modules page on github, you can pretty much find the library you are looking for (it’s sort of starting to remind me of PHP where if you could think of something, chances are there is a module for it).

One of the things that I think people forget is the fact that you can develop NodeJS modules not only in javascript, but in native C/C++. For those of you that forgot, NodeJS is possible because it uses Googles V8 javascript engine. With V8 you can build extensions with C/C++ that will be exposed to javascript. Recently, I decided to dive into the world of native modules because I needed a way to use the imagemagick image manipulation library directly from javascript. All of the libraries currently listed on the NodeJS modules page take a rather round-about approach by forking a new process and calling out to the commandline binaries provided by the imagemagick library. This is VERY VERY VERY SLOW and since image manipulation can be very intense, being able to use the library dirctly will make things MUCH faster.

Part one: babby’s first native module

This is the first part in what will hopefully be a multipart tutorial as I write a native module for imagemagick. Today, we will take a look at making the most basic of of native modules and how to use it with Node.

To start off, lets create a file called testModule.cpp. This is where everything (for now) will happen. Heres what we need to start:

Note, this is assuming you have NodeJS installed already and in your path (if you dont, go do that!). We need to import both the Node header and the V8 header.

To build our module, we will be using the node-waf tool that comes bundled with NodeJS. In the same directory as testModule.cpp create a file called wscript and put the following stuff in it:

The wscript file sets up needed environment variables and libraries that need to be linked at compile time. Think of it as some kind of makefile. The t.target property needs to match up to the name of the export property in your module (I’ll point this out when we get there).

Now, to build your module simply run the following:

Alright, now that we have those basics out of the way, lets make a module that when called, returns the string “Hello World”.

So to quote the V8 handbook:

A handle is a pointer to an object. All V8 objects are accessed using handles, they are necessary because of the way the V8 garbage collector works.

A scope can be thought of as a container for any number of handles. When you’ve finished with your handles, instead of deleting each one individually you can simply delete their scope.

So if we were to think of this in javascript, we’d basically have something like:

So now that we have a function that can do some kind of work, how do we expose it to Node? Lets take a look:

The function TestModule takes an object handle and basically shoves our function in it. This is how exports work in C++. In javascript we’d have:

Now, a note on the NODE_MODULE(…) line. Before when I said t.target needed to match, this is where it needs to. The first argument of NODE_MODULE needs to be the same as your target value.

Once you have all of that, build your new node module. To try it out, run node and import your module.

That’s it! You now have your first native module. In my next post, we’ll dig deeper into building something a bit more substantial. One of the main draws of NodeJS is its asynchronous nature, so next time we’ll take a look at how to go about building a module with asynchronous function using libuv that its at the very core of NodeJS.

Drag and Drop File Uploads with Javascript

by Sean Mcgary on September 30, 2012

The other day I was working on building a file upload interface in Javascript where a user could drag and drop files to upload to a server. I already knew that this was possible using the drag and drop api. Users could drag files from their desktop or other folder to a defined dropzone on the page and it would pull a FileList from the drop event. I use Google Chrome as my default browser, so heres what I started with:

First thing to keep in mind is that by default, browsers will try and open a file when you drag it into the window. To prevent that, we need to predent the default action as well as prevent the event from propagating up the DOM tree. After that, we are able to access the FileList. Then I fired up Firefox just to check to make sure it worked across different browsers, knowing sometimes that just because it works in Chrome doesn’t mean it will work in other browsers. Upon trying it in Firefox, it loaded up the file in the browser. Turns out, non-Chrome browsers require a bit of an extra step; you need to listen for the ‘dragover’ event and prevent that from propagating and taking effect. Here’s the revised code:

Now our drop event will work in Chrome, Firefox, and Safari. I havent had a chance yet to try it in Internet Explorer, but according to caniuse.com it looks like IE10 with Windows 8 will support drag and drop events. For those curious, here’s a jsfiddle of the above example.

A Lack of Usability in the Photo Sharing World

by Sean Mcgary on September 25, 2012

Recently Ive noticed that photo sharing sites (eg. flickr, Smugmug, etc) have rather poor user interfaces and user experience. UIs seem to have become overly complex, pushing rudimentary features out of the way to places that are not immediately accessible or take a bit of work to find. When Im using a photo sharing site, there are a couple of really basic features that I believe should be very prominent upon logging in.

The “upload” button should be VERY easy to find.

When I first log into flickr, it is not immediately obvious how I can upload photos or videos.

Flickr

When I first see this page, I immediately look at the top row where the various navigation items and menus are. Theres nothing at the root level that links to an upload page. However, under the “You” dropdown is an upload link. For a site that relies on users uploading photos, Id think that they would make it a single click away and very obvious. Instead, they nest one of the most import actions in a dropdown menu. As you make it down the page, you’ll realize that there IS an upload link on the main page, however its the same style as their section headers and doesn’t immediately stand out. This really should be styled more like a call to action so that it stands out better. Smugmug, in comparison, has an upload link on their top row of navigation, but they require you to first create a gallery if you dont have one, or pick a gallery to upload to if you have already created one. This is definitely a step in the right direction. I’ll get to my issues with their gallery structure in a little bit.

Smugmug

Galleries, sets, and categories oh my!

For those of you that are old enough, think back ten years or so. Chances are your mother, grandmother, or some other family member has a closet full of photo albums of photos from your childhood, or even their own childhood. Photo albums are the most basic and rudimentary method for organizing photos. Have a bunch of photos that happend all at once? Maybe a vacation, birthday party, or other special event? Put them all in one place, like a photo album, so that you can find them later! Even Facebook has this down. You can upload photos and then organize them into albums. Its simple and easy. Flickr and Smugmug on the other hand, make it a bit more difficult. Flickr has the notion of “sets”. Sets are essentially the same idea as an album; you name the set and select photos to put in the set. Beyond that its really easy to get lost. Organizing photos into sets is relatively simple; select “your sets” from the “organize” dropdown. Though, upon doing this you are brought to an entirely different user interface than you were just on. The entire root site navigation has disappeared and you are shown a pseudo full-screen page. Adding photos is pretty easy - just drag and drop from your “photostream” on the bottom. The flow of this organize process could be better handled as it seems like they are trying to cram too many features into one page and have thus made it a bit complex to navigate. As a note, my mother who is not the greatest with computers, has never been able to figure out how to use this particular interface on flickr.

Well how about smugmug; How does it stack up against flickr? The first thing that drives me nuts is that you HAVE to create a gallery (equivelant of an album) in order to upload any photos at all. Flickr allows its users to simply upload photos then organize later. Smugmug also doesnt allow you to include one photo in multiple galleries. There is absolutely no way to organize your photos a la flickr. Everything MUST go into a gallery, and one gallery only, and only into that gallery by uploading something to it. Smugmug also has categories. An extra level of hierarchy and organization seems like a good idea, but their flow is very limiting, much like their flow for adding photos to albums.

Help help, my photos are being help hostage!

One large point of contention on the internet right now is over reclaiming your own data from a website that you are using. Google+ does a great job with addressing this by allowing users to “liberate” their data by download a zip archive of it. Recently, my flickr pro subscription ran out. Currently I have a few hundred photos hosted through them. However, when your pro subscription runs out, flickr essentially holds your photos hostage allowing you to only have access to the 200 most recent photos. What if I didnt have a backup of my photos? (Yes, stupid I know). The only way to get them back would be to pay $25 to upgrade just to download them all. Flickr also doesnt provide a batch download feature to reclaim all of your uploaded photos. Smugmug is a little bit different. They dont follow a freemium model like flickr. They are purely a subscription service that you pay yearly. So once your subscription runs out, you have to renew it to get back to your photos. Then again, Smugmug targets professional photographers that are using them as a showcase of their work as well as for white label printing. Smugmug, as far as I can tell, also doesnt have a way to batch download photos that you have uploaded.

How I would do it differently

Both flickr and Smugmug have features that are good and features that are not so good. And if they were combined and implemented a little bit differently than you would have one hell of a site. So I am going to attempt to do just that; take features and ideas from both and improve upon them. The internet is a much different place now than it was when flickr and smugmug first launched in the early 2000’s. There is now a larger focus on building social communities and applications that are incredibly easy to use. Here is how I would do it:

Freemium Model

This app is going to be a “pay for what you use” type of deal. There will be a pricing model with a free tier where the amount you pay is based on the amount of space that you are consuming. The free tier will offer a certain amount of space rather than a limit on the number of photos. If you want to compress your photos down to a few kilobytes and upload a few hunderd, go for it! However, if you want to upload photos at their full resolution that are a few megabytes a piece, then you may want to look into one of the paid tiers. This way, people that are into “casual” photography have a place to upload and share photos for free or cheap, and professionals have an affordable way to host their photos as well. Pricing will be focused around space consumed, and not necessarily features available to you. There might be a point where some features appear that are more geared toward professionals and might be offered to paying users only, but for the most part everyone will be on an equal playing field as far as features are concerned

Focus on the basics

As I explained above, doing the simple things uploading and organizing photos and albums has gotten rather difficult. In this application, viewing photos and albums, uploading photos, and organizing your photos will be a very primary focus. The interface is designed in a way that even my own mother will be able to use it without having to call me up to walk her through the process. I figure if she can do it without help, then most everyone else should be able to as well.

Building communities

Whats use would a hosting site be if you couldnt interact with people and talk about your love for photography? Users will have the option of enabling commenting on albums and photos. Flickr has some very large communities because of their commenting system, but there is also a high level of spam comments as well. Users will be able to monitor and moderate comment threads on their own photos and albums to hopefully prevent spam from coming up. Users will also be able to favorite photos and albums as well as follow other users. If two users follow each other, they will be classified as friends. On your dashboard, you will see activity on the things you have followed; when a new photo is added to an album, when comments are made, when a user creates a new album or uploads some photos. User activity and engagement will play a key role in this new application.

Liberate your data

Users will be able to download a zip file containing all of the photos that they have uploaded. ‘nuff said.

Building an Editor for MarkdownWiki

by Sean Mcgary on May 26, 2012

The MarkdownWiki Editor

In my free time lately, I have been building a web application to refresh the wiki market. MarkdownWiki is a new cloud hosted platform that allows users to create and collaborate wiki pages. It preserves the original purpose of wikis - provide a place for users to present their knowledge, information, notes, documentation. The possibilites have become endless.

The reason I started building MarkdownWiki was to build a wiki platform that was up-to-date with today’s latest and greatest technologies. The first thing that I decied to start with was building an editor that makes editing and creating wikis easy for everyone (maybe even for my own mother!). In this post on the MarkdownWiki blog, I talk about the built in editor and how it will make creating wiki pages many times easier than it currently is.

Goodbye LAMP, Hello NodeJS

by Sean Mcgary on April 19, 2011

When I first started developing web applications about five years ago, PHP was my language of choice because it was super easy to learn and a LAMP stack was trivial to setup on my local machine to get started. At the dawn Web2.0, PHP was booming as a web language. People began developing and rapidly prototyping applications with PHP and MySQL. You knew C or C++? PHP was even easier for you to dive into and get started with. When I started, I jumped on the Codeigniter bandwagon as it made development quick and provided the minimal amount of structure needed via the MVC design pattern so that you didnt just end up with tons of files of spaghetti code that wasnt structured at all. After a while with Codeigiter, I realized it was too simple. At the time, it lacked an autoloader and didnt play well with libraries that didnt comply to its limited interface provided to work with libraries. So I decided to build my own framework. Foundation-PHP (as I decided to call it) provided a similar structure to Codeigniter, but also allowed me to change how classes were loaded and how objects were created, including the features that Codeignter lacked. At this point, I also had been introduced to MongoDB, so I decided to make that the default database handler instead of MySQL or Postgres. My framework wasn’t quite on par in terms of features compared to Codeignter, but it included all the features that I needed to build applications quickly.

Javascript comes to the server

During this time that I was hacking out applications built on PHP, other frameworks such as Ruby on Rails, or Django, or Pylons were starting to mature and developers were starting to move away from the battle tested LAMP stack. People started throwing around buzzwords like “high availability”, “big data”, “NoSQL”. PHP started its decent from a popular web language whith people starting to move to languages like Ruby and Python. Things like HTML5 and CSS3 were skyrocketing in popularity and usage. Browsers, with tons of help from Google Chrome and its rapid release cycle, started to incorporate these features that were’t even finalized by the W3C into their new versions so that people could start using these new exciting features. Javascript in particular, started to become very popular in due to developers using it to create applications with an almost desktop experience.

Then in 2009 NodeJS, created by Ryan Dahl, appeared. NodeJS consists of Google’s V8 Javascript engine along with a set of core libraries providing APIs to expose common lower level features such as file I/O, sockets, IPC mechanisms, etc. Because NodeJS is built with the help of Javascript, it is event driven providing purely asynchronous I/O thus minimizing overhead compared to traditional blocking I/O patterns.

Due to the asynchronous nature of NodeJS, it was able to perform very well when writing servers. A community quickly formed around NodeJS and started to contribute to its feature set. Developers started writing libraries and modules to work with databases, the underlying operating system, websockets, and graphics just name a few. Soon, developers jumped at the fact that NodeJS makes a fantastic web application server. Hell, an HTTP server library is built into Node’s core.

The great migration

Not to long ago, I decided to say farewell to PHP and being moving over to NodeJS. The first thing I did? (Build a web framework)[https://github.com/seanmcgary/NodeWebMVC]. Node is still very young and does not (yet) have all the frameworks and tools that more mature languages such as PHP, Ruby, and Python have. To me, having a small framework to get up and running is a huge advantage t orapid protyping and development of ideas.

Jumping in the express lane

By default, NodeJS comes with the necessary tools to create an, albeit simple, webserver. In less than 20 lines of code, you can have a “functioning” webserver that will send data to a browser upon connection. That however doesn’t do us much good when trying to create an application. Fortunately, the guys over at Visionmedia decided to build a little library called expressjs to make developing HTTP servers a bit easier. Express is built on a number of connect libraries giving you features like HTTP routing, sessions, and parsing POST, PUT, GET, and DELETE requests. Having these features puts express on par in terms of features and functionality with libraries such as Sinatra. You can quickly build a server that has routing and session handling, great for developing API servers, but it still lacks a little bit more structure needed for rich web applications.

Adding some structure

For my framework, I took express as a base and started to build around it. I very much liked the MVC pattern of Codeigniter, so I decided to sprinkle some MVC on top of express. Doing this allows the developer to clearly seperate code into controllers, models and views instead of just setting functions for specific HTTP requests. This also allows the developer to write code that is a bit more modular, taking advantage of inheritence for controllers and models. The one thing that I am loving about Javascript that is making development much easier is using Mustache (in this case Handlebars.js, a fork or mustache) for templating and view parsing. Now instead of needing to write code in view files, they are done in HTML with Mustache placeholders and then parsed serverside before being sent to the user. This makes everything much cleaner and less intrusive. This also makes the flow of data from datastore/database to view much simpler. View content can be fetched from the database and with very minimal modifications, sent directly to the view to be parsed. This is a HUUUUGE advantage.

Lets build some apps!

Since building this small framework to provide a bit of structure, Ive been able to start rapidly prototyping applications. Right now, the app I am writing, (Markdownwiki.com)[http://blog.markdownwiki.com], is a living test of my framework and allows me to constatly add features into the framework that I feel might be commonly needed by developers.

Now that my migration to NodeJS from PHP is nearly complete, I will be continually building out my NodeWebMVC framework for people to use. So if you’re looking for a framework to get started with building an app, I encourage you to check it out. I would love to get some feedback on it. As I develop it, I’ll tag stable points along the way so that you wont be confused if it for some reason it doesnt work. If you run into bugs, file them here on github and I’ll get to fixing them .Alternatively, feel free to fork it and fix them yourself and submit a pull request with your fix.

With NodeJS and server-side Javascript becoming a very prominent technology in web applications today, I figured Id give an introduction to NodeJS to get everyone up and running. “But how can Javascript run in a server environment, I thought it required a browser and was only used to make my website interactive”. NodeJS allows you to build applications written in Javascript with the help of Google’s V8 Javascript rendering engine that is at the heart of the Google Chrome web browser. Coupled with the asynchronous nature of NodeJS, V8 contributes greatly to Node’s performance and scalability.

"Okay, thats cool and all, but why should I use it?"

Theres really two reasons - Performance and the fact that Javascript is becoming a pretty universal language. If you’ve ever dealt with web development of any kind, chances are you’ve had some experience with Javascript. The performance comes from a combination of Google’s V8 engine, the other is the fact that Node is asynchronous and event driven, very much like JS applications that run in your browser. NodeJS is all a single thread.

"But wait, how does that make it more efficient than multi threaded applications? Wouldnt a single thread be a bottle neck?

If this were a traditional server implementation that would be true, but Node is a bit different. Instead of needing to manually identify events and spawn new threads based on those events, you simply register events that will act as callbacks when fired - very similar, if not exactly the same, as most applications of Javascript that will run in your browser. Since we’re not manually creating new threads for each new event, and each event is asynchronous, none of the events will block the NodeJS event loop. This allows NodeJS to execute concurrent connections very efficiently and quickly by receiving an event and then pushing it to the background to run, with the process notifying the server when its done.

"Thats pretty cool, how do I get started then?"

First off, you’re going to need a computer with a non-windows operating system (so anything Unix, Linux, or Mac OS X will work). They’re working on a build to work with Windows, but its not quite done yet. Next, grab the source from Github here. We’re going to be building version v0.4.x. NodeJS only has two things needed for building and installing - python 2.4 (or higher) and libssl-dev (if you plan on using SSL encryption). Now to build Node, cd into the directory you checked out and run the following:

This will build Node and install it to your path. Now, all you need to do to run Node on the command line is just run: $ node &lt;your file name&gt;.

Now, lets take a look at a simple NodeJS TCP server implementation:

Now lets take a more detailed look at what this is doing. First, we need to include the “net” library that is built into Node. This is done using the “require” function. Node comes with a bunch of libraries that allow you to create sockets, interact with the filesystem, make system calls, and much much more. You can also create your own libraries, both in C++ and plain javascript, to include in your Node applications. Now that we have the “net” library included, we can proceed with creating our server object. The server takes a single variable, we’ll call this “socket” because thats essentially what it is. We’ll use this to write and read data to/from our connection. Because Node is asynchronous, we need to add some listeners - connect, data, and end. The first two are pretty self explanatory, with the data event firing whenever the client sends data to the server. So, instead of creating a thread loop to block and wait for data from the client, we simply register an event for it that will listen while the rest of the server runs unblocked. When the server receives data, the callback function will be called and the logic contained will be executed. The last line in our file simply tells the server to start listening on a provided port and host. And thats it! It’s very simple, you just set some listeners and then just forget about it while it runs.

Send Mail Through Google Apps With PHP

by Sean Mcgary on January 19, 2011

Recently while I was working on a site that Im creating, I needed a way to easily send email out to users. Like a lot of people that have domain names and dont want to run their own mail server, I have decided to let Google Apps handle all of my email and app related needs. Now way you dont have a mail server and dont want to maintain one. Maybe you dont know how or just dont want to deal with maintaining such a service. Well as it turns out, you can use Google Apps as your mail server. In this example, Im using PHP to send out a message using the SMTP server and an account that Ive setup on my Google Apps domain (one that is not my primary, admin account).

First we have to install some libraries through PEAR

Now that we have our two dependancies installed, we can write our code to send our mail. Basically what this does, is it connects to Gmail’s (or Google Apps) SMTP server and sends a message. This is pretty much the same thing that happens when you send an email when using a desktop client like Thunderbird or Apple Mail. All you do is supply it with Google’s SMTP server, port, and your username and password. NOTE: if you are using Google Apps, the username is going to be your-account-name@your-domain.com.

And voila! You can now send mail through Google!

Threaded TCP Server in Python

by Sean Mcgary on October 15, 2010

Recently I finally decided to take some time to learn Python. So I figured the best way to learn something new is to dive right in and write an application. This application happened to be a new server for Computer Science House’s networking vending machine(s) ‘Drink’. The general idea behind Drink is that it’s a ‘communal refrigerator’ for CSH that the members ‘donate’ money to in order to stock it with delicious drinks (such as Coke products since RIT is exclusively a Pepsi campus). Being the geeks we are, these vending machines that we have on our floor must be accessible via the internet in some way, thats where the server comes in. The server needs to facilitate connections to each machine as well as accept incoming connections from clients that want to drop drinks and then shuffle those requests off the to tini-boards that control the physical machines. So immediately I was thrown into learning threading and sockets in Python.

Well thats cool and all, but youre probably asking why youre here. I mean, the title does hint that you’ll be learning something. This is true, we’re getting there so just sit tight for another minute. So one of the issues that needed to be overcome while writing this server was having an instance of a server that can server multiple clients at the same time and not have one client blocking the socket connection. Python, being the flexible language it is, offers you multiple ways to handle sockets, threads, and the combination of the two. In this process what we want to happen is have a server bound to a specific address and port, but once the connection is accepted, we want the server to scoop that connection to a semi-random port in its own thread so that we dont block other clients from connecting. First lets take a look at the server implementation:

First we start off by importing the socket and thread modules. Now, if we wanted to make this a threaded class, we could ‘from threading import Thread’ so that our server could inherit from Thread (this is an example of one of those many modules for threading I mentioned). Now if we look at our main “method” we define our host, port, and buffer size and we create a tuple called ‘addr’ to hold the host and port. Now in this implementation, we have created a SOCK_STREAM socket which is the same as a TCP socket. When making this a server, you have to remember to bind the socket to the addr tuple (this is not the case with the client we will implement). And finally we tell the socket to listen for connections. The 2 we pass to the listen function call tells the server that it can queue up 2 connections before it starts to refuse them.

Now for the magic/voodoo awesomeness of the server. In the while loop you’ll notice that when we call serversocket.accept() we get two variables back: a clientsocket and the clientaddr. Now, we take that socket connection and we hand it off to a thread by calling thread.start_new_thread. In this call, we pass it ‘handler’, which is the function you see defined at the top with the clientsocket and clientaddr as parameters. This function then runs, receiving and sending data with the client. Because we spawned a new thread to handle this connection, the server is free to keep accepting connections from other clients.

Now lets take a look at a simple client to interact with our server.

99% of this should look the same as the server we just created. The only difference here is that we arent creating new threads and we’re not binding our socket to our address. Instead we are just connecting to the server and then looping while we send/receive data that we receive from standard in. Hopefully this has been helpful. There are a lot of Python resources out there, but it took me a while to find an implementation that worked in my situation. So hopefully this is simple enough for anyone to modify to fit their needs.

reCaptcha for Codeigniter 2

by Sean Mcgary on August 23, 2010

Every time I make a new website with a user registration, I usually end up using a reCaptcha somewhere in the process. A while ago, I discovered a reCaptcha library on the Codeigniter forums. And since then, Ive modified it a little bit to work with Codeigniter 2.0and have placed it on Github where everyone can access it. Below is just an example of the Controller (included in the repository) so you can see how it all comes together.

Improving Database Performance With Memcached

by Sean Mcgary on May 20, 2010

When it comes to web application performance, often times your database will be the largest bottleneck and can really slow you down. So how can you speed up performance when you have a site or application that is constantly hitting your database to either write new data or to fetch stored data? One of the easiest ways is to cache the data that is accessed the most. Today, I cam going to show you a brief example of how to do this with Memecached using PHP and the Codeigniter framework. First off, what exactly is Memcached? “Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.” Basically Memcached is a distributed system that dumps key value pairs to RAM for super fast access. If you need to scale, all you do is add more RAM or more nodes with more RAM. Lets get started.

First, we need to install memcached, so I will show you how to do so with a Debian based system (Debian/Ubuntu). First off, run the following from your commandline.

sudo aptitude install memcached libmemcached2 libmemcached-dev

This installs the base Memcached server along with the development libraries. Now we need a way for PHP to interact with Memcached, so we’re going to go and grab the Memcache PHP extension from Pecl. If you haven’t installed anything via Pecl before, its basically a PHP extension repository and manager and works similar to aptitude, but specifically for PHP. To use Pecl, you need the PHP5 dev package as well as Pear.

sudo aptitude install php5-dev php-pear

Once you have those installed, you can install the PHP Memcache extension:

sudo pecl install memcache

Depending on which operating system you’re using, you may need additional packages or libraries. For me on Ubuntu 9.10, I needed to install make and g++ in order to build the extension. So just take a look at the end of the output when you install the extension, and it will tell you what is missing.

Now we’re ready to start some coding. For this tutorial Im going to be using Codeigniter since I really like its object oriented structure and its nice database class. I will be using the new (though not yet official) version 2.0. You can find it over on BitBuckt and either clone the repository (note, you’ll need Mercurial to do this) or just download the source in a compressed format. Now we’re going to need a database to grab data out of. You can call it whatever you want. The only thing that you need is a table called “users”. Since I did this with 1000ish “users”, you can make the table yourself, or you can grab a dump of the table here.

Now that we have everything set up, we can start some coding. Connecting to Memcache is really simple and is done in two lines:

nSo now, what we are going to do is create a controller and a model. In this case, the controller is just there to call our model and display the returned data. The model will be doing all actions associated with accessing the database and accessing Memcache. So we are going to create two files: a controller called main.php and a model called user_model.php:

In our controller we load the model up in the constructor since we will be using it with every function. The function “cacheusers()” is used to call the model to tell it to take all the users in the database and put them into the cache. We’ll get to the specifics of that in just a minute. The function getuser tells the model to go into the cache and find a user based on their unique user_id.

The model is a bit more involved process. First off in the constructor, we connect to Memcached and assign it to a class level variable so that all of our other methods can access it. The first function, “getusers” is an all-in-one example. Before I explain the function, lets first figure out how to access Memcached. Memcached stores things as a key-value pair in memory. Keys need to be a unique value and can be as large as 250 bytes. The value can be anything - string, array, object, pretty much anything in PHP that can be serialized can be stored as a value. This however does not include database connections or file handles. For our purposes, we are going to be storing each row of the user table in Memcached. So a way to do that in a unique way would be to store a hashed version of the SQL query that we would use if we were accessing the user from the database. So for example, if our user had the userid of 43, we would hash the query used to access him from the database:

"SELECT * FROM users WHERE user_id=43"

The get_users function is going to store ALL users in a single index in Memcached, so the first line we come to is the query to access all the users from the database. The we perform an MD5 hash on it and assign that to a variable. Now we check and see if that key is in Memcached by performing $this-&gt;memcache-&gt;get($key). If that key does not exist, it will return NULL. So we check to see if it’s null. If it is, we know that we have to hit the database to grab the data. So we do that and while we’re at it, we put the resulting data into Memcache so that when we need to get it again, it’s now there. And of course if the key does exist, we don’t even touch the database. It’s all a pretty simple and straight forward process.

Lets take a look at cache_users(). Here what we are doing is grabbing all the users from the database and looping over all of them. The idea behind this is that we want each user to be in the cache individually versus all together like in the previous example. So while we are looping over the returned users, we prepare a SQL statement for them as if we were going to get them from the database, and then we store the user in its own row in Memcache. Now to store something in Memcache, we call $this-&gt;memcache-&gt;set($key, $value, $compression, $time). $key and $value are pretty self explanatory. $compression is a boolean value (0 or 1) that specifies if you want your data compressed or not. $time is the amount of time that you want the data to stay in the cache (set in seconds). Once that time has expired, the row is flushed from the cache. Now that we have all of our users in the cache, we can call fetchuserfrom_cache and you will get your user!

Hopefully this shouldve given you a decent overview and an idea of how caching works so that you can apply it to your own applications. If you have any troubles or questions, leave a comment and I’ll help you out!

Scaling Apps With Message Queues

by Sean Mcgary on May 13, 2010

When it comes to scaling a web application, one of the easiest ways to boost performance is with an asynchronous queue. Since web apps have started to become as complex as native desktop applications, users are expecting them to perform like such. This is where using asynchronous queues comes in to play. Typically with high traffic sites like Facebook, digg, twitter, et al, not everything needs to happen instantaneously, it just needs to look like it. For example, when you choose to send a message to someone on facebook, chances when you click that “send message” button, facebook takes your message and shoves it in a queue of other messages to be processed. To you the user, it looks like its already been sent, but in reality there might be a little bit of a delay. This makes it so that facebook doesnt have to handle sending thousands of messages per minute when you immediately click that button. Granted this is just a hypothetical example, I honestly have no idea how they actually handle such requests, but it makes for a good working example so that you guys can have an idea of what’s going on.

So thats the general gist of an asynchronous queue, now lets dive in a little deeper. When looking for an asynchronous queue, or messaging system or messaging broker, as some are called, there are a variety of options to consider. Today we’re going to be looking at Apache Active MQ because it takes advantage of the Java Messaging Service (JMS) and also integrates with a wide variety of languages including Java, PHP, C++, C, C#/.NET, and a wide variety of others. The example Im going to show you will be using PHP through the Stomp protocol. First off, lets install ActiveMQ.

First off, we need a server environment. Currently Im using Ubuntu Server 9.04. ActiveMQ is a Java application, so we need to install Java.

sudo aptitude install openjdk-6-jre

We’re also going to need Maven to pull in any necessary Java dependancies.

sudo aptitude install maven2

Now somewhere (Id recommend your home directory) download the ActiveMQ source and unpack it:

wget [http://mirrors.ecvps.com/apache/activemq/apache-activemq/5.3.2/apache-activemq-5.3.2-bin.tar.gz](http://mirrors.ecvps.com/apache/activemq/apache-activemq/5.3.2/apache-activemq-5.3.2-bin.tar.gz)

tar xvf apache-activemq-5.3.2-bin.tar.gz

Now, cd into that directory and run the following:

chmod 755 bin/activemq

Now we’re ready to build ActiveMQ using Maven

mvn clean install

And thats it! Simply run ./bin/activemq and ActiveMQ will start right up.

So now that we have a message broker set up, we need a way to start sending it messages. To do this, we are going to use the Stomp protocol with PHP. Im going to show you a simple example that opens a Stomp connection, sends a message to the queue, and then retrieves the sent message and displays it on the screen. Typically you would separate this file into a producer (a script that enqueues messages) and then a consumer (usually a daemonized background process) to retrieve the message and decide what to do with it. The nice thing about ActiveMQ is that your producers can be of the same language, or different languages all together depending on your processing needs.

To start, go and grab the Stomp PHP library and unpack it. Since we will be including Stomp.php in our script, you are probably going to want to add the Stomp library to your php.ini include_path.

Now with that all installed, we are going to create our script.

The code in this example pretty much explains itself. Since this is a simple example, it sends a message, then reads it back. Typically your consumer would be running in a loop, checking the queue for new messages and then processing the messages when it receives them.

That is pretty much it. I wanted this to be just a short introduction, so hopefully in a week or so, I’ll come back and give a more involved tutorial that will show how you can separate that script into a consumer and producer to really get work done.

Codeigniter Google Calendar Library

by Sean Mcgary on June 17, 2009

So I was playing around with the Google Calendar portion of the Gdata API the other day and did some searching and found that there wasnt a Codeigniter library for it, probably because it seems that Google has teamed up with the guys that are working on the Zend Framework to bring Gdata to the PHP world. So I took the ZendGdata API for Google Calendar and implemented it in Codeigniter so that you just need to make a few simple function calls to gain authorization to a calendar, add events, query events, etc. This is my first attempt at writing a "library" for something, so hopefully it turned out well. If you find any errors in it or have any suggestions for improvement, just let me know!

Files you’ll need:

I also have it managed through a Git repository on on Github. Feel free to clone it or pull from it.

Setup:

First off we need to edit your config file. Open up your application/config/config.php file. Scroll down to the uriprotocol option and change it to PATHINFO. Then go to the uriallowedchars setting and place a question mark after ‘a-z’. This allows you to put question marks in the URL. That?s it!

  • Place the Gcal.php file in your Codeigniter syste/application/libraries direcetory
  • Install the ZendGdata library with one of the two options:

    *   Make a directory anywhere on your machine/server
    
    • Place the ZendGdata directory in it
  • Open you php.ini file and find the include_path line.

    *   Add to the include_path the directory that you placed the ZendGdata directory in. `/your/directory/ZendGdata/library`
    
    • Save and restart Apache
  • Place the ZendGdata directory where you want.

  • In the Gcal.php library file, modify the requireonce so that it points at the directory where ZendGdata is located. IE: `requireonce(\"/your/directory/ZendGdata/Library/Zend/Loader.php\");`

  • Place the calendar.php file in your controllers directory

The calendar.php file is a sample controller. In it you’ll find an example of a call to each function in the library. Also with each one, I go through and show how to manipulate the data returned since it can get a little confusing at times (the return values are often many-dimensional arrays that are a bit difficult to interpret just by looking at them).

AuthSub

AuthSub is basically Googles version of OAuth. Like OAuth, AuthSub sends the user to log into their Google account and then returns them to a specified URL with an access token as a GET variable.

ClientLogin

ClientLogin is your basic, straight forward authentication with a users Google account username and password. It’s usually used for installed applications, but I included it anyway if you want to use it.

And finally here is a list of the functions included in the library: