Skip to content

Instantly share code, notes, and snippets.

@pikesley
Forked from mrchrisadams/01.md
Created October 28, 2012 18:25
Show Gist options
  • Select an option

  • Save pikesley/3969365 to your computer and use it in GitHub Desktop.

Select an option

Save pikesley/3969365 to your computer and use it in GitHub Desktop.

Revisions

  1. @mrchrisadams mrchrisadams revised this gist Oct 28, 2012. 3 changed files with 0 additions and 0 deletions.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes.
  2. @mrchrisadams mrchrisadams created this gist Oct 28, 2012.
    186 changes: 186 additions & 0 deletions part one
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,186 @@
    In my spare time, I've recently been working with a few codebases that either are written in, or use enough code written nodejs, that make me keen to have some kind of testing framework in place to help put the same kinds of safety nets in place that I'm used to working with on Chef, Sinatra or Rails projects.

    After losing a couple of weekends to trying to find an approach to BDD style development with node, and getting my head around asynchronous coding concepts, I think I've settled on an approach that feels enough like rspec to feel comfortable enough to use for future serverside js development.

    It's way too much for a single post, to I'll be sharing the first of a three part series of posts, to help other Ruby developers used to synchronous development with rspec, adjust to asynchronous development, with the closest thing I can find to rspec right now, mocha.

    I'll cover how Mocha syntax compares to rspec, then I'll cover implementing the code to pass these mocha specs, then I'll add a post to help keep asynchronous code halfway manageable in node.

    ### How we do it in Ruby

    I'm going to use a side project I've been hacking on for a few months to show how I'd add a new class, to wrap its calls to a persistence layer, to provide a degree of encapsulation, abstracting away the database technology from external interface for the class.

    Lets say my all I want to do here with a User class here is have a method that finds me a user, stored as a hash in Redis, keyed on their machine's mac addess.

    The tests I'd write might look a bit like this pseudocode here:

    describe 'User' do

    before(:each) do
    redis = Redis.new
    redis.hmset("99:aa:44:33:01:3r", {
    :username => "mrchrisadams",
    :name => "Chris"
    :email => "[email protected]",
    :mac_address => "99:aa:44:33:01:3r"
    })
    end

    it 'fetches the user object' do
    u = User.new
    c = u.find_by_mac('99:aa:44:33:01:3r')
    c.name.should be('mrchrisadams')
    end

    end

    I use the `before` block to store a hash inside redis, setting a few extra values on it, and then later on, in the `it 'fetches the user object'` block, I instantiate an instance of my user class, and call the `find_by_mac` method to fetch me the hash I just stored in Redis.

    The implemntation code in Ruby, might look like this:

    class User do

    def initialize
    @db = Redis.new
    end

    def find_by_mac(mac)
    @db.hgetall(mac)
    end

    end

    So far so good - this is synchronous code - it feels comfortable, and is easy enough to work with.

    ### Doing this in node

    Now, lets try to take the same approach in node, to see how different this looks, but also to see what we need to be aware of when learning to think in asynchronous terms. the [completed mocha test code is here on github](https://github.com/mrchrisadams/herenow/blob/users/test/herenow/users.js), and the [completed implementation code is here too](https://github.com/mrchrisadams/herenow/blob/users/herenow/db/user.js)

    So, lets be good developers and try to write out test code first, in Mocha, the javascript flavoured take on rspec. I'll paste the lost, then go through the interesting bits piece by piece.

    describe('User', function() {
    describe('#findByDevice', function() {

    beforeEach(function(done) {

    db.hmset("mrchrisadams", {
    name: "Chris Adams",
    username: "mrchrisadams",
    devices: ["00:1e:c2:a4:d3:5e"],
    email_address: "[email protected]"
    }, done);

    })


    it('should fetch the user for that mac', function(done) {
    var user = new User();
    user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
    console.log(err)
    } else {
    // console.log(res)
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
    }
    }); // find by device
    }) // should fetch the user for that mac
    })
    })

    So first of all, much of the syntax is somewhat familiar. We have nested `describe` and `it` blocks, and event the assertion syntax is reassuringly familiar with nice, readable `should`s, `be`s and `equal`s around.

    However, there are a few important additions here that we need to allow for the asynchronous nature of node. First of all, lets look at the `beforeEach` function.

    beforeEach(function(done) {

    db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
    }, done);
    })

    In this case, we're making a call to the `node-redis` a popular redis library for node, that is completely asynchronous. We could have tried usingthe bog-standard `beforeEach` function like this, when working with an asynchronous library (not the lack of `done`):

    beforeEach(function() {
    db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
    });
    })

    Had we done this, node would have zipped to the `beforeEach` function, started it, then returned straight away, racing ahead trying to run the tests below it, without waiting for our Redis setup steps to be finished. Now Redis is fast, but you can't rely on that to make sure your tests are set up before you run them, and this code would have given us at best unpredictable results, but more likely, fails across the board.

    #### `done` to the rescue

    Here's how we do it when working with async library:

    beforeEach(function(done) {

    db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
    }, done);
    })

    The difference this time round is that we're passing in `done`, a function that exists to stop the tests running until Redis setup steps are finished, and we're in a state for testing.

    We have to take this approach for the tests itself in our `it` function:

    it('should fetch the user for that mac', function(done) {
    var user = new User();
    user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
    // do something to recover
    } else {
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
    }
    }); // close findByDevice
    }) // close should fetch the user for that mac

    Here, we're doing something very different to the ruby approach of storing returned values from methods in variables, then testing the value of those.

    #### Our first real exposure to "continuation-passing style"

    Look at this line in paticular:

    user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res)

    Understanding this here for me was the key to getting my head around this initially very alien syntax, and if you're having trouble with the shift from sync to async, the closest thing in typical sync ruby code might be something like, where you set the varibale `res` then use it for testing assertions against:

    res = findByDevice('00:1e:c2:a4:d3:5e)
    res.should be_okay

    Now there's two important things here to remember when working with node:

    1) Because we're working asynchronously, we only want to run out assertions once we know we have the values back from the call we just made
    2) We we're working with javascript, we can pass functions around to execute the code inside them, at a later date.

    So, our solution to the asynchronous problem, is to pass in a _function with the assertions we care about inside it_, as a parameter to our `findByDevice` call on the user object.

    So, what we're saying here is, "go fetch the results of `findByDevice`, with the paramters `00:1e:c2:a4:d3:5e`, and here's the function I'd like you to execute when you're done, please":

    user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
    // do something to recover
    } else {
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
    }
    })

    You might be confused by the two paramters `err`, and `res`. These are generally accepted convention when coding asynchronously in node, to make it possible to pass the result from one call back to another. Passing in a function which itself has the parameters `error`, and `result` (or some variaton on the name) as the last argument going into a method call is a generally accepted convention with node now, and is often referred to as the continuation-passing style. It's crucial to understand it, because you won't get far without it.

    #### Where does `done()` fit into this?

    You might notice a call to `done()` on the last line of the function we're passing in, after the assertions. Mocha, when you pass in `done` to a testing block, doesn't known when the test is passed or failed, so will wait for `done()` to be called, before deciding if that particular `it` testing block has failed or passed.

    ### Now onto the implementation in node

    So we've run thorugh how Mocha works now, and how it compares to Rspec, and we've seen how we rely on anonymmous functions run our assertions on the results of asynchronous functions. (i.e. anonymous functions are functions with no name, just the keyword, parameters, the code to execute like - `function(err, res) { // do stuff }` ).

    It's worth re-reading the above, until you're really comfortable with the concepts, as the next section is unfortunately pretty messy.
    75 changes: 75 additions & 0 deletions part three
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,75 @@
    In the last post, I implemented an asynchronous function that wrapped a call to Redis, using an existing node library, `node-redis`. The final implementation introduced nested asynchronous method calls, and the code ended up looking a bit like this, even after simplifying somewhat:

    User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
    if (err) { console.log(err) }
    else {
    if (device.hasOwnProperty('mac')) {
    db.hgetall(device.owner, function (err, user) {
    if (err) { console.log(err)}
    else{
    callback(err , user)
    }
    });
    }
    }
    });
    }

    Now the code took this form, because we relied on the results of one asynchronous function to make the second one - if you take a second to imagine how hard to read this would look at four, five or six levels of nesting, you'll quickly understand why so many developers are writing their own callback management libraries to make this easier to work with.

    The one I've found most promising so far is `async.js`, a fairly comprehensive utility module that provides a number of different ways to ensure that asynchronous functions are either called in a specific order, or run in parallel, before aggregating their results before allowing code to continue and so on.

    In this case, I'll be focusing on the use of `waterfall`, a function in the `async` module that lets you pass in an array of functions to be called in order, passing the results of one to the next, until a final callback passes the final result on to the code initially calling the function async was called from within.

    User.prototype.findByDevice = function(device_mac, callback) {

    async.waterfall([

    // fetch our device first
    function(cb){
    db.hgetall(device_mac, function (err, res) {
    cb(null, res);
    })
    },

    // new we have our device, fetch the user
    function(device, cb){
    db.hgetall(device.owner, function (err, res) {
    cb(err, res);
    })
    }

    // return our user object
    ], function (err, user) {
    callback(err, user)
    });

    }

    In our case, we have our function `findByDevice` on `User`, and we have passed an array containing our two asynchronous functions as the first argument to `async`, then passing a final anonymous function to return our user object.

    To be more specific, just like the code above, we take our mac address string as the first parameter to `findByDevice`, and the function to execute as our second parameter, `callback`. We then make the asynchronous call to Redis to fetch a device object, passing in `cb` as our function to execute once Redis has given us our hash, to pass it to the next function in the array.

    We then use the `owner` property of the device object passed into the second function, to make another call to fetch our user, again passing in `cb`, to execute once Redis has given us a user object, to pass to the final function.

    Once we have the user object, we can pass it on to the code that called `findByDevice` with `callback(err, user)`, completing the asynchronous callback chain.

    #### More than just waterfalls

    Of course, just because we now know how to execute asynchronous functions in a set order, one after the other like we're used to doesn't mean we _should_ always do so.

    One of the advantages of node's asynchronous style is that it allows the [parallel execution](https://github.com/caolan/async#parallel) of code, so the same operations could be applied to the an array of values at the same time, getting around bottlenecks, but then only passing on the results once all the operations have been completed.

    Alternatively, this allows us to pop values onto [queues](https://github.com/caolan/async#queueworker-concurrency), with set numbers of workers, to work through them, without needing a dedicated worker process like you would with in Rails for, when using [delayed_job](https://github.com/collectiveidea/delayed_job) or [resque](https://github.com/defunkt/resque).

    #### Doesn't all this seem like a lot of work though? The ruby you showed me first was much shorter, and easier to read

    In a word, yes.

    Node isn't a magic bullet, and although it's popular, if you're doing a basic CRUD app, there are often very good reasons to choose Rails, Django over Node and Express.

    That being said, it pays to understand your options when choosing a particular technology to solve the problem facing you, and if you're a fan of behaviour driven development, that such an approach is possible. Also, once you've got your head around async programming, it's good to knwo that there some well developed tools to help you apply these techniques to both server, and client side javascript.

    If anything's not clear in this series, please let me know - I've sunk a good few hours into these posts now, to make it easier to understand async node development if you're used to sync ruby development, and I'd really like to know where I can improve these for future visitors.
    106 changes: 106 additions & 0 deletions part two
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,106 @@
    This is the second of the three part series covering how to migrate from developing synchronously with Rails and Rspec, to asynchronously with Node, and Mocha. It picks up from the previous post, introducing Mocha syntax, and asynchronous testing.

    We've covered before how we'd implement a class with instance methods in Ruby in the previous post. Here's the simplified psuedo-code, for comparison:

    class User do

    def initialize
    @db = Redis.new
    end

    def find_by_mac(mac)
    @db.hgetall(mac)
    end

    end

    It looks a bit different when working with asynchronous node code.

    #### What implementation in node looks like

    Because javascript doesn't have a class system, if we want something that acts a but like a class, the idiomatic approach is to use functions, and use a bit of boilerplate code to make it easier to identify the function in stacktraces or logging when developing.

    In javascript, instead of defining class methods or instance methods like we do with Ruby, the idiomatic approach here is to use `prototype`, to inject new methods into the `User` function, so they're available to all instances of the `User` function in the system.

    This chunk of code below is roughly analagous to declaring a `User` class in ruby, mixing in methods from an `EventEmitter` module and giving it a `to_s` method, so there's a readable string returned when you try to log the class, or print it:


    function User() {
    if(false === (this instanceof User)) {
    return new User();
    }
    events.EventEmitter.call(this);
    }
    sys.inherits(User, events.EventEmitter);

    User.prototype.toString = function() {
    return "User"
    }

    One thing - `User.prototype.toString` is synchronous because we can rely on it returning a value instantaneously , without thinking about callbacks.

    #### Writing asynchronous functions

    However, when we're working with asynchronous functions, we need to know how to define them as well as call them.

    Here's a simplified version of an asynchronous function in use in the `User` function. We have defined a function on the prototype of `User`, accepting two parameters: `device_mac` a String we use as our key when fetching a hash with Redis, and `callback` the function we want to pass into `findByDevice` for later execution when Redis gives us our hash to execute operations on when it's done. In line with convention, our function `callback` itself takes two parameters, `err` and `res` - in our case, `res` is the hash given to us by Redis, if all is well, and `err` is what we get if something goes wrong with Redis when it's fetching our hash for us.

    User.prototype.findByDevice = function(device_mac, callback) {
    db.hgetall(device_mac, function (err, res) {
    callback(null, res);
    })
    })

    #### Checking this against our test code

    It might be helpful to show these side by side, to put the implemented function on `User`, next to the function we're passing with our test to see what it is we're passing into `findByDevice`:

    Our implemented function:

    User.prototype.findByDevice = function(device_mac, callback) {
    db.hgetall(device_mac, function (err, res) {
    callback(err, res);
    })
    })

    The test:

    user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
    // do something to recover
    } else {
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
    }
    })

    When we have the value from Redis, `callback(err, res)` is executing the function below, with our `res.username.should.be.ok` type assertions.


    ### When you need to call async functions from async functions

    Once you've got your head around passing functions for asynchronous code, you'll often find yourself working with multiple asynchronous functions, that you need to control the order of, so that data is passed from one to the other, to give you the result you want. Here's the actual implementation of the `findByDevice` I ended up using in the project I'm working on. We still pass in the `callback` function as our final parameter, but in order to return the value we want, we end up nesting a second call to `db.hgetall` inside the anonymous function we pass into our first call of `db.hgetall`, then using `callback(err, user)` to execute the function passed into `findByDevice`, and pass the results along to the code initially calling `user.findByDevice`.

    User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
    if (err) { console.log(err) }
    else {
    if (device.hasOwnProperty('mac')) {
    db.hgetall(device.owner, function (err, user) {
    if (err) { console.log(err)}
    else{
    callback(err , user)
    }
    });
    }
    }
    });
    }

    ### Avoiding callback hell.

    Even with just two asynchronous function calls, this isn't very readable, and it seems that nearly every second developer on the planet playing with node has written their own callback handling library to make this easier to read and more maintainable.

    In fact there's a bewildering number of libraries out there that claim to make this problem much easier to understand. In the next post, I'll introduce `async.js` a well documented library I've found fairly straightforward to work with, to help mitigate against callback hell.