Skip to content

Instantly share code, notes, and snippets.

@albertoperdomo
Created November 21, 2011 16:23
Show Gist options
  • Select an option

  • Save albertoperdomo/1383129 to your computer and use it in GitHub Desktop.

Select an option

Save albertoperdomo/1383129 to your computer and use it in GitHub Desktop.

Revisions

  1. Alberto Perdomo created this gist Nov 21, 2011.
    131 changes: 131 additions & 0 deletions mongify.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,131 @@
    This is a case study of how we would model parts of the spotify app in MongoDB.

    MODEL
    =====

    users
    -----
    username
    name
    plain_txt_password
    plan
    list_ids: [id, id, id]
    friend_ids: [id, id, id]

    tracks
    ------
    id
    title
    album_id
    artist_ids [1,2,8]

    albums
    ------
    id
    title
    artist_id
    published_at
    produced_by
    track_count
    awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]

    artists
    -------
    id
    name
    bio
    picture: [{path: ..., filename: ..., version: "thumb", ...}]
    active: true
    overdose: true

    lists
    -----
    id
    user_id
    title
    public: true/false
    track_ids: [id, id, id, id]
    subscriber_count

    plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
    -----
    user_id
    track_id
    played_at
    duration

    SAMPLE QUERIES
    ==============

    Your n favorite songs (the ones with the most plays by you)
    -----------------------------------------------------------

    One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.

    Group query:

    db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })

    result:

    [ { _id: ..., value: { n: ..., last_played_at: ...}

    map reduce query
    ----------------

    db.plays.mapReduce(
    function () { emit( this.track_id, { n: 1, t: this.played_at }) },
    function (key, values) {


    //// Prettier implementation
    var n_res = 0;
    var max_t = 0;
    values.forEach(function(v){
    n_res+=v.n;
    if(v.t > max_t) max_t = v.t;
    })
    return {n: n_res, t: max_t};


    ///// Implementation with better performance
    var ac = values[0];
    for(var i = 1; i < values.length; i++) {
    ac.n += values[i].n;
    var v_t = values[i].t;
    if(v_t > ac.t) ac.t = v_t;
    }
    return ac;
    },
    out: "fooo"
    query: { user_id: xxx })

    db.foo.find().sort(..)

    How this map reduce works:

    Sample data:
    track_id played_at
    --------------
    1 100
    2 101
    1 102
    2 100
    1 200

    emit(1, { n: 1, t: 100 })
    emit(2, { n: 1, t: 101 })
    emit(1, { n: 1, t: 102 })
    emit(2, { n: 1, t: 100 })
    emit(1, { n: 1, t: 200 })


    reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
    => { n: 3, t: 200 }

    reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
    => { n: 2, t: 101 }

    ...

    Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
    34 changes: 34 additions & 0 deletions posts_and_comments.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,34 @@
    Option A: comments are referencing a blogpost, like typically in ActiveRecord
    =============================================================================

    posts
    -----
    id
    title
    body
    author_id -> user

    comments
    --------
    id
    text
    author_id -> user
    post_id -> post

    Option B: comments are embedded in the blogpost document
    ========================================================

    posts
    -----
    id
    title
    body
    author_id -> user
    comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]


    An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.

    posts.find({"comments._id": xxx})

    It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.