albertoperdomo · November 21, 2011 16:23 · Nov 21, 2011
diff --git a/mongify.txt b/mongify.txt
@@ -0,0 +1,131 @@
+This is a case study of how we would model parts of the spotify app in MongoDB.
+
+MODEL
+=====
+
+users
+-----
+username
+name
+plain_txt_password
+plan
+list_ids: [id, id, id]
+friend_ids: [id, id, id]
+
+tracks
+------
+id
+title
+album_id
+artist_ids [1,2,8]
+
+albums
+------
+id
+title
+artist_id
+published_at
+produced_by
+track_count
+awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]
+
+artists
+-------
+id
+name
+bio
+picture: [{path: ..., filename: ..., version: "thumb", ...}]
+active: true
+overdose: true
+
+lists
+-----
+id
+user_id
+title
+public: true/false
+track_ids: [id, id, id, id]
+subscriber_count
+
+plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
+-----
+user_id
+track_id
+played_at
+duration
+
+SAMPLE QUERIES
+==============
+
+Your n favorite songs (the ones with the most plays by you)
+-----------------------------------------------------------
+
+One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.
+
+Group query:
+
+db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })
+
+result:
+
+[ { _id: ..., value: { n: ...,  last_played_at: ...}
+
+map reduce query
+----------------
+
+db.plays.mapReduce(
+		function () { emit( this.track_id, { n: 1, t: this.played_at }) },
+		function (key, values) {
+
+
+			//// Prettier implementation
+			var n_res = 0;
+			var max_t = 0;
+			values.forEach(function(v){
+				n_res+=v.n;
+				if(v.t > max_t) max_t = v.t;
+			})
+			return {n: n_res, t: max_t};
+
+
+			///// Implementation with better performance
+			var ac = values[0];
+			for(var i = 1; i < values.length; i++) {
+				ac.n += values[i].n;
+				var v_t = values[i].t;
+				if(v_t > ac.t) ac.t = v_t;
+			}
+			return ac;
+		},
+			out: "fooo"
+		 query: { user_id: xxx })
+
+		db.foo.find().sort(..)
+
+How this map reduce works:
+
+Sample data:
+track_id played_at
+--------------
+1 100
+2 101
+1 102
+2 100
+1 200
+
+emit(1, { n: 1, t: 100 })
+emit(2, { n: 1, t: 101 })
+emit(1, { n: 1, t: 102 })
+emit(2, { n: 1, t: 100 })
+emit(1, { n: 1, t: 200 })
+
+
+reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
+	=>				  { n: 3, t: 200 }
+
+reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
+  =>				  { n: 2, t: 101 }
+
+...
+
+Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
diff --git a/posts_and_comments.txt b/posts_and_comments.txt
@@ -0,0 +1,34 @@
+Option A: comments are referencing a blogpost, like typically in ActiveRecord
+=============================================================================
+
+posts
+-----
+id
+title
+body
+author_id -> user
+
+comments
+--------
+id
+text
+author_id -> user
+post_id -> post
+
+Option B: comments are embedded in the blogpost document
+========================================================
+
+posts
+-----
+id
+title
+body
+author_id -> user
+comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]
+
+
+An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.
+
+posts.find({"comments._id": xxx})
+
+It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.
No results found