foxish · August 9, 2016 09:07 · Aug 9, 2016 · Aug 9, 2016 · Aug 9, 2016 · Aug 9, 2016
diff --git a/mongo.md b/mongo.md
@@ -55,8 +55,8 @@ WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
 OpLog size: depends on storage engine, 3 types: in-memory, wiredTiger, mmapv1.
 
-* mmapv1 is default and preferred, due to maturity.
-* wiredTiger known to have some issues in the past.
+* mmapv1 was default and preferred, due to maturity.
+* wiredTiger known to have some issues in the past but is default since 3.2. 
 
 
 ## Failover

diff --git a/mongo.md b/mongo.md
@@ -1,4 +1,4 @@
-MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb)
+MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb/tree/master/kubernetes)
 
 # Concepts
 

diff --git a/mongo.md b/mongo.md
@@ -1,4 +1,4 @@
-MongoDB is document database that supports range and field queries. 
+MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb)
 
 # Concepts
 

diff --git a/mongo.md b/mongo.md
@@ -12,8 +12,6 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'
 
 Replication is asynchronous.
 Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
-Priority values assigned to each node, and are floating point numbers between 0 and 1000. 
-Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.
 
 ## Configuration
 Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
@@ -24,34 +22,53 @@ Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes ar
   * w: n; ack from n voting members.
   * w: `<tag set>`; ack from members having a particular tag.
 
-Roles:
+Priority values assigned to each node, and are floating point numbers between 0 and 1000. 
+Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.
+Read concern: local/majority. Local means read from primary, majority might read from secondaries.
+
+## Roles:
 * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
 * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
 * Delayed: Typically hidden, records master copies with a delay to avoid eg: human error. 
 
 ## Initial Deployment
+A simple mongodb replicaset, with three members. We start with an image which turns on replicasets for the instance by supplying the right commandline flags. This becomes the image that we supply to our petset with 3 replicas.  
 
+* After the pods are created, we pick any one pod and execute `rs.initiate()` after connecting to its mongo instance. That node turns into primary. Then `rs.add()` the other two pods using their cluster domain names.
+* For example:
 
+```
+rs.add("mongodb-1.mongodb.default.svc.cluster.local") 
+rs.add("mongodb-2.mongodb.default.svc.cluster.local") 
+```
 
+## Scaling
+
+* Automatic Failover works with petsets out of the box.
+* Adding new nodes involves finding the PRIMARY and running the corresponding `rs.add(...)` commands on it.
+* Reading from slaves require execution of `rs.slaveOk()` on connections to slaves.
 
 ## Fault tolerance
 Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
 
+OpLog size: depends on storage engine, 3 types: in-memory, wiredTiger, mmapv1.
+
+* mmapv1 is default and preferred, due to maturity.
+* wiredTiger known to have some issues in the past.
 
-Read concern: local/majority. Local means read from primary, majority might read from secondaries.
-OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
 
 ## Failover
-New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
+New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync. This could be done using snapshots.
 
 ## Changing hostnames
-1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
-2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.
+
+* Change hostnames of secondary members, remove the old hostname and add the new hostname to the replicaset.
+* Stop all members, reconfigure offline using same datadir.
 
 
 ## 2 problems
-rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
-rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
-false elections
+Rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propagates to a healthy reachable secondary, because it will become master.
+
+Rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
diff --git a/mongo.md b/mongo.md
@@ -1,5 +1,7 @@
 MongoDB is document database that supports range and field queries. 
 
+# Concepts
+
 ## Replication
 
 A single server can run either standalone or as part of a replica set.
@@ -13,23 +15,30 @@ Failover: If a primary doesn't communicate with the others for > 10s, secondarie
 Priority values assigned to each node, and are floating point numbers between 0 and 1000. 
 Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.
 
+## Configuration
+Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
+  * wtimeout is how long to wait for ack.
+  * w: 0; no ack of write)
+  * w: 1; ack when write has propagated to primary. (default) 
+  * w: majority; there needs to be an ack from a majority of voting nodes 
+  * w: n; ack from n voting members.
+  * w: `<tag set>`; ack from members having a particular tag.
+
 Roles:
 * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
 * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
 * Delayed: Typically hidden, records master copies with a delay to avoid eg: human error. 
 
+## Initial Deployment
+
+
+
+
 ## Fault tolerance
 Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
-## Configuration
-Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
-  * wtimeout is how long to wait for ack.
-  * w: 0; no ack of write)
-  * w: 1; ack when write has propagated to primary. (default) 
-  * w: majority; there needs to be an ack from a majority of voting nodes 
-  * w: n; ack from n voting members.
-  * w: `<tag set>`; ack from members having a particular tag.
+
 
 Read concern: local/majority. Local means read from primary, majority might read from secondaries.
 OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

diff --git a/mongo.md b/mongo.md
@@ -29,7 +29,7 @@ Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes ar
   * w: 1; ack when write has propagated to primary. (default) 
   * w: majority; there needs to be an ack from a majority of voting nodes 
   * w: n; ack from n voting members.
-  * w: <tag set>; ack from members having a particular tag.
+  * w: `<tag set>`; ack from members having a particular tag.
 
 Read concern: local/majority. Local means read from primary, majority might read from secondaries.
 OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

diff --git a/mongo.md b/mongo.md
@@ -23,7 +23,7 @@ Number of members that can become unavailable and the cluster can still elect pr
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
 ## Configuration
-Write concern: { w: <value>, j: <boolean>, wtimeout: <number> }; How writes are acknowledged in the system.
+Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
   * wtimeout is how long to wait for ack.
   * w: 0; no ack of write)
   * w: 1; ack when write has propagated to primary. (default) 

diff --git a/mongo.md b/mongo.md
@@ -23,7 +23,14 @@ Number of members that can become unavailable and the cluster can still elect pr
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
 ## Configuration
-Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
+Write concern: { w: <value>, j: <boolean>, wtimeout: <number> }; How writes are acknowledged in the system.
+  * wtimeout is how long to wait for ack.
+  * w: 0; no ack of write)
+  * w: 1; ack when write has propagated to primary. (default) 
+  * w: majority; there needs to be an ack from a majority of voting nodes 
+  * w: n; ack from n voting members.
+  * w: <tag set>; ack from members having a particular tag.
+
 Read concern: local/majority. Local means read from primary, majority might read from secondaries.
 OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
 

diff --git a/mongo.md b/mongo.md
@@ -10,14 +10,16 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'
 
 Replication is asynchronous.
 Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
+Priority values assigned to each node, and are floating point numbers between 0 and 1000. 
+Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.
+
 Roles:
-* Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set, used to break ties.
-* Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
+* Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
 * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
-* Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
+* Delayed: Typically hidden, records master copies with a delay to avoid eg: human error. 
 
 ## Fault tolerance
-Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
+Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
 ## Configuration

diff --git a/mongo.md b/mongo.md
@@ -11,7 +11,7 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'
 Replication is asynchronous.
 Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
 Roles:
-* Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
+* Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set, used to break ties.
 * Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
 * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
 * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
@@ -26,7 +26,7 @@ Read concern: local/majority. Local means read from primary, majority might read
 OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
 
 ## Failover
-New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
+New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
 
 ## Changing hostnames
 1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.

diff --git a/mongo.md b/mongo.md
@@ -28,7 +28,7 @@ OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
 ## Failover
 New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
 
-## Changing hostnames:
+## Changing hostnames
 1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
 2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.
 

diff --git a/mongo.md b/mongo.md
@@ -16,24 +16,24 @@ Roles:
 * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
 * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
 
-## Fault tolerance:
+## Fault tolerance
 Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
 WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
 
-## Configuration:
+## Configuration
 Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
 Read concern: local/majority. Local means read from primary, majority might read from secondaries.
 OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
 
-## Failover:
+## Failover
 New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
 
 ## Changing hostnames:
 1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
 2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.
 
 
-## 2 problems:
+## 2 problems
 rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
 rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
 false elections
diff --git a/mongo.md b/mongo.md
@@ -1,5 +1,4 @@
-
-This example runs mongodb through a petset.
+MongoDB is document database that supports range and field queries. 
 
 ## Replication
 

diff --git a/mongo.md b/mongo.md
@@ -0,0 +1,40 @@
+
+This example runs mongodb through a petset.
+
+## Replication
+
+A single server can run either standalone or as part of a replica set.
+A "replica set" is set of mongod instances with 1 primary.
+Primary: receives writes, services reads. Can step down and become secondary.
+Secondary: replicate the primary's oplog. If the primary goes down, secondaries will hold an election.
+Arbiter: used to achieve majority vote with even members, do not hold data, don't need dedicated nodes. Never becomes primary.
+
+Replication is asynchronous.
+Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
+Roles:
+* Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
+* Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
+* Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
+* Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
+
+## Fault tolerance:
+Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
+WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.
+
+## Configuration:
+Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
+Read concern: local/majority. Local means read from primary, majority might read from secondaries.
+OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
+
+## Failover:
+New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
+
+## Changing hostnames:
+1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
+2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.
+
+
+## 2 problems:
+rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
+rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
+false elections
No results found