Skip to content

Instantly share code, notes, and snippets.

@foxish
Forked from bprashanth/mongo.md
Last active August 9, 2016 09:07
Show Gist options
  • Select an option

  • Save foxish/b0dfcded735d74e1cd9bc21ed441e3c3 to your computer and use it in GitHub Desktop.

Select an option

Save foxish/b0dfcded735d74e1cd9bc21ed441e3c3 to your computer and use it in GitHub Desktop.

Revisions

  1. foxish revised this gist Aug 9, 2016. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -55,8 +55,8 @@ WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    OpLog size: depends on storage engine, 3 types: in-memory, wiredTiger, mmapv1.

    * mmapv1 is default and preferred, due to maturity.
    * wiredTiger known to have some issues in the past.
    * mmapv1 was default and preferred, due to maturity.
    * wiredTiger known to have some issues in the past but is default since 3.2.


    ## Failover
  2. foxish revised this gist Aug 9, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb)
    MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb/tree/master/kubernetes)

    # Concepts

  3. foxish revised this gist Aug 9, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    MongoDB is document database that supports range and field queries.
    MongoDB is document database that supports range and field queries (https://github.com/foxish/docker-mongodb)

    # Concepts

  4. foxish revised this gist Aug 9, 2016. 1 changed file with 28 additions and 11 deletions.
    39 changes: 28 additions & 11 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -12,8 +12,6 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'

    Replication is asynchronous.
    Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
    Priority values assigned to each node, and are floating point numbers between 0 and 1000.
    Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.

    ## Configuration
    Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
    @@ -24,34 +22,53 @@ Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes ar
    * w: n; ack from n voting members.
    * w: `<tag set>`; ack from members having a particular tag.

    Roles:
    Priority values assigned to each node, and are floating point numbers between 0 and 1000.
    Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.
    Read concern: local/majority. Local means read from primary, majority might read from secondaries.

    ## Roles:
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: Typically hidden, records master copies with a delay to avoid eg: human error.

    ## Initial Deployment
    A simple mongodb replicaset, with three members. We start with an image which turns on replicasets for the instance by supplying the right commandline flags. This becomes the image that we supply to our petset with 3 replicas.

    * After the pods are created, we pick any one pod and execute `rs.initiate()` after connecting to its mongo instance. That node turns into primary. Then `rs.add()` the other two pods using their cluster domain names.
    * For example:

    ```
    rs.add("mongodb-1.mongodb.default.svc.cluster.local")
    rs.add("mongodb-2.mongodb.default.svc.cluster.local")
    ```

    ## Scaling

    * Automatic Failover works with petsets out of the box.
    * Adding new nodes involves finding the PRIMARY and running the corresponding `rs.add(...)` commands on it.
    * Reading from slaves require execution of `rs.slaveOk()` on connections to slaves.

    ## Fault tolerance
    Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.


    OpLog size: depends on storage engine, 3 types: in-memory, wiredTiger, mmapv1.

    * mmapv1 is default and preferred, due to maturity.
    * wiredTiger known to have some issues in the past.

    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

    ## Failover
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync. This could be done using snapshots.

    ## Changing hostnames
    1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
    2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.

    * Change hostnames of secondary members, remove the old hostname and add the new hostname to the replicaset.
    * Stop all members, reconfigure offline using same datadir.


    ## 2 problems
    rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
    rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
    false elections
    Rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propagates to a healthy reachable secondary, because it will become master.

    Rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
  5. foxish revised this gist Aug 9, 2016. 1 changed file with 17 additions and 8 deletions.
    25 changes: 17 additions & 8 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,7 @@
    MongoDB is document database that supports range and field queries.

    # Concepts

    ## Replication

    A single server can run either standalone or as part of a replica set.
    @@ -13,23 +15,30 @@ Failover: If a primary doesn't communicate with the others for > 10s, secondarie
    Priority values assigned to each node, and are floating point numbers between 0 and 1000.
    Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.

    ## Configuration
    Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
    * wtimeout is how long to wait for ack.
    * w: 0; no ack of write)
    * w: 1; ack when write has propagated to primary. (default)
    * w: majority; there needs to be an ack from a majority of voting nodes
    * w: n; ack from n voting members.
    * w: `<tag set>`; ack from members having a particular tag.

    Roles:
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: Typically hidden, records master copies with a delay to avoid eg: human error.

    ## Initial Deployment




    ## Fault tolerance
    Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration
    Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
    * wtimeout is how long to wait for ack.
    * w: 0; no ack of write)
    * w: 1; ack when write has propagated to primary. (default)
    * w: majority; there needs to be an ack from a majority of voting nodes
    * w: n; ack from n voting members.
    * w: `<tag set>`; ack from members having a particular tag.


    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
  6. foxish revised this gist Aug 9, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -29,7 +29,7 @@ Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes ar
    * w: 1; ack when write has propagated to primary. (default)
    * w: majority; there needs to be an ack from a majority of voting nodes
    * w: n; ack from n voting members.
    * w: <tag set>; ack from members having a particular tag.
    * w: `<tag set>`; ack from members having a particular tag.

    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
  7. foxish revised this gist Aug 9, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -23,7 +23,7 @@ Number of members that can become unavailable and the cluster can still elect pr
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration
    Write concern: { w: <value>, j: <boolean>, wtimeout: <number> }; How writes are acknowledged in the system.
    Write concern: `{ w: <value>, j: <boolean>, wtimeout: <number> }`; How writes are acknowledged in the system.
    * wtimeout is how long to wait for ack.
    * w: 0; no ack of write)
    * w: 1; ack when write has propagated to primary. (default)
  8. foxish revised this gist Aug 9, 2016. 1 changed file with 8 additions and 1 deletion.
    9 changes: 8 additions & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -23,7 +23,14 @@ Number of members that can become unavailable and the cluster can still elect pr
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration
    Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
    Write concern: { w: <value>, j: <boolean>, wtimeout: <number> }; How writes are acknowledged in the system.
    * wtimeout is how long to wait for ack.
    * w: 0; no ack of write)
    * w: 1; ack when write has propagated to primary. (default)
    * w: majority; there needs to be an ack from a majority of voting nodes
    * w: n; ack from n voting members.
    * w: <tag set>; ack from members having a particular tag.

    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

  9. foxish revised this gist Aug 9, 2016. 1 changed file with 6 additions and 4 deletions.
    10 changes: 6 additions & 4 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -10,14 +10,16 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'

    Replication is asynchronous.
    Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
    Priority values assigned to each node, and are floating point numbers between 0 and 1000.
    Priority 0 members cannot vote. Higher-priority members are more likely to call elections, and are more likely to win.

    Roles:
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set, used to break ties.
    * Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
    * Delayed: Typically hidden, records master copies with a delay to avoid eg: human error.

    ## Fault tolerance
    Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
    Number of members that can become unavailable and the cluster can still elect primary. 50 members, 7 voting members => 46 can go down (but only 3 of the voting members).
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration
  10. foxish revised this gist Aug 8, 2016. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,7 @@ Arbiter: used to achieve majority vote with even members, do not hold data, don'
    Replication is asynchronous.
    Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
    Roles:
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set, used to break ties.
    * Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.
    @@ -26,7 +26,7 @@ Read concern: local/majority. Local means read from primary, majority might read
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

    ## Failover
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empty datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.

    ## Changing hostnames
    1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
  11. @bprashanth bprashanth revised this gist Aug 4, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion mongo.md
    Original file line number Diff line number Diff line change
    @@ -28,7 +28,7 @@ OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.
    ## Failover
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.

    ## Changing hostnames:
    ## Changing hostnames
    1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
    2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.

  12. @bprashanth bprashanth revised this gist Aug 4, 2016. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -16,24 +16,24 @@ Roles:
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.

    ## Fault tolerance:
    ## Fault tolerance
    Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration:
    ## Configuration
    Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

    ## Failover:
    ## Failover
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.

    ## Changing hostnames:
    1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
    2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.


    ## 2 problems:
    ## 2 problems
    rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
    rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
    false elections
  13. @bprashanth bprashanth revised this gist Aug 4, 2016. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,4 @@

    This example runs mongodb through a petset.
    MongoDB is document database that supports range and field queries.

    ## Replication

  14. @bprashanth bprashanth created this gist Aug 4, 2016.
    40 changes: 40 additions & 0 deletions mongo.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,40 @@

    This example runs mongodb through a petset.

    ## Replication

    A single server can run either standalone or as part of a replica set.
    A "replica set" is set of mongod instances with 1 primary.
    Primary: receives writes, services reads. Can step down and become secondary.
    Secondary: replicate the primary's oplog. If the primary goes down, secondaries will hold an election.
    Arbiter: used to achieve majority vote with even members, do not hold data, don't need dedicated nodes. Never becomes primary.

    Replication is asynchronous.
    Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election.
    Roles:
    * Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
    * Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
    * Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
    * Delayed: just like hidden but records master copies with a delay to avoid eg: human error.

    ## Fault tolerance:
    Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members).
    WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

    ## Configuration:
    Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries.
    Read concern: local/majority. Local means read from primary, majority might read from secondaries.
    OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

    ## Failover:
    New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.

    ## Changing hostnames:
    1. Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
    2. Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.


    ## 2 problems:
    rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master.
    rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available.
    false elections