Last active
April 27, 2020 11:46
-
-
Save pbailis/5660980 to your computer and use it in GitHub Desktop.
Revisions
-
pbailis revised this gist
May 28, 2013 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -6,6 +6,8 @@ Notes from courses like Lorenzo Alvisi's [Distributed Computing](http://www.cs.u There are a bunch of classics on [causality](http://www.cs.princeton.edu/courses/archive/fall11/cos518/papers/time-clocks.pdf), [Paxos](http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf) (and more [practical takes on Paxos](http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)), and [distributed snapshots](http://ece842.com/S13/readings/chandy85.pdf). **Edit: aside from these below, Alex Feinberg and Henry Robinson's lists at [this Quora post](http://www.quora.com/What-are-the-seminal-papers-in-distributed-systems-Why) contain a bunch of good practically-oriented but theoretically grounded papers.** Practical databases: * **Consistency in Partitioned Networks** [PDF](http://delab.csd.auth.gr/~dimitris/courses/mpc_fall05/papers/invalidation/acm_csur85_partitioned_network_consistency.pdf) [ACM](http://dl.acm.org/citation.cfm?id=5508) A nice, practical discussion of techniques database systems can employ to ensure consistency under partitions. This survey predates CAP by several decades but is well-written and summarizes several important ideas. -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ Practical databases: * **Megastore: Providing Scalable, Highly Available Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/papers/jbaker-megastore.pdf) Megastore gives a reasonable example of a Paxos-based database architecture. * **Consistency Tradeoffs in Modern Distributed Database System Design** [PDF](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf) [IEEE](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6127847) is a great paper from Daniel Abadi reminding us that, aside from behavior during failures, highly available ("AP") systems also achieve low latency. * There are many remnants of the **Bayou** project in many "AP" systems today. The project was aimed at disconnected operation in a proto-smartphone/mobile computing era; a good overview is [The Bayou Architecture: Support for Data Sharing among Mobile Users](http://www.dpi.ufv.br/~nacif/cmovel/bayou.pdf). Also good is [Managing update conflicts in Bayou, a weakly connected replicated storage system](http://www.cs.princeton.edu/courses/archive/fall11/cos518/papers/bayou.pdf). Definitely a more practically oriented paper. **Optimistic Replication** [PDF](http://www.ysaito.com/survey.pdf) [ACM](http://dl.acm.org/citation.cfm?id=1057980) is a great survey of similar techniques. More formal stuff: -
pbailis revised this gist
May 28, 2013 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,11 +11,11 @@ Practical databases: * **Consistency in Partitioned Networks** [PDF](http://delab.csd.auth.gr/~dimitris/courses/mpc_fall05/papers/invalidation/acm_csur85_partitioned_network_consistency.pdf) [ACM](http://dl.acm.org/citation.cfm?id=5508) A nice, practical discussion of techniques database systems can employ to ensure consistency under partitions. This survey predates CAP by several decades but is well-written and summarizes several important ideas. * **Megastore: Providing Scalable, Highly Available Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/papers/jbaker-megastore.pdf) Megastore gives a reasonable example of a Paxos-based database architecture. * **Consistency Tradeoffs in Modern Distributed Database System Design** [PDF](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf) [IEEE](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6127847) is a great paper from Daniel Abadi reminding us that, aside from behavior during failures, highly available ("AP") systems also achieve low latency. * There are many remnants of the **Bayou** project in many "AP" systems today. The project was aimed at disconnected operation in a proto-smartphone/mobile computing era; a good overview is [The Bayou Architecture: Support for Data Sharing among Mobile Users](http://www.dpi.ufv.br/~nacif/cmovel/bayou.pdf). Also good is [Managing update conflicts in Bayou, a weakly connected replicated storage system](http://www.cs.princeton.edu/courses/archive/fall11/cos518/papers/bayou.pdf). Definitely a more practically oriented paper. *Optimistic Replication* [PDF](http://www.ysaito.com/survey.pdf) [ACM](http://dl.acm.org/citation.cfm?id=1057980) is a great survey of similar techniques. More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * (Even better, **A short introduction to failure detectors for asynchronous distributed systems** [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806)) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,5 +16,6 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * (Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806)) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,4 +17,4 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) (Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806)) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,7 +16,5 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) (Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806)) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,4 +18,5 @@ More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,6 +17,5 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,6 +17,6 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 1 addition and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,7 +17,6 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) * Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis revised this gist
May 28, 2013 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,5 +17,7 @@ Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/pa More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) ** Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story) -
pbailis created this gist
May 28, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,21 @@ Context: I was asked for a list of interesting reading relating to "distributed databases, behavior under partitions and failures, failure detection." Here's what I came up with in about an hour. For textbooks, "Introduction to Reliable and Secure Distributed Programming" is a superb introduction to distributed computing from a formal perspective; it's really not about "programming" or "engineering" but about distributed system fundamentals like consensus, distributed registers, and broadcast. Used in Berkeley's [Distributed Computing course](http://www.cs.berkeley.edu/~alig/cs294-91/) (and [HT to @lalithsuresh](https://twitter.com/lalithsuresh/status/339089590159814656)) [Book Site](http://www.distributedprogramming.net/) Notes from courses like Lorenzo Alvisi's [Distributed Computing](http://www.cs.utexas.edu/users/lorenzo/corsi/cs371d/13S/notes/) class can be great. There are a bunch of classics on [causality](http://www.cs.princeton.edu/courses/archive/fall11/cos518/papers/time-clocks.pdf), [Paxos](http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf) (and more [practical takes on Paxos](http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)), and [distributed snapshots](http://ece842.com/S13/readings/chandy85.pdf). Practical databases: * **Consistency in Partitioned Networks** [PDF](http://delab.csd.auth.gr/~dimitris/courses/mpc_fall05/papers/invalidation/acm_csur85_partitioned_network_consistency.pdf) [ACM](http://dl.acm.org/citation.cfm?id=5508) A nice, practical discussion of techniques database systems can employ to ensure consistency under partitions. This survey predates CAP by several decades but is well-written and summarizes several important ideas. * **Megastore: Providing Scalable, Highly Available Storage for Interactive Services** [PDF](http://pdos.csail.mit.edu/6.824-2012/papers/jbaker-megastore.pdf) Megastore gives a reasonable example of a Paxos-based database architecture. * *Consistency Tradeoffs in Modern Distributed Database System Design* [PDF](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf) [IEEE](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6127847) is a great paper from Daniel Abadi reminding us that, aside from behavior during failures, highly available ("AP") systems also achieve low latency. * There are many remnants of the *Bayou* project in many "AP" systems today. The project was aimed at disconnected operation in a proto-smartphone/mobile computing era; a good overview is [The Bayou Architecture: Support for Data Sharing among Mobile Users](http://www.dpi.ufv.br/~nacif/cmovel/bayou.pdf). Also good is [Managing update conflicts in Bayou, a weakly connected replicated storage system](http://www.cs.princeton.edu/courses/archive/fall11/cos518/papers/bayou.pdf). Definitely a more practically oriented paper. *Optimistic Replication* [PDF](http://www.ysaito.com/survey.pdf) [ACM](http://dl.acm.org/citation.cfm?id=1057980) is a great survey of similar techniques. More formal stuff: * **Unreliable failure detectors for reliable distributed systems** [PDF](http://www.ecommons.cornell.edu/bitstream/1813/7192/1/95-1535.pdf) [ACM](http://dl.acm.org/citation.cfm?id=226647) A very theoretical but highly celebrated paper relating the problem of failure detection and consensus; together with **The Weakest Failure Detector for Solving Consensus** make for a great if tough tutorial on failure detectors (may be better off reading a textbook) [PDF](http://courses.csail.mit.edu/6.852/05/papers/jacm96.pdf) [ACM](http://dl.acm.org/citation.cfm?id=234549) ** Even better, *A short introduction to failure detectors for asynchronous distributed systems* [PS.GZ](ftp://ftp.irisa.fr/pub/techreports/2004/PI-1613.ps.gz) [ACM](http://dl.acm.org/citation.cfm?doid=1052796.1052806) * **The Byzantine Generals Problem** [PDF](http://www.eecs.harvard.edu/cs262/Readings/byzantineGenerals.pdf) [ACM](http://dl.acm.org/citation.cfm?id=357176) introduces the problem of byzantine fault tolerance, albeit in typical Lamport style (i.e., with a cute but sometimes distracting story)