Let's start meta: 1. Lamport - [State the Problem Before Describing the Solution](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/State-the-Problem-Before-Describing-the-Solution.pdf) (1978). … 1-page memo. Read it. 2. Herlihy - [Wait-free synchronization](http://www.diku.dk/OLD/undervisning/2005f/dat-os/skrifter/lockfree.pdf) (1991) … Truly seminal. Lucid + enough good ideas for 4 papers easily. 3. Cook - [How complex systems fail](https://www.researchgate.net/profile/Richard_Cook3/publication/228797158_How_complex_systems_fail/links/0c96053410db96a89c000000.pdf) (1998) … 4 pages that *anyone* working on/with complex systems should read. 4. Moffat, Turpin - [On the Implementation of Minimum Redundancy Prefix Codes](https://pdfs.semanticscholar.org/bda3/442cc6b1d10e4b36b574af0a34a668492230.pdf) (1997) … Much has been written about Huffman coding, a lot of it wrong. Almost everything worth knowing is in this (short!) paper. 5. a.Dybvig, Hieb, Butler - [Destination-Driven Code Generation](https://pdfs.semanticscholar.org/dcb8/8719880e1f76ad71fb1c5aebb118e2ecfe71.pdf)_(1990) … A simple, beautiful way to improve code gen forfast one-pass compilers. b. Milliken - [One-pass Code Generation in V8"](http://cs.au.dk/~mis/dOvs/slides/46b-codegeneration-in-V8.pdf) (year?) … makes a great companion piece. 6. Valmari - [Fast brief practical DFA minimization](http://www.cs.cmu.edu/~./cdm/pdf/Valmari12.pdf) (2012) … Takes a classic algorithm by Hopcroft that textbooks struggled to explain (or mostly omitted entirely) for decades, and simplifies it down to a (not particularly technical) 5-page paper. 7. Sarnak, Tarjan - [Planar Point Location Using Persistent Search Trees](http://users.cs.fiu.edu/~giri/teach/6405/s04/SarnakTarjanCACM86.pdf) (1986) … an interesting problem in its own right, and this paper solves it by introducing persistent search trees made efficient using a *beautiful* idea (section 3!). 8. Porter, Duff - [Compositing Digital Images](https://keithp.com/~keithp/porterduff/p253-porter.pdf) (1984) … Obligatory mention. Every half-year or so, another blog post crosses my timeline where someone excitedly sings the praises of premultiplied alpha. We need to get better about teaching this. 9. Brandt - [Hard Sync Without Aliasing](http://elthariel.free.fr/ebooks/icmc01-hardsync.pdf) (2001) … Introducing MinBLEPs for bandlimited synthesis. Such an awesome idea! 10. Veach - [Robust Monte Carlo Methods for Light Transport Simulation](http://graphics.stanford.edu/papers/veach_thesis/) (PhD thesis, 1997) … wherein he invents a big chunk of modern light transport sim. A tour de force. 11. Goto, van de Geijn-[Anatomy of high-performance matrix multiplication](http://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf) (2008) … - how a high-perf GEMM works. 12. Chazelle-[Filtering search: a new approach to query-answering](https://www.cs.princeton.edu/~chazelle/pubs/FilteringSearch.pdf) (1986) … - balancing preprocessing and query time, as applied to several geometric search problems. Not exactly easy reading but worth your time! 13. McIllroy-[A Killer Adversary for Quicksort](https://pdfs.semanticscholar.org/dd00/6b6383bd512c7645d8408fa6c97875c27318.pdf) (1999) … - how to do adversarial testing right. Worth studying! 14. Knuth-[Structured Programming with go to Statements](https://sbel.wisc.edu/Courses/ME964/Literature/knuthProgramming1974.pdf) (1974) … - oft-quoted, rarely read. It may surprise you. 15. Bryant-[Graph-Based Algorithms for Boolean Function Manipulation](http://mtv.ece.ucsb.edu/courses/ece156B_14/randy_obdd86.pdf) (1986) … - introducing ROBDDs and kickstarting a revolution in formal ASIC verification in the process. 16. Donoghue et al.-[Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding](http://stanford.edu/~boyd/papers/scs.html) (2016) … a great example of good theory making for an elegant, practical algorithm. 17. Braun et al.-[Simple and Efficient Construction of Static Single Assignment Form](http://pp.ipd.kit.edu/uploads/publikationen/braun13cc.pdf) (2013) … There are many SSA construction algorithms; this is the slickest one I know. 18. Lamport, Palais-[On the Glitch Phenomenon](http://lamport.azurewebsites.net/pubs/pubs.html#glitch) (1976) … - arbitration is hard. Read Lamport's notes on it too! 19. Curtsinger, Berger-[Stabilizer: Statistically sound performance evaluation](http://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf) (2013) - … on how the ubiquity of address-indexed caches distorts program performance measurements, and what to do about it. 20. Tomasulo-[An efficient algorihm for exploiting multiple arithmetic units](http://www.ece.umd.edu/class/enee646.F2014/Tomasulo.PDF) (1967) … - to my knowledged, the first Out of Order execution implementation that actually shipped - and applied to a small enough problem that it's easy to understand. 21. Wilson, Johnstone, Neely, Boles - [Dynamic storage allocation: A survey and critical review](http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15213-f98/doc/dsa.pdf) (1995) … Knuth got this one wrong. 22. Meyer, Tischer-[GLICBAWLS](http://www.academia.edu/download/40157928/Glicbawls_-_Grey_Level_Image_Compression20151118-23218-qu4fld.pdf) (2001) … Two Aussies make a dirty joke for an IOCCC entry, accidentally develop a fairly competitive lossless image coder in the process. It happens. 23. Hutton et al.-[Improving FPGA Performance and Area Using an Adaptive Logic Module](http://www2.engr.arizona.edu/~ece506/readings/project-reading/6-cad/altera-alm.pdf) (2004) … Mainly geekery for me; feel free to skip this one if you're not interested in FPGA arch, I just thought it was really cool when I first saw it! (2010 or so) 24. O'Neill-[PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation](http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf) (2014) … Introduces a new, pretty cool RNG and makes for a great introduction to the topic in general. 25. Cook, Podelski, Rybalchenko-[Termination Proofs for Systems Code](http://homedirs.ccs.neu.edu/wahl/Teaching/Software-Model-Checking/2016-Spring/Papers/termination.pdf) … The Halting Problem: SOLVED!* *for a limited but very practically relevant class of programs; e.g. device drivers. 26. O'Neill - [The Genuine Sieve of Eratosthenes](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) (2009) … The standard example implementation as a pure functional program is not, in fact, the same algorithm, and it's very instructive to see how it fails. 27. Qureshi, Jaleel et al. - [Adaptive Insertion Policies for High Performance Caching](http://web.ece.ucdavis.edu/~anhttran/files/download/caching/ISCA-2007-Qureshi-SetDuelingControl.pdf) (2007) … - this is a HW paper, but the design principles here have been a great inspiration for me in designing adaptive algs in SW. 28. Thompson-[Reflections on Trusting Trust](https://www.cs.colorado.edu/~jrblack/class/csci6268/s14/p761-thompson.pdf) (1984 Turing Award lecture) … a classic. If you haven't read it, do so! 29. Dijkstra, Lamport et. al-[On-the-fly Garbage Collection: an Exercise in Cooperation](http://lamport.azurewebsites.net/pubs/garbage.pdf) (1978) … introducing 30. Lamport-[Multiple Byte Processing with Full-Word Instructions](http://lamport.azurewebsites.net/pubs/multiple-byte.pdf) (1975) … I thought I wouldn't need to do this anymore when we got byte-wise SIMD instrs on CPUs, and then CUDA came along and it was suddenly profitable on GPUs.