A complete list of books, articles, blog posts, videos and neat pages that support Data Fundamentals (H), organised by Unit. # Formatting If the resource is available online (legally) I have included a link to it. Each entry has symbols following it. * ⨕⨕⨕ indicates difficulty/depth, from ⨕ (easy to pick up intro, no background required) through ⨕⨕⨕⨕⨕ (graduate level textbook, maths heavy, expect equations) * ⭐ indicates a particularly recommended resource; 🌟 is a **very strongly recommended** resource and you should look at it. # General * [SciPy lecture notes](http://www.scipy-lectures.org/) introduces all of the scientific Python infrastructure and lots of overlap with Data Fundamentals (H) ⨕⨕⨕ * [Learning AI if you suck at math](https://hackernoon.com/learning-ai-if-you-suck-at-math-8bdfb4b79037) ⨕⨕ # Mathematical notation * **Mathematical Notation: A Guide for Engineers and Scientists** by *Edward R. Scheinerman* covers all of the mathematical notation (and more) that we will use in a very concise form. ⨕ 🌟 * [Deep learning notation](https://github.com/omarsar/deep_learning_notations) covers much of the same terminology and symbols. ⨕⨕ * [Math As Code: A cheatsheet for Mathematical Notation](https://github.com/Jam3/math-as-code#pipes) A really nice explanation of mathematical notation in terms of simple code (in Javascript, but easily applicable) ⨕⨕ 🌟 Excerpt from **Math as Code**: --- The big Greek `Σ` (Sigma) is for [Summation](https://en.wikipedia.org/wiki/Summation). In other words: summing up some numbers. $$\sum_{i=1}^{100}i$$ Here, `i=1` says to start at `1` and end at the number above the Sigma, `100`. These are the lower and upper bounds, respectively. The *i* to the right of the "E" tells us what we are summing. In code: ```js var sum = 0 for (var i = 1; i <= 100; i++) { sum += i } ``` The result of `sum` is `5050`. --- # Python If you don't know any Python, you will need to learn some. You will need to know: * basic syntax: expressions and function calls * printing * lists * dictionaries * basic iteration (for, while) * functions, parameters * (maybe) list comprehensions You will not need to know: * classes * exceptions * file handling * or anything more advanced ## References * [Python cheat sheet](https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc.pdf) A quick reference card. * [learnxinyminutes Python 3](https://learnxinyminutes.com/docs/python3/) A very concise reference * [Python for data science cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf)A quick reference card with a data science focus. ⭐ * ["Think Python!" by Allen Downey](http://greenteapress.com/thinkpython2/thinkpython2.pdf) A full textbook on Python. Easy to read. * Try the online tutorials at [LearnPython](http://www.learnpython.org/) # Jupyter We'll be using Jupyter for everything in DF(H). While it's not hard to learn, there are some guides: * [Jupyter notebook video tutorial](https://www.youtube.com/watch?v=HW29067qVWk) * [Jupyter notebook tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) * [Keyboard shortcut cheatsheet](https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/pdf_bw/) # Cheat sheets and API references Quick references for getting stuck and coding things up. This covers NumPy and Matplotlib, the two key software libraries we use in DF(H). * [NumPy cheatsheet](https://github.com/juliangaal/python-cheat-sheet/blob/master/NumPy/NumPy.md) * [NumPy API reference](https://docs.scipy.org/doc/numpy-1.13.0/reference/) * [NumPy user guide](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.html) * [Python for Data Science cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf) * [Another NumPy Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf) * [Introduction to Matplotlib](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html) * [Matplotlib command summary](https://matplotlib.org/api/pyplot_summary.html) # Unit 1: Vectorized computation I * [From Python to Numpy](http://www.labri.fr/perso/nrougier/from-python-to-numpy/) ⨕⨕⨕⭐ * [100 numpy exercises](http://www.labri.fr/perso/nrougier/teaching/numpy.100/index.html) ⨕⨕⭐ * [NumPy tutorial](http://scipy.github.io/old-wiki/pages/Tentative_NumPy_Tutorial) ⨕⭐ * [Introduction to NumPy](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) ⨕⨕ * [Linear algebra cheat sheet](https://medium.com/towards-data-science/linear-algebra-cheat-sheet-for-deep-learning-cd67aba4526c#.739w4i3m1) ⨕⨕ * [101 NumPy Exercises for Data Analysis](https://www.machinelearningplus.com/python/101-numpy-exercises-python/) ⨕⨕ # Unit 2: Vectorized computation II ## Articles on floating point * [Floating point visually explained](http://fabiensanglard.net/floating_point_visually_explained/) ⨕ 🌟 * [A series of fascinating articles on the inner workings of floating point by a Google engineer](https://randomascii.wordpress.com/2012/01/11/tricks-with-the-floating-point-format/) ⨕⨕⨕⭐ * [Floating point numbers](http://pmihaylov.com/floating-point-numbers) ⨕ * [What Every Computer Scientist Should Know About Floating Point Numbers](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) ⨕⨕⨕ * [Demystifying floating point precision](https://blog.demofox.org/2017/11/21/floating-point-precision/) ⨕⨕ ## Advanced NumPy * [Advanced NumPy](http://www.scipy-lectures.org/advanced/advanced_numpy/) ⨕⨕⨕⨕ * [NumPy tricks](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html) and [Part I](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html) ⨕⨕⨕⨕⨕ # Unit 3: Visualisation ## Aesthetics * [An important GIF. Watch this](https://gfycat.com/ImprobableFemaleBasenji) ⨕ 🌟 * [Ten simple rules for better figures](http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003833&type=printable) (and the accomapnying [video](https://www.youtube.com/watch?v=p7Mj-4kASmI) (recommended; read this and watch the video) ⨕ 🌟 * [Graphics Principles Cheat Sheet](https://www.psiweb.org/docs/default-source/2018-psi-conference-posters/48-julie-jones.pdf?sfvrsn=cb68dedb_4) ⨕ 🌟 * [Grammar of graphics cheat sheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) ⨕⨕ ## Uncertainty * [The Hacker's Guide to uncertainty visualisation] (https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html) ⨕ 🌟 * [Understanding the Box plot](https://medium.com/@GalarnykMichael/understanding-boxplots-5e2df7bcbd51) Very thorough discussion of what Box plots are and how they should be used. ⨕⨕ ## Matplotlib * [Simple Matplotlib cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf) ⨕ * [Introduction to Matplotlib](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html) ⨕⨕ * [Matplotlib command summary](https://matplotlib.org/api/pyplot_summary.html) ⨕⨕ * [Extensive matplotlib cheatsheet](http://nbviewer.jupyter.org/urls/gist.github.com/Jwink3101/e6b57eba3beca4b05ec146d9e38fc839/raw/f486ca3dcad44c33fc4e7ddedc1f83b82c02b492/Matplotlib_Cheatsheet ⨕ ) ⨕⨕⨕ ### Visualisation * [Effective use of colour in scientific visualisation](https://www.slideshare.net/RileyXBrady/effective-use-of-color-in-scientific-visualization) ⨕ * [How to choose a chart for data](http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf) ⨕⨕ ### Example visualisations * [Randal Olson's blog](http://www.randalolson.com) has many, many examples of good visualization, mainly using Python for graph preparation. ⨕ ## Books * [Layered Grammar of Graphics](http://vita.had.co.nz/papers/layered-grammar.pdf) (long, but detailed) ⨕⨕⨕ * **The Grammar of Graphics,** *Leland Wilkinson*, Second ed. ⨕⨕⨕⨕ * **How to Lie with Statistics** *Darrel Huff* (short, easy to read, worth reading) ⨕⭐ * **Information Visualization: Perception for Design** *Colin Ware*: a serious book on advanced visualisations.⨕⨕⨕ * The "Tufte" books * **The Visual Display of Quantitative Information** by *Edward Tufte*⨕⨕⨕ * **Visual Explanations: Images and Quantities, Evidence and Narrative** by *Edward Tufte*⨕⨕⨕ * **Envisioning Information** by *Edward Tufte*⨕⨕⨕ # Unit 4: Computational Linear Algebra I ## Primers * **WATCH THIS** [3blue1brown Linear Algebra series (strongly recommended)](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) ⨕ 🌟 * [An introduction to linear algebra](https://jeremykun.com/2011/06/19/linear-algebra-a-primer/) *Jeremy Kun* ⨕⨕⨕⭐ * [A primer on inner product spaces](https://jeremykun.com/2011/07/25/inner-product-spaces-a-primer/) *Jeremy Kun* ⨕⨕⨕ ## High-dimensional spaces This can be mind-bending. Some further reading and viewing: #### Videos * [AI experiments: Visualizing high dimensional spaces](https://www.youtube.com/watch?v=wvsE8jm1GzE) ⨕⭐ * [3blue1Brown A Trick to Visualizing Higher Dimensions](https://youtu.be/zwAD6dRSVyI) ⨕⨕ 🌟 #### Texts * [High-dimensional spaces chapter](https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/chap1-high-dim-space.pdf) ⨕⨕⨕ * [Geometry in Very High Dimension](https://www.math.wustl.edu/~feres/highdim) ⨕⨕⨕⨕ * [On the Surprising Behavior of Distance Metrics in High Dimensional Space](https://bib.dbvis.de/uploadedFiles/155.pdf) ⨕⨕⨕⨕⨕ ## Books * [**Introduction to Applied Linear Algebra**](http://vmls-book.stanford.edu/vmls.pdf) freely available. *Stephen Boyd and Lieven Vandenberghe*⨕⨕⨕⭐ * **Coding the Matrix** *Phillip N. Klein* An excellent and thorough introduction to linear algebra through Python programming⨕⨕⨕ * **Linear Algebra Done Right**, *Sheldon Axler* a more pure mathematics perspective ⨕⨕⨕ # Unit 5: Computational Linear Algebra II ## Eigenvectors * [**A tutorial on principal components analysis**](https://arxiv.org/pdf/1404.1100.pdf) ⨕⨕⨕⭐ * [**Eigenvectors and eigenvalues**](http://setosa.io/ev/eigenvectors-and-eigenvalues/) ⨕⨕⭐ * [**An introduction to principal components and the geometric interpretation of the covariance matrix**](http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/) ⨕⨕⨕ ## Beyond the course * [**A tutorial on spectral graph theory and graph Laplacians**](https://csustan.csustan.edu/~tom/Clustering/GraphLaplacian-tutorial.pdf) ⨕⨕⨕⨕⨕ ## The SVD * [**Matrix decompositions**](http://nicolas-hug.com/blog/matrix_facto_1) ⨕⨕ 🌟 * [**A tutorial on the singular value decomposition**](https://blog.statsbot.co/singular-value-decomposition-tutorial-52c695315254) ⨕⨕⨕⭐ * [Toward an exploratory medium for mathematics](http://cognitivemedium.com/emm/emm.html) intuitive geometric explanation of the SVD⨕⨕⨕⭐ * [SVD part 1](https://jeremykun.com/2016/04/18/singular-value-decomposition-part-1-perspectives-on-linear-algebra/) *Jeremy Kun*⨕⨕⨕ * [SVD part 2](https://jeremykun.com/2016/05/16/singular-value-decomposition-part-2-theorem-proof-algorithm/) *Jeremy Kun*⨕⨕⨕ * [**The Singular Value Decomposition**](http://theory.stanford.edu/~tim/s15/l/l9.pdf) ⨕⨕⨕⨕ ### Books * [**The Matrix Cookbook**](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf) *Kaare Brandt Petersen and Michael Syskind Pedersen*. If you need to do a tricky calculation with matrices, this book will probably tell you how to do it.⨕⨕⨕⨕⨕# * **Introduction to Linear Algebra** *Gilbert Strang* The standard textbook on linear algebra⨕⨕⨕⨕ * **A First Course in Numerical Methods** *Uri M. Ascher and Chen Greif*⨕⨕⨕⨕ # Unit 6: Numerical Optimization I * [**On the Origin of Circuits**](https://www.damninteresting.com/on-the-origin-of-circuits/) covers genetic algorithms ⨕ ⭐ * [**Khan academy: Multivariable calculus**](https://www.khanacademy.org/math/multivariable-calculus), particularly "Thinking about multivariable functions", "Derivatives of multivariable functions" and "Applications of multivariable derivatives" * [**Why have Sex? Information Acquisition and Evolution**](http://www.inference.org.uk/mackay/itprnn/ps/265.280.pdf) ⨕⨕⨕⨕ ## Books * **When least is best: How Mathematicians Discovered Many Clever Ways to Make Things as Small (or as Large) as Possible** *by Paul J. Nahin* An interesting and mathematically thorough description of the history of optimisation from a mathematical standpoint.⨕⨕⨕⨕ * **The Blind Watchmaker** *Richard Dawkins* An excellent popular science book on how evolution (genetic algorithms in the wild) can work, including some early computer simulations. # Unit 7: Numerical Optimization II ## Gradient descent * [**Gradient based optimization**](https://exploringpirate.wordpress.com/2017/08/16/gradient-based-optimization/) ⨕⭐ * [**An overview of gradient descent optimization algorithms**](http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentvariants) ⨕⨕ * [**An introduction to algorithms for continuous optimization**](http://www.numerical.rl.ac.uk/people/nimg/course/lectures/paper/paper.pdf) by Nicholas Gould⨕⨕⨕⨕⨕ * [**What is backpropagation**](https://www.youtube.com/watch?v=Ilg3gGewQ5U) if you want more detail on how first-order optimisation is used in deep learning.⨕⨕ ## Automatic differentiation * [**How machines learn**](https://www.youtube.com/watch?v=IHZwWFHWa-w) *3blue1brown strikes again*⨕ 🌟 **Recommended: WATCH THIS** * **[Introduction to automatic differentiation](https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/) ⨕⨕⨕⭐ ## Pareto optimality * **[The best Mario Kart character according to data science](https://www.datasciencecentral.com/profiles/blogs/the-best-mario-kart-character-according-to-data-science-2)**⨕ 🌟 # Unit 8: Probability & Stochastics I --- ## Probability * [Probability by Peter Norvig](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb) ⨕⨕ 🌟 ## Bayesian thinking and Bayes' rule * [A visual guide to Bayesian thinking](https://www.youtube.com/watch?v=BrK7X_XlGB8) ⨕ 🌟 * [Veritasium explains Bayes' Theorem](https://www.youtube.com/watch?v=R13BD8qKeTg) ⨕ * [Count Bayesie's guide to Bayesian statistics](https://www.countbayesie.com/blog/2016/5/1/a-guide-to-bayesian-statistics) *A collection of very readable articles by Count Bayesie*⨕⨕⨕⭐ * [Video by author of Think Bayes](**Video by same author** https://www.youtube.com/watch?v=TpgiFIGXcT4) ⨕⨕ * [Khan Academy materials on probability](https://www.khanacademy.org/math/statistics-probability/probability-library) (more in depth than we cover, but high quality stuff) ⨕⨕⨕ ## Beyond the course These provide a formal basis for probability theory, if you feel more comfortable having a rigorous mathematical basis. These go way beyond the course. * [A formal introduction to probability for Scientists and Engineers](https://betanalpha.github.io/assets/case_studies/probability_theory.html) ⨕⨕⨕⨕⨕ * [Conditional Probability Theory (For Scientists and Engineers)](https://betanalpha.github.io/assets/case_studies/conditional_probability_theory.html) ⨕⨕⨕⨕⨕ ## Books * [Probability and statistics cookbook](http://statistics.zone/) ⨕⨕⨕ Like the Matrix Cookbook, this provides a dense, quick reference to many problems in statistics and probability. * **[Think Bayes](http://www.greenteapress.com/thinkbayes/thinkbayes.pdf)**, *Allen B. Downey* light, Python focused⨕⨕⭐ * **All of Statistics: A Concise Course in Statistical Inference** *Larry Wasserman* *Outstanding; the best of these books, but somewhat maths heavy.*⨕⨕⨕⨕⨕⭐ * [Chapters 2](http://www.inference.org.uk/mackay/itprnn/ps/22.40.pdf) [and 3](http://www.inference.org.uk/mackay/itprnn/ps/47.59.pdf) of Information Theory, Inference, and Learning Algorithms by David Mackay⨕⨕⨕⨕ * **A First Course in Probability ** *by Sheldon Ross* (standard textbook on probability) ⨕⨕⨕ ### Beyond the course * **[Probability theory: the logic of science](https://bayes.wustl.edu/etj/prob/book.pdf)** by *E. T. Jaynes* *an excellent but controversial and very technical book*⨕⨕⨕⨕⨕ * **[Information Theory, Inference and Learning Algorithms](http://www.inference.org.uk/itprnn/book.pdf)**, *David Mackay* *Also excellent and covers many interesting relation between probability, information and learning*⨕⨕⨕⨕⨕ * [**Introduction to statistical learning**](http://www-bcf.usc.edu/~gareth/ISL/) (outstanding introduction to statistical learning, including a book, video and course notes) ⨕⨕⨕⨕ # Unit 9: Probability & Stochastics II * [Why would I ever need Bayesian statistics](https://medium.com/@peadarcoyle/why-would-i-ever-need-bayesian-statistics-4cf844c4a23a) ⨕⨕ * [MCMC for dummies](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) ⨕⨕⨕⭐ * [Bayesian Linear Regression](https://www.chrisstucchio.com/blog/2017/bayesian_linear_regression.html) This is what the example above is based on.⨕⨕⨕⨕ * [The Non-parametric Bootstrap as a Bayesian Model](http://www.sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/) How the bootstrap can be seen as Bayesian model⨕⨕⨕⨕ ## Books * [Bayesian methods for Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) a full "book" on Bayesian methods and inference ⨕⨕⨕⨕⭐ # Unit 10: Digital Signals and Time Series * [Sampling, Quantization and Encoding](http://kaushanim.blogspot.co.uk/2010/03/sampling-quantization-encoding.html) (short introduction to sampling and quantization) ⨕⨕⭐ * [DSP for the Braindead](http://yehar.com/blog/?p=121) (not actually for the braindead, in fact much more advanced than we cover here!) ⨕⨕⨕ ## Books * **The Scientist and Engineer's Guide to Signal Processing** http://dspguide.com/ (free, online book) ⨕⨕⨕ * **Digital Signal Processing, A Computer Science Perspective**, *Jonathan (Y) Stein* A great introduction for CS students, but fantastically expensive. * Handbook of Mathematics for engineers and scientists * Mathematical Notation: A Guide for Engineers and Scientists * Book of Proof ## Misc Links * This Hacker News [post](https://news.ycombinator.com/item?id=18510528) * A guide to writing mathematics [link](https://personal.math.ubc.ca/~cytryn/teaching/scienceOneF10W11/handouts/OS.writingMathCommented.pdf)