1. *Feature Learning*
  * [Learning Feature Representations with K-means](http://www.cs.stanford.edu/~acoates/papers/coatesng_nntot2012.pdf) by Adam Coates and Andrew Y. Ng 
  * [The devil is in the details: an evaluation of recent feature encoding methods](http://www.robots.ox.ac.uk/~vgg/publications/2011/Chatfield11/chatfield11.pdf) by Chatfield et. al. 
  * [Emergence of Object-Selective Features in Unsupervised Feature Learning](http://web.stanford.edu/~acoates/papers/coateskarpathyng_nips2012.pdf) by Coates, Ng
  * [Scaling Learning Algorithms towards AI](http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-07.pdf) Benjio & LeCun

2. *Deep Neural Nets*
  * [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov
  * [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) by Xavier Glorot and Yoshua Bengio
  * [On the difficulty of training Recurrent Neural Networks](http://arxiv.org/pdf/1211.5063v2.pdf) by Razvan Pascanu, Tomas Mikolov and Yoshua Bengio
  * [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://arxiv.org/abs/1502.03167) by Sergey Ioffe and Christian Szegedy
  * [Deep Learning in Neural Networks: An Overview](http://arxiv.org/pdf/1404.7828v4.pdf) by Jurgen Schmidhuber
  * [Stochastic Gradient Descent Tricks](http://research.microsoft.com/pubs/192769/tricks-2012.pdf) by L´eon Bottou
  * [Qualitatively characterizing neural network optimization problems](http://arxiv.org/abs/1412.6544) by Ian J. Goodfellow, Oriol Vinyals