- Feature Learning
 
- Learning Feature Representations with K-means by Adam Coates and Andrew Y. Ng
 - The devil is in the details: an evaluation of recent feature encoding methods by Chatfield et. al.
 - Emergence of Object-Selective Features in Unsupervised Feature Learning by Coates, Ng
 - Scaling Learning Algorithms towards AI Benjio & LeCun
 
- Deep Neural Nets
 
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov
 - Understanding the difficulty of training deep feedforward neural networks by Xavier Glorot and Yoshua Bengio
 - On the difficulty of training Recurrent Neural Networks by Razvan Pascanu, Tomas Mikolov and Yoshua Bengio
 - Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift by Sergey Ioffe and Christian Szegedy
 - Deep Learning in Neural Networks: An Overview by Jurgen Schmidhuber
 - Qualitatively characterizing neural network optimization problems by Ian J. Goodfellow, Oriol Vinyals
 - On Recurrent and Deep Neural Networks Phd thesis of Razvan Pascanu
 - Scaling Learning Algorithms towards AI by Yann LeCun and Yoshua Benjio
 - Efficient Backprop by LeCun, Bottou et al
 - Towards Biologically Plausible Deep Learning by Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Zhouhan Lin
 - Training Recurrent Neural Networks Phd thesis of Ilya Sutskever
 
- Scalable Machine Learning
 
- Bring the Noise: Embracing Randomness is the Key to Scaling Up Machine Learning Algorithms by Brian Delssandro
 - Large Scale Machine Learning with Stochastic Gradient Descent by Leon Bottou
 - The TradeOffs of Large Scale Learning by Leon Bottou & Olivier Bousquet
 - Hash Kernels for Structured Data by Qinfeng Shi et. al.
 - Feature Hashing for Large Scale Multitask Learning by Weinberger et. al.
 - Large-Scale Learning with Less RAM via Randomization by a group of authors from Google
 
- Gradient based Training
 
- Practical Recommendations for Gradient-Based Training of Deep Architectures by Yoshua Bengio
 - Stochastic Gradient Descent Tricks by L´eon Bottou
 
- Non Linear Units
 
- Rectified Linear Units Improve Restricted Boltzmann Machines by Nair & Hinton
 - Mathematical Intuition for Performance of Rectified Linear Unit in Deep Neural Networks by Alexandre Dalyec
 
- Interesting blog posts