# General Machine Learning Questions

## Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
Convex: 
local min = global min
efficient solvers
strong theoretical guarantees
Examples of ML algorithms: 
- Linear regression/ Ridge regression, with Tikhonov regularisation
- Sparse linear regression with L1 regularisation, such as Lasso
- Support vector machines
- Parameter estimation in Linear-Gaussian time series (Kalman filter and friends)
    

Non-convex
- Multi local min
- Many solvers come from convex world
- Weak theoretical guarantees if any
Examples of ML algorithms: 
- Neural networks
- Maximum likelihood mixtures of Gaussians

## What is overfitting?
https://en.wikipedia.org/wiki/Overfitting

## Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.
https://www2.isye.gatech.edu/~tzhao80/Lectures/Lecture_6.pdf

## Describe the criterion for a particular model selection. Why is dimension reduction important?
http://www.stat.cmu.edu/tr/tr759/tr759.pdf

## What are the assumptions for logistic and linear regression?
- Linear regression: Linearity of residuals, Independence of residuals, Normal distribution of residuals, Equal variance of residuals. 
http://blog.uwgb.edu/bansalg/statistics-data-analytics/linear-regression/what-are-the-four-assumptions-of-linear-regression/
- Logistic regression: Dependent variable is binary, Observations are independent of each other, Little or no multicollinearity among the independent variables, Linearity of independent variables and log odds.
https://www.statisticssolutions.com/assumptions-of-logistic-regression/

## Compare Lasso and Ridge Regression.
https://blog.alexlenail.me/what-is-the-difference-between-ridge-regression-the-lasso-and-elasticnet-ec19c71c9028

## What’s the difference between MLE and MAP inference?
https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

## How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
- Explain why and pseudo-code: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html
- Distance metrics: Euclidean distance, Manhatan distance, 
https://pdfs.semanticscholar.org/a630/316f9c98839098747007753a9bb6d05f752e.pdf
- Explain normalization for K-means and different results you can have: https://www.edupristine.com/blog/k-means-algorithm

## How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.
https://towardsdatascience.com/2-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547

## What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”
https://cedar.buffalo.edu/~srihari/CSE574/Discriminative-Generative.pdf

## Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?
https://sebastianraschka.com/Articles/2014_about_feature_scaling.html

## How can you plot ROC curves for multiple classes? 
With macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html