# General Machine Learning Questions ## Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex? Convex: local min = global min efficient solvers strong theoretical guarantees Examples of ML algorithms: - Linear regression/ Ridge regression, with Tikhonov regularisation - Sparse linear regression with L1 regularisation, such as Lasso - Support vector machines - Parameter estimation in Linear-Gaussian time series (Kalman filter and friends) Non-convex - Multi local min - Many solvers come from convex world - Weak theoretical guarantees if any Examples of ML algorithms: - Neural networks - Maximum likelihood mixtures of Gaussians ## What is overfitting? https://en.wikipedia.org/wiki/Overfitting ## Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages. https://www2.isye.gatech.edu/~tzhao80/Lectures/Lecture_6.pdf ## Describe the criterion for a particular model selection. Why is dimension reduction important? http://www.stat.cmu.edu/tr/tr759/tr759.pdf ## What are the assumptions for logistic and linear regression? - Linear regression: Linearity of residuals, Independence of residuals, Normal distribution of residuals, Equal variance of residuals. http://blog.uwgb.edu/bansalg/statistics-data-analytics/linear-regression/what-are-the-four-assumptions-of-linear-regression/ - Logistic regression: Dependent variable is binary, Observations are independent of each other, Little or no multicollinearity among the independent variables, Linearity of independent variables and log odds. https://www.statisticssolutions.com/assumptions-of-logistic-regression/ ## Compare Lasso and Ridge Regression. https://blog.alexlenail.me/what-is-the-difference-between-ridge-regression-the-lasso-and-elasticnet-ec19c71c9028 ## What’s the difference between MLE and MAP inference? https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/ ## How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range? - Explain why and pseudo-code: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html - Distance metrics: Euclidean distance, Manhatan distance, https://pdfs.semanticscholar.org/a630/316f9c98839098747007753a9bb6d05f752e.pdf - Explain normalization for K-means and different results you can have: https://www.edupristine.com/blog/k-means-algorithm ## How many topic modeling techniques do you know of? Formulate LSI and LDA techniques. https://towardsdatascience.com/2-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547 ## What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?” https://cedar.buffalo.edu/~srihari/CSE574/Discriminative-Generative.pdf ## Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling? https://sebastianraschka.com/Articles/2014_about_feature_scaling.html ## How can you plot ROC curves for multiple classes? With macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html