Skip to content

Instantly share code, notes, and snippets.

@pb8294
Forked from felipemoraes/0.useful.md
Created July 7, 2021 20:28
Show Gist options
  • Save pb8294/4f3e2f20c0c00bd9efd087a02a17284d to your computer and use it in GitHub Desktop.
Save pb8294/4f3e2f20c0c00bd9efd087a02a17284d to your computer and use it in GitHub Desktop.

Revisions

  1. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 2 changed files with 15 additions and 8 deletions.
    9 changes: 2 additions & 7 deletions 4.data-science-prob-questions.md
    Original file line number Diff line number Diff line change
    @@ -15,13 +15,8 @@ All the problems that can happen with overfitting.

    # Math and Probability

    ## Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
    https://www.youtube.com/watch?v=UzeL2GcLx3Y

    ## How do you weight 9 marbles three times on a balance scale to select the heaviest one?
    https://mattgadient.com/2013/02/03/9-marbles-and-a-weight-balance-which-is-the-heaviest-one/

    ## Estimate the disease probability in one city given the probability is very low nationwide.
    https://medium.com/acing-ai/interview-guide-to-probability-distributions-a6dfb08c3766

    ## Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    ## Estimate the disease probability in one city given the probability is very low nationwide. Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    https://medium.com/acing-ai/interview-guide-to-probability-distributions-a6dfb08c3766
    14 changes: 13 additions & 1 deletion 5.programming-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,8 @@
    # Programming Questions

    ## Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
    https://www.geeksforgeeks.org/trapping-rain-water/

    ## Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
    Solution: heap that keeps and updates the most profitable products.

    @@ -10,4 +13,13 @@ https://www.geeksforgeeks.org/circular-queue-set-1-introduction-array-implementa
    Grep like solution, careful with overflow!

    ## Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
    https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/
    https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/

    ## Given two strings, print all the inter-leavings of the Strings in which characters from two strings should be in same order as they were in original strings.

    e.g.
    for "abc", "de", print all of these:
    adebc, abdec, adbce, deabc, dabce, etc, etc

    https://gist.github.com/geraldyeo/6c4eaea8a1a6bcc480cac5328cbff664

  2. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 0.useful.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ## Useful materials:
    # Useful materials:

    - Machine Learning cheatsheet: https://stanford.edu/~shervine/teaching/cs-229.html
    - [Pattern Recognition and Machine Learning Book](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf)
  3. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 3 changed files with 17 additions and 1 deletion.
    4 changes: 4 additions & 0 deletions 0.useful.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    ## Useful materials:

    - Machine Learning cheatsheet: https://stanford.edu/~shervine/teaching/cs-229.html
    - [Pattern Recognition and Machine Learning Book](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf)
    8 changes: 8 additions & 0 deletions 4.data-science-prob-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,19 +1,27 @@
    # Data Science in Production

    ## When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?
    Many possible answers here, mine: you sample a N large enough to reduce uncertainty over the large data, then you compare with a statistical test.
    https://www.sas.upenn.edu/~fdiebold/Teaching104/Ch14_slides.pdf

    ## When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?
    A sequential machine learning algorithm where you manage to keep the state of the user and predict his/her next action. Here many options are possible HMM, RNN, Bandits.

    ## When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?
    You sample by position making them a uniform distribution.

    ## If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?
    All the problems that can happen with overfitting.

    # Math and Probability

    ## Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
    https://www.youtube.com/watch?v=UzeL2GcLx3Y

    ## How do you weight 9 marbles three times on a balance scale to select the heaviest one?
    https://mattgadient.com/2013/02/03/9-marbles-and-a-weight-balance-which-is-the-heaviest-one/

    ## Estimate the disease probability in one city given the probability is very low nationwide.
    https://medium.com/acing-ai/interview-guide-to-probability-distributions-a6dfb08c3766

    ## Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    6 changes: 5 additions & 1 deletion 5.programming-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,13 @@
    # Programming Questions

    ## Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
    Solution: heap that keeps and updates the most profitable products.

    ## Implement circular queue using an array.
    https://www.geeksforgeeks.org/circular-queue-set-1-introduction-array-implementation/

    ## Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.
    Grep like solution, careful with overflow!

    ## Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
    ## Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
    https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/
  4. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 4 changed files with 28 additions and 47 deletions.
    17 changes: 4 additions & 13 deletions 2.neural-networks-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,31 +1,22 @@
    # Neural Networks

    - Is random weight assignment better than assigning same weights to the units in the hidden layer?

    ## Is random weight assignment better than assigning same weights to the units in the hidden layer?
    Because of the symmetry problem, all the units will get the same values during the forward propagation. This also will bias you to a specific local minima.

    https://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers

    https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

    - Why is gradient checking important?

    ## Why is gradient checking important?
    Gradient checking can help to find bugs in a backpropagation implementation, it is done by comparing the analytical gradient and the numerical gradient computed with calculus.

    https://stackoverflow.com/questions/47506521/what-exactly-is-gradient-checking

    http://cs231n.github.io/optimization-1/

    - What is the loss function in a NN?

    ## What is the loss function in a NN?
    The loss function depends on the type of problem:
    Regression: Mean squared error
    Binary classification: Binary cross entropy
    Multiclass: Cross entropy
    Ranking: Hinge loss

    - There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?

    ## There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?
    It can be either the weight transfer from the input layer to the hidden layer for that neuron is to be blamed or the activation function for the neuron should be changed.

    https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
    34 changes: 12 additions & 22 deletions 3.svm-logr-questions.md → 3.svm-logr-em.questions.md
    Original file line number Diff line number Diff line change
    @@ -1,52 +1,42 @@
    # SVM and Log Regression (Log R)

    - Difference between SVM and Log R?

    ## Difference between SVM and Log R?
    http://www.cs.toronto.edu/~kswersky/wp-content/uploads/svm_vs_lr.pdf

    - What does LogR give ?
    ## What does LogR give ?
    Posterior probability (P(y|x))


    - Does SVM give any probabilistic output?

    ## Does SVM give any probabilistic output?
    http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf


    - What are the support vectors in SVM?
    ## What are the support vectors in SVM?
    The vectors that define the hyperplane (margin) of SVM.


    - Evaluation of LogR?
    ## Evaluation of LogR?
    You can use any evaluation metric such as Precision, Recall, AUC, F1.

    - How does a logistic regression model know what the coefficients are?
    ## How does a logistic regression model know what the coefficients are?
    http://www-hsc.usc.edu/~eckel/biostat2/notes/notes14.pdf

    # Expectation-Maximization

    - How's EM done?

    ## How's EM done?
    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique

    - How are the params updated?

    ## How are the params of EM updated?
    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique


    - When doing an EM for GMM, how do you find the mixture weights?
    ## When doing an EM for GMM, how do you find the mixture weights?
    I replied that for 2 Gaussians, the prior or the mixture weight can be assumed to be a Bernouli distribution.
    http://www.aishack.in/tutorials/expectation-maximization-gaussian-mixture-model-mixtures/

    - If x ~ N(0,1), what does 2x follow?
    ## If x ~ N(0,1), what does 2x follow?
    N(0,2)
    https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables


    - How would you sample for a GMM?

    ## How would you sample for a GMM?
    http://www.robots.ox.ac.uk/~fwood/teaching/C19_hilary_2013_2014/gmm.pdf

    - How to sample from a Normal Distribution with known mean and variance?

    ## How to sample from a Normal Distribution with known mean and variance?
    https://stats.stackexchange.com/questions/16334/how-to-sample-from-a-normal-distribution-with-known-mean-and-variance-using-a-co
    16 changes: 8 additions & 8 deletions 4.data-science-prob-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,19 +1,19 @@
    # Data Science in Production

    - When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?
    ## When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?

    - When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?
    ## When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?

    - When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?
    ## When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?

    - If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?
    ## If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?

    # Math and Probability

    - Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
    ## Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?

    - How do you weight 9 marbles three times on a balance scale to select the heaviest one?
    ## How do you weight 9 marbles three times on a balance scale to select the heaviest one?

    - Estimate the disease probability in one city given the probability is very low nationwide.
    ## Estimate the disease probability in one city given the probability is very low nationwide.

    - Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    ## Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    8 changes: 4 additions & 4 deletions 5.programming-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,9 @@
    # Programming Questions

    - Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
    ## Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.

    - Implement circular queue using an array.
    ## Implement circular queue using an array.

    - Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.
    ## Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.

    - Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
    ## Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
  5. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 5 changed files with 0 additions and 0 deletions.
    File renamed without changes.
    File renamed without changes.
  6. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 1 changed file with 26 additions and 15 deletions.
    41 changes: 26 additions & 15 deletions general-ml-questions.md
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,6 @@
    # General Machine Learning Questions

    ## Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?

    Convex:
    local min = global min
    efficient solvers
    @@ -22,29 +21,41 @@ Examples of ML algorithms:
    - Maximum likelihood mixtures of Gaussians

    ## What is overfitting?

    https://en.wikipedia.org/wiki/Overfitting

    - Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.
    ## Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.
    https://www2.isye.gatech.edu/~tzhao80/Lectures/Lecture_6.pdf

    - Describe the criterion for a particular model selection. Why is dimension reduction important?
    ## Describe the criterion for a particular model selection. Why is dimension reduction important?
    http://www.stat.cmu.edu/tr/tr759/tr759.pdf

    - What are the assumptions for logistic and linear regression?
    ## What are the assumptions for logistic and linear regression?
    - Linear regression: Linearity of residuals, Independence of residuals, Normal distribution of residuals, Equal variance of residuals.
    http://blog.uwgb.edu/bansalg/statistics-data-analytics/linear-regression/what-are-the-four-assumptions-of-linear-regression/
    - Logistic regression: Dependent variable is binary, Observations are independent of each other, Little or no multicollinearity among the independent variables, Linearity of independent variables and log odds.
    https://www.statisticssolutions.com/assumptions-of-logistic-regression/

    - Compare Lasso and Ridge Regression.

    - What’s the difference between MLE and MAP inference?
    ## Compare Lasso and Ridge Regression.
    https://blog.alexlenail.me/what-is-the-difference-between-ridge-regression-the-lasso-and-elasticnet-ec19c71c9028

    ## What’s the difference between MLE and MAP inference?
    https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

    - How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?

    - How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.
    ## How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
    - Explain why and pseudo-code: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html
    - Distance metrics: Euclidean distance, Manhatan distance,
    https://pdfs.semanticscholar.org/a630/316f9c98839098747007753a9bb6d05f752e.pdf
    - Explain normalization for K-means and different results you can have: https://www.edupristine.com/blog/k-means-algorithm

    - What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”
    ## How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.
    https://towardsdatascience.com/2-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547

    - Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?
    ## What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”
    https://cedar.buffalo.edu/~srihari/CSE574/Discriminative-Generative.pdf

    - How can you plot ROC curves for multiple classes?
    ## Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?
    https://sebastianraschka.com/Articles/2014_about_feature_scaling.html

    There is something called as a macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K
    ## How can you plot ROC curves for multiple classes?
    With macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K
    https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
  7. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 4 changed files with 111 additions and 0 deletions.
    19 changes: 19 additions & 0 deletions data-science-prob-questions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    # Data Science in Production

    - When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?

    - When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?

    - When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?

    - If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?

    # Math and Probability

    - Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?

    - How do you weight 9 marbles three times on a balance scale to select the heaviest one?

    - Estimate the disease probability in one city given the probability is very low nationwide.

    - Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
    31 changes: 31 additions & 0 deletions neural-networks-questions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,31 @@
    # Neural Networks

    - Is random weight assignment better than assigning same weights to the units in the hidden layer?

    Because of the symmetry problem, all the units will get the same values during the forward propagation. This also will bias you to a specific local minima.

    https://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers

    https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

    - Why is gradient checking important?

    Gradient checking can help to find bugs in a backpropagation implementation, it is done by comparing the analytical gradient and the numerical gradient computed with calculus.

    https://stackoverflow.com/questions/47506521/what-exactly-is-gradient-checking

    http://cs231n.github.io/optimization-1/

    - What is the loss function in a NN?

    The loss function depends on the type of problem:
    Regression: Mean squared error
    Binary classification: Binary cross entropy
    Multiclass: Cross entropy
    Ranking: Hinge loss

    - There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?

    It can be either the weight transfer from the input layer to the hidden layer for that neuron is to be blamed or the activation function for the neuron should be changed.

    https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
    9 changes: 9 additions & 0 deletions programming-questions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,9 @@
    # Programming Questions

    - Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.

    - Implement circular queue using an array.

    - Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.

    - Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
    52 changes: 52 additions & 0 deletions svm-logr-questions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,52 @@
    # SVM and Log Regression (Log R)

    - Difference between SVM and Log R?

    http://www.cs.toronto.edu/~kswersky/wp-content/uploads/svm_vs_lr.pdf

    - What does LogR give ?
    Posterior probability (P(y|x))


    - Does SVM give any probabilistic output?

    http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf


    - What are the support vectors in SVM?
    The vectors that define the hyperplane (margin) of SVM.


    - Evaluation of LogR?
    You can use any evaluation metric such as Precision, Recall, AUC, F1.

    - How does a logistic regression model know what the coefficients are?
    http://www-hsc.usc.edu/~eckel/biostat2/notes/notes14.pdf

    # Expectation-Maximization

    - How's EM done?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique

    - How are the params updated?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique


    - When doing an EM for GMM, how do you find the mixture weights?
    I replied that for 2 Gaussians, the prior or the mixture weight can be assumed to be a Bernouli distribution.
    http://www.aishack.in/tutorials/expectation-maximization-gaussian-mixture-model-mixtures/

    - If x ~ N(0,1), what does 2x follow?
    N(0,2)
    https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables


    - How would you sample for a GMM?

    http://www.robots.ox.ac.uk/~fwood/teaching/C19_hilary_2013_2014/gmm.pdf

    - How to sample from a Normal Distribution with known mean and variance?

    https://stats.stackexchange.com/questions/16334/how-to-sample-from-a-normal-distribution-with-known-mean-and-variance-using-a-co
  8. @felipemoraes felipemoraes revised this gist Dec 4, 2018. 2 changed files with 50 additions and 146 deletions.
    50 changes: 50 additions & 0 deletions general-ml-questions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,50 @@
    # General Machine Learning Questions

    ## Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?

    Convex:
    local min = global min
    efficient solvers
    strong theoretical guarantees
    Examples of ML algorithms:
    - Linear regression/ Ridge regression, with Tikhonov regularisation
    - Sparse linear regression with L1 regularisation, such as Lasso
    - Support vector machines
    - Parameter estimation in Linear-Gaussian time series (Kalman filter and friends)


    Non-convex
    - Multi local min
    - Many solvers come from convex world
    - Weak theoretical guarantees if any
    Examples of ML algorithms:
    - Neural networks
    - Maximum likelihood mixtures of Gaussians

    ## What is overfitting?

    https://en.wikipedia.org/wiki/Overfitting

    - Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.

    - Describe the criterion for a particular model selection. Why is dimension reduction important?

    - What are the assumptions for logistic and linear regression?

    - Compare Lasso and Ridge Regression.

    - What’s the difference between MLE and MAP inference?

    https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

    - How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?

    - How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.

    - What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”

    - Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?

    - How can you plot ROC curves for multiple classes?

    There is something called as a macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K
    146 changes: 0 additions & 146 deletions ml-interview-review.md
    Original file line number Diff line number Diff line change
    @@ -1,146 +0,0 @@
    # General Machine Learning Questions

    - Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?

    - What is overfitting?

    - Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.

    - Describe the criterion for a particular model selection. Why is dimension reduction important?

    - What are the assumptions for logistic and linear regression?

    - Compare Lasso and Ridge Regression.

    - What’s the difference between MLE and MAP inference?

    https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

    - How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?

    - How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.

    - What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”

    - Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?

    - How can you plot ROC curves for multiple classes?

    There is something called as a macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K

    # Neural Networks

    - Is random weight assignment better than assigning same weights to the units in the hidden layer?

    Because of the symmetry problem, all the units will get the same values during the forward propagation. This also will bias you to a specific local minima.

    https://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers

    https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

    - Why is gradient checking important?

    Gradient checking can help to find bugs in a backpropagation implementation, it is done by comparing the analytical gradient and the numerical gradient computed with calculus.

    https://stackoverflow.com/questions/47506521/what-exactly-is-gradient-checking

    http://cs231n.github.io/optimization-1/

    - What is the loss function in a NN?

    The loss function depends on the type of problem:
    Regression: Mean squared error
    Binary classification: Binary cross entropy
    Multiclass: Cross entropy
    Ranking: Hinge loss

    - There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?

    It can be either the weight transfer from the input layer to the hidden layer for that neuron is to be blamed or the activation function for the neuron should be changed.

    https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

    # SVM and Log Regression (Log R)

    - Difference between SVM and Log R?

    http://www.cs.toronto.edu/~kswersky/wp-content/uploads/svm_vs_lr.pdf



    - What does LogR give ?
    Posterior probability (P(y|x))


    - Does SVM give any probabilistic output?

    http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf


    - What are the support vectors in SVM?
    The vectors that define the hyperplane (margin) of SVM.


    - Evaluation of LogR?
    You can use any evaluation metric such as Precision, Recall, AUC, F1.

    - How does a logistic regression model know what the coefficients are?
    http://www-hsc.usc.edu/~eckel/biostat2/notes/notes14.pdf

    # Expectation-Maximization

    - How's EM done?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique

    - How are the params updated?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique


    - When doing an EM for GMM, how do you find the mixture weights?
    I replied that for 2 Gaussians, the prior or the mixture weight can be assumed to be a Bernouli distribution.
    http://www.aishack.in/tutorials/expectation-maximization-gaussian-mixture-model-mixtures/

    - If x ~ N(0,1), what does 2x follow?
    N(0,2)
    https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables


    - How would you sample for a GMM?

    http://www.robots.ox.ac.uk/~fwood/teaching/C19_hilary_2013_2014/gmm.pdf

    - How to sample from a Normal Distribution with known mean and variance?

    https://stats.stackexchange.com/questions/16334/how-to-sample-from-a-normal-distribution-with-known-mean-and-variance-using-a-co

    # Data Science in Production

    - When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?

    - When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?

    - When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?

    - If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?

    # Math and Probability

    - Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?

    - How do you weight 9 marbles three times on a balance scale to select the heaviest one?

    - Estimate the disease probability in one city given the probability is very low nationwide.

    - Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?

    # Programming Questions

    - Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.

    - Implement circular queue using an array.

    - Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.

    - Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
  9. @felipemoraes felipemoraes renamed this gist Dec 4, 2018. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  10. @felipemoraes felipemoraes created this gist Dec 4, 2018.
    146 changes: 146 additions & 0 deletions ml-interview-review
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,146 @@
    # General Machine Learning Questions

    - Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?

    - What is overfitting?

    - Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.

    - Describe the criterion for a particular model selection. Why is dimension reduction important?

    - What are the assumptions for logistic and linear regression?

    - Compare Lasso and Ridge Regression.

    - What’s the difference between MLE and MAP inference?

    https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

    - How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?

    - How many topic modeling techniques do you know of? Formulate LSI and LDA techniques.

    - What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”

    - Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?

    - How can you plot ROC curves for multiple classes?

    There is something called as a macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K

    # Neural Networks

    - Is random weight assignment better than assigning same weights to the units in the hidden layer?

    Because of the symmetry problem, all the units will get the same values during the forward propagation. This also will bias you to a specific local minima.

    https://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers

    https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

    - Why is gradient checking important?

    Gradient checking can help to find bugs in a backpropagation implementation, it is done by comparing the analytical gradient and the numerical gradient computed with calculus.

    https://stackoverflow.com/questions/47506521/what-exactly-is-gradient-checking

    http://cs231n.github.io/optimization-1/

    - What is the loss function in a NN?

    The loss function depends on the type of problem:
    Regression: Mean squared error
    Binary classification: Binary cross entropy
    Multiclass: Cross entropy
    Ranking: Hinge loss

    - There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?

    It can be either the weight transfer from the input layer to the hidden layer for that neuron is to be blamed or the activation function for the neuron should be changed.

    https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

    # SVM and Log Regression (Log R)

    - Difference between SVM and Log R?

    http://www.cs.toronto.edu/~kswersky/wp-content/uploads/svm_vs_lr.pdf



    - What does LogR give ?
    Posterior probability (P(y|x))


    - Does SVM give any probabilistic output?

    http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf


    - What are the support vectors in SVM?
    The vectors that define the hyperplane (margin) of SVM.


    - Evaluation of LogR?
    You can use any evaluation metric such as Precision, Recall, AUC, F1.

    - How does a logistic regression model know what the coefficients are?
    http://www-hsc.usc.edu/~eckel/biostat2/notes/notes14.pdf

    # Expectation-Maximization

    - How's EM done?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique

    - How are the params updated?

    https://stackoverflow.com/questions/11808074/what-is-an-intuitive-explanation-of-the-expectation-maximization-technique


    - When doing an EM for GMM, how do you find the mixture weights?
    I replied that for 2 Gaussians, the prior or the mixture weight can be assumed to be a Bernouli distribution.
    http://www.aishack.in/tutorials/expectation-maximization-gaussian-mixture-model-mixtures/

    - If x ~ N(0,1), what does 2x follow?
    N(0,2)
    https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables


    - How would you sample for a GMM?

    http://www.robots.ox.ac.uk/~fwood/teaching/C19_hilary_2013_2014/gmm.pdf

    - How to sample from a Normal Distribution with known mean and variance?

    https://stats.stackexchange.com/questions/16334/how-to-sample-from-a-normal-distribution-with-known-mean-and-variance-using-a-co

    # Data Science in Production

    - When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?

    - When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?

    - When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?

    - If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?

    # Math and Probability

    - Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?

    - How do you weight 9 marbles three times on a balance scale to select the heaviest one?

    - Estimate the disease probability in one city given the probability is very low nationwide.

    - Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?

    # Programming Questions

    - Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.

    - Implement circular queue using an array.

    - Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.

    - Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.