# Data Science in Production ## When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values? Many possible answers here, mine: you sample a N large enough to reduce uncertainty over the large data, then you compare with a statistical test. https://www.sas.upenn.edu/~fdiebold/Teaching104/Ch14_slides.pdf ## When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase? A sequential machine learning algorithm where you manage to keep the state of the user and predict his/her next action. Here many options are possible HMM, RNN, Bandits. ## When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias? You sample by position making them a uniform distribution. ## If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application? All the problems that can happen with overfitting. # Math and Probability ## How do you weight 9 marbles three times on a balance scale to select the heaviest one? https://mattgadient.com/2013/02/03/9-marbles-and-a-weight-balance-which-is-the-heaviest-one/ ## Estimate the disease probability in one city given the probability is very low nationwide. Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city? https://medium.com/acing-ai/interview-guide-to-probability-distributions-a6dfb08c3766