Last active
          March 23, 2022 10:25 
        
      - 
      
- 
        Save tomleo/b011ad8db69fb8c18108 to your computer and use it in GitHub Desktop. 
Revisions
- 
        tomleo revised this gist Jan 17, 2015 . 2 changed files with 67 additions and 13 deletions.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,33 @@ A/B Testing =========== Optimizely ---------- click goals measures how often visitors click an element url targeting where the experiment runs audiences who sees the experiment The setup is strait-forward, and the interface makes it easy for non-technical people to create experiments related to simple content and visual changes. For example you can change css and copy. When changing markup a developer should probably be involved as other code or cascading styles might rely on the document structure. Setup ````` you’ll need to paste that line of code into the <head> tag of any page you want to include in your experiment Because Optimizely will actually control how your page displays, it’s important to put the snippet as high in the <head> tag as possible. This lets Optimizely load variations in real time, before the visitor even sees the page. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -51,7 +51,8 @@ sizes. `bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_ priors a distribution that encodes your prior belief about the parameter-of-interest (represent what we believe before we run the test) Advantages of Basian Testing ```````````````````````````` @@ -81,7 +82,8 @@ TODO Re-learn Calculus - `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_ - `Probabilistic Programming & Bayesian Methods for Hackers <http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/>`_ Test Procedure @@ -108,27 +110,45 @@ lift respect to the population as a whole), measured against a random choice targeting model. likelihood a function that encodes how likely your data is given a range of possible parameters posterior a distribution of the parameter-of-interest given your data, combining the prior and likelihood - stop if P(CTR_B > CTR_A) is below a threshold. - if P(CTR_B < CTR_A) = 0.5 then the banners are the same - run the test until the expected cost drops below a threshold Calculations in Python `````````````````````` [6]_ .. code-block:: python from numpy.random import beta as beta_dist import numpy as np N_samp = 10000 # number of samples to draw clicks_A = 450 # insert your own data here views_A = 56000 clicks_B = 345 # ditto views_B = 49000 alpha = 1.1 # just for the example - set your own! beta = 14.2 A_samples = beta_dist(clicks_A+alpha, views_A-clicks_A+beta, N_samp) B_samples = beta_dist(clicks_B+alpha, views_B-clicks_B+beta, N_samp) np.mean(A_samples > B_samples) # posterior probability that CTR_A > CTR_B np.mean( 100. *(A_samples - B_samples)/B_samples > 3 ) # probability that the lift of A relative to B is at least 3% # You can set alpha and beta priors to 1 when you belive all values are equally likely @@ -140,3 +160,4 @@ Calculating Cost .. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html .. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html .. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html .. [6] http://engineering.richrelevance.com/bayesian-ab-tests/ 
- 
        tomleo revised this gist Dec 9, 2014 . 1 changed file with 47 additions and 0 deletions.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -84,6 +84,53 @@ TODO Re-learn Calculus - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/ Test Procedure -------------- CTR click through rate MED minimum detectable effect size significance/alpha Determine % of time a difference will be detected, assuming one does not exsit power/beta % of time the MDE will be detected impression number of times X is exposed to a potential viewer lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. - stop if P(CTR_B > CTR_A) is below a threshold. - if P(CTR_B < CTR_A) = 0.5 then the banners are the same - run the test until the expected cost drops below a threshold Calculating Cost ```````````````` .. code-block:: python import numpy as np from numpy.random import beta def expected_cost(a_clicks, a_impressions, b_clicks, b_impressions, num_samples=10000): #form the posterior distribution over the CTRs a_dist = beta(1 + a_clicks, 1 + a_impressions - a_clicks, num_samples) b_dist = beta(1 + b_clicks, 1 + b_impressions - b_clicks, num_samples) #form the distribution over the cost cost_dist = np.maximum((b_dist-a_dist), 0) # return the expected cost return cost_dist.mean() .. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_ 
- 
        tomleo created this gist Dec 6, 2014 .There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,95 @@ A/B Testing =========== null-Hypothesis tests --------------------- The p-value is used in the context of null hypothesis testing in order to quantify the idea of statistical significance of evidence. [2]_ A common mistake is to run multiple null hypothesis tests as the data are coming in and decide to stop the test early on the the first significant result. [1]_ If you run experiments: the best way to avoid repeated significance testing errors is to not test significance repeatedly. Decide on a sample size in advance and wait until the experiment is over before you start believing the “chance of beating original” figures that the A/B testing software gives you. [3]_ `Sample Size Calculator <http://www.evanmiller.org/ab-testing/sample-size.html>`_ Issues with null-hypothesis method: [4]_ - Even if preliminary evidence says that one version is terrible, we will keep losing conversions until we hit an arbitrary threshold. - If we hit that threshold without having reached statistical proof, we cannot continue the experiment. - Naive attempts to fix the former problems by using the same statistical test multiple times leads to our making far more mistakes than we are willing to accept. `A/B Split Test Significance Calculator <https://vwo.com/ab-split-test-significance-calculator/>`_ Bayesian A/B testing -------------------- Bayesian A/B testing is an alternative to Students T-Test (t-distributions) and obviously p-distrubutions which require large sample sizes. - unlike the Student T-Test, you can stop the test early if there is a clear winner or run it for longer if you need more samples. While is is generally true `A/B Testing with Limited Data`_ shows a workaround. `bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_ priors represent what we believe before we run the test Advantages of Basian Testing ```````````````````````````` 1. Easier to interpret results, p-values are confusing. Try to follow `A/B Testing with Limited Data`_ without your brain melting 2. "measuring the probability at time t that B is better than A (or vice versa). You can look at the data, check if the test is finished, and stop the test early if the result is highly conclusive." [5]_ 3. You can use your current posteriors as new priors for what is essentially the start of a new test without any major interruptions in your development flow. [5]_ This is the probably the worst thing you can do with traditional hypothesis testing. 4. Bayesian A/B test achieves the same lift as the standard procedure, but typically uses fewer data points. [5]_ TODO Re-learn Calculus `````````````````````` - `Easy Evaluation of Decision Rules in Bayesian A/B testing <Easy Evaluation of Decision Rules in Bayesian A/B testing>`_ - `A Formula for Bayesian A/B Testing <http://www.evanmiller.org/bayesian-ab-testing.html>`_ - `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_ - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/ .. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_ .. [1] http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/ .. [2] http://en.wikipedia.org/wiki/P-value .. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html .. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html .. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html