Skip to content

Instantly share code, notes, and snippets.

@tomleo
Last active March 23, 2022 10:25
Show Gist options
  • Save tomleo/b011ad8db69fb8c18108 to your computer and use it in GitHub Desktop.
Save tomleo/b011ad8db69fb8c18108 to your computer and use it in GitHub Desktop.

Revisions

  1. tomleo revised this gist Jan 17, 2015. 2 changed files with 67 additions and 13 deletions.
    33 changes: 33 additions & 0 deletions ab-testing-notes.rst
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,33 @@
    A/B Testing
    ===========

    Optimizely
    ----------

    click goals
    measures how often visitors click an element

    url targeting
    where the experiment runs

    audiences
    who sees the experiment

    The setup is strait-forward, and the interface makes it easy for non-technical
    people to create experiments related to simple content and visual changes.

    For example you can change css and copy. When changing markup a developer
    should probably be involved as other code or cascading styles might rely on the
    document structure.

    Setup
    `````

    you’ll need to paste that line of code into the <head> tag of any page you want
    to include in your experiment

    Because Optimizely will actually control how your page displays, it’s important
    to put the snippet as high in the <head> tag as possible. This lets Optimizely
    load variations in real time, before the visitor even sees the page.


    47 changes: 34 additions & 13 deletions abtesting.rst
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,8 @@ sizes.
    `bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_

    priors
    represent what we believe before we run the test
    a distribution that encodes your prior belief about the
    parameter-of-interest (represent what we believe before we run the test)

    Advantages of Basian Testing
    ````````````````````````````
    @@ -81,7 +82,8 @@ TODO Re-learn Calculus

    - `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_

    - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
    - `Probabilistic Programming & Bayesian Methods for Hackers <http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/>`_



    Test Procedure
    @@ -108,27 +110,45 @@ lift
    respect to the population as a whole), measured against a random choice
    targeting model.

    likelihood
    a function that encodes how likely your data is given a range of possible
    parameters

    posterior
    a distribution of the parameter-of-interest given your data, combining the
    prior and likelihood



    - stop if P(CTR_B > CTR_A) is below a threshold.
    - if P(CTR_B < CTR_A) = 0.5 then the banners are the same
    - run the test until the expected cost drops below a threshold

    Calculating Cost
    ````````````````

    Calculations in Python
    ``````````````````````

    [6]_

    .. code-block:: python
    from numpy.random import beta as beta_dist
    import numpy as np
    from numpy.random import beta
    N_samp = 10000 # number of samples to draw
    clicks_A = 450 # insert your own data here
    views_A = 56000
    clicks_B = 345 # ditto
    views_B = 49000
    alpha = 1.1 # just for the example - set your own!
    beta = 14.2
    A_samples = beta_dist(clicks_A+alpha, views_A-clicks_A+beta, N_samp)
    B_samples = beta_dist(clicks_B+alpha, views_B-clicks_B+beta, N_samp)
    np.mean(A_samples > B_samples) # posterior probability that CTR_A > CTR_B
    np.mean( 100. *(A_samples - B_samples)/B_samples > 3 ) # probability that the lift of A relative to B is at least 3%
    def expected_cost(a_clicks, a_impressions, b_clicks, b_impressions, num_samples=10000):
    #form the posterior distribution over the CTRs
    a_dist = beta(1 + a_clicks, 1 + a_impressions - a_clicks, num_samples)
    b_dist = beta(1 + b_clicks, 1 + b_impressions - b_clicks, num_samples)
    #form the distribution over the cost
    cost_dist = np.maximum((b_dist-a_dist), 0)
    # return the expected cost
    return cost_dist.mean()
    # You can set alpha and beta priors to 1 when you belive all values are equally likely
    @@ -140,3 +160,4 @@ Calculating Cost
    .. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html
    .. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html
    .. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html
    .. [6] http://engineering.richrelevance.com/bayesian-ab-tests/
  2. tomleo revised this gist Dec 9, 2014. 1 changed file with 47 additions and 0 deletions.
    47 changes: 47 additions & 0 deletions abtesting.rst
    Original file line number Diff line number Diff line change
    @@ -84,6 +84,53 @@ TODO Re-learn Calculus
    - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/


    Test Procedure
    --------------

    CTR
    click through rate

    MED
    minimum detectable effect size

    significance/alpha
    Determine % of time a difference will be detected, assuming one does not exsit

    power/beta
    % of time the MDE will be detected

    impression
    number of times X is exposed to a potential viewer

    lift
    is a measure of the performance of a targeting model (association rule) at
    predicting or classifying cases as having an enhanced response (with
    respect to the population as a whole), measured against a random choice
    targeting model.


    - stop if P(CTR_B > CTR_A) is below a threshold.
    - if P(CTR_B < CTR_A) = 0.5 then the banners are the same
    - run the test until the expected cost drops below a threshold

    Calculating Cost
    ````````````````

    .. code-block:: python
    import numpy as np
    from numpy.random import beta
    def expected_cost(a_clicks, a_impressions, b_clicks, b_impressions, num_samples=10000):
    #form the posterior distribution over the CTRs
    a_dist = beta(1 + a_clicks, 1 + a_impressions - a_clicks, num_samples)
    b_dist = beta(1 + b_clicks, 1 + b_impressions - b_clicks, num_samples)
    #form the distribution over the cost
    cost_dist = np.maximum((b_dist-a_dist), 0)
    # return the expected cost
    return cost_dist.mean()
    .. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_
  3. tomleo created this gist Dec 6, 2014.
    95 changes: 95 additions & 0 deletions abtesting.rst
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,95 @@
    A/B Testing
    ===========


    null-Hypothesis tests
    ---------------------

    The p-value is used in the context of null hypothesis testing in
    order to quantify the idea of statistical significance of
    evidence. [2]_

    A common mistake is to run multiple null hypothesis tests as the
    data are coming in and decide to stop the test early on the the
    first significant result. [1]_

    If you run experiments: the best way to avoid repeated significance testing
    errors is to not test significance repeatedly. Decide on a sample size in
    advance and wait until the experiment is over before you start believing the
    “chance of beating original” figures that the A/B testing software gives you. [3]_

    `Sample Size Calculator <http://www.evanmiller.org/ab-testing/sample-size.html>`_

    Issues with null-hypothesis method: [4]_

    - Even if preliminary evidence says that one version is terrible, we will keep
    losing conversions until we hit an arbitrary threshold.

    - If we hit that threshold without having reached statistical proof, we cannot
    continue the experiment.

    - Naive attempts to fix the former problems by using the same statistical test
    multiple times leads to our making far more mistakes than we are willing to
    accept.

    `A/B Split Test Significance Calculator <https://vwo.com/ab-split-test-significance-calculator/>`_



    Bayesian A/B testing
    --------------------

    Bayesian A/B testing is an alternative to Students T-Test
    (t-distributions) and obviously p-distrubutions which require large sample
    sizes.

    - unlike the Student T-Test, you can stop the test early if there
    is a clear winner or run it for longer if you need more samples. While is is
    generally true `A/B Testing with Limited Data`_
    shows a workaround.

    `bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_

    priors
    represent what we believe before we run the test

    Advantages of Basian Testing
    ````````````````````````````

    1. Easier to interpret results, p-values are confusing. Try to follow
    `A/B Testing with Limited Data`_ without your brain melting

    2. "measuring the probability at time t that B is better than A (or vice versa).
    You can look at the data, check if the test is finished, and stop the test
    early if the result is highly conclusive." [5]_

    3. You can use your current posteriors as new priors for what is essentially
    the start of a new test without any major interruptions in your development
    flow. [5]_ This is the probably the worst thing you can do with traditional
    hypothesis testing.

    4. Bayesian A/B test achieves the same lift as the standard procedure, but
    typically uses fewer data points. [5]_


    TODO Re-learn Calculus
    ``````````````````````

    - `Easy Evaluation of Decision Rules in Bayesian A/B testing <Easy Evaluation of Decision Rules in Bayesian A/B testing>`_

    - `A Formula for Bayesian A/B Testing <http://www.evanmiller.org/bayesian-ab-testing.html>`_

    - `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_

    - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/




    .. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_

    .. [1] http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
    .. [2] http://en.wikipedia.org/wiki/P-value
    .. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html
    .. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html
    .. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html