tomleo · March 23, 2022 10:25 · Jan 17, 2015 · Dec 9, 2014 · Dec 6, 2014
diff --git a/ab-testing-notes.rst b/ab-testing-notes.rst
@@ -0,0 +1,33 @@
+A/B Testing
+===========
+
+Optimizely
+----------
+
+click goals
+    measures how often visitors click an element
+
+url targeting
+    where the experiment runs
+
+audiences
+    who sees the experiment
+
+The setup is strait-forward, and the interface makes it easy for non-technical
+people to create experiments related to simple content and visual changes.
+
+For example you can change css and copy. When changing markup a developer
+should probably be involved as other code or cascading styles might rely on the
+document structure.
+
+Setup
+`````
+
+you’ll need to paste that line of code into the <head> tag of any page you want
+to include in your experiment
+
+Because Optimizely will actually control how your page displays, it’s important
+to put the snippet as high in the <head> tag as possible. This lets Optimizely
+load variations in real time, before the visitor even sees the page.
+
+
diff --git a/abtesting.rst b/abtesting.rst
@@ -51,7 +51,8 @@ sizes.
 `bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_
 
 priors
-    represent what we believe before we run the test
+    a distribution that encodes your prior belief about the
+    parameter-of-interest (represent what we believe before we run the test)
 
 Advantages of Basian Testing
 ````````````````````````````
@@ -81,7 +82,8 @@ TODO Re-learn Calculus
 
 - `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_
 
-- Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
+- `Probabilistic Programming & Bayesian Methods for Hackers <http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/>`_
+
 
 
 Test Procedure
@@ -108,27 +110,45 @@ lift
     respect to the population as a whole), measured against a random choice
     targeting model.
 
+likelihood
+    a function that encodes how likely your data is given a range of possible
+    parameters
+
+posterior
+    a distribution of the parameter-of-interest given your data, combining the
+    prior and likelihood
+
+
 
 - stop if P(CTR_B > CTR_A) is below a threshold.
 - if P(CTR_B < CTR_A) = 0.5 then the banners are the same
 - run the test until the expected cost drops below a threshold
 
-Calculating Cost
-````````````````
+
+Calculations in Python
+``````````````````````
+
+[6]_
 
 .. code-block:: python
 
+    from numpy.random import beta as beta_dist
     import numpy as np
-    from numpy.random import beta
+    N_samp = 10000 # number of samples to draw
+    clicks_A = 450 # insert your own data here
+    views_A = 56000
+    clicks_B = 345 # ditto
+    views_B = 49000
+    alpha = 1.1 # just for the example - set your own!
+    beta = 14.2
+    A_samples = beta_dist(clicks_A+alpha, views_A-clicks_A+beta, N_samp)
+    B_samples = beta_dist(clicks_B+alpha, views_B-clicks_B+beta, N_samp)
+
+    np.mean(A_samples > B_samples) # posterior probability that CTR_A > CTR_B
+
+    np.mean( 100. *(A_samples - B_samples)/B_samples > 3 ) # probability that the lift of A relative to B is at least 3%
 
-    def expected_cost(a_clicks, a_impressions, b_clicks, b_impressions, num_samples=10000):
-        #form the posterior distribution over the CTRs
-        a_dist = beta(1 + a_clicks, 1 + a_impressions - a_clicks, num_samples) 
-        b_dist = beta(1 + b_clicks, 1 + b_impressions - b_clicks, num_samples)
-        #form the distribution over the cost
-        cost_dist = np.maximum((b_dist-a_dist), 0) 
-        # return the expected cost
-        return cost_dist.mean()
+    # You can set alpha and beta priors to 1 when you belive all values are equally likely
 
 
 
@@ -140,3 +160,4 @@ Calculating Cost
 .. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html
 .. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html
 .. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html
+.. [6] http://engineering.richrelevance.com/bayesian-ab-tests/
diff --git a/abtesting.rst b/abtesting.rst
@@ -84,6 +84,53 @@ TODO Re-learn Calculus
 - Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
 
 
+Test Procedure
+--------------
+
+CTR
+    click through rate
+
+MED
+    minimum detectable effect size
+
+significance/alpha
+    Determine % of time a difference will be detected, assuming one does not exsit
+
+power/beta
+    % of time the MDE will be detected
+
+impression
+    number of times X is exposed to a potential viewer
+
+lift
+    is a measure of the performance of a targeting model (association rule) at
+    predicting or classifying cases as having an enhanced response (with
+    respect to the population as a whole), measured against a random choice
+    targeting model.
+
+
+- stop if P(CTR_B > CTR_A) is below a threshold.
+- if P(CTR_B < CTR_A) = 0.5 then the banners are the same
+- run the test until the expected cost drops below a threshold
+
+Calculating Cost
+````````````````
+
+.. code-block:: python
+
+    import numpy as np
+    from numpy.random import beta
+
+    def expected_cost(a_clicks, a_impressions, b_clicks, b_impressions, num_samples=10000):
+        #form the posterior distribution over the CTRs
+        a_dist = beta(1 + a_clicks, 1 + a_impressions - a_clicks, num_samples) 
+        b_dist = beta(1 + b_clicks, 1 + b_impressions - b_clicks, num_samples)
+        #form the distribution over the cost
+        cost_dist = np.maximum((b_dist-a_dist), 0) 
+        # return the expected cost
+        return cost_dist.mean()
+
+
 
 
 .. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_

diff --git a/abtesting.rst b/abtesting.rst
@@ -0,0 +1,95 @@
+A/B Testing
+===========
+
+
+null-Hypothesis tests
+---------------------
+
+The p-value is used in the context of null hypothesis testing in
+order to quantify the idea of statistical significance of
+evidence. [2]_
+
+A common mistake is to run multiple null hypothesis tests as the
+data are coming in and decide to stop the test early on the the
+first significant result. [1]_
+
+If you run experiments: the best way to avoid repeated significance testing
+errors is to not test significance repeatedly. Decide on a sample size in
+advance and wait until the experiment is over before you start believing the
+“chance of beating original” figures that the A/B testing software gives you. [3]_
+
+`Sample Size Calculator <http://www.evanmiller.org/ab-testing/sample-size.html>`_
+
+Issues with null-hypothesis method: [4]_
+
+- Even if preliminary evidence says that one version is terrible, we will keep
+  losing conversions until we hit an arbitrary threshold.
+
+- If we hit that threshold without having reached statistical proof, we cannot
+  continue the experiment.
+
+- Naive attempts to fix the former problems by using the same statistical test
+  multiple times leads to our making far more mistakes than we are willing to
+  accept.
+
+`A/B Split Test Significance Calculator <https://vwo.com/ab-split-test-significance-calculator/>`_
+
+
+
+Bayesian A/B testing
+--------------------
+
+Bayesian A/B testing is an alternative to Students T-Test
+(t-distributions) and obviously p-distrubutions which require large sample
+sizes.
+
+- unlike the Student T-Test, you can stop the test early if there
+  is a clear winner or run it for longer if you need more samples. While is is
+  generally true `A/B Testing with Limited Data`_ 
+  shows a workaround.
+
+`bayesian_ab_test.py <https://gist.github.com/stucchio/9090456>`_
+
+priors
+    represent what we believe before we run the test
+
+Advantages of Basian Testing
+````````````````````````````
+
+1. Easier to interpret results, p-values are confusing. Try to follow
+   `A/B Testing with Limited Data`_ without your brain melting
+
+2. "measuring the probability at time t that B is better than A (or vice versa).
+   You can look at the data, check if the test is finished, and stop the test
+   early if the result is highly conclusive." [5]_
+
+3. You can use your current posteriors as new priors for what is essentially
+   the start of a new test without any major interruptions in your development
+   flow. [5]_ This is the probably the worst thing you can do with traditional
+   hypothesis testing.
+
+4. Bayesian A/B test achieves the same lift as the standard procedure, but
+   typically uses fewer data points. [5]_
+
+
+TODO Re-learn Calculus
+``````````````````````
+
+- `Easy Evaluation of Decision Rules in Bayesian A/B testing <Easy Evaluation of Decision Rules in Bayesian A/B testing>`_
+
+- `A Formula for Bayesian A/B Testing <http://www.evanmiller.org/bayesian-ab-testing.html>`_
+
+- `Asymptotics of Evan Miller's Bayesian A/B formula <https://www.chrisstucchio.com/blog/2014/bayesian_asymptotics.html>`_
+
+- Finish reading http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
+
+
+
+
+.. _`A/B Testing with Limited Data`: http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html>`_
+
+.. [1] http://ewulczyn.github.io/How_Naive_AB_Testing_Goes_Wrong/
+.. [2] http://en.wikipedia.org/wiki/P-value
+.. [3] http://www.evanmiller.org/how-not-to-run-an-ab-test.html
+.. [4] http://elem.com/~btilly/ab-testing-multiple-looks/part1-rigorous.html
+.. [5] http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html