Last active
January 6, 2025 16:54
-
-
Save rain-1/3ff27988a24bc6e6417031df925d50e7 to your computer and use it in GitHub Desktop.
Revisions
-
rain-1 revised this gist
Jul 11, 2022 . 1 changed file with 9 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -106,16 +106,16 @@ m = update_given_blue() m [(0, 0.0), (1, 0.05454545454545454), (2, 0.09696969696969698), (3, 0.12727272727272723), (4, 0.14545454545454545), (5, 0.1515151515151515), (6, 0.14545454545454545), (7, 0.12727272727272726), (8, 0.09696969696969694), (9, 0.05454545454545453), (10, 0.0)] ``` Here is the graph -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -58,7 +58,7 @@ p_red() Now we can get to the heart of the problem. Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is significant - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we drew a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue. We can use Bayes theorem to work out the *posterior* probabilities for each model. -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -22,7 +22,7 @@ m = [(i, 1/(size+1)) for i in range(size+1)] Let an *event* be that we pull a ball out, check if it is red or blue, then put it back in. We will call events *draws*. What is the probability of each? 1/2 right? This turns out to be true, but later we will not have all models equally likely. So how do we calculate probabilities then? * $P(\text{red} | M_0) = 0$ * $P(\text{red} | M_1) = 1/10$ -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,7 +11,7 @@ Suppose you **know** that there are 10 balls in an urn, some are red and some ar Initially we do not know which situation we are in. So a reasonable thing to do would be to assign an equal probability to every model. This is the *maximum entropy principle* and we use it to set up our *prior* probability distribution. Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will enable us to hone in a more focused distribution around one or two models. ```python size = 10 -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -171,6 +171,6 @@ The key step that enabled us to work iterate and refine our models here was the An agent using this type of bayesian reasoning is able to work on partial knowledge of the universe it exists within, but still do its best based on that. Another fundamental concept is that for anything to get started in the first place we needed to set up a prior probability distribution. To do that the concept of entropy was used. These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay*. Also the wumpus chapter of AIMA. * https://www.inference.org.uk/itprnn/book.pdf -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 0 additions and 8 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -130,14 +130,6 @@ As is universal in statistics, a bell curve starts to appear. This was all inspired by a question. What if we drew a red ball 6 times in a row, and then our friend came along and drew a blue. How surprised would be we? Should we accuse them of cheating? ```python m = [(i, 1/(size+1)) for i in range(size+1)] m = update_given_red() m = update_given_red() -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,5 @@ I have included working code examples that can be run throughout, as well as graphs. I hope this helps make this easier to understand in a more hands on way. # The setup Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation: @@ -167,7 +169,7 @@ But if we were to tip out the urn and see it only had 1 red ball in it, that wou  # What's the bigger picture This was a very simple example of the general concept of an agent performing bayesian reasoning to create a model of the world through uncertainty. This is one the foundational steps that is required for accurate decision making to take place. The concepts here should apply in a very wide range of situations. -
rain-1 renamed this gist
Jul 8, 2022 . 1 changed file with 4 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -169,13 +169,13 @@ But if we were to tip out the urn and see it only had 1 red ball in it, that wou # What is this really about This was a very simple example of the general concept of an agent performing bayesian reasoning to create a model of the world through uncertainty. This is one the foundational steps that is required for accurate decision making to take place. The concepts here should apply in a very wide range of situations. Probabilities are fundamentally a subjective internal valuation of an agents belief. probabilites are not objective. They are based on the agents personal model of the world which is formed by the information they have recieved over time. The key step that enabled us to work iterate and refine our models here was the application of Bayes theorem to go from forward reasoning to backward reasoning. This let us compute probabilities for potential *models of the world* based on evidence or events that were observed. An agent using this type of bayesian reasoning is able to work on partial knowledge of the universe it exists within, but still do its best based on that. Another fundamental concept is that for anything to get started in the first place we needed to set up a prior probability distribution. To do that the concept of entropy was used. These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay* -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 5 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -116,7 +116,10 @@ m (10, 0.18181818181818182)] ``` Here is the graph  As is universal in statistics, a bell curve starts to appear. @@ -162,7 +165,7 @@ I'm getting a result of 8%, so a bit less than 1/10. It's believable. But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that.  # What is this really about -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -86,7 +86,7 @@ def p_model_given_blue(i): Here is a graph of the new probability distribution:  You can see that 0 reds has 0 chance, and all reds is preferred as the most likely model. -
rain-1 revised this gist
Jul 8, 2022 . 3 changed files with 0 additions and 0 deletions.There are no files selected for viewing
LoadingSorry, something went wrong. Reload?Sorry, we cannot display this file.Sorry, this file is invalid so it cannot be displayed.LoadingSorry, something went wrong. Reload?Sorry, we cannot display this file.Sorry, this file is invalid so it cannot be displayed.LoadingSorry, something went wrong. Reload?Sorry, we cannot display this file.Sorry, this file is invalid so it cannot be displayed. -
rain-1 revised this gist
Jul 8, 2022 . 1 changed file with 14 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,7 +1,3 @@ # The setup Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation: @@ -167,3 +163,17 @@ I'm getting a result of 8%, so a bit less than 1/10. It's believable. But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that. red-red-red-red-red-red.png # What is this really about This was a very simple example of the general concept of an agent performing bayesian reasoning under uncertainty. Probabilities are fundamentally about an agents belief, based on their personal model of the world which is formed by the information they have recieved. Probabilty and entropy are tightly connected. The key step that enabled us to work intelligently here was applying Bayes theorem to go from forward reasoning to backward reasoning. Many people reading this will have been familiar with Bayes theorem already but what I want to stress is that we applied Bayes theorem to work out probabilities of *models of the world*. An agent using this type of bayesian reasoning is able to admit that it only has partial knowledge of the universe, but still do its best based on that. And another fundamental concept is that for anything to get started in the first place we needed a prior probability distribution. These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay* * https://www.inference.org.uk/itprnn/book.pdf -
rain-1 created this gist
Jul 8, 2022 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,169 @@ # Prologue # The setup Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation: * M0: 0 red, 10 blue * M1: 1 red, 9 blue * ... * M10: 10 red, 0 blue Initially we do not know which situation we are in. So a reasonable thing to do would be to assign an equal probability to every model. This is the *maximum entropy principle* and we use it to set up our *prior* probability distribution. Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will hone in a more specific. ```python size = 10 m = [(i, 1/(size+1)) for i in range(size+1)] ``` # Calculating the forward probability of an event Let an *event* be that we pull a ball out, check if it is red or blue, then put it back in. We will call events *draws*. What is the probability of each? 1/2 right? This turns out to be true, but later we will be leaning more towards some specific models than others. So how do we calculate probabilities then? * $P(\text{red} | M_0) = 0$ * $P(\text{red} | M_1) = 1/10$ * ... * $P(\text{red} | M_{10}) = 10/10$ and we know the probability of each model so we can sum it up * $P(\text{red}) = \sum_{i} P(M_i) P(\text{red} | M_i)$ ```python def p_red(): return sum([p * i/size for (i,p) in m]) def p_blue(): return 1 - p_red() p_red() 0.5 ``` # A Joke > A mathematician, a physicist, and an engineer are riding a train through Scotland. > The engineer looks out the window, sees a black sheep, and exclaims, "Hey! They've got black sheep in Scotland!" > The physicist looks out the window and corrects the engineer, "Strictly speaking, all we know is that there's at least one black sheep in Scotland." > The mathematician looks out the window and corrects the physicist, " Strictly speaking, all we know is that is that at least one side of one sheep is black in Scotland." # Calculating the backwards probability of a model, given an event Now we can get to the heart of the problem. Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is actually significant though - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we pulled a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue. We can use Bayes theorem to work out the *posterior* probabilities for each model. $$P(M_i | \text{red}) = \frac{P(\text{red} | M_i) P(M_i)}{P(\text{red})}$$ ```python def p_model_given_red(i): return i/size * m[i][1] / p_red() def p_model_given_blue(i): return (1 - i/size) * m[i][1] / p_blue() [p_model_given_red(i) for i in range(size+1)] [0.0, 0.018181818181818184, 0.03636363636363637, 0.05454545454545454, 0.07272727272727274, 0.09090909090909091, 0.10909090909090909, 0.12727272727272726, 0.14545454545454548, 0.16363636363636364, 0.18181818181818182] ``` Here is a graph of the new probability distribution: RED.png You can see that 0 reds has 0 chance, and all reds is preferred as the most likely model. What if we drew a red then a blue? ```python def update_given_red(): return [(i, p_model_given_red(i)) for i in range(size+1)] def update_given_blue(): return [(i, p_model_given_blue(i)) for i in range(size+1)] m = update_given_red() m = update_given_blue() m [(0, 0.0), (1, 0.018181818181818184), (2, 0.03636363636363637), (3, 0.05454545454545454), (4, 0.07272727272727274), (5, 0.09090909090909091), (6, 0.10909090909090909), (7, 0.12727272727272726), (8, 0.14545454545454548), (9, 0.16363636363636364), (10, 0.18181818181818182)] ``` Here is the graph RED-BLUE.png As is universal in statistics, a bell curve starts to appear. # Answering a question This was all inspired by a question. What if we drew a red ball 6 times in a row, and then our friend came along and drew a blue. How surprised would be we? Should we accuse them of cheating? ```python m = [(i, 1/(size+1)) for i in range(size+1)] m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m = [(i, 1/(size+1)) for i in range(size+1)] m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m = update_given_red() m [(0, 0.0), (1, 5.054576792921572e-07), (2, 3.234929147469806e-05), (3, 0.00036847864820398234), (4, 0.002070354654380676), (5, 0.007897776238939953), (6, 0.02358263348505487), (7, 0.05946659051104296), (8, 0.13250269788036326), (9, 0.26862093454070324), (10, 0.505457679292157)] p_blue() 0.08611103388841013 ``` I'm getting a result of 8%, so a bit less than 1/10. It's believable. But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that. red-red-red-red-red-red.png