rain-1 · January 6, 2025 16:54 · Jul 11, 2022 · Jul 8, 2022 · Jul 8, 2022 · Jul 8, 2022
diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -106,16 +106,16 @@ m = update_given_blue()
 m
 
 [(0, 0.0),
- (1, 0.018181818181818184),
- (2, 0.03636363636363637),
- (3, 0.05454545454545454),
- (4, 0.07272727272727274),
- (5, 0.09090909090909091),
- (6, 0.10909090909090909),
+ (1, 0.05454545454545454),
+ (2, 0.09696969696969698),
+ (3, 0.12727272727272723),
+ (4, 0.14545454545454545),
+ (5, 0.1515151515151515),
+ (6, 0.14545454545454545),
  (7, 0.12727272727272726),
- (8, 0.14545454545454548),
- (9, 0.16363636363636364),
- (10, 0.18181818181818182)]
+ (8, 0.09696969696969694),
+ (9, 0.05454545454545453),
+ (10, 0.0)]
 ```
 
 Here is the graph

diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -58,7 +58,7 @@ p_red()
 
 Now we can get to the heart of the problem.
 
-Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is actually significant though - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we pulled a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue.
+Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is significant - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we drew a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue.
 
 We can use Bayes theorem to work out the *posterior* probabilities for each model.
 

diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -22,7 +22,7 @@ m = [(i, 1/(size+1)) for i in range(size+1)]
 
 Let an *event* be that we pull a ball out, check if it is red or blue, then put it back in. We will call events *draws*.
 
-What is the probability of each? 1/2 right? This turns out to be true, but later we will be leaning more towards some specific models than others. So how do we calculate probabilities then?
+What is the probability of each? 1/2 right? This turns out to be true, but later we will not have all models equally likely. So how do we calculate probabilities then?
 
 * $P(\text{red} | M_0) = 0$
 * $P(\text{red} | M_1) = 1/10$

diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -11,7 +11,7 @@ Suppose you **know** that there are 10 balls in an urn, some are red and some ar
 
 Initially we do not know which situation we are in. So a reasonable thing to do would be to assign an equal probability to every model. This is the *maximum entropy principle* and we use it to set up our *prior* probability distribution.
 
-Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will hone in a more specific.
+Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will enable us to hone in a more focused distribution around one or two models.
 
 ```python
 size = 10

diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -171,6 +171,6 @@ The key step that enabled us to work iterate and refine our models here was the
 
 An agent using this type of bayesian reasoning is able to work on partial knowledge of the universe it exists within, but still do its best based on that. Another fundamental concept is that for anything to get started in the first place we needed to set up a prior probability distribution. To do that the concept of entropy was used.
 
-These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay*
+These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay*. Also the wumpus chapter of AIMA.
 
 * https://www.inference.org.uk/itprnn/book.pdf
diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -130,14 +130,6 @@ As is universal in statistics, a bell curve starts to appear.
 This was all inspired by a question. What if we drew a red ball 6 times in a row, and then our friend came along and drew a blue. How surprised would be we? Should we accuse them of cheating?
 
 ```python
-m = [(i, 1/(size+1)) for i in range(size+1)]
-m = update_given_red()
-m = update_given_red()
-m = update_given_red()
-m = update_given_red()
-m = update_given_red()
-m = update_given_red()
-
 m = [(i, 1/(size+1)) for i in range(size+1)]
 m = update_given_red()
 m = update_given_red()

diff --git a/Modelling an Uncertain World.md b/Modelling an Uncertain World.md
@@ -1,3 +1,5 @@
+I have included working code examples that can be run throughout, as well as graphs. I hope this helps make this easier to understand in a more hands on way.
+
 # The setup
 
 Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation:
@@ -167,7 +169,7 @@ But if we were to tip out the urn and see it only had 1 red ball in it, that wou
 
 ![red red red red red red](https://gist.github.com/rain-1/3ff27988a24bc6e6417031df925d50e7/raw/29b6c65bbe49bd0e811f5e2a5cff0624f58d2e10/red%2520red%2520red%2520red%2520red%2520red.png)
 
-# What is this really about
+# What's the bigger picture
 
 This was a very simple example of the general concept of an agent performing bayesian reasoning to create a model of the world through uncertainty. This is one the foundational steps that is required for accurate decision making to take place. The concepts here should apply in a very wide range of situations.
 

diff --git a/balls in an urn.md → Modelling an Uncertain World.md b/balls in an urn.md → Modelling an Uncertain World.md
@@ -169,13 +169,13 @@ But if we were to tip out the urn and see it only had 1 red ball in it, that wou
 
 # What is this really about
 
-This was a very simple example of the general concept of an agent performing bayesian reasoning under uncertainty.
+This was a very simple example of the general concept of an agent performing bayesian reasoning to create a model of the world through uncertainty. This is one the foundational steps that is required for accurate decision making to take place. The concepts here should apply in a very wide range of situations.
 
-Probabilities are fundamentally about an agents belief, based on their personal model of the world which is formed by the information they have recieved. Probabilty and entropy are tightly connected.
+Probabilities are fundamentally a subjective internal valuation of an agents belief. probabilites are not objective. They are based on the agents personal model of the world which is formed by the information they have recieved over time.
 
-The key step that enabled us to work intelligently here was applying Bayes theorem to go from forward reasoning to backward reasoning. Many people reading this will have been familiar with Bayes theorem already but what I want to stress is that we applied Bayes theorem to work out probabilities of *models of the world*.
+The key step that enabled us to work iterate and refine our models here was the application of Bayes theorem to go from forward reasoning to backward reasoning. This let us compute probabilities for potential *models of the world* based on evidence or events that were observed.
 
-An agent using this type of bayesian reasoning is able to admit that it only has partial knowledge of the universe, but still do its best based on that. And another fundamental concept is that for anything to get started in the first place we needed a prior probability distribution.
+An agent using this type of bayesian reasoning is able to work on partial knowledge of the universe it exists within, but still do its best based on that. Another fundamental concept is that for anything to get started in the first place we needed to set up a prior probability distribution. To do that the concept of entropy was used.
 
 These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay*
 

diff --git a/balls in an urn.md b/balls in an urn.md
@@ -116,7 +116,10 @@ m
  (10, 0.18181818181818182)]
 ```
 
-Here is the graph RED-BLUE.png
+Here is the graph
+
+![red blue](https://gist.github.com/rain-1/3ff27988a24bc6e6417031df925d50e7/raw/29b6c65bbe49bd0e811f5e2a5cff0624f58d2e10/red%2520blue.png)
+
 
 As is universal in statistics, a bell curve starts to appear.
 
@@ -162,7 +165,7 @@ I'm getting a result of 8%, so a bit less than 1/10. It's believable.
 
 But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that.
 
-red-red-red-red-red-red.png
+![red red red red red red](https://gist.github.com/rain-1/3ff27988a24bc6e6417031df925d50e7/raw/29b6c65bbe49bd0e811f5e2a5cff0624f58d2e10/red%2520red%2520red%2520red%2520red%2520red.png)
 
 # What is this really about
 

diff --git a/balls in an urn.md b/balls in an urn.md
@@ -86,7 +86,7 @@ def p_model_given_blue(i):
 
 Here is a graph of the new probability distribution:
 
-RED.png
+![red](https://gist.github.com/rain-1/3ff27988a24bc6e6417031df925d50e7/raw/29b6c65bbe49bd0e811f5e2a5cff0624f58d2e10/red.png)
 
 You can see that 0 reds has 0 chance, and all reds is preferred as the most likely model.
 

diff --git a/red blue.png b/red blue.png
diff --git a/red red red red red red.png b/red red red red red red.png
diff --git a/red.png b/red.png
diff --git a/balls in an urn.md b/balls in an urn.md
@@ -1,7 +1,3 @@
-# Prologue
-
-
-
 # The setup
 
 Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation:
@@ -167,3 +163,17 @@ I'm getting a result of 8%, so a bit less than 1/10. It's believable.
 But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that.
 
 red-red-red-red-red-red.png
+
+# What is this really about
+
+This was a very simple example of the general concept of an agent performing bayesian reasoning under uncertainty.
+
+Probabilities are fundamentally about an agents belief, based on their personal model of the world which is formed by the information they have recieved. Probabilty and entropy are tightly connected.
+
+The key step that enabled us to work intelligently here was applying Bayes theorem to go from forward reasoning to backward reasoning. Many people reading this will have been familiar with Bayes theorem already but what I want to stress is that we applied Bayes theorem to work out probabilities of *models of the world*.
+
+An agent using this type of bayesian reasoning is able to admit that it only has partial knowledge of the universe, but still do its best based on that. And another fundamental concept is that for anything to get started in the first place we needed a prior probability distribution.
+
+These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book *Information Theory, Inference, and Learning Algorithms - David J.C. MacKay*
+
+* https://www.inference.org.uk/itprnn/book.pdf
diff --git a/balls in an urn.md b/balls in an urn.md
@@ -0,0 +1,169 @@
+# Prologue
+
+
+
+# The setup
+
+Suppose you **know** that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation:
+
+* M0: 0 red, 10 blue
+* M1: 1 red, 9 blue
+* ...
+* M10: 10 red, 0 blue
+
+Initially we do not know which situation we are in. So a reasonable thing to do would be to assign an equal probability to every model. This is the *maximum entropy principle* and we use it to set up our *prior* probability distribution.
+
+Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will hone in a more specific.
+
+```python
+size = 10
+m = [(i, 1/(size+1)) for i in range(size+1)]
+```
+
+# Calculating the forward probability of an event
+
+Let an *event* be that we pull a ball out, check if it is red or blue, then put it back in. We will call events *draws*.
+
+What is the probability of each? 1/2 right? This turns out to be true, but later we will be leaning more towards some specific models than others. So how do we calculate probabilities then?
+
+* $P(\text{red} | M_0) = 0$
+* $P(\text{red} | M_1) = 1/10$
+* ...
+* $P(\text{red} | M_{10}) = 10/10$
+
+and we know the probability of each model so we can sum it up
+
+* $P(\text{red}) = \sum_{i} P(M_i) P(\text{red} | M_i)$
+
+```python
+def p_red():
+    return sum([p * i/size for (i,p) in m])
+
+def p_blue():
+    return 1 - p_red()
+
+p_red()
+0.5
+```
+
+# A Joke
+
+> A mathematician, a physicist, and an engineer are riding a train through Scotland.
+
+> The engineer looks out the window, sees a black sheep, and exclaims, "Hey! They've got black sheep in Scotland!"
+
+> The physicist looks out the window and corrects the engineer, "Strictly speaking, all we know is that there's at least one black sheep in Scotland."
+
+> The mathematician looks out the window and corrects the physicist, " Strictly speaking, all we know is that is that at least one side of one sheep is black in Scotland."
+
+# Calculating the backwards probability of a model, given an event
+
+Now we can get to the heart of the problem.
+
+Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is actually significant though - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we pulled a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue.
+
+We can use Bayes theorem to work out the *posterior* probabilities for each model.
+
+$$P(M_i | \text{red}) = \frac{P(\text{red} | M_i) P(M_i)}{P(\text{red})}$$
+
+```python
+def p_model_given_red(i):
+    return i/size * m[i][1] / p_red()
+
+def p_model_given_blue(i):
+    return (1 - i/size) * m[i][1] / p_blue()
+
+[p_model_given_red(i) for i in range(size+1)]
+
+[0.0,
+ 0.018181818181818184,
+ 0.03636363636363637,
+ 0.05454545454545454,
+ 0.07272727272727274,
+ 0.09090909090909091,
+ 0.10909090909090909,
+ 0.12727272727272726,
+ 0.14545454545454548,
+ 0.16363636363636364,
+ 0.18181818181818182]
+```
+
+Here is a graph of the new probability distribution:
+
+RED.png
+
+You can see that 0 reds has 0 chance, and all reds is preferred as the most likely model.
+
+What if we drew a red then a blue?
+
+```python
+def update_given_red():
+    return [(i, p_model_given_red(i)) for i in range(size+1)]
+
+def update_given_blue():
+    return [(i, p_model_given_blue(i)) for i in range(size+1)]
+
+m = update_given_red()
+m = update_given_blue()
+m
+
+[(0, 0.0),
+ (1, 0.018181818181818184),
+ (2, 0.03636363636363637),
+ (3, 0.05454545454545454),
+ (4, 0.07272727272727274),
+ (5, 0.09090909090909091),
+ (6, 0.10909090909090909),
+ (7, 0.12727272727272726),
+ (8, 0.14545454545454548),
+ (9, 0.16363636363636364),
+ (10, 0.18181818181818182)]
+```
+
+Here is the graph RED-BLUE.png
+
+As is universal in statistics, a bell curve starts to appear.
+
+# Answering a question
+
+This was all inspired by a question. What if we drew a red ball 6 times in a row, and then our friend came along and drew a blue. How surprised would be we? Should we accuse them of cheating?
+
+```python
+m = [(i, 1/(size+1)) for i in range(size+1)]
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+
+m = [(i, 1/(size+1)) for i in range(size+1)]
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+m = update_given_red()
+
+m
+[(0, 0.0),
+ (1, 5.054576792921572e-07),
+ (2, 3.234929147469806e-05),
+ (3, 0.00036847864820398234),
+ (4, 0.002070354654380676),
+ (5, 0.007897776238939953),
+ (6, 0.02358263348505487),
+ (7, 0.05946659051104296),
+ (8, 0.13250269788036326),
+ (9, 0.26862093454070324),
+ (10, 0.505457679292157)]
+
+p_blue()
+0.08611103388841013
+```
+
+I'm getting a result of 8%, so a bit less than 1/10. It's believable.
+
+But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that.
+
+red-red-red-red-red-red.png