Last active
August 22, 2020 09:12
-
-
Save dosentmatter/7dd5b3612eb5243f78fe57cbb59e2c15 to your computer and use it in GitHub Desktop.
Revisions
-
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 39 additions and 25 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -143,8 +143,8 @@ E[k_i] = p * (1 - p**i) E[k_i] = p - p**(i + 1) E[k] = E[k_0] + E[k_1] + ... E[k_n-1] E[k] = (p - p**(0 + 1)) + (p - p**(1+1)) + ... (p - p**(n - 1 + 1)) E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) --- @@ -168,7 +168,7 @@ So for n=0, E[k] <= E[C] Let's start induction from n=1, When n=1, E[C] = (1 - p) * 1 = 1 - p E[k] = (p - p**1) = 0 So for n=1, E[k] <= E[C] Inductive step: @@ -177,15 +177,15 @@ We have to prove P(x + 1) is true. P(x + 1) means E[k_x+1 elements] < E[C_x+1 elements] (p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * (x + 1) (p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * x + (1 - p) The first chunk of each side is just P(x), I will put them in square brackets: [ (p - p**1) + (p - p**2) + ... + (p - p**x) ] + (p - p**(x + 1)) <= [ (1 - p) * x ] + (1 - p) The part in the square brackets [ (p - p**1) + (p - p**2) + ... + (p - p**x) ] <= [ (1 - p) * x ] (p - p**1) + (p - p**2) + ... + (p - p**x) <= (1 - p) * x E[k_x elements] <= E[C_x elements] is just P(x), which we have assumed to be true. @@ -195,25 +195,25 @@ If a <= b is true, a + x <= b + y is also true if x <= y In our case, a = E[k_x elements] = (p - p**1) + (p - p**2) + ... + (p - p**x) = left terms inside square brackets b = E[C_x elements] = (1 - p) * x = right terms inside square brackets x = (p - p**(x + 1)) = left term outside square brackets y = (1 - p) = right term outside square brackets This leaves us with having to prove the following in order to prove P(x + 1). Note that this is just the expected value of the next term added when adding one more element to the array. (p - p**(x + 1)) <= (1 - p) p - p**(x + 1) <= 1 - p 2*p <= 1 + p**(x + 1) Since 0 <= p <= 0.5, the greatest 2*p can be is 1, and 1 + p**(x + 1) is always greater than 1. This means that the original x <= y inequality is true and P(x + 1) is true. Thus, P(n) is true for n being any integer >= 0. In other words: "If 0 <= p <= 0.5 and n is an integer >= 0, then E[k] <= E[C]" This means that for 0 <= p <= 0.5, Solution 3 is always better. --- @@ -226,26 +226,40 @@ Assumptions: Proposition: "If 0.5 < p <= 1, then E[k] >= E[C]" We will special case p=1. E[C] = (1 - p) * n = (1 - 1) * n = (0) * n = 0 E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) = (1 - 1**1) + (1 - 1**2) + ... (1 - 1**n) = (1 - 1) + (1 - 1) + ... (1 - 1) = (0) + (0) + ... (0) = 0 When p=1, the array is filled with non-zeros. No work is done and E[k] = E[C], but E[k] >= E[C] still holds true. Now let's exclude p=1 and prove the rest where 0.5 < p < 1. For this proposition, I will not do a formal proof. I will only analyze E[k] and E[C] and show that E[k] grows faster than E[C] for large n. Let's start with E[C] E[C] = (1 - p) * n Since 0.5 < p < 1, we know that 0 < (1 - p) < 0.5 As n grows large, we keep adding more 0 < (1 - p) < 0.5 terms. How about E[k]? E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) Keep in mind 0.5 < p < 1. For earlier terms such as (p - p**1) or (p - p**2), the terms will be pretty small, because the exponent is small, so the subtraction is large. However for later terms, and when n is large, (p - p**i) can be approximated as just (p), since p**i ~= 0, where i is large. This is because 0.5 < p < 1, so each multiple of p decreases p**i. When n is large, there will be many terms that will be approximated as just p. As n grows large, we keep adding more approximate 0.5 < p < 1 terms. @@ -260,7 +274,7 @@ This means the proposition holds true for large n, since E[k] grows faster than Note that for low n, it's possible for the opposite to be true. E[C] might be greater. You can graph it out on a calculator to see visually. But for low n, optimization is unnecessary. This means for large n and 0.5 < p <= 1, Solution 2 is always better. --- @@ -337,7 +351,7 @@ n = 250 elements to sample For large n, 250 elements isn't much. You can adjust E and d depending how accurate you need to be. Keep in mind, the cutoff to determining if you should use Solution 2 or 3 is at p = 0.5. So your E should be small enough to distinguish the cutoff, but large enough so n isn't too large to sample. Set d depending on how accurate you want your estimate to be If n gets too large, you can also set a 10% array size limit to the sample size n. -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -132,7 +132,9 @@ P(having at least one zero) = (1 - P(having all non-zeros)) Since the following two are independent events, they can be factored: = P(non-zero at index i AND having at least one zero before index i) = P(non-zero at index i) * P(having at least one zero before index i) Since i is indexed starting at 0, there are i elements before index i and we can plug into above formulas: = P(non-zero) * P(having at least one zero in i elements) = P(non-zero) * (1 - P(having i non-zeros)) = p * (1 - p**i) Since each element can contribute at most 1 incorrect non-zero, -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -130,6 +130,7 @@ P(having at least one zero) = (1 - P(having all non-zeros)) P(incorrect non-zero at index i) = P(non-zero at index i with at least one zero before it) Since the following two are independent events, they can be factored: = P(non-zero at index i AND having at least one zero before index i) = P(non-zero at index i) * P(having at least one zero before index i) Since i is indexed starting at 0, we don't need to do (i - 1): = p * (1 - p**i) -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -41,7 +41,7 @@ knowledge will come in handy later. Cost of Solution 2 for the general case: For the first loop, we have: Cost(first loop) = Cost(first loop iterations) + Cost(first loop moves ie. writes) since k of the elements need to be moved: Cost(first loop) = Cost(n iterations) + Cost(k moves) Cost(first loop) = n + k -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,7 +7,7 @@ For Solution 2, you can skip moving when `lastNonZeroFoundAt == i` For Solution 3, you can skip swapping when `lastNonZeroFoundAt == cur` For Solution 2, even if you do a memset, that is an O(n) because it has to write each element to 0. For Solution 3, we can say that swap is a 3 cost operation since it does 3 writes including the temporary variable. --- @@ -17,6 +17,8 @@ Assumptions: 2. Reading is 0 cost 3. Comparisons are 0 cost 4. Incrementing `lastNonZeroFoundAt` is 0 cost since both solutions increment it by the same amount (ie. the number of non-zeros) anyway 5. Writing is 1 cost 6. Swapping is 3 costs because it consists of 3 writes. Let n be the number of elements in the array Let C be the number of zeros. -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,7 +2,7 @@ I don't think Solution 2 is always better than Solution 3 First of all, it doesn't matter unless n is large, since both are O(n). For both Solutions 2 and 3, you can skip operations when there are non-zeros at the beginning of the array by doing the following: For Solution 2, you can skip moving when `lastNonZeroFoundAt == i` For Solution 3, you can skip swapping when `lastNonZeroFoundAt == cur` -
dosentmatter revised this gist
Aug 22, 2020 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,7 +16,7 @@ Assumptions: 1. Each loop iteration costs 1 2. Reading is 0 cost 3. Comparisons are 0 cost 4. Incrementing `lastNonZeroFoundAt` is 0 cost since both solutions increment it by the same amount (ie. the number of non-zeros) anyway Let n be the number of elements in the array Let C be the number of zeros. -
dosentmatter revised this gist
Dec 2, 2019 . 1 changed file with 58 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -278,6 +278,61 @@ It's not worth determining C or k because that would take O(n). You can determine the approximation of p by sampling a few values and averaging by how many non-zeros you see. How many values should you sample? We can use the Chebyshev theorem: See Chapter 8.1 of https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf and 5a of https://inst.eecs.berkeley.edu/~ee126/fa19/hw/hw5-sol.pdf P(|X - u| >= E) <= V(X) / E**2 Let X be a Bernoulli trial with X_i as the random variable for each coin flip. There is a p chance for the coin to flip heads (non-zero element) and a (1 - p) chance for tails (zero element) S_n = X_0 + X_1 + ... X_n-1 P(|S_n / n - p| >= E) <= V(S_n / n) / E**2 P(|S_n / n - p| >= E) <= Var(S_n) / (n**2 * E**2) P(|S_n / n - p| >= E) <= n * Var(X_i) / (n**2 * E**2) P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2) Var(X_i) = E[X_i**2] - E[X]**2 = [ (1 - p) * 0**2 + p * 1**2 ] - [ (1 - p) * 0 + p * 1 ] ** 2 = [ p * 1**2 ] - [ p * 1 ] ** 2 = [ p ] - [ p ] ** 2 = p - p**2 = p * (1 - p) P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2) P(|S_n / n - p| >= E) <= p * (1 - p) / (n * E**2) We set d = p * (1 - p) / (n * E ** 2) n * E ** 2 = p * (1 - p) / d n = p * (1 - p) / (d * E**2) We can use this formula to determine a good value for n, the sample size. E is the error threshold of estimated probability vs actual p: |S_n / n - p| >= E d is the probability of the estimated probability breaching the error threshold, E P(|S_n / n - p| >= E) <= d Formula: n = p * (1 - p) / (d * E**2) To be more concrete, we can maximize p * (1 - p) to be the maximum, which is when p = 0.5 n = 0.5 * (0.5) / (d * E**2) n = 1 / (4 * d * E**2) we can set E = 0.1 and d = 0.1 n = 1 / (4 * 0.1 * 0.1**2) n = 250 elements to sample For large n, 250 elements isn't much. You can adjust E and d depending how accurate you need to be. Keep in mind, the cutoff to determining if you should use Solution 2 or 3 is at p = 0.5. So your E should be small enough to distinguish the cutoff, but large enough so n isn't too large. Set d depending on how accurate your estimate should be. If n gets too large, you can also set a 10% array size limit to the sample size n. -
dosentmatter revised this gist
Dec 2, 2019 . 1 changed file with 23 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -172,25 +172,39 @@ We have to prove P(x + 1) is true. P(x + 1) means E[k_x+1 elements] < E[C_x+1 elements] (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) + (p - p**x) <= (1 - p) * (x + 1) (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) + (p - p**x) <= (1 - p) * x + (1 - p) The first chunk of each side is just P(x), I will put them in square brackets: [ (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) ] + (p - p**x) <= [ (1 - p) * x ] + (1 - p) The part in the square brackets [ (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) ] <= [ (1 - p) * x ] (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) <= (1 - p) * x E[k_x elements] <= E[C_x elements] is just P(x), which we have assumed to be true. How can we use P(x) to prove P(x + 1)? We can use this fact: If a <= b is true, a + x <= b + y is also true if x <= y In our case, a = E[k_x elements] = (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) = left term inside square brackets b = E[C_x elements] = (1 - p) * x = right term inside square brackets x = (p - p**x) = left term outside square brackets y = (1 - p) = right term outside square brackets This leaves us with having to prove the following in order to prove P(x + 1). Note that this is just the expected value of the next term added when adding one more element to the array. (p - p**x) <= (1 - p) p - p**x <= 1 - p 2*p <= 1 + p**x Since 0 <= p <= 0.5, the greatest 2*p can be is 1, and 1 + p**x is always greater than 1. This means that the original x <= y inequality is true and P(x + 1) is true. Thus, P(n) is true for n being any integer >= 0. In other words: -
dosentmatter revised this gist
Nov 27, 2019 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -264,4 +264,6 @@ It's not worth determining C or k because that would take O(n). You can determine the approximation of p by sampling a few values and averaging by how many non-zeros you see. How many values should you sample? That is a question for another day, but I think there is a theorem on how many samples you need for a desired margin of error. It might be the Central Limit Theorem. Not sure yet. -
dosentmatter revised this gist
Nov 27, 2019 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -85,7 +85,7 @@ be worth it to do an O(n) count, so let's do some probability. So now let's do some probability and calculate what we expect k or C to be given the following input: 1. We generate an array of length n. 2. Each element is generated independently with a probability p to be a non-zero, else zero. What is the expected value of C and k? That is E[C] and E[k]? If E[k] <= E[C], probability would suggest Solution 3 would be better -
dosentmatter created this gist
Nov 27, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,267 @@ I don't think Solution 2 is always better than Solution 3 First of all, it doesn't matter unless n is large, since both are O(n). For both Solutions 2 and 3, you can skip operations when there are non-zeros at the beginning of the array: For Solution 2, you can skip moving when `lastNonZeroFoundAt == i` For Solution 3, you can skip swapping when `lastNonZeroFoundAt == cur` For Solution 2, even if you do a memset, that is an O(n) because it has to write each element to 0. For Solution 3, we can say that swap is a 3 cost operation including the temporary variable. --- Let's calculate the cost for the general case: Assumptions: 1. Each loop iteration costs 1 2. Reading is 0 cost 3. Comparisons are 0 cost 4. Incrementing `lastNonZeroFoundAt` is 0 cost since both solutions increment it anyway Let n be the number of elements in the array Let C be the number of zeros. Let k be the number of non-zeros in the incorrect location. To explain k: [1, 2, 3] - k = 0 because no zeros [0, 1, 2, 3] - k = 3 because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively [0, 0, 1, 2, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively [0, 1, 2, 0, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively Notice that another way to think about k is "the number of non-zeros with at least one zero before it". This is because if there is at least one zero before a non-zero, it is out of place. More than one zero before a non-zero does not change anything. That will only put more zeros at the end. This knowledge will come in handy later. --- Cost of Solution 2 for the general case: For the first loop, we have: Cost(first loop) = Cost(first loop iterations) + Cost(first loop moves) since k of the elements need to be moved: Cost(first loop) = Cost(n iterations) + Cost(k moves) Cost(first loop) = n + k For the second loop (memset), we have: Cost(second loop) = Cost(second loop iterations) + Cost(second loop writes) Cost(second loop) = Cost(C iterations) + Cost(C writes) Cost(second loop) = C + C Cost(second loop) = 2*C So in total, Cost(Solution 2) = Cost(first loop) + Cost(second loop) Cost(Solution 2) = n + k + 2*C Cost of Solution 3 for the general case: Cost(Solution 3) = Cost(loop iterations) + Cost(loop swaps) Cost(Solution 3) = Cost(n iterations) + Cost(k swaps) Since a swap costs 3, Cost(Solution 3) = n + 3*k --- So let's see when Solution 3 is better than or equal to Solution 2: Cost(Solution 2) >= Cost(Solution 3) n + k + 2*C >= n + 3*k 2*C >= 2*k C >= k k <= C So this means that Cost(Solution 3) <= Cost(Solution 2) when k <= C. k <= C means the number of non-zeros in the incorrect location is <= the number of zeros. It would take O(n) to determine k or C. For C, you O(n) loop to count number of zeros. For k, you O(n) loop to find the first zero and then count the number of non-zeros found afterwards. Even though you know the optimal Solution to use after counting, it might not be worth it to do an O(n) count, so let's do some probability. --- So now let's do some probability and calculate what we expect k or C to be given the following input: 1. We generate an array of length n. 2. Each element is generated independently with a probability P to be a non-zero, else zero. What is the expected value of C and k? That is E[C] and E[k]? If E[k] <= E[C], probability would suggest Solution 3 would be better --- E[C]: Let C_i be a random variable on whether the element at index i is a zero. Because of Linearity of Expectation, E[C] = E[C_0] + E[C_1] + ... E[C_n-1] There is a (1 - p) chance to generate a zero. Since each element can contribute at most 1 zero with probability (1 - p). E[C_i] = (1 - p) * 1 E[C] = (1 - p) * 1 + (1 - p) * 1 + ... } n times E[C] = (1 - p) * n --- E[k]: This one is a little trickier Again, we can use Linearity of Expecation, Let k_i be a random variable on whether the element at index i is a non-zero in the incorrect location. Mentioned above, I said another way to think about this is, "a non-zero with at least one zero before it" E[k] = E[k_0] + E[k_1] + ... E[k_n-1] There is a p chance to generate a non-zero Each element can contribute at most 1 incorrect non-zero, but what is the probability of an incorrect non-zero? The probability of an incorrect non-zero, is the probability of generating a non-zero with at least one zero before it. Let P mean "the probability of" P(having at least one zero) = (1 - P(having all non-zeros)) P(incorrect non-zero at index i) = P(non-zero at index i with at least one zero before it) Since the following two are independent events, they can be factored: = P(non-zero at index i) * P(having at least one zero before index i) Since i is indexed starting at 0, we don't need to do (i - 1): = p * (1 - p**i) Since each element can contribute at most 1 incorrect non-zero, E[k_i] = (p * (1 - p**i)) * 1 E[k_i] = p * (1 - p**i) E[k_i] = p - p**(i + 1) E[k] = E[k_0] + E[k_1] + ... E[k_n-1] E[k] = (p - p**(0 + 1)) + (p - p**(1+1)) + ... (p - p**(n - 1)) E[k] = (p - p**1) + (p - p**2) + ... (p - p**(n - 1)) --- I took some time to analyze this with graphs and stuff, so I will just prove some assumptions: p is a probability so it is in the range [0, 1] Proposition: "If 0 <= p <= 0.5, then E[k] <= E[C]" Lets prove by induction on n: Proof: Let P(n), where n is the number of elements in the array, be the statement that E[k] <= E[C] Base Case: Let's special case when n=0 since the forumla for E[k] above doesn't work for n=0 When n=0, E[C] = (1 - p) * 0 = 0, or you think of it as 0 because there are no zeros in an empty array E[k] = 0, because there are no incorrect non-zeros in an empty array So for n=0, E[k] <= E[C] Let's start induction from n=1, When n=1, E[C] = (1 - p) * 1 = 1 - p E[k] = (p - p**0) = 0 So for n=1, E[k] <= E[C] Inductive step: Assume that P(x) is true for some arbitrary x >= 1 We have to prove P(x + 1) is true. P(x + 1) means E[k_x+1 elements] < E[C_x+1 elements] (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) + (p - p**x) < (1 - p) * (x + 1) (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) + (p - p**x) < (1 - p) * x + (1 - p) The first chunk of each side is just P(x), I will put them in square brackets: [ (p - p**1) + (p - p**2) + ... + (p - p**(x - 1)) ] + (p - p**x) < [ (1 - p) * x ] + (1 - p) Since we are assuming P(x) is true, the terms in the brackets can be removed from both sides without affecting the inequality, since the term on the left side of the inequality is less than the term on the right side according to P(x) This leaves us with having to prove the following in order to prove P(x + 1). Note that this is just the expected value of the next term added when adding one more element to the array. (p - p**x) < (1 - p) p - p**x < 1 - p 2*p < 1 + p**x Since 0 <= p <= 0.5, the greatest 2*p can be is 1, and 1 + p**x is always greater than 1. Thus, P(n) is true for n being any integer >= 0. In other words: "If 0 <= p <= 0.5 an n is an integer >= 0, then E[k] <= E[C]" This means that for 0 <= p <= 0.5, Solution 3 is always better. --- How about 0.5 < p <= 1? Assumptions: 1. n is very large. This is an okay assumption since we only care about optimizing Solution 2/3 when n is large. Proposition: "If 0.5 < p <= 1, then E[k] >= E[C]" We will special case p=1, When p=1, the array is filled with non-zeros. No work is done and E[k] = E[C], but E[k] >= E[C] still holds true. For this proposition, I will not do a formal proof. I will only analyze E[k] and E[C] and show that E[k] grows faster than E[C] for large n. Let's start with E[C] E[C] = (1 - p) * n Since 0.5 < p < 1, we know that 0 < (1 - p) < 0.5 As n grows large, we keep adding more (1 - p) < 0.5 terms. How about E[k]? E[k] = (p - p**1) + (p - p**2) + ... (p - p**(n - 1)) For 0.5 < p < 1 For earlier terms such as (p - p**1) or (p - p**2), the terms will be pretty small, because the exponent is small, so the subtraction is large. However for later terms, and when n is large, (p - p**i) can be approximated as just (p), since p**i ~= 0, where i is large. When n is large, there will be many terms that will be approximated as just p. As n grows large, we keep adding more approximate 0.5 < p < 1 terms. Compare the statements above about E[C] and E[k]. For large n, E[C] keeps adding 0 < (1 - p) < 0.5 terms while E[k] keeps adding 0.5 < p < 1 terms The added E[k] terms are larger than the E[C] terms. This means the proposition holds true for large n, since E[k] grows faster than E[C] as you increase n. Note that for low n, it's possible for the opposite to be true. E[C] might be greater. You can graph it out on a calculator to see visually. But for low n, optimization is unnecessary. This means for 0.5 < p <= 1, Solution 2 is always better. --- So what have we learned? Use Solution 3 for 0 <= p <= 0.5 Use Solution 2 for 0.5 < p <= 1 (for large n) --- How do I apply this to the general case? In the general case, there are a lot of unknowns. If there are some knowns, you can use the above findings to determine the optimal algorithm. For the general case, you are given a large n-size array. You don't know the p value used to generate the array. You don't know the value of C or k. It's not worth determining C or k because that would take O(n). You can determine the approximation of p by sampling a few values and averaging by how many non-zeros you see. How many values should you sample? That is a question for another day.