clinejj · January 4, 2016 04:19
diff --git a/Blending Algorithm b/Blending Algorithm
 ### Description 
 These functions will take a list of lists of sorted objects and
 blends them together based on input target percentages and a target length,
 where each element from a given list appears in the final list sorted relatively
 to other elements from that list (and backfill from other lists if necessary).

 For example, let's say you have sorted lists A, B, and C each of size 10. You
 want to blend them into a single list of size 20 with 50% from A, 30% from B,
 and 10% from C. In the final list, you would expect 10 elements from A, 6 from
 B, and 4 from C.

 This algorithm assumes that a) we want to get as close to the target length as
 possible, and b) to do that we may need to "backfill" from other lists if a
 given list can't meet its target percentage. This may mean that the input blend
 percentages will not match up to the output if a given list doesn't contain
 enough elements to meet its quota.

 ### Algorithm Overview 
 The basic blend function requires 3 inputs:
 - A list of sorted lists (each list can be any length)
 - A list of "blend percentages", where the nth element in the list is the 
  goal percentage of the blended list to contain elements from the nth list 
  in the list of lists
 - The target length of the blended lists
  
 From this function, the first step is to calculate the total size of our input
 lists. If all the elements across our list of lists is less than the target
 length, we can simply blend the lists according to our preferred blending method
 without having to do any extra work.*

 * This ends up ignoring the blend percentages since our algorithm calls to get
 * as close to the target length as possible

 When calculating the input sizes, we create a basic matrix with each list, the
 size of that list, the target number of elements from that list to include in
 the blended list, the difference between the size and the target, and then a
 "use" space that tells us later how many elements we need to use from that list.
 The matrix looks like:

 ID:	0	1	2	3	4 
 SIZE: 	5	5	5	5	5 
 GOAL: 	9	7	2	2	1 
 DIFF: 	-4	-2	3	3	4 
 USE:  	-4	-2	3	3	4

 After creating the matrix, we sort it based on the "use" field, from smallest to
 largest. If this value is negative, it means that a given list is not long
 enough to meet its target percentage and those slots need to be reallocated to
 the other lists. We reallocate slots to the remaining lists based on their
 relative blend percentages, keeping the ratio of reallocated slots in parity
 with the ratio of the goals.

 To do so, we iterate through the columns in the matrix, until we reach the first
 column that has a positive (greater than 0) "use" value. We assign this column a
 number of slots in proportion to its blend ratio with the remaining columns that
 have a positive use value. This is reflected by updating it's "use" value
 (decreasing it by the number of slots it was assigned) and increasing the first
 column's "use" value by the same amount. We continue iterating through the rest
 of the columns and reallocate as we go.

 After we iterate through all the columns once, the first column will be
 reallocated. We re-sort the matrix again in ascending order of "use" value, and
 proceed with the reallocation steps again. This sort and reallocate process is
 repeated until all columns in our matrix have 0 or positive use values and the
 size of our elements in use is equal to the original target blended list length.
 The size of our elements in use can be calculated by substracting the absolute
 value of the "use" field from the size of each column, and summing these values
 for each column.

 Once our size matrix has been completed, we can move to the blending step. Our
 size matrix now tells us how many elements to use from each list (the same value
 used to sum the total - the size of the list minus the absolute value of the
 "use" field). We can use various blending methods, from a round robin (popping
 the 0th element from each list), to a random assignment (pick a random list, add
 the 0th element, repeat until the 0th element has been added for each list, then
 move to the 1st element, etc.), where we add each element from our lists in
 order into our blended list until the size limit for the list has been reached
 and the target blended list length is reached. The blending algorithm may vary
 by need, so I will not go into more detail on this portion.

 After adding the elements from our lists of lists as dicated by our size matrix,
 we will end up with a singular blended list no longer than our target length
 (possibly less if the input lists were not big enough).

 See the example file for a walkthrough. I have created a Java implementation of
 this algorithm, using a round-robin blending method, available for use at:
 https://github.com/clinejj/ListBlender
diff --git a/Example b/Example
 Let's work through an example. Here are our inputs:

 LISTS
 0     1     2     3     4
 0-0   1-12  2-24  3-36  4-48
 0-1   1-13  2-25  3-37  4-49
 0-2   1-14  2-26  3-38  4-50
 0-3   1-15  2-27  3-39  4-51
 0-4   1-16  2-28  3-40  4-52

 PERCENTAGES
 44
 33
 11
 8
 4

 LENGTH
 20

 Our first step is to create our matrix:
 ID:	0	1	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	9	7	2	2	1	
 DIFF:	-4	-2	3	3	4	
 USE:	-4	-2	3	3	4

 In this case, our inputs are already sorted so we can skip that step and move
 directly into the reallocation process. We need to reallocate 4 slots from 
 column 0. The first column with a postive "use" value is column 2. It's ratio 
 is 2/5 (the sum of the goals of the columns with a positive "use" value is 5), 
 so with integer math 2/5 * 4 = 2. We reallocate two slots from column 0 to 
 column 2, like so:
 ID:	0	1	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	9	7	2	2	1	
 DIFF:	-4	-2	3	3	4	
 USE:	-2	-2	1	3	4

 The next column with a positive value is column 3, which gets the same 
 allocation of 2 slots since it has the same ratio.
 ID:	0	1	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	9	7	2	2	1	
 DIFF:	-4	-2	3	3	4	
 USE:	0	-2	1	1	4

 There are no more reallocations needed. We re-calculate our current size, 
 which shows we are  so we re-sort the matrix in ascending order based on the 
 "use" value:
 ID:	1	0	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	7	9	2	2	1	
 DIFF:	-2	-4	3	3	4	
 USE:	-2	0	1	1	4

 And start the reallocation process again. We need to reallocate 2 slots from 
 column 1. The first column with a positive value is column 2. We reallocate 1 
 slot to column 2 (2/5 * 2), like so:
 ID:	1	0	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	7	9	2	2	1	
 DIFF:	-2	-4	3	3	4	
 USE:	-1	0	0	1	4

 And again, column 3 gets 1 slot as well:
 ID:	1	0	2	3	4	
 SIZE:	5	5	5	5	5	
 GOAL:	7	9	2	2	1	
 DIFF:	-2	-4	3	3	4	
 USE:	0	0	0	0	4

 There are no more reallocations, so we conclude with that step. You'll note 
 that at this point there are 21 items to add to the list - this extra item 
 is taken care of in the blending step, as you will see. We'll resort the 
 matrix, this time based on ID:
 ID:	0	1	2	3	4	
 SIZE:	5	5 	5	5	5	
 GOAL:	9	7	2	2	1	
 DIFF:	-4	-2	3	3	4	
 USE:	0	0	0	0	4

 Now that we have our sizes and we know how much we need to take from each 
 list, we begin the blending process. This example will use a round-robin 
 algorithm, where on each pass it will take the first element from each list 
 (and then remove that element), until either the list size is met (in which 
 case it will skip the list) or the target blended list size is reached. For 
 reference here are our lists again:
 LISTS                               BLENDED LIST:
 0     1     2     3     4
 0-0   1-12  2-24  3-36  4-48
 0-1   1-13  2-25  3-37  4-49
 0-2   1-14  2-26  3-38  4-50
 0-3   1-15  2-27  3-39  4-51
 0-4   1-16  2-28  3-40  4-52

 In this example, we're taking the entirety of lists 0, 1, 2, and 3 to blend, 
 and 1 element from list 4. After the first pass, our lists look like this:
 LISTS                               BLENDED LIST:
 0     1     2     3     4           0-0
 0-1   1-13  2-25  3-37  4-49        1-12
 0-2   1-14  2-26  3-38  4-50        2-24
 0-3   1-15  2-27  3-39  4-51        3-36
 0-4   1-16  2-28  3-40  4-52        4-48

 After the second pass:
 LISTS                               BLENDED LIST:
 0     1     2     3     4           0-0
 0-2   1-14  2-26  3-38  4-49        1-12
 0-3   1-15  2-27  3-39  4-50        2-24
 0-4   1-16  2-28  3-40  4-51        3-36
                        4-52        4-48
                                    0-1
                                    1-13
                                    2-25
                                    3-37

 After the next pass:
 LISTS                               BLENDED LIST:
 0     1     2     3     4           0-0
 0-3   1-15  2-27  3-39  4-49        1-12
 0-4   1-16  2-28  3-40  4-50        2-24
                        4-51        3-36
                        4-52        4-48
                                    0-1
                                    1-13
                                    2-25
                                    3-37
                                    0-2
                                    1-14
                                    2-26
                                    3-38
                                    
 This continues until our final list, which looks like this:
 BLENDED LIST:
 0-0
 1-12
 2-24
 3-36
 4-48
 0-1
 1-13
 2-25
 3-37
 0-2
 1-14
 2-26
 3-38
 0-3
 1-15
 2-27
 3-39
 0-4
 1-16
 2-28
	### Description
	These functions will take a list of lists of sorted objects and
	blends them together based on input target percentages and a target length,
	where each element from a given list appears in the final list sorted relatively
	to other elements from that list (and backfill from other lists if necessary).

	For example, let's say you have sorted lists A, B, and C each of size 10. You
	want to blend them into a single list of size 20 with 50% from A, 30% from B,
	and 10% from C. In the final list, you would expect 10 elements from A, 6 from
	B, and 4 from C.

	This algorithm assumes that a) we want to get as close to the target length as
	possible, and b) to do that we may need to "backfill" from other lists if a
	given list can't meet its target percentage. This may mean that the input blend
	percentages will not match up to the output if a given list doesn't contain
	enough elements to meet its quota.

	### Algorithm Overview
	The basic blend function requires 3 inputs:
	- A list of sorted lists (each list can be any length)
	- A list of "blend percentages", where the nth element in the list is the
	goal percentage of the blended list to contain elements from the nth list
	in the list of lists
	- The target length of the blended lists

	From this function, the first step is to calculate the total size of our input
	lists. If all the elements across our list of lists is less than the target
	length, we can simply blend the lists according to our preferred blending method
	without having to do any extra work.*

	* This ends up ignoring the blend percentages since our algorithm calls to get
	* as close to the target length as possible

	When calculating the input sizes, we create a basic matrix with each list, the
	size of that list, the target number of elements from that list to include in
	the blended list, the difference between the size and the target, and then a
	"use" space that tells us later how many elements we need to use from that list.
	The matrix looks like:

	ID: 0 1 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 9 7 2 2 1
	DIFF: -4 -2 3 3 4
	USE: -4 -2 3 3 4

	After creating the matrix, we sort it based on the "use" field, from smallest to
	largest. If this value is negative, it means that a given list is not long
	enough to meet its target percentage and those slots need to be reallocated to
	the other lists. We reallocate slots to the remaining lists based on their
	relative blend percentages, keeping the ratio of reallocated slots in parity
	with the ratio of the goals.

	To do so, we iterate through the columns in the matrix, until we reach the first
	column that has a positive (greater than 0) "use" value. We assign this column a
	number of slots in proportion to its blend ratio with the remaining columns that
	have a positive use value. This is reflected by updating it's "use" value
	(decreasing it by the number of slots it was assigned) and increasing the first
	column's "use" value by the same amount. We continue iterating through the rest
	of the columns and reallocate as we go.

	After we iterate through all the columns once, the first column will be
	reallocated. We re-sort the matrix again in ascending order of "use" value, and
	proceed with the reallocation steps again. This sort and reallocate process is
	repeated until all columns in our matrix have 0 or positive use values and the
	size of our elements in use is equal to the original target blended list length.
	The size of our elements in use can be calculated by substracting the absolute
	value of the "use" field from the size of each column, and summing these values
	for each column.

	Once our size matrix has been completed, we can move to the blending step. Our
	size matrix now tells us how many elements to use from each list (the same value
	used to sum the total - the size of the list minus the absolute value of the
	"use" field). We can use various blending methods, from a round robin (popping
	the 0th element from each list), to a random assignment (pick a random list, add
	the 0th element, repeat until the 0th element has been added for each list, then
	move to the 1st element, etc.), where we add each element from our lists in
	order into our blended list until the size limit for the list has been reached
	and the target blended list length is reached. The blending algorithm may vary
	by need, so I will not go into more detail on this portion.

	After adding the elements from our lists of lists as dicated by our size matrix,
	we will end up with a singular blended list no longer than our target length
	(possibly less if the input lists were not big enough).

	See the example file for a walkthrough. I have created a Java implementation of
	this algorithm, using a round-robin blending method, available for use at:
	https://github.com/clinejj/ListBlender
	Let's work through an example. Here are our inputs:

	LISTS
	0 1 2 3 4
	0-0 1-12 2-24 3-36 4-48
	0-1 1-13 2-25 3-37 4-49
	0-2 1-14 2-26 3-38 4-50
	0-3 1-15 2-27 3-39 4-51
	0-4 1-16 2-28 3-40 4-52

	PERCENTAGES
	44
	33
	11
	8
	4

	LENGTH
	20

	Our first step is to create our matrix:
	ID: 0 1 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 9 7 2 2 1
	DIFF: -4 -2 3 3 4
	USE: -4 -2 3 3 4

	In this case, our inputs are already sorted so we can skip that step and move
	directly into the reallocation process. We need to reallocate 4 slots from
	column 0. The first column with a postive "use" value is column 2. It's ratio
	is 2/5 (the sum of the goals of the columns with a positive "use" value is 5),
	so with integer math 2/5 * 4 = 2. We reallocate two slots from column 0 to
	column 2, like so:
	ID: 0 1 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 9 7 2 2 1
	DIFF: -4 -2 3 3 4
	USE: -2 -2 1 3 4

	The next column with a positive value is column 3, which gets the same
	allocation of 2 slots since it has the same ratio.
	ID: 0 1 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 9 7 2 2 1
	DIFF: -4 -2 3 3 4
	USE: 0 -2 1 1 4

	There are no more reallocations needed. We re-calculate our current size,
	which shows we are so we re-sort the matrix in ascending order based on the
	"use" value:
	ID: 1 0 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 7 9 2 2 1
	DIFF: -2 -4 3 3 4
	USE: -2 0 1 1 4

	And start the reallocation process again. We need to reallocate 2 slots from
	column 1. The first column with a positive value is column 2. We reallocate 1
	slot to column 2 (2/5 * 2), like so:
	ID: 1 0 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 7 9 2 2 1
	DIFF: -2 -4 3 3 4
	USE: -1 0 0 1 4

	And again, column 3 gets 1 slot as well:
	ID: 1 0 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 7 9 2 2 1
	DIFF: -2 -4 3 3 4
	USE: 0 0 0 0 4

	There are no more reallocations, so we conclude with that step. You'll note
	that at this point there are 21 items to add to the list - this extra item
	is taken care of in the blending step, as you will see. We'll resort the
	matrix, this time based on ID:
	ID: 0 1 2 3 4
	SIZE: 5 5 5 5 5
	GOAL: 9 7 2 2 1
	DIFF: -4 -2 3 3 4
	USE: 0 0 0 0 4

	Now that we have our sizes and we know how much we need to take from each
	list, we begin the blending process. This example will use a round-robin
	algorithm, where on each pass it will take the first element from each list
	(and then remove that element), until either the list size is met (in which
	case it will skip the list) or the target blended list size is reached. For
	reference here are our lists again:
	LISTS BLENDED LIST:
	0 1 2 3 4
	0-0 1-12 2-24 3-36 4-48
	0-1 1-13 2-25 3-37 4-49
	0-2 1-14 2-26 3-38 4-50
	0-3 1-15 2-27 3-39 4-51
	0-4 1-16 2-28 3-40 4-52

	In this example, we're taking the entirety of lists 0, 1, 2, and 3 to blend,
	and 1 element from list 4. After the first pass, our lists look like this:
	LISTS BLENDED LIST:
	0 1 2 3 4 0-0
	0-1 1-13 2-25 3-37 4-49 1-12
	0-2 1-14 2-26 3-38 4-50 2-24
	0-3 1-15 2-27 3-39 4-51 3-36
	0-4 1-16 2-28 3-40 4-52 4-48

	After the second pass:
	LISTS BLENDED LIST:
	0 1 2 3 4 0-0
	0-2 1-14 2-26 3-38 4-49 1-12
	0-3 1-15 2-27 3-39 4-50 2-24
	0-4 1-16 2-28 3-40 4-51 3-36
	4-52 4-48
	0-1
	1-13
	2-25
	3-37

	After the next pass:
	LISTS BLENDED LIST:
	0 1 2 3 4 0-0
	0-3 1-15 2-27 3-39 4-49 1-12
	0-4 1-16 2-28 3-40 4-50 2-24
	4-51 3-36
	4-52 4-48
	0-1
	1-13
	2-25
	3-37
	0-2
	1-14
	2-26
	3-38

	This continues until our final list, which looks like this:
	BLENDED LIST:
	0-0
	1-12
	2-24
	3-36
	4-48
	0-1
	1-13
	2-25
	3-37
	0-2
	1-14
	2-26
	3-38
	0-3
	1-15
	2-27
	3-39
	0-4
	1-16
	2-28