Skip to content

Instantly share code, notes, and snippets.

@clinejj
Last active January 4, 2016 04:19
Show Gist options
  • Select an option

  • Save clinejj/8568197 to your computer and use it in GitHub Desktop.

Select an option

Save clinejj/8568197 to your computer and use it in GitHub Desktop.
An Algorithm to Blend Lists of Lists
### Description
These functions will take a list of lists of sorted objects and
blends them together based on input target percentages and a target length,
where each element from a given list appears in the final list sorted relatively
to other elements from that list (and backfill from other lists if necessary).
For example, let's say you have sorted lists A, B, and C each of size 10. You
want to blend them into a single list of size 20 with 50% from A, 30% from B,
and 10% from C. In the final list, you would expect 10 elements from A, 6 from
B, and 4 from C.
This algorithm assumes that a) we want to get as close to the target length as
possible, and b) to do that we may need to "backfill" from other lists if a
given list can't meet its target percentage. This may mean that the input blend
percentages will not match up to the output if a given list doesn't contain
enough elements to meet its quota.
### Algorithm Overview
The basic blend function requires 3 inputs:
- A list of sorted lists (each list can be any length)
- A list of "blend percentages", where the nth element in the list is the
goal percentage of the blended list to contain elements from the nth list
in the list of lists
- The target length of the blended lists
From this function, the first step is to calculate the total size of our input
lists. If all the elements across our list of lists is less than the target
length, we can simply blend the lists according to our preferred blending method
without having to do any extra work.*
* This ends up ignoring the blend percentages since our algorithm calls to get
* as close to the target length as possible
When calculating the input sizes, we create a basic matrix with each list, the
size of that list, the target number of elements from that list to include in
the blended list, the difference between the size and the target, and then a
"use" space that tells us later how many elements we need to use from that list.
The matrix looks like:
ID: 0 1 2 3 4
SIZE: 5 5 5 5 5
GOAL: 9 7 2 2 1
DIFF: -4 -2 3 3 4
USE: -4 -2 3 3 4
After creating the matrix, we sort it based on the "use" field, from smallest to
largest. If this value is negative, it means that a given list is not long
enough to meet its target percentage and those slots need to be reallocated to
the other lists. We reallocate slots to the remaining lists based on their
relative blend percentages, keeping the ratio of reallocated slots in parity
with the ratio of the goals.
To do so, we iterate through the columns in the matrix, until we reach the first
column that has a positive (greater than 0) "use" value. We assign this column a
number of slots in proportion to its blend ratio with the remaining columns that
have a positive use value. This is reflected by updating it's "use" value
(decreasing it by the number of slots it was assigned) and increasing the first
column's "use" value by the same amount. We continue iterating through the rest
of the columns and reallocate as we go.
After we iterate through all the columns once, the first column will be
reallocated. We re-sort the matrix again in ascending order of "use" value, and
proceed with the reallocation steps again. This sort and reallocate process is
repeated until all columns in our matrix have 0 or positive use values and the
size of our elements in use is equal to the original target blended list length.
The size of our elements in use can be calculated by substracting the absolute
value of the "use" field from the size of each column, and summing these values
for each column.
Once our size matrix has been completed, we can move to the blending step. Our
size matrix now tells us how many elements to use from each list (the same value
used to sum the total - the size of the list minus the absolute value of the
"use" field). We can use various blending methods, from a round robin (popping
the 0th element from each list), to a random assignment (pick a random list, add
the 0th element, repeat until the 0th element has been added for each list, then
move to the 1st element, etc.), where we add each element from our lists in
order into our blended list until the size limit for the list has been reached
and the target blended list length is reached. The blending algorithm may vary
by need, so I will not go into more detail on this portion.
After adding the elements from our lists of lists as dicated by our size matrix,
we will end up with a singular blended list no longer than our target length
(possibly less if the input lists were not big enough).
See the example file for a walkthrough. I have created a Java implementation of
this algorithm, using a round-robin blending method, available for use at:
https://github.com/clinejj/ListBlender
Let's work through an example. Here are our inputs:
LISTS
0 1 2 3 4
0-0 1-12 2-24 3-36 4-48
0-1 1-13 2-25 3-37 4-49
0-2 1-14 2-26 3-38 4-50
0-3 1-15 2-27 3-39 4-51
0-4 1-16 2-28 3-40 4-52
PERCENTAGES
44
33
11
8
4
LENGTH
20
Our first step is to create our matrix:
ID: 0 1 2 3 4
SIZE: 5 5 5 5 5
GOAL: 9 7 2 2 1
DIFF: -4 -2 3 3 4
USE: -4 -2 3 3 4
In this case, our inputs are already sorted so we can skip that step and move
directly into the reallocation process. We need to reallocate 4 slots from
column 0. The first column with a postive "use" value is column 2. It's ratio
is 2/5 (the sum of the goals of the columns with a positive "use" value is 5),
so with integer math 2/5 * 4 = 2. We reallocate two slots from column 0 to
column 2, like so:
ID: 0 1 2 3 4
SIZE: 5 5 5 5 5
GOAL: 9 7 2 2 1
DIFF: -4 -2 3 3 4
USE: -2 -2 1 3 4
The next column with a positive value is column 3, which gets the same
allocation of 2 slots since it has the same ratio.
ID: 0 1 2 3 4
SIZE: 5 5 5 5 5
GOAL: 9 7 2 2 1
DIFF: -4 -2 3 3 4
USE: 0 -2 1 1 4
There are no more reallocations needed. We re-calculate our current size,
which shows we are so we re-sort the matrix in ascending order based on the
"use" value:
ID: 1 0 2 3 4
SIZE: 5 5 5 5 5
GOAL: 7 9 2 2 1
DIFF: -2 -4 3 3 4
USE: -2 0 1 1 4
And start the reallocation process again. We need to reallocate 2 slots from
column 1. The first column with a positive value is column 2. We reallocate 1
slot to column 2 (2/5 * 2), like so:
ID: 1 0 2 3 4
SIZE: 5 5 5 5 5
GOAL: 7 9 2 2 1
DIFF: -2 -4 3 3 4
USE: -1 0 0 1 4
And again, column 3 gets 1 slot as well:
ID: 1 0 2 3 4
SIZE: 5 5 5 5 5
GOAL: 7 9 2 2 1
DIFF: -2 -4 3 3 4
USE: 0 0 0 0 4
There are no more reallocations, so we conclude with that step. You'll note
that at this point there are 21 items to add to the list - this extra item
is taken care of in the blending step, as you will see. We'll resort the
matrix, this time based on ID:
ID: 0 1 2 3 4
SIZE: 5 5 5 5 5
GOAL: 9 7 2 2 1
DIFF: -4 -2 3 3 4
USE: 0 0 0 0 4
Now that we have our sizes and we know how much we need to take from each
list, we begin the blending process. This example will use a round-robin
algorithm, where on each pass it will take the first element from each list
(and then remove that element), until either the list size is met (in which
case it will skip the list) or the target blended list size is reached. For
reference here are our lists again:
LISTS BLENDED LIST:
0 1 2 3 4
0-0 1-12 2-24 3-36 4-48
0-1 1-13 2-25 3-37 4-49
0-2 1-14 2-26 3-38 4-50
0-3 1-15 2-27 3-39 4-51
0-4 1-16 2-28 3-40 4-52
In this example, we're taking the entirety of lists 0, 1, 2, and 3 to blend,
and 1 element from list 4. After the first pass, our lists look like this:
LISTS BLENDED LIST:
0 1 2 3 4 0-0
0-1 1-13 2-25 3-37 4-49 1-12
0-2 1-14 2-26 3-38 4-50 2-24
0-3 1-15 2-27 3-39 4-51 3-36
0-4 1-16 2-28 3-40 4-52 4-48
After the second pass:
LISTS BLENDED LIST:
0 1 2 3 4 0-0
0-2 1-14 2-26 3-38 4-49 1-12
0-3 1-15 2-27 3-39 4-50 2-24
0-4 1-16 2-28 3-40 4-51 3-36
4-52 4-48
0-1
1-13
2-25
3-37
After the next pass:
LISTS BLENDED LIST:
0 1 2 3 4 0-0
0-3 1-15 2-27 3-39 4-49 1-12
0-4 1-16 2-28 3-40 4-50 2-24
4-51 3-36
4-52 4-48
0-1
1-13
2-25
3-37
0-2
1-14
2-26
3-38
This continues until our final list, which looks like this:
BLENDED LIST:
0-0
1-12
2-24
3-36
4-48
0-1
1-13
2-25
3-37
0-2
1-14
2-26
3-38
0-3
1-15
2-27
3-39
0-4
1-16
2-28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment