Skip to content

Instantly share code, notes, and snippets.

@johnelm
Last active October 4, 2023 19:18
Show Gist options
  • Select an option

  • Save johnelm/216d5eebf38c0f4ff1b54d27a31fcf2d to your computer and use it in GitHub Desktop.

Select an option

Save johnelm/216d5eebf38c0f4ff1b54d27a31fcf2d to your computer and use it in GitHub Desktop.
IdenticalPairs Coding Challenge

Identical Pairs ( algorithm challenge )

This gist is my solution for an online algorithm challenge I did recently.

The challenge:

Given an input array of N elements, find the number of identical pairs (R).

A pair (P, Q) is identical (P = Q) if 0 <= P < Q < N <= 100,000.  
    ( The first of a pair's indices must be lower than the second's, and not vice versa. )

-1,000,000,000 <= (each element) <= 1,000,000,000
0 <= R <= 1,000,000,000 ( if the number of pairs is > 1,000,000,000 return 1,000,000,000 )

Allowable time complexity was O(nlog(n), and space was O(n).

When I submitted my solution during the challenge, I was a little disappointed that I didn't score 100% on this one, and I'm pretty sure the reason was correctness - I'd forgotten to account for the ceiling of 1B for the return value.

Also, when reviewing the challenge afterward, I realized that the number of pairs for each distinct value in the array increases as the Triangular formula ( x * ( x + 1 ) / 2 ), with which I was somewhat familiar from some earlier practice. However, I was getting the counts via a count-distinct summary, so a value didn't show up in the counts unless there was at least one of them. This meant I had to subtract 1 from the counts to use the triangular formula.

My final solution

This gist includes my solution (IdenticalPairs.java), and my Spock tests (IdenticalPairsSpecification.groovy). The tests use my 'TweakSequence' Spock extension, which I wrote for use with algorithm challenges like this.

My solution features and highlights:

  • I've extracted all the formulae and algorithms I used into separate lambda functions constants
  • then used a switch statement to demonstrate various solution approaches, combining the lambda constants to show tradeoffs in readability, optimization, and use of language features.
  • and, handled the 1B ceiling!
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
class IdenticalPairs {
public enum Approach {
TRIANGULAR_FORMULA_STREAM,
TRIANGULAR_FORMULA_LOOP,
COUNTING_NESTED_LOOP,
ONE_LINE,
}
public static final int MAX_IDENTICAL_PAIRS = 1_000_000_000;
// result is a map of distinct values and the counts of their occurrences
// we actually only need the values, we can ignore the keys after we have the distinct counts
public static final Function< int[], Collection< Long > > COUNT_DISTINCT =
in -> IntStream.of( in )
.boxed().collect(
Collectors.groupingBy(
i -> i, Collectors.counting() ) ).values();
/*
*
*
*/
public static final IntUnaryOperator TRIANGULAR_FORMULA = x -> x * ( x + 1 ) / 2;
/*
* the triangular formula is based on zero-index sequences, i.e. 0=0, 1=0, 2=1, 3=3, 4=6, 5=10, 6=15...
* since we're getting the counts via a count-distinct, the occurrences of each distinct number will not show up
* in the counts unless there is at least 1. So we have to subtract 1 to align with the triangular formula.
* i.e. if there are zero of a number in the input, we won't have a count for it
*/
public static final IntUnaryOperator SUBTRACT_ONE = x -> x - 1;
/*
* combining the subtraction and the formula into one operation ( subtraction first ) that can be applied to a count
*/
public static final IntUnaryOperator IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT = TRIANGULAR_FORMULA.compose( SUBTRACT_ONE );
/*
* here's a non-formula version that uses a loop to calculate the pairs
*/
public static final IntUnaryOperator IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT_VIA_LOOP = distinctCount -> {
int pairs = 0;
int count = 2;
while ( count <= distinctCount ) {
pairs = pairs + count - 1;
count++;
}
return pairs;
};
public int solution( int[] A ) {
if ( A.length < 2 ) // can't have a pair from 0 or 1 elements;
return 0;
Approach approach = Approach.TRIANGULAR_FORMULA_LOOP;
switch ( approach ) {
case ONE_LINE: {
// explicit, not using any of the Function constants
return Math.min( MAX_IDENTICAL_PAIRS,
IntStream.of( A ).boxed()
.collect( Collectors.groupingBy( i -> i, Collectors.counting() ) ).values().stream()
.mapToInt( Long::intValue )
.map( x -> x - 1 )
.map( x -> x * ( x + 1 ) / 2 )
.sum()
);
}
case TRIANGULAR_FORMULA_STREAM: {
// still explicit but using Function constants for readability and clarity
return Math.min( MAX_IDENTICAL_PAIRS,
COUNT_DISTINCT.apply( A ).stream()
.mapToInt( Long::intValue )
.map( IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT )
.sum()
);
}
case TRIANGULAR_FORMULA_LOOP: {
// breaking things up a bit.. getting the distinctCounts first and looping over them
// can optimize a bit here, if the ceiling is reached before we've handled all the counts,
// we can exit early
int retval = 0;
Collection< Long > distinctCounts = COUNT_DISTINCT.apply( A );
for ( long distinctCount : distinctCounts ) {
if ( retval > MAX_IDENTICAL_PAIRS )
return MAX_IDENTICAL_PAIRS; // exit early if the ceiling is reached
int countForKey = Math.toIntExact( distinctCount );
retval += IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT.applyAsInt( countForKey );
}
return Math.min( MAX_IDENTICAL_PAIRS, retval );
}
case COUNTING_NESTED_LOOP: {
// breaking things up more.. accumulating the distinct counts with a loop instead of a stream
// lets us check whether we've reached the ceiling earlier.
// it's cheap to do so if we're using the triangular formula (i.e. not the loop version);
int retval = 0;
Map< Integer, Integer > distinctCounts = new HashMap<>();
for ( int i = 0; i < A.length; i++ ) {
int count = distinctCounts.getOrDefault( A[ i ], 0 );
distinctCounts.put( A[ i ], ++count );
if ( IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT.applyAsInt( retval + count ) > MAX_IDENTICAL_PAIRS )
return MAX_IDENTICAL_PAIRS; // exit early if the ceiling is reached
}
for ( int distinctCount : distinctCounts.values() ) {
if ( retval > MAX_IDENTICAL_PAIRS )
return MAX_IDENTICAL_PAIRS;
retval += IDENTICAL_PAIRS_FROM_OCCURRENCE_COUNT.applyAsInt( distinctCount );
}
return Math.min( MAX_IDENTICAL_PAIRS, retval );
}
default: {
return 0;
}
}
}
}
import spock.lang.*
class IdenticalPairsSpecification extends Specification {
def solution = new IdenticalPairs()
@Unroll
@TweakSequence( sequenceParameterName = "A" )
def "#iterationCount: result given input #A is #x"() {
expect:
solution.solution( A ) == x
where:
x | A
0 | [ ]
0 | [ 1 ]
0 | [ 3 ]
0 | [ 1, 2 ]
0 | [ 1, 2 ]
0 | [ 1, 3 ]
0 | [ 2, 3 ]
1 | [ 1, 1 ]
0 | [ 1, 2, 3 ]
1 | [ 1, 2, 1 ]
3 | [ 2, 2, 2 ]
4 | [ 2, 2, 2, 1, 1 ]
4 | [ 3, 5, 6, 3, 3, 5 ]
11 | [ 3, 5, 6, 3, 3, 5, 3, 3 ]
0 | [ 2, 3, 1, 5 ]
0 | [ 2, 3, 4, 1 ]
0 | [ -1_000_000_000, 3, 4, 1, 1_000_000_000 ]
}
@Unroll
@TweakSequence
def "#iterationCount: result is the expected #result"() {
expect:
solution.solution( sequence ) == result
where:
result | sequence | indexReplacements
10 | [ 1 ] * 5 | [ : ]
9 | [ 1, 2, 3 ] * 3 | [ : ]
18 | [ 1, 2, 3 ] * 4 | [ : ]
15 | [ 1, 2, 3, 4, 5 ] * 3 | [ : ]
50 | [ 1, 2, 3, 4, 5 ] * 5 | [ : ]
49995000 | [ 1 ] * 10_000 | [ : ]
1_000_000_000 | [ range: [ start: 0, end: 20, repeat: 10_000 ] ] | [ : ]
1_000_000_000 | [ range: [ start: -10, end: 10, repeat: 10_000 ] ] | [ : ]
1_000_000_000 | [ range: [ start: -10, end: 10, repeat: 9_999 ] ] | [ : ]
0 | [ range: [ start: 50_000, end: -49_999 ] ] | [ : ]
1 | [ range: [ start: 50_000, end: -49_999 ] ] | [ 0: 0 ]
5 | [ range: [ start: 50_000, end: -49_999 ] ] | [ 0: 0, 100: 1, 200: 2, 30_000: 3, 70_000: 4 ]
21 | [ range: [ start: 50_000, end: -49_999 ] ] | [ 0: 0, 100: 0, 200: 0, 30_000: 0, 70_000: 0, 90_000: 0 ]
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment