Counting pairs of numbers with a given difference out of a set

This is just a post detailing the solution to a small algorithmic challenge from HackerRank. The exercise is to find the pairs of numbers out of a set of numbers N that have difference K. The link to the actual HackerRank problem can be seen here. We will see some different implementations and strategies to approach this problem and how they could be optimized even further.

Counting pairs of numbers with a given difference out of a set of numbers

This is just a post detailing the solution to a small algorithmic challenge from HackerRank. The exercise is to find the pairs of numbers out of a set of numbers N that have difference K. The link to the actual HackerRank problem can be seen here. I am not sharing my solution because it’s unique or even good. There may be much better solutions out there and in fact if someone knows something better I encourage you to comment on it so that both me and the readers of this blog can learn something new!

All of the code mentioned in this tutorial can be cloned from Github

The actual text of the challenge is given below:

Given N numbers [N<=10^5], count the total pairs of numbers that have a difference of K. [K>0 and K<1e9]

Input Format:

1st line contains N & K (integers).
2nd line contains N numbers of the set. All the N numbers are assured to be distinct.

Output Format:

One integer saying the no of pairs of numbers that have a diff K.

Sample Input #00:

5 2
1 5 3 4 2

Sample Output #00:

3


Now with challenges like this, speed is what is important. We want our program to solve the problem for all possible input in the least amount of computational time. I will be using C to solve this problem. First of all we have to figure out a way to read in the input from the stdin as is described in the problem definition.

1
2
3
4
5
6
7
8
9
10
int N,K,*set,i=0,j;
char* buff,* token;
int result = 0;
 
 
//read 1st input line
scanf("%d %d \n",&N,&K);
//allocate space for the set numbers string and the actual set
buff = malloc(13*N);//13 here is for ULONG_MAX number of digits + space
set = malloc(sizeof(int)*N);

Above is the beginning of main() for the problem. N and K are the numbers given in the problem definition. set will be an array holding all of our numbers while buff will be a buffer holding the string of the second line from which to read the numbers into set. After reading both N and K with scanf() we allocate the buffer to hold the string. It may be a bit sloppy doing it like that but here we allocate 13 bytes for each number. This is an approximation judging from the typical maximum value of ULONG_MAX as shown here, plus the space separating each number. It should be safe since we are using ints and ULONG_MAX is for unsigned ints. We also allocate the array of numbers.

//read second line of input
gets(buff);
//find the individual numbers
token = strtok(buff," ");
while(token != NULL)
{
    sscanf(token,"%d",&set[i]);
    token = strtok(NULL," ");
    i++;
}

Further down in reading the input we can see from above that we read the second input line into the buffer. Then since we know from the problem definition that the numbers are space separated we can simply use libc's strtok() function to obtain each number of the set. This is what is accomplished by the while loop in there. It tokenizes the string located inside buff until all of the numbers are read into set.

Now since we have read all of the input we have to decide on the strategy with which to determine how many pairs of numbers have difference of K. The naive approach would be to simply go through the array NxN times and count the difference of all numbers increasing a counter for those which match K. Something like below

//naive approach
for(i=0;i<N;i++)
{
    for(j=0;j<N;j++)
    {
        if( abs(set[i] - set[j]) == K)
            result++;
    }
}

This approach is not very good and it will actually fail the test in the last 2 data sets in HackerRank. The complexity here is O(N2) and is not good. We can do better than that! For reference let me mention the processing times that HackerRank returned for this approach:

Status: Wrong Answer

# 0 0.06s : Success

# 1 0.08s : Success

# 2 0.14s : Success

# 3 0.23s : Success

# 4 0.36s : Success

# 5 0.54s : Success

# 6 0.74s : Success

# 7 0.98s : Success

# 8 1.26s : Success

# 9 1.58s : Success

# 10 1.93s : Success

# 11 2.32s : Success

# 12 2.74s : Success

# 13 3.22s : Time limit exceeded

# 14 3.22s : Time limit exceeded

So what could an alternate way to approach this problem be? Well let's see! We have an array of numbers and these numbers are also guaranteed to be positive. The relation between the numbers that we are trying to explore is their difference. What operation on the array could allow us to deal with the difference in a much better way? Yes you guessed it correctly! We can sort the array! By having a sorted array of numbers we can easily get the differences between numbers without wasting too much time.

Counting pairs of numbers with a given difference out of a set of numbers

//comparison function for qsort()
int compare(const void* a, const void* b)
{
    //a less than b
    if( *(int*)a < *(int*)b)
        return -1;
    //a greater than b
    return 1;
    //we don't take into account a == b since according to the problem definition it is impossible
}
//sort the array
qsort(set,N,sizeof(int),compare);

As we can see from the above code the C standard library includes an implementation of quicksort as long as you #include <stdlib.h>. Since C has no templates you need to implement a comparison function using the signature int compare(const void* a, const void* b) which will be returning -1 in case of a being less than b, 0 in case they are equal and 1 in case a is greater than b. But in our comparison implementation we omit one of those checks. The problem definition clearly states that the numbers in the array are unique and also that the difference K can't be 0. So there is no need for a and b being equal comparison.

So afterwards we can call qsort() on the array which will sort it in place. For more information on stdlib quicksort implementation take a look at the relevant documentation.

int diff;
for(i=0;i<N;i++)
{
    for(j=i+1;j<N;j++)
    {
        diff = set[j]-set[i];
        if(diff > K)
            break;
        if( diff == K)
        {
            result++;
            break;
        }
    }
}
printf("%d",result);

And here is the actual counting part of this approach. We check each number of the array against each other trying to find number pairs which satisfy the difference of K. Thanks to the sorting of the array we also know that if we go to differences greater than K we can safely proceed to the next number since all subsequent numbers would be greater and so would the difference. In addition, we don't need to be checking the previous numbers that were already checked and we know that since K>0, the second for() loop starts at i+1.

This approach shows a very big difference in the results as can be seen by the details from the HackerRank data sets.



# 0 0.03s : Success

# 1 0.03s : Success

# 2 0.03s : Success

# 3 0.04s : Success

# 4 0.03s : Success

# 5 0.04s : Success

# 6 0.04s : Success

# 7 0.05s : Success

# 8 0.05s : Success

# 9 0.05s : Success

# 10 0.05s : Success

# 11 0.06s : Success

# 12 0.08s : Success

# 13 0.08s : Success

# 14 0.08s : Success


There is a small way to squeeze out a bit more performance for this algorithm though. It does not involve a change of strategy but rather a change of implementation. C stdlib quicksort has one disadvantage. In order to provide for any possible type of input it uses void* and a function pointer for the comparison of the elements of the array that has to be defined by the user. C++ makes up for that with the use of templates but since we are using C there is another way. We can have a dedicated quicksort implementation only for integers. I actually had one lying around and so I inserted it into the code to compare.

int partition(int* a,int left,int right,int pivotI)
{
    int swap,newPivotI,i;
    int pivotValue = a[pivotI];
    //move the pivot to the end
    a[pivotI] = a[right];
    a[right] = pivotValue;
    newPivotI = left;//initialize the new pivot position to the leftmost index
    //find the new pivot position
    for(i=left;i<right;i++)
    {//if the value at the iterated index is less than the pivot value then swap that value
    // with the potential pivot position and increase the potential pivot position
        if(a[i] <= pivotValue)
        {
            swap = a[i];
            a[i] = a[newPivotI];
            a[newPivotI] = swap;
            newPivotI++; //increase the potential pivot index
        }
    }
    //in the end move the pivot to its final position
    a[right] = a[newPivotI];
    a[newPivotI] = pivotValue;
    return newPivotI;
}
 
void quicksort_inplace(int* a,int left,int right)
{
    int pivotI,newPivotI;
    if(left < right)//if the list has at least 2 items
    {
        //choose the pivot
        pivotI = left+(right-left)/2;
        //partition the list
        newPivotI = partition(a,left,right,pivotI);
        //recursively quick sort elements smaller than the pivot
        quicksort_inplace(a,left,newPivotI-1);
        //recursively quick sort elements greater than the pivot
        quicksort_inplace(a,newPivotI+1,right);
    }
}
 
void quicksort(int* a,int size)
{
    quicksort_inplace(a,0,size-1);
}

As can be seen from the code above this is a standard in-place quicksort implementation for integers. It involves a partition() function and a function that's called recursively on the created partitions to sort the array in place. It's easy to study the code and I believe it is well commented. If you want more information on the actual algorithm you can find it here.

The only change that needs to be done in the code that was presented before is swapping the call to stdlib's qsort() with the following:

quicksort(set,N);

The results as can be seen from the HackerRank data set details showed a very small improvement. So maybe for this data and this problem there is an improvement but whether it's worth the effort is up for discussion.

# 0 0.02s : Success

# 1 0.03s : Success

# 2 0.03s : Success

# 3 0.03s : Success

# 4 0.03s : Success

# 5 0.04s : Success

# 6 0.04s : Success

# 7 0.04s : Success

# 8 0.04s : Success

# 9 0.05s : Success

# 10 0.05s : Success

# 11 0.05s : Success

# 12 0.05s : Success

# 13 0.06s : Success

# 14 0.06s : Success



A final and perhaps not so important improvement would be to remember the last j index that was reached in the inner loop. This way we can compare and see if the difference of the value at i and at the last j is less than K. If it is, then the inner loop can actually start from the next item after the last j.

int diff;
int lastJ=1;
for(i=0;i<N;i++)
{
    j = i+1;
    if(set[lastJ]-set[i]<K)
        j = lastJ+1;
    for(;j<N;j++)
    {
        diff = set[j]-set[i];
        if(diff > K)
            break;
        lastJ = j;
        if( diff == K)
        {
            result++;
            break;
        }
    }
}
printf("%d",result);

Above we can see what I said implemented in code. There is an extra if check but depending on the data some improvement could be gained. But as far as HackerRank's data is concerned there was no visible improvement when I ran this code through it.

EDIT1:
Christian Sommer made a very useful comment below detailing a very clever approach to this problem! Taking advantage of the fact that all the values are guaranteed to be unique and that we simply want to count the number of the pairs and not the pairs themselves there is an even smarter approach.

If during reading a number M from the input stream we have already read another number satisfying either M+K or M-K then that means that we have found a pair. It's that simple! The specific code given by Christian is in C++ and utilizes std::set from the STL. It assumes that stream is an ifstream and that we have already read the first line that contains N and K.

int pairCount = 0;
int el;
std::set<int> s;
while(stream >> el) {
if (s.count(el+K) > 0 || s.count(el-K) > 0) ++pairCount;
s.insert(el);
}
return pairCount;

Well that's all I guess. I simply wanted to share my implementation of a solution to this particular problem. I hope it helped the readers and maybe even taught something new to some people. If a reader has a better and faster solution I encourage you to share it with me in the comments section. I am really looking forward to seeing if there are more awesome ways to approach this problem.

10 thoughts on “Counting pairs of numbers with a given difference out of a set”

  1. Your solution (sorting and linear search) looks fine for small K. If, however, K is large (say, N/2), and the input numbers are 1, 2, 3, … N, linear search would run through half the array for half the elements, hence it still is quadratic in nature.

    Note that difference exactly K means that if M is in the stream, either M+K or M-K are needed to yield a pair. Since the numbers are distinct, one could just build a search tree (a set). C++, assuming that stream is an ifstream(file) and we already read N (not needed) and K.

    int pairCount = 0;
    int el;
    std::set s;
    while(stream >> el) {
    if (s.count(el+K) > 0 || s.count(el-K) > 0) ++pairCount;
    s.insert(el);
    }
    return pairCount;

  2. You are right that for large K the solution is still quadratic in nature for the worst case scenario.

    Thank you for sharing another solution to the problem! It is really elegant. Of course given the fact that all elements are unique should point towards a Binary Search Tree. The key thinking is as you said “that if M is in the stream, either M+K or M-K are needed to yield a pair”.

    Great solution! I will edit the post and share it there at the end giving you full credit of course.

  3. The last mod you made to your code (tracking lastJ) prevents it from being quadratic, even in the scenario Christian describes. You can prove this to yourself be re-writing it as one loop rather than a nested loop, where if the difference is less than or equal to K, you increment j, otherwise you increment i. This solution has the advantage that the values don’t have to be distinct.

  4. Thanks for the observation Willus. I am a little busy now so I can’t devote any time to it right now but I will post another comment as soon as I get around to doing it.

  5. Hello ,
    In the statement – if (s.count(el+K) > 0 || s.count(el-K) > 0) ++pairCount;

    We are just increasing the pairCount by 1 while we may have more than one possible pairs.
    Should it not be this way :
    pairCount = pairCount + s.count(num+K) + s.count(num-K);

    Correct me if i am wrong .

  6. Did you try with the input :
    5 0
    1 1 1 1 1

    P.S. I did not run your code.

  7. hello, I will try your input when I come back from work and find some free time. Thanks for the feedback! If you want you can try the code yourself and tell me how it performs in the meantime

  8. @daemonoid:
    The input you provided is invalid. The problem statement clearly states:

    Given N numbers [N< =10^5], count the total pairs of numbers that have a difference of K. [K>0 and K<1e9]

    1st line contains N & K (integers). 2nd line contains N numbers of the set. All the N numbers are assured to be distinct.

    So neither of the solutions presented in this post is supposed to cater for input like that.

    Thanks for the feedback!

Leave a Reply

Your email address will not be published. Required fields are marked *