This is just a post detailing the solution to a small algorithmic challenge from HackerRank. The exercise is to find the pairs of numbers out of a set of numbers `N`

that have difference `K`

. The link to the actual HackerRank problem can be seen here. I am not sharing my solution because it’s unique or even good. There may be much better solutions out there and in fact if someone knows something better I encourage you to comment on it so that both me and the readers of this blog can learn something new!

All of the code mentioned in this tutorial can be cloned from Github

The actual text of the challenge is given below:

Given N numbers [N<=10^5], count the total pairs of numbers that have a difference of K. [K>0 and K<1e9] Input Format: 1st line contains N & K (integers). 2nd line contains N numbers of the set. All the N numbers are assured to be distinct. Output Format: One integer saying the no of pairs of numbers that have a diff K. Sample Input #00: 5 2 1 5 3 4 2 Sample Output #00: 3

Now with challenges like this, speed is what is important. We want our program to solve the problem for all possible input in the least amount of computational time. I will be using C to solve this problem. First of all we have to figure out a way to read in the input from the `stdin`

as is described in the problem definition.

1 2 3 4 5 6 7 8 9 10 | int N,K,*set,i=0,j; char* buff,* token; int result = 0; //read 1st input line scanf("%d %d \n",&N,&K); //allocate space for the set numbers string and the actual set buff = malloc(13*N);//13 here is for ULONG_MAX number of digits + space set = malloc(sizeof(int)*N); |

Above is the beginning of `main()`

for the problem. `N`

and `K`

are the numbers given in the problem definition. `set`

will be an array holding all of our numbers while `buff`

will be a buffer holding the string of the second line from which to read the numbers into `set`

. After reading both `N`

and `K`

with `scanf()`

we allocate the buffer to hold the string. It may be a bit sloppy doing it like that but here we allocate 13 bytes for each number. This is an approximation judging from the typical maximum value of `ULONG_MAX`

as shown here, plus the space separating each number. It should be safe since we are using ints and `ULONG_MAX`

is for unsigned ints. We also allocate the array of numbers.

//read second line of input gets(buff); //find the individual numbers token = strtok(buff," "); while(token != NULL) { sscanf(token,"%d",&set[i]); token = strtok(NULL," "); i++; } |

Further down in reading the input we can see from above that we read the second input line into the buffer. Then since we know from the problem definition that the numbers are space separated we can simply use libc's `strtok()`

function to obtain each number of the set. This is what is accomplished by the while loop in there. It tokenizes the string located inside `buff`

until all of the numbers are read into `set`

.

Now since we have read all of the input we have to decide on the strategy with which to determine how many pairs of numbers have difference of `K`

. The naive approach would be to simply go through the array `NxN`

times and count the difference of all numbers increasing a counter for those which match `K`

. Something like below

//naive approach for(i=0;i<N;i++) { for(j=0;j<N;j++) { if( abs(set[i] - set[j]) == K) result++; } } |

This approach is not very good and it will actually fail the test in the last 2 data sets in HackerRank. The complexity here is O(N^{2}) and is not good. We can do better than that! For reference let me mention the processing times that HackerRank returned for this approach:

Status: Wrong Answer # 0 0.06s : Success # 1 0.08s : Success # 2 0.14s : Success # 3 0.23s : Success # 4 0.36s : Success # 5 0.54s : Success # 6 0.74s : Success # 7 0.98s : Success # 8 1.26s : Success # 9 1.58s : Success # 10 1.93s : Success # 11 2.32s : Success # 12 2.74s : Success # 13 3.22s : Time limit exceeded # 14 3.22s : Time limit exceeded

So what could an alternate way to approach this problem be? Well let's see! We have an array of numbers and these numbers are also guaranteed to be positive. The relation between the numbers that we are trying to explore is their difference. What operation on the array could allow us to deal with the difference in a much better way? Yes you guessed it correctly! We can sort the array! By having a sorted array of numbers we can easily get the differences between numbers without wasting too much time.

//comparison function for qsort() int compare(const void* a, const void* b) { //a less than b if( *(int*)a < *(int*)b) return -1; //a greater than b return 1; //we don't take into account a == b since according to the problem definition it is impossible } |

//sort the array qsort(set,N,sizeof(int),compare); |

As we can see from the above code the C standard library includes an implementation of quicksort as long as you `#include <stdlib.h>`

. Since C has no templates you need to implement a comparison function using the signature `int compare(const void* a, const void* b)`

which will be returning -1 in case of `a`

being less than `b`

, 0 in case they are equal and 1 in case `a`

is greater than `b`

. But in our comparison implementation we omit one of those checks. The problem definition clearly states that the numbers in the array are unique and also that the difference `K`

can't be 0. So there is no need for `a`

and `b`

being equal comparison.

So afterwards we can call `qsort()`

on the array which will sort it in place. For more information on stdlib quicksort implementation take a look at the relevant documentation.

int diff; for(i=0;i<N;i++) { for(j=i+1;j<N;j++) { diff = set[j]-set[i]; if(diff > K) break; if( diff == K) { result++; break; } } } printf("%d",result); |

And here is the actual counting part of this approach. We check each number of the array against each other trying to find number pairs which satisfy the difference of `K`

. Thanks to the sorting of the array we also know that if we go to differences greater than `K`

we can safely proceed to the next number since all subsequent numbers would be greater and so would the difference. In addition, we don't need to be checking the previous numbers that were already checked and we know that since `K>0`

, the second `for()`

loop starts at `i+1`

.

This approach shows a very big difference in the results as can be seen by the details from the HackerRank data sets.

# 0 0.03s : Success # 1 0.03s : Success # 2 0.03s : Success # 3 0.04s : Success # 4 0.03s : Success # 5 0.04s : Success # 6 0.04s : Success # 7 0.05s : Success # 8 0.05s : Success # 9 0.05s : Success # 10 0.05s : Success # 11 0.06s : Success # 12 0.08s : Success # 13 0.08s : Success # 14 0.08s : Success

There is a small way to squeeze out a bit more performance for this algorithm though. It does not involve a change of strategy but rather a change of implementation. C stdlib quicksort has one disadvantage. In order to provide for any possible type of input it uses `void*`

and a function pointer for the comparison of the elements of the array that has to be defined by the user. C++ makes up for that with the use of templates but since we are using C there is another way. We can have a dedicated quicksort implementation only for integers. I actually had one lying around and so I inserted it into the code to compare.

int partition(int* a,int left,int right,int pivotI) { int swap,newPivotI,i; int pivotValue = a[pivotI]; //move the pivot to the end a[pivotI] = a[right]; a[right] = pivotValue; newPivotI = left;//initialize the new pivot position to the leftmost index //find the new pivot position for(i=left;i<right;i++) {//if the value at the iterated index is less than the pivot value then swap that value // with the potential pivot position and increase the potential pivot position if(a[i] <= pivotValue) { swap = a[i]; a[i] = a[newPivotI]; a[newPivotI] = swap; newPivotI++; //increase the potential pivot index } } //in the end move the pivot to its final position a[right] = a[newPivotI]; a[newPivotI] = pivotValue; return newPivotI; } void quicksort_inplace(int* a,int left,int right) { int pivotI,newPivotI; if(left < right)//if the list has at least 2 items { //choose the pivot pivotI = left+(right-left)/2; //partition the list newPivotI = partition(a,left,right,pivotI); //recursively quick sort elements smaller than the pivot quicksort_inplace(a,left,newPivotI-1); //recursively quick sort elements greater than the pivot quicksort_inplace(a,newPivotI+1,right); } } void quicksort(int* a,int size) { quicksort_inplace(a,0,size-1); } |

As can be seen from the code above this is a standard in-place quicksort implementation for integers. It involves a `partition()`

function and a function that's called recursively on the created partitions to sort the array in place. It's easy to study the code and I believe it is well commented. If you want more information on the actual algorithm you can find it here.

The only change that needs to be done in the code that was presented before is swapping the call to stdlib's `qsort()`

with the following:

quicksort(set,N); |

The results as can be seen from the HackerRank data set details showed a very small improvement. So maybe for this data and this problem there is an improvement but whether it's worth the effort is up for discussion.

# 0 0.02s : Success # 1 0.03s : Success # 2 0.03s : Success # 3 0.03s : Success # 4 0.03s : Success # 5 0.04s : Success # 6 0.04s : Success # 7 0.04s : Success # 8 0.04s : Success # 9 0.05s : Success # 10 0.05s : Success # 11 0.05s : Success # 12 0.05s : Success # 13 0.06s : Success # 14 0.06s : Success

A final and perhaps not so important improvement would be to remember the last `j`

index that was reached in the inner loop. This way we can compare and see if the difference of the value at `i`

and at the last `j`

is less than `K`

. If it is, then the inner loop can actually start from the next item after the last `j`

.

int diff; int lastJ=1; for(i=0;i<N;i++) { j = i+1; if(set[lastJ]-set[i]<K) j = lastJ+1; for(;j<N;j++) { diff = set[j]-set[i]; if(diff > K) break; lastJ = j; if( diff == K) { result++; break; } } } printf("%d",result); |

Above we can see what I said implemented in code. There is an extra if check but depending on the data some improvement could be gained. But as far as HackerRank's data is concerned there was no visible improvement when I ran this code through it.

**EDIT1:**

Christian Sommer made a very useful comment below detailing a very clever approach to this problem! Taking advantage of the fact that all the values are guaranteed to be unique and that we simply want to count the number of the pairs and not the pairs themselves there is an even smarter approach.

If during reading a number `M`

from the input stream we have already read another number satisfying either `M+K`

or `M-K`

then that means that we have found a pair. It's that simple! The specific code given by Christian is in C++ and utilizes `std::set`

from the STL. It assumes that `stream`

is an `ifstream`

and that we have already read the first line that contains `N`

and `K`

.

int pairCount = 0; int el; std::set<int> s; while(stream >> el) { if (s.count(el+K) > 0 || s.count(el-K) > 0) ++pairCount; s.insert(el); } return pairCount; |

Well that's all I guess. I simply wanted to share my implementation of a solution to this particular problem. I hope it helped the readers and maybe even taught something new to some people. If a reader has a better and faster solution I encourage you to share it with me in the comments section. I am really looking forward to seeing if there are more awesome ways to approach this problem.