Algorithm #5: Binary Search and its relatives

I think most of you already know about binary search. It is an algorithm that is used to search an element in an array. Its complexity is O(log n) as opposed to O(n) for linear search, so it’s a much better way to search an array. This time I’ll first revise binary search and then I’ll show how simple modifications in the basic binary search algorithm can produce interesting variants.

1:  BINARY SEARCH

Assume that we have got an array of n integers, and we have to run many queries on the array. In each query, we are given a number ( named targetnum ) and we have to tell if the number is present in the array or not. We’ll use binary search to solve this.

The first step of this algorithm is to sort the array. At first look, it may seem that sorting will consume a lot of time and will contribute to overhead. But, remember that we have to run MANY queries on the array. This means that sorting once will give advantage for each of the many queries. So, ultimately the overhead contributed by sorting is much less than the advantage it gives.

Now, when we start searching, targetnum can be present anywhere in the sorted array. We take a look at the integer that is present in the middle of the array. If this middle integer is targetnum itself, then our search terminates. If this middle integer is greater that targetnum, then we can say that targetnum is obviously present somewhere in the first half of the array (i.e. in [first,mid) ) (this is because the array is sorted). And if the middle integer is less than targetnum, then targetnum is present somewhere in the second half of the array (i.e. in (mid,last] ).

The potential space ( i.e. the space where targetnum can be present ) is, therefore, reduced to half in every step.

We repeat this process until the size of the potential space becomes 1. And if we still have not yet found the targetnum, we can conclude that it is not present in the array.

In form of C code, this algorithm will look something like:
binarysearch() is a function that takes the array, the size of the array and the integer to be searched as arguments.
If there is even a single instance of targetnum in the array then 1 will be returned, otherwise 0 will be returned.

int binarysearch(int *a,int n,int targetnum)
{
  int start=0, end=n-1, mid; 
  while(start<=end)
  {
    mid=(start+end)/2;
    if(a[mid]==targetnum) // if the middle element is the required number, then search terminates
    {
      break;
    }
    else if(a[mid]>targetnum)
    {
      end=mid-1; // restrict search to first half
    }
    else if(a[mid]<targetnum)
    {
      start=mid+1; // restrict search to second half
    }
  }
  if(start<=end)
    return 1;
  else
    return 0;
}

This algorithm halves the potential space in every iteration. So, it takes a maximum of log2(n) iterations to complete the search of one element, as opposed to n iterations per query for linear search.

2,3: LOWER BOUND AND UPPER BOUND

A variant of binary search can be used to find the lower bound and upper bound of an element (say, targetnum ) in a sorted array.

Lower Bound: The lower bound of targetnum is the first element in the sorted array that is not less than targetnum.

Upper Bound: The upper bound of targetnum is the first element in the sorted array that is greater than targetnum.

Explanation: It is possible that targetnum may have no instance, a single instance or multiple instances in the array.
If there is only one instance of targetnum present in the array, lower bound will be the position of that instance, and upper bound will be the position of that instance + 1.

If there are multiple instances of targetnum present in the array, lower bound will be the position of the first instance, and upper bound will be the position of the last instance + 1.

If there is no instance of targetnum in the array, lower bound and upper bound both will be the position of first integer greater than targetnum.

So, if the sorted array is {1,4,5,5,5,6,9,9,10}, then the lower bound and upperbound of each integer will be:

Number Lower Bound Upper Bound
0 0 0
1 0 1
2 1 1
3 1 1
4 1 2
5 2 5
6 5 6
7 6 6
8 6 6
9 6 8
10 8 9
11 9 9

(Note that in case of 11, the lower bound and upper bound are 9, i.e. last index of array + 1. This is so because 11 is greater than all the elements of array)
The code for finding lower bound:
lowerbound() is a function that takes the array, the size of the array and the element whose lower bound has to be found as arguments and returns the lower bound of the given element.

int lowerbound(int *a,int n,int targetnum)
{
    int first=0,last=n-1,mid;
    while(first<last)
    {
        mid=(first+last)/2;
        if(a[mid]==targetnum)
        {
            last=mid;
        }
        else if(a[mid]>targetnum)
        {
            last=mid-1;
        }
        else if(a[mid]<targetnum)
        {
            first=mid+1;
        }  
    }
    if(a[first]>=targetnum)
        return first;
    return first+1;
}

The main difference between binary search and this algorithm is when the condition a[mid]==targetsum is true. See line 9 of the code. In the binary search algorithm, if a[mid]==targetnum is true, the search ends (as the required number was found). But here, we are instead interested in the first instance of targetnum, so we continue our search.
If there is an instance of targetnum present at position mid, then it means that the first instance of targetnum must be in the interval [first,mid].

After the while loop ends, the required position is either the value in first or first+1.

The code for finding upper bound is:
upperbound() is a function that takes the array, the size of the array and the element whose upper bound has to be found as arguments and returns the upper bound of that element.

int upperbound(int *a,int n,int targetnum)
{
    int first=0,last=n-1,mid;
    while(first<last)
    {
        mid=(first+last)/2;
        if(a[mid]==targetnum)
        {
            first=mid+1;
        }
        else if(a[mid]>targetnum)
        {
            last=mid-1;
        }
        else if(a[mid]<targetnum)
        {
            first=mid+1;
        }
    }
    if(last>=0 && a[last]>targetnum)
        return last;
    return last+1;
}

The main difference between this algorithm and binary search is when the condition a[mid]==targetnum is true. Here, we are interested in the last occurrence of targetnum.
So, if one instance of targetnum is present at mid, then the last instance must be present somewhere in the interval [mid,last].

After the while loop ends, the required position is either the value in last or last+1.

APPLICATION:
One of the possible interesting applications of upperbound() and lowerbound() is:
If we are told to find the number of occurrences of a given integer (say, targetnum) in an array, we can use lowerbound() and upperbound() to find the answer.
The answer will be upperbound(a,n,targetnum)-lowerbound(a,n,targetnum).
Convince yourself that this query is correct by consulting the definitions of upper bound and lower bound given earlier in this post.
This query will have time complexity O(log n) as both the functions take about log n iterations each to find the answer.

DEEPER INSIGHT:
1. For the ones who observed line 9 of upperbound() closely: There are two reasons for which I used first= mid+1, instead of first = mid.
Firstly, using first=mid will lead to an infinite loop in some cases (eg. when first=1 and last=2).
Secondly, we are interested in (the position of last instance of targetnum) + 1, so its alright if targetnum itself is not present in the active range.

4:
There is another type of binary search; one that is not done on any array. There are situations where you need to optimize a certain parameter. You know the minimum and maximum value the parameter can take but there seems to be no other way than iterating over the possible values to find the optimal value. In certain cases, brute force iteration can be avoided and binary search can be used to solve the problem. One recent example of this type is the “PROBLEM A: ARROWS AND QUIVERS” question in ACM ICPC2013 Amritapuri site.
Discussing this topic will make this post complicated and much too lengthy. I’ll write about it in my future posts.
ACM ICPC 2013 Amritapuri questions
Solution to problem A

QUESTIONS ON BINARY SEARCH:

Am I A Fibonacci Number (Simple)
Codeforces:Eugeny and Play List (Easy)
Codeforces:Primes on Interval (Medium)

If you feel that some point wasn’t explained properly, please comment below and I’ll try to fix it.

Advertisements

4 comments

  1. Hey very nice tutorial
    Ithink there is small problem in binarysearch method

    binarysearch will fail on below input
    a = {18};
    p = 18 ;

    // n =0 , start =0 , end = 1-1=0; while(start<end) fails in 1st attempt , if(start<end) fails so return 0 ;

    expected output should be 1
    but it will return 0

    I think you can fix this error by changing condition in while and last if statement to "start<=end"

      1. welcome 🙂
        one more minor modification requires in upperbound()
        in last if condition it should be
        if(last>=0 && a[last]>targetnum) return last;
        else return last+1;

        because sometime last can have -1 value after while loop completed
        like a = {10,12} p = 8;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s