Theory

Algorithm Complexity

In this post I am going to discuss what ‘complexity’ of an algorithm is and a method by which it is determined. I am not going to bore you guys by posting too much about theoretical concepts instead of actual algorithms. From the next post onward I am going to talk only about new algorithms. But it is important that you understand this concept.

Algorithm complexity tells us about how fast or slow an algorithm performs. We use the Big O notation to classify algorithms on the basis of how they respond to input of different sizes. I am going to describe it briefly. The Big O notation basically tells us “what is the execution time of the algorithm directly proportional to”.  If I say that the complexity of an algorithm is O(n^2), it means that the execution time of that algorithm is proportional to the square of the input size. Consider the following code snippet:

for(i=0;i<n;i++)
{
  for(j=0;j<n;j++)
  {
    //some code here
  }
}

The code inside the two loops will execute n^2 times. Therefore, The complexity of the above code snippet is O(n^2), because the execution time is directly proportional to the square of ‘n’.

Another example, complexity of the following code is O(n^3):

for(i=0;i<n;i++)
{
  for(j=0;j<n;j++)
  {
     for(k=0;k<n;k++)
     {
       //some code here
     }
  }
}

Similarly, the following algorithm for printing all elements of an array of size ‘n’ is an O(n) algorithm. In other words, it has linear complexity.

for(i=0;i<n;i++)
{
   printf("%d ",array[i]);
}

If I have to write a program for adding two numbers, I’ll write it as follows:

scanf("%d %d",&number1,&number2);
int sum=number1+number2;
printf("%d",sum);

The execution time of this algorithm is not dependent on the input ( number1 and number2). In this case, we say that the complexity of this code is O(1) i.e. constant complexity.

One more example,

scanf("%d",&n);
for(i=1;i<n;i*=2)
{
  //some code here
}

The complexity of the above algorithm is O(logn) as the code inside the loop executes ([log(n)]  + 1) times. ( [num] means that num is rounded up to the closest integer). The base for log is 2.

E.g.,

for n=8, the values of i for different iterations are i =  1, 2, 4, 8. Therefore, 4 iterations.

For n=31, i = 1, 2, 4, 8, 16, 32. Therefore, 6 iterations.

For n=33, i = 1, 2, 4, 8, 16, 32, 64. Therefore, 7 iterations.

So, the number of times the for loop iterates is directly proportional to log(n).

If I insert a code with O(logn) complexity inside a code with O(n) complexity, I’ll get code with complexity O(n*logn). For example,

scanf("%d",&n);
for(j=0;j<n;j++)
{
  for(i=1;i<n;i*=2)
  {
    //some code here
  }
}

The execution time of the above code is proportional to n*log(n), therefore O(n*logn)

O(1) algorithms are the best (obviously) but a constant time solution is not possible for most problems. Algorithms of higher order (n^3, n^4,…) are not preferred. They are the main culprits for TLE (Time Limit Exceeded) verdict.

Common algorithm complexities, from better to worse are

O(1) < O(logn) < O(n) < O(n*logn) < O(n^2) < O((n^2)*logn) < O(n^3) < …

Counting sort, that we previously discussed, has a complexity O(n). In counting sort, an array of size ‘n’ is traversed 3 times. Therefore, the execution time is directly proportional to n.

Other basic sorting algorithms like Selection sort, Bubble sort , Quick sort, etc have complexity O(n^2). Efficient sorting algorithms like Merge sort and Heap sort, have complexity O(nlogn), and therefore are preferred over the O(n^2) sorting algorithms.

The Big O notation is  widely used to compare algorithms. While introducing new algorithms I too will  mention their complexity in this manner.

I came across a page that contains some interesting information.

http://bigocheatsheet.com/

“Why am I getting Time Limit Exceeded?”

“I have implemented everything perfectly. Then why am I getting TLE?” is the question that comes to most programmers who are new to competitive programming. Let me tell you guys once and for all that declaring variables inside loops or writing a lengthy code has nothing to do with it. Variable declaration takes negligible time and the number of lines in the code does not affect the execution time of a program. If you are getting TLE, it is because the logic that you have devised to solve the problem is taking up a lot of time when run on large inputs. Lets take a simple example.

Suppose that we have to write a program to find the sum of all natural numbers from a till b. ( a and b are natural numbers too )

One possible approach (in C/C++) to this simple problem is

int i,sum=0;
for(i=a;i<=b;i++)
   sum+=i;

Now, if examined in terms of correctness, this code has nothing wrong with it. It finds the correct answer. But the code is not efficient with respect to time. If the difference between ‘a’ and ‘b’ is very large (say, a=1 and b=10^15), this code would take forever to run because the for loop will have to iterate from 1 till 10^15.

There’s another way to find the solution. We know that the sum of the first n natural numbers is ( n * (n+1) ) /2. By using this formula we may write the following code:

int high= (b * ( b + 1 )) /2;
int low= ((a-1) * a) /2;
int sum=high-low;

The variable ‘high’ contains the sum of first b natural numbers. And the variable ‘low’ contains the sum of first (a-1) natural numbers. So, obviously the sum of natural numbers from a to b will be (high – low).

The second approach will always take constant time to run. The execution time will be the same no matter how far apart a and b are. And it gives the same answer as the first method.

Now you see, this is the reason the first solution was not correct even though it was giving the correct output. Remember that if you are getting TLE, it may be because of the following reasons:

1. The logic of your program is not fast enough (as demonstrated above).

2. Your program is going into an infinite loop somewhere.

3. Rarely, VERY rarely because you are not using an efficient way to take input.

The problem setters usually set the time limit such that a participant never gets a TLE due to the 3rd reason. A note for C++ users: if a problem is such that large amount of numbers have to taken as input, it is better to use scanf() instead of the cin stream. The cin stream is slower than scanf().

(See the documentation for your language to find the best way to take input.)

When comparing two algorithms, we use a method known as the ‘Big O Notation’. In a nutshell, the Big O notation tells us what the execution time of a program  is directly proportional to.

This notation is used very frequently to comment on the efficiency of the algorithms and its very important that you get familiar to it. I’ll write about it in my next post (which will be very soon!).