Algorithm #2: Primality Testing

One of the important algorithms that a competitive programmer must know are the primality testing algorithms. Solution to some programming problems involve checking if a number is prime or not and it is the algorithm you use to do this that determines if your solution gets accepted.

I’m sure that you all are aware that a number (say, n) is prime if there is no number (say, i ) such that i<n and n%i==0 and i!=1. We can use this fact to write a function that takes a positive number and returns 1 if the number is prime and 0 if the number is not prime.

int isprime(int n)
{
   if(n<2)
      return 0;
   int i=2;
   while(i<n)
   {
      if(n%i==0)
         return 0;
      i++;
   }
   return 1;
}

The complexity of this algorithm is O(n). This is the naive approach for primality testing. Naive approach means a simple approach that is not optimised at all and is not sufficient.

Let me draw your attention to a fact. If I want to find two numbers ( a and b ) for a number n, such that n=a*b, then one of the two numbers will be less than sqrt(n) and the other one will be greater than sqrt(n). Except when a=b=sqrt(n). So, if I don’t find any factor of n less than or equal to sqrt(n), then I can be sure that n is a prime number and there is no need to go all the way to n.

Therefore the condition in the while loop can be modified, and the code will look as follows:

int isprime(int n)
{
   if(n<2)
      return 0;
   int i=2;
   while(i*i<=n)
   {
      if(n%i==0)
         return 0;
      i++;
   }
   return 1;
}

Note that instead of finding sqrt(n), I simply used i*i<=n. This does exactly the same thing as i<=sqrt(n). The complexity of this algorithm is O(sqrt(n)).

But for some problems even this optimised method is not enough. There is another, very efficient way to find prime numbers. This algorithm is known as Sieve of Eratosthenes.

I will explain how the Sieve of Eratosthenes works for a number less than or equal to 100.

We start by writing numbers from 1 to 100 on paper. Cross the number 1 on the paper. Circle the number 2 and cross every multiple of 2. Circle 3 and cross every multiple of 3. We skip 4 as it is crossed (it is a multiple of 2). Circle 5 and cross every multiple of 5. At each step, we find the next smallest number that is not crossed. We circle that number and cross every multiple of that number. We continue this process until all numbers are either circled or crossed. All the numbers that are circled are prime numbers. Now, if I want to check the primality of any number less than or equal to 100, I can simply see if that number is circled or crossed. If it is circled, it is a prime number otherwise it is composite.

So, once we have completed the circling-and-crossing process, we can check the primality of a number in constant time i.e. O(1) time. In a way we ‘precalculate’ the primality of numbers and use this information later on.

In all programming problems the upper limit of numbers is given. Suppose that it is given that all numbers are <=10^7. In this case, I’ll simply run the Sieve of Eratosthenes till 10^7 in the start of the program and later in the program I’ll be able to check the primality of any number <=10^7 in O(1) time.

Following is an implementation in C/C++. I have declared an array ‘isprime’ of size (10^7 + 1) to find prime numbers upto 10^7. If isprime[i]==1, it simply means that the ith number is crossed.

int isprime[10000001]={0},i=2,j;
isprime[0]=1;
isprime[1]=1;
for(i=2;i*i<=10000000;i++)
{
   if(isprime[i]==1)//if the number is crossed, it is skipped
      continue;
   for(j=i*2;j<=10000000;j+=i)// NOTE that j is incremented by i everytime
   {
      isprime[j]=1; // crossing every multiple of i
   }
}

Now whenever I want to check if a number p is a prime number, I’ll observe the value of isprime[p]. If it is 0, then it is prime otherwise it is not.

Lets examine the execution time of this code. The outer loop iterates i from 2 till n(n=10000000 here). The inner loop will iterate only if ‘i’ is not crossed (i.e. If ‘i’ is prime).

For i=2, the inner loop will iterate n/2 times.

For i=3, the inner loop will iterate n/3 times.

For i=5, the inner loop will iterate n/5 times.

For i=7, the inner loop will iterate n/7 times… and so on.

For i= some composite number, the inner loop will iterate 0 times.

Therefore the total times the inner loop iterates is :

( n/2) + ( n/3 ) + ( n/5 ) +( n/7 ) + ( n/11 ) + ( n/13 ) + …

We can take n out from all the terms and we get,

n * summation (1/i), where i is a prime number less than or equal to n. Summation(n/i) is called the harmonic series of primes. Its summation till n approximately reaches log(log(n)). Therefore, the inner loop iterates approximately n*log(log(n)) times

The complexity of Sieve of Eratosthenes is as follows:

For Precalculation : O(n* log(log(n)))

For each query : O(1)

If you don’t understand how we arrived at this complexity, don’t fret. You’ll understand it eventually when you observe some more algorithms and develop a better understanding of complexity of algorithms.

All questions are welcome.

 

Sources to read from :

About the Sieve of Eratosthenes:

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

http://www.youtube.com/watch?v=9m2cdWorIq8

Complexity of this algorithm :

http://stackoverflow.com/questions/2582732/time-complexity-of-sieve-of-eratosthenes-algorithm

Online question:

Implement this algorithm first and see if it is generating prime numbers. And when you get familiar with the concept try to solve this question:

‘Count k primes’ on Codechef: 

http://www.codechef.com/problems/KPRIME

This question requires some modification to the actual Sieve algorithm. I’ll leave it to you to figure it out. If you face any problem, you may leave a  comment here.

Advertisements

7 comments

  1. I tried this part of code and it is showing “Segmentation fault (core dumped)” while running….Here “j” is declared as int so how can we increment “j” to 10^7 ?….please give me a solution for this

    1. The reason for segmentation fault is that you cannot declare an array of size 10^7 unless it is global. You can do one of two things:
      1. either make the array declaration global
      OR
      2. reduce the size of the array to 10^6. Also, change both the loop conditions to <=10^6.
      ——
      int can handle values from -2 X 10^9 to +2 X 10^9 (approx).

      1. Getting correct answer now…thanks for your help..
        Can you explain the logic behind declaring it global?

    2. An array this big cannot be declared locally. Local variables are stored on stack.

      You must either dynamically allocate them space in the program’s heap memory using new operator / malloc() or declare them as global.

      The size of the stack is fixed and cannot be increased.
      But if the heap is too small then the OS increases the size of the heap.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s