Programming Technique

Representation of Graphs

Graph is a very versatile data structure. It can be used to solve a large number of problems. You’d be surprised to see how easy the problems become once they are interpreted in terms of a graph.

Before I can start with graph algorithms like DFS, BFS, Disjoint Set Union, Segment trees, etc. it is important to know how a graph can be represented in memory.

I’m assuming that you have knowledge of elementary graph theory and the concepts of node, edge, tree etc.

Two things are essential to a graph: its nodes and edges. As you know, each edge connects two nodes.
For all future references lets take N as the number of nodes and M as the number of edges. Each edge is of the form (a,b) denoting that nodes a and b are connected (1<=a,b<=N ) .
Following are two ways to store a graph. I’ll show you how to implement them:

1. Adjacency Matrix: Use a 2-D array of size N x N. Let’s call this matrix adjmat[N][N]. adjmat[i][j]=1 iff there is an edge from i to j.
For every undirected edge (a,b) in the graph, mark adjmat[a][b]=1 and adjmat[b][a]=1. If you have a directed edge from a to b, then mark only adjmat[a][b]=1.

2. Adjacency List: Use a 1-D array for each node. For every node, store in that array the nodes that it is connected to. So we’ll have N 1-D arrays, one for each node.

Lets take the following graph as example:
graph
The adjacency matrix would be:

adjmat

and the adjacency list would be:
adjlist

The adjacency list is a more efficient way to store information about a graph. It consumes lesser memory and is more time efficient as compared to adjacency matrix. The reason for this is that the adjacency matrix has one element in adjmat[][] for each possible edge in the graph i.e. N^2 possible edges. What if we have a tree? In case of a tree we have only N-1 edges. We don’t need to use N^2 amount of space just to store info about N-1 edges.
In such cases adjacency list is useful. Adjacency list can be implemented using a linked list OR dynamic array. And it consumes only that much space that is required.

Using an adjacency list also leads to lesser number of iterations in later operation (eg. during DFS, BFS, etc.). This is because the number of entries in adjacency list is 2 X M . But in case of adjacency matrix it is always N^2.

Of course in a situation where the number of edges is close to N^2, both methods will perform equally well.

In competitive programming competitions, most of the time a graph can be given as input in two ways. (Assume undirected graph)

1. The problem will give you N, and N lines will follow. In the beginning of ith line will be Di (Degree of ith node) and a list of the nodes that the ith node is connected to.
For the example graph given above, the input would look like:
5
3 2 3 4
1 1
2 1 4
2 1 3
0

2. The problem will give you N and M, and each of the next M lines will contain two integers a and b denoting an edge between nodes a and b.
For the example graph the input would be like this:
5 4
1 2
1 3
1 4
3 4

Both can be handled with either Adjacency Matrix or Adjacency List.

Type-1 with Adjacency Matrix:

int main()
{
    int n,i,j,temp;
    scanf(" %d",&n);
    int adjmat[101][101]={0};//size of this matrix should be (N+1) X (N+1) because we are using 1 based numerbing for nodes
    int degree[101]={0};
    i=1;
    while(i<=n)
    {
        scanf(" %d",&degree[i]);
        j=0;
        while(j<degree[i])
        {
            scanf(" %d",&temp);
            adjmat[i][temp]=1;
            adjmat[temp][i]=1;
            j++;
        }
        i++;
    }
    return 0;
}

Type 1 with adjacency list:

int main()
{
    int n,i,j,temp;
    scanf(" %d",&n);
    int *adjlist[101];
    int degree[101];
    i=1;
    while(i<=n)
    {
        scanf(" %d",&degree[i]);
        adjlist[i]=(int *)malloc(sizeof(int)*degree[i]);
        j=0;
        while(j<degree[i])
        {
            scanf(" %d",&temp);
            adjlist[i][j]=temp;
            j++;
        }
        i++;
    }
    return 0;
}

Type 2 with adjacency matrix:

int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    int adjmat[101][101]={0};
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjmat[a][b]=1;
        adjmat[b][a]=1;
        i++;
    }
    return 0;
}

Type2 with adjacency list:

int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    int adjlist[101][101]={0};
    int degree[101]={0};
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjlist[a][degree[a]]=b;
        degree[a]++;
        adjlist[b][degree[b]]=a;
        degree[b]++;
        i++;
    }
    return 0;
}

The above implementation for Type 2 with Adjacency List doesn’t use dynamic arrays ( as it would involve a lot of realloc() and would get complicated ). So it still uses a lot of memory. But the advantage of lesser iterations for later operations, as mentioned before, is still there.

If you want to use dynamic arrays, study about C++ STL Vectors. Vectors are dynamic arrays. I recommend that you read the C++ STL documentation for vectors as vectors are incredibly useful and you don’t need to worry about malloc, realloc etc. as that is handled automatically by vectors.
Follow this link. (For now, study about push_back(), size(), pop_back(), begin() and end() only)

In the next post, I’ll write about how to implement DFS and BFS.

Advertisements

Algorithm #8: Dynamic Programming for Subset Sum problem

Uptil now I have posted about two methods that can be used to solve the subset sum problem, Bitmasking and Backtracking. Bitmasking was a brute force approach and backtracking was a somewhat improved brute force approach.

In some cases, we can solve the subset sum problem using Dynamic Programming. (Note that I said “in some cases”). This post is an introduction to Dynamic programming.

Okay, so first questions first… What is Dynamic Programming?
Dynamic programming is not an algorithm in itself, it is a programming strategy. The basic idea behind DP is that while computing for the answer, we store the results of the intermediate computations so that we don’t need to recompute them later. We first compute the answer of smaller versions of our problem and store the answers. These intermediate answers help us compute the actual answer and as they are already stored somewhere we don’t need to recalculate them everytime we need them. This saves up a lot of time. That’s DP in a nutshell. There is much more to the definition of Dynamic Programming that you can read on Wikipedia. But for now, this simple definition will do.
This discussion on quora might help.

Let’s look at Subset Sum problem again:

We are given a set of N numbers. Our objective is to find out if it is possible to select some numbers from this set such that their sum exactly equals M. Eg. If we have the numbers {3,5,3,1} and M=8, then the answer is “yes” because of the subset {3,5}. If M=10, then the answer is “no” because no subset has sum=10.

The DP solution looks like this:

#include
int main()
{
    int N,M;
    scanf(" %d %d",&N,&M);
    int nums[N+1],i,j,ispossible[N+1][M+1];
    for(i=1;i<=N;i++)
        scanf(" %d",&nums[i]);
    for(i=0;i<=N;i++)
        for(j=0;j<=M;j++)
            ispossible[i][j]=0;//initialising to 0
    for(i=0;i<=N;i++)
        ispossible[i][0]=1;
    for(i=1;i<=N;i++)
    {
        for(j=0;j<=M;j++) { if(ispossible[i-1][j]==1) ispossible[i][j]=1; if(j-nums[i]>=0 && ispossible[i-1][j-nums[i]]==1)
                ispossible[i][j]=1;
        }
    }
    if(ispossible[N][M]==1)
        printf("Yes");
    else
        printf("No");
    return 0;
}

I’ve taken the input in nums[1], nums[2], … , nums[n]
I’ve used a 2-D array named ispossible[ ][ ]. It has size(N+1) x (M+1).
ispossible[i][j] can be either 0 or 1. Its value will be 1 if it is possible to select some numbers from the set {nums[1],nums[2],nums[3],…,nums[i]} so that their sum equals to j; otherwise it will be 0.

The logic is really simple:

Our final goal is to find the value of ispossible[N][M], which will be 1 if it is possible to obtain a subset of {nums[1],nums[2],nums[3],…,nums[N]} with sum equal to M.
To find the value of ispossible[N][M] we need to find the value of each element in the two dimensional array ispossible[ ][ ].

In general, ispossible[i][j] = 1 iff one of the following two conditions is true:

1. ispossible[i-1][j] is 1. If ispossible[i-1][j] is 1 it means that it is possible to obtain a sum of j by selecting some numbers from {nums[1],nums[2],nums[3],…,nums[i-1]}, so obviously the same is also possible with the set {nums[1],nums[2],nums[3],….,nums[i-1],nums[i]}.

2. ispossible[i-1][j-nums[i]] is 1. If ispossible[i-1][j-nums[i]] is 1 it means that it is possible to obtain a sum of (jnums[i]) by selecting numbers from {nums[1], nums[2],…,nums[i-1]}. Now if we select nums[i] too, then we can obtain a sum of (jnums[i])+nums[i] = j. Therefore ispossible[i][j] is 1.

The above two points serve as the DP equation for calculating all values in ispossible[ ][ ].

Note that first it is important to set ispossible[i][0] = 1 (for 0<=i<=N). Because obviously its possible to obtain a sum = 0 from any value of i.
This algorithm has time complexity O(N*M) and space complexity O(N*M).
The drawback of this algorithm and the reason why I said “in some cases” before is that if N*M is too large, then an array of the required size cannot be declared. So, this method works well only in those cases where N*M is around 10^8.

As I said earlier, Dynamic Programming is not a single algorithm but it is a programming strategy. A programming strategy that can be mastered only by practice. Dynamic programming questions are always unique and new. Each question has its own equation. By practicing enough DP questions you’ll learn how to recognize DP questions and how to think of their equation and solution.

RELATED PROBLEM:
Paying up ( Codechef ): This is a really simple DP problem that is exactly as the problem described in this post.
Fence ( Codeforces )
More questions can be found on Codeforces here.

Revising Recursion

I was going to write about Backtracking this time. But then I thought that it would be better to post about recursion first as it is a very important weapon. Algorithms such as Backtracking, Depth First Search, Disjoint Set Union (also known as Union Find), Segment trees and many other graph algorithms are best implemented using recursion.

Recursion is when a problem is defined in terms of smaller instances of the same problem. A very well known example is the factorial of a number. We can define factorial of a number in two ways.
First is the iterative way,
factorial(n)= product(i=1 to n) ( i )

The second way is the recursive way,
factorial(n)=n*factorial(n-1) .

Of course, both the definitions lead to the same values. In the recursive definition, the factorial of n is shown in terms of factorial of (n-1) (i.e. in terms of a smaller factorial problem).

Programs implemented using recursion can be incredibly short and at the same time incredibly difficult to understand. So it’s essential that some properties of recursion are understood properly before starting the algorithms I mentioned in the start of my post.

Let’s look at the code for factorial of a positive number, n. The code is pretty straight forward.

long long factorial(long long n)
{
  if(n==0)
    return 1;
  long long a=factorial(n-1);
  return n*a;
}

A recursive function is one that can call itself.

When a function is called, we say that an instance of that function is created.

Here, factorial() is a recursive function. Note that, it has a call to itself in its body. factorial() is capable of calling itself, and the call at line 5 is called a recursive call.

When a function calls itself, a different instance of that function is created whose variables are completely different than the calling function. Eg. if I want to find 4!, I will call factorial(4). factorial(4) will in turn call factorial(3). The variables ‘n’ and ‘a’ are different for both these instances of factorial(), i.e. the variables have different locations in memory and store different values. For the first one n=4 and for the second one n=3.

All instances of factorial() will execute completely independent of each other. One instance of factorial doesn’t care about the value of ‘n’ or any other variable in other instances of factorial().

For an instance of a recursive function to terminate, all its child instances should first terminate. Eg. For factorial(4) to terminate, first its child instance i.e. factorial(3) should terminate.

recursion_factorial

Recursion depth:

An instance of a recursive function is said to be at depth d, if there are d instances of that function present above it.
In the previous example, factorial(4) is at depth 0, factorial(3) is at depth 1, factorial(2) is at depth 2, factorial(1) is at depth 3 and factorial(0) is at depth 4.

Consider the following code snippet:

void func(int depth)
{
  if(depth==10)
  {
    printf("reached depth of %d, now returning back",depth);
      return;
  }
  printf("at depth %d\n",depth);
  func(depth+1);
  printf("returning from depth of %d\n",depth);
}
int main()
{
  func(0);
  return 0;
}

In this code, we are limiting the maximum depth to which func() can be recursively called to 10.
Keep in mind that each instance of func() has different ‘int depth’ variable.
Notice how we are passing an incremented value of depth in the recursive function call func(depth+1).
This is a very common technique to increment values while calling recursively.

An infinite recursion is also possible which is just like an infinite loop. Suppose I forgot to increment the depth while passing it:

void func(int depth)
{
  if(depth==10)
  {
    printf("reached depth of %d, now returning back",depth);
      return;
  }
  printf("at depth %d\n",depth);
  func(depth);//notice the lack of +1 here
  printf("returning from depth of %d\n",depth);
}
int main()
{
  func(0);
  return 0;
}

This code will run until eternity as the condition that ends the recursion is never met.
Okay, so this covers the basic terminology and few techniques.

As an advanced example, lets write a program to find all length n permutations of the digits 1,2,3 using recursion.

int permutationcount=0;
void permute(int atdepth, int n, int selections[])
{
  if(atdepth==n)
  {
    int i;
    printf(“Permutation %d: “,permutationcount);
    for(i=0;i<n;i++)
      printf(“%d “,selections[i]);
    printf(“\n”);
    permutationcount++;
    return;
  }
  selections[atdepth]=1;
  permute(atdepth+1,n,selections); // select 1 for current position
  selections[atdepth]=2;
  permute(atdepth+1,n,selections); // select 2 for current position
  selections[atdepth]=3;
  permute(atdepth+1,n,selections); // select 3 for current position
  return;
}
int main()
{
  int a[1000],n;
  scanf(" %d",&n);
  permute(0,n,a);
  return 0;
}

See the following image to understand the order of function calls:
recursion_permute

A few important observations:
1. An instance of the recursive function permute() at depth d sets the value of selections[d], i.e. the (d+1)th position in the final permutation. And then calls permute() again to compute the rest of the permutation.
2. Each instance of permute() contains three recursive calls within it. Each one represents a different selection of number for the current position.
3. When a depth of n is reached, n numbers have been set in the selections[] array. And the array is printed.
4. A recursion depth of n is reached 3^n times. Because there are 3^n permutations possible.
5. Global variables are preserved among all the recursive instances of permute(). The variable permutationcount is used to keep track of the number of permutations.

For a particular instance of permute(), the call at line 17 will be executed only when the call at line 15 (along with all the subsequent calls that arise from inside it) complete execution.
And the call at line 19 will be executed only when the calls at lines 15 and 17 (along with all their respective subsequent calls) complete execution.

Recursion may seem overwhelming at first and it may take time for you to get used to it. But once you get the hang of it, there’s nothing more wonderfully magic in programming than recursive functions. Don’t give up on them just because they seem difficult at first sight.