graph

Algorithm #9 : Depth- and Breadth- First Search

This post is about the graph traversal algorithms, Depth First Search (DFS) and Breadth First Search (BFS).
BFS and DFS are one of the algorithms for graph exploration. Graph exploration means discovering the nodes of a graph by following the edges. We start at one node and then follow edges to discover all nodes in a graph. The choice of first node may be arbitrary or problem specific.

The difference between BFS and DFS is the order in which the nodes of a graph are explored.
If you are not already familiar with BFS and DFS in theory, I recommend that you read about them. Because I’m going to focus more on their implementation here.

In a nutshell, DFS continues on one path and explores it completely before going down another path.
But in BFS we progress equally in all possible paths.

The following gifs will give you a good general idea about the two.
Here, the nodes are numbered according to the order in which they are explored.

Depth First Search
DFS

Breadth First Search
BFS

[Both the images were taken from commons.wikimedia.org]

IMPLEMENTATION:
We can use any of the four graph representation methods that I introduced in my post Representation of Graphs. In this post we’ll use Adjacency list and assume that the input is the edges in form of pairs of positive integers (i.e. Type 2, if you refer to my post). The nodes are numbered from 1 to n.

For DFS:
DFS has to be implemented using a stack data structure. As recursion uses the internal stack, we can use recursion as follows:

int adjlist[101][101]={0};
int degree[101]={0};
int done[101]={0};//this array marks if a node has already been explored
void dfs(int at)
{
	if(done[at]==1)//if the node has already been explored, then return
		return;
	printf("At node %d\n",at);
	done[at]=1;
	int i=0;
	while(i<degree[at])//for each of the edges on this node
	{
		dfs(adjlist[at][i]);
		i++;
	}
	return;
}
int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjlist[a][degree[a]]=b;
        degree[a]++;
        adjlist[b][degree[b]]=a;
        degree[b]++;
        i++;
    }
    dfs(1);//start with any node. node 1 is the first node here
    return 0;
}

For BFS:
BFS needs a Queue data structure for its implementation. Here I use an array queue[] and integers front and rear to implement Queue.


int main()
{
    int n,m,i,a,b;
    int adjlist[101][101]={0};
    int degree[101]={0};
    scanf(" %d %d",&n,&m);
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjlist[a][degree[a]]=b;
        degree[a]++;
        adjlist[b][degree[b]]=a;
        degree[b]++;
        i++;
    }
    int queue[101],front=0,rear=0;
    int done[101]={0};//this array marks if a node has already been explored
    int at;
    queue[rear]=1;//start with any node. node 1 is the first node heres
    rear++;
    done[1]=1;
    while(front!=rear)
    {
        at=queue[front];
        printf("At node %d\n",at);
        front++;
        for(i=0;i<degree[at];i++)
        {
            if(done[adjlist[at][i]]!=1)
            {
                queue[rear]=adjlist[at][i];
                rear++;
                done[adjlist[at][i]]=1;
            }
        }
    }
    return 0;
}

The array done[] is used to mark the nodes that have already been visited. This has to be done to stop the code from re-discovering already visited nodes and running forever.

Above is a bare-bones implementation of the two algorithms. They do nothing more than exploring the graph. Apart from only exploring the graph, DFS and BFS can also be used to compute other information too.
For example, if we have a tree as input, we can modify the above DFS code to compute the depth of each node in the tree, and also the size of the sub-tree rooted at each node.

int adjlist[101][101]={0};
int degree[101]={0};
int depth[101]={0};
int sizeofsubtree[101]={0};
int done[101]={0};//this array marks which node has already been explored
int dfs(int at,int currentdepth)
{
	if(done[at]==1)//if the node has already been explored, then return
		return 0;
	depth[at]=currentdepth;
	printf("Node %d at depth %d\n",at,depth[at]);
	done[at]=1;
	int i=0,size=1;//initialised to 1 as current node is also part of the sub-tree rooted at current node
	while(i<degree[at])//for each of the edges on this node
	{
		size+=dfs(adjlist[at][i],currentdepth+1);
		i++;
	}
	sizeofsubtree[at]=size;
	return sizeofsubtree[at];
}
int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjlist[a][degree[a]]=b;
        degree[a]++;
        adjlist[b][degree[b]]=a;
        degree[b]++;
        i++;
    }
    dfs(1,0);//start with root node. assuming that node 1 is the root node here
    i=1;
    printf("Depth of subtrees:\n");
    while(i<=n)
    {
        printf("Rooted at %d: %d\n",i,sizeofsubtree[i]);
        i++;
    }
    return 0;
}

A second variable currentdepth is passed to each dfs() instance that represents the depth of the current node. Notice that at line 16, currentdepth+1 has been passed to dfs(); because child of the current node has depth one more than the parent node.

Each recursive instance of dfs() returns the size of the sub-tree rooted at a node.
At every node, we sum up the values returned by dfs() for each child node ( This is done at line 16 ). The size of a sub-tree rooted at a node is the summation of sizes of sub-trees rooted at its children + 1. This is how the size of all sub-trees is computed.

COMPARISION BETWEEN DFS AND BFS:
If all the nodes of a graph have to be discovered, then BFS and DFS both take equal amount of time. But, if we want to search for a specific node, both algorithms may differ in execution time.

DFS is more risky compared to BFS. If a node has more than one edge leading from it, the choice of which edge to follow first is arbitrary.
As we don’t have any intelligently way of choosing which edge to follow first, it may be possible that the required node is present down the first edge that we choose and it is also possible that the required node is present down the last edge that we choose from that node. In the former case, DFS will find the node very quickly, but in the latter case DFS will take a lot of time. If we take a wrong path at some node, DFS will have to completely traverse the whole path before it can go down another path. That’s why DFS is more risky than BFS.

In BFS, all paths are explored equally. So, in some cases the search may be a little slower than DFS but the advantage of BFS is that it doesn’t arbitrary favor one path over the other.

BFS is very useful in problems where you have the find the shortest path. This is because BFS explores closer nodes first. So, when we find the node the first time, we can be sure that this is the shortest path to it. Whereas in DFS, we’ll have to find all possible paths and then select the shortest path.
BFS can also be used in checking if the graph is bipartite.

DFS is useful in problems where we have to check connectivity of graph and in topological sorting.

Suppose we have an infinite graph. If we use DFS to find a specific node, the search will never end if the node is not in the first path that the algorithm chooses. But, given sufficient time, BFS will be able to find it.

COMPLEXITY OF BFS AND DFS:
The complexity of DFS and BFS is O(E), where E is the number of edges.
Of course, the choice of graph representation also matters. If an adjacency matrix is used, they will take O(N^2) time (N^2 is the maximum number of edges that can be present). If an adjacency list is used, DFS/BFS will take O(E) time.

Related Problems:

Codechef : Fire Escape Routes
Hackerearth : Quidditch Practice Problem
Hackerearth : Close call

Advertisements

Representation of Graphs

Graph is a very versatile data structure. It can be used to solve a large number of problems. You’d be surprised to see how easy the problems become once they are interpreted in terms of a graph.

Before I can start with graph algorithms like DFS, BFS, Disjoint Set Union, Segment trees, etc. it is important to know how a graph can be represented in memory.

I’m assuming that you have knowledge of elementary graph theory and the concepts of node, edge, tree etc.

Two things are essential to a graph: its nodes and edges. As you know, each edge connects two nodes.
For all future references lets take N as the number of nodes and M as the number of edges. Each edge is of the form (a,b) denoting that nodes a and b are connected (1<=a,b<=N ) .
Following are two ways to store a graph. I’ll show you how to implement them:

1. Adjacency Matrix: Use a 2-D array of size N x N. Let’s call this matrix adjmat[N][N]. adjmat[i][j]=1 iff there is an edge from i to j.
For every undirected edge (a,b) in the graph, mark adjmat[a][b]=1 and adjmat[b][a]=1. If you have a directed edge from a to b, then mark only adjmat[a][b]=1.

2. Adjacency List: Use a 1-D array for each node. For every node, store in that array the nodes that it is connected to. So we’ll have N 1-D arrays, one for each node.

Lets take the following graph as example:
graph
The adjacency matrix would be:

adjmat

and the adjacency list would be:
adjlist

The adjacency list is a more efficient way to store information about a graph. It consumes lesser memory and is more time efficient as compared to adjacency matrix. The reason for this is that the adjacency matrix has one element in adjmat[][] for each possible edge in the graph i.e. N^2 possible edges. What if we have a tree? In case of a tree we have only N-1 edges. We don’t need to use N^2 amount of space just to store info about N-1 edges.
In such cases adjacency list is useful. Adjacency list can be implemented using a linked list OR dynamic array. And it consumes only that much space that is required.

Using an adjacency list also leads to lesser number of iterations in later operation (eg. during DFS, BFS, etc.). This is because the number of entries in adjacency list is 2 X M . But in case of adjacency matrix it is always N^2.

Of course in a situation where the number of edges is close to N^2, both methods will perform equally well.

In competitive programming competitions, most of the time a graph can be given as input in two ways. (Assume undirected graph)

1. The problem will give you N, and N lines will follow. In the beginning of ith line will be Di (Degree of ith node) and a list of the nodes that the ith node is connected to.
For the example graph given above, the input would look like:
5
3 2 3 4
1 1
2 1 4
2 1 3
0

2. The problem will give you N and M, and each of the next M lines will contain two integers a and b denoting an edge between nodes a and b.
For the example graph the input would be like this:
5 4
1 2
1 3
1 4
3 4

Both can be handled with either Adjacency Matrix or Adjacency List.

Type-1 with Adjacency Matrix:

int main()
{
    int n,i,j,temp;
    scanf(" %d",&n);
    int adjmat[101][101]={0};//size of this matrix should be (N+1) X (N+1) because we are using 1 based numerbing for nodes
    int degree[101]={0};
    i=1;
    while(i<=n)
    {
        scanf(" %d",&degree[i]);
        j=0;
        while(j<degree[i])
        {
            scanf(" %d",&temp);
            adjmat[i][temp]=1;
            adjmat[temp][i]=1;
            j++;
        }
        i++;
    }
    return 0;
}

Type 1 with adjacency list:

int main()
{
    int n,i,j,temp;
    scanf(" %d",&n);
    int *adjlist[101];
    int degree[101];
    i=1;
    while(i<=n)
    {
        scanf(" %d",&degree[i]);
        adjlist[i]=(int *)malloc(sizeof(int)*degree[i]);
        j=0;
        while(j<degree[i])
        {
            scanf(" %d",&temp);
            adjlist[i][j]=temp;
            j++;
        }
        i++;
    }
    return 0;
}

Type 2 with adjacency matrix:

int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    int adjmat[101][101]={0};
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjmat[a][b]=1;
        adjmat[b][a]=1;
        i++;
    }
    return 0;
}

Type2 with adjacency list:

int main()
{
    int n,m,i,a,b;
    scanf(" %d %d",&n,&m);
    int adjlist[101][101]={0};
    int degree[101]={0};
    i=0;
    while(i<m)
    {
        scanf(" %d %d",&a,&b);
        adjlist[a][degree[a]]=b;
        degree[a]++;
        adjlist[b][degree[b]]=a;
        degree[b]++;
        i++;
    }
    return 0;
}

The above implementation for Type 2 with Adjacency List doesn’t use dynamic arrays ( as it would involve a lot of realloc() and would get complicated ). So it still uses a lot of memory. But the advantage of lesser iterations for later operations, as mentioned before, is still there.

If you want to use dynamic arrays, study about C++ STL Vectors. Vectors are dynamic arrays. I recommend that you read the C++ STL documentation for vectors as vectors are incredibly useful and you don’t need to worry about malloc, realloc etc. as that is handled automatically by vectors.
Follow this link. (For now, study about push_back(), size(), pop_back(), begin() and end() only)

In the next post, I’ll write about how to implement DFS and BFS.