First of all, I’d like to apologize for not having posted anything at all since so many weeks. I’ll try to be fairly regular from now.
The topic that I am going to discuss now is called Bit Masking. This technique is useful when you have a set and have to select a subset from it that satisfies a particular condition. The following would be a nice example. It is called the ‘Subset Sum Problem’.
You have n coins with you; each coin has a particular value. You want to select some coins such that the sum of value of the selected coins is m.
So, each coin can have one of two fates: either it is selected or it is not. Lets solve this problem for n=4, m=10 and the value of coins as 2,3,5,7. There are a total of 2^4=16 subsets possible. In other words, there are 16 ways of selecting coins.
Let me draw your attention to an interesting fact. When you count from 0 to 15 in binary, you obtain all the possible length 4 permutations of 0s and 1s. We shall use this fact to generate all the subsets of {2,3,5,7}. Each of the numbers from 0 to 15 will represent a different subset of the set {2,3,5,7}. For instance, the number 14 (1110 in binary) represents the subset {2,3,5}, the number 10 (or 1010 in binary) represents the subset {2,5}, and 15 (1111 in binary) represents {2,3,5,7} and 0 (or 0000 ) represents {}. So, if the ith bit in the binary representation is 1, then the ith number in the set is selected otherwise it is rejected.
Just for clarity, I’m listing below the subset that each number represents:
Initial set= {2,3,5,7}
Number | Its Binary | Subset it represents |
0 | 0000 | {} |
1 | 0001 | {7} |
2 | 0010 | {5} |
3 | 0011 | {5,7} |
4 | 0100 | {3} |
5 | 0101 | {3,7} |
6 | 0110 | {3,5} |
7 | 0111 | {3,5,7} |
8 | 1000 | {2} |
9 | 1001 | {2,7} |
10 | 1010 | {2,5} |
11 | 1011 | {2,5,7} |
12 | 1100 | {2,3} |
13 | 1101 | {2,3,7} |
14 | 1110 | {2,3,5} |
15 | 1111 | {2,3,5,7} |
So, to find the answer for the initial problem, all we have to do is iterate a variable (say, i) from 0 to 15; during each iteration find the subset that i represents and then check if it satisfies the condition.
The code will look something like
#include<stdio.h> int main() { int n,coins[10000],j; long long m; scanf(" %d %lld",&n,&m); long long last=(1<<n)-1; // generalizing, we have iterate from 0 to 2^n - 1 long long i,temp,currentsum; i=0; while(i<n) { scanf(" %d",&coins[i]); i++; } for(i=0;i<=last;i++) { temp=i; //stored i in a different variable to save it from modification currentsum=0; // to keep track of the sum of values of selected coins j=0; /*-----SUBSET GENERATION PROCESS START-----*/ printf("Checking subset : { "); while(j<n) // for each bit of temp { if(temp%2==1) //examining the last bit (LSB) of temp. If it is 1, we take the coin { printf("%d ",coins[j]); currentsum+=(coins[j]); } temp>>=1; //right shifting temp, so that we may analyze the next least significant bit in the next iteration of while loop j++; } /*-----SUBSET GENERATION PROCESS OVER-----*/ printf("} Sum= %lld",currentsum); /*-----CONDITION EVALUATION FOR THE CURRENT SUBSET-----*/ if(currentsum==m) { printf("Found solution subset"); //break; } /*-----CONDITION EVALUATION OVER-----*/ printf("\n"); } return 0; }
The body of the outer ‘for’ loop can be changed in accordance to what the problem is. In this case, we just needed the sum of numbers of each subset, so I did a running total of the value of coins selected. And when we obtain the sum of that subset, we check if it is equal to m. I’ve added printfs in the code at various places to make things clearer. Try it out by running it on your machine.
There is an obvious drawback to this method. As we are examining all possible subsets, this is a brute-force approach. If n becomes greater than 25 this method will be very slow. And, of course, we also have the limitation that we cannot have an integer larger than 64 bits in C/C++, so this method will not work for n greater than 64 ( though, as mentioned earlier, the method already becomes impractical at around n=25) .
There exists no efficient algorithm for Subset Sum problem. We may optimize our algorithm in many different ways, but a perfect solution has not yet been discovered. All algorithms that exist currently will become slower and, therefore, inefficient very quickly as n increases. Bit Masking is one of the many solutions that exist currently. There are many other, more efficient solutions (comparatively more efficient) that are quite complex to be discussed currently.
In a nutshell, this method is very easy to code and is very good for programming competitions if n is small enough. There is an alternate method very popular in competitive programming called ‘Backtracking’ which I’ll discuss in future.
If you want to practice bit masking then follow these links:
http://www.codechef.com/problems/MARCHA1 (Simple)
http://www.codechef.com/problems/LEALCO (Medium difficulty)
NOTE:
1. For those who are comfortable with the notation of complexity, the computational complexity of this algorithm is O(n*2^n)
2. Lines 26 and 27 in the code may just as well been
printf("%d ",coins[n-j-1]); currentsum+=(coins[n-j-1]);
because it doesn’t matter if you start counting the bits from the left or from the right
temp>>=1;
this appears at line 29. Is that correct? I think it should be >> only.
sorry! Does that mean, temp=temp>>1! but do you need to assign explicitly?
It is shorthand for temp=temp>>1 . And yes, we do need to assign temp>>1 to temp explicitly
And at line 22, if you want to examine all the bits, the condition in the while loop should be something different, like temp!=0 or j<noOfBitsInAnInteger, shouldn't it?. Great blog though!
Thanks 🙂
The number of bits in temp can be at most n, that’s why I used j<n.
temp!=0 will work just as well.
Thanks a lot!
In line 29, why is the bit right shifted when the next LSB will be present to the left of the last LSB?
also could you explain line 7??
thanks a lot.
(1<<n)-1 is equal to (2^n) -1.
As you know, for n numbers there are 2^n subsets. Each of these subsets can be represented by an integer from 0 to (2^n)-1.
The first subset will be represented by 0 ( it will be the empty subset) and the last subset will be (2^n)-1 ( it will be the subset with all numbers in it ).
Right shifting is done so that the concerned bit is always present at the Least Significant Bit ( rightmost bit ) position.
By performing temp>>=1, the current LSB is removed and the next LSB is shifted to the rightmost position, thus enabling us to extract it in the next iteration by temp%2.
Great, just the explanation I needed 🙂 cheers.
Great effort.. Keep it up.