notepad of ttan: A Notes to Hacker's Delight

Hacker's delight is a interesting book. The only problem is it skiped many steps and hard to follow. For example, one of the topic is how to cout the number of binary 1s for a unsigned interger?

1. Easy answer start from here
unsigned int CountBitOne(unsigned int value)
{
unsigned int count = 0;
while(value)
{
count += value & 1;
value >>= 1;
}
return count;
}
Note: don't use for loop! That's not efficient as while loop. Think about the sparse 1 case.

2. A well know trick.
unsigned int CountBitOne(unsigned int x)
{
unsigned int count = 0;
while(x)
{
++count;
x &= (x -1);
}
return count;
}
It is easy to prove x&(x-1) reset the least significiant bit of 1 of value. Suppose we have 8bits interger value
x b'XXXX1000 (Capital X means either 1 or 0)
x-1 b'XXXX0111
x&(x-1) b'XXXX0000

3. Can you do more optimization? Look up table is a good, and probbaly the fast solution, just make sure your lookup table is always in cache.

4. Method 1 and 2 count one bit at one time. Can we do more bits at the same time?
Divide and Conquer Algorithm. Suppose there is a 8bit integer 213(11010101 in binary), the algorithm works like this(each time merge two neighbor blocks):
+-------------------------------+
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | <- x
| 1 0 | 0 1 | 0 1 | 0 1 | <- first time merge
| 0 0 1 1 | 0 0 1 0 | <- second time merge
| 0 0 0 0 0 1 0 1 | <- third time ( answer = 00000101 = 5)
+-------------------------------+

uint32_t CountBitOne(uint32_t x)
{
x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
x = (x & 0x0000FFFF) + ((x >> 16)& 0x0000FFFF);
return x;
}
I changed the parameter's type to uint32_t since the introduction of magic binary numbers.

5. Divide and conqure method provided in Hacker's delight
https://books.google.com/books?id=iBNKMspIlqEC&pg=PA66&hl=en#v=onepage&q&f=false

uint32_t pop(uint32_t x)
{
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}

The first line of this algorith has the same effect as method 4, proved by truth table
---------------------------------------------
| v | (v >> 1) & 0b0101 | v - x |
---------------------------------------------
0b00 0b00 0b00
0b01 0b00 0b01
0b10 0b01 0b01
0b11 0b01 0b10

6. Last one is the so called "fastest way—without using lookup tables and popcount.", copied from
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
It counts the set bits with just 12 operations.
uint32_t popcount(uint32_t v) {
v = v - ((v >> 1) & 0x55555555); // put count of each 2 bits into those 2 bits
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // put count of each 4 bits into those 4 bits
return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
The trick is to multiply the result by 0b10101010 which has an interesting property. If our number has four bytes, A B C D, it will result in a new number with these bytes A+B+C+D B+C+D C+D D. A 4 byte number can have maximum of 32 bits set, which can be represented as 0b00100000.

All we need now is the first byte which has the sum of all set bits in all the bytes, and we get it by >> 24.

7. Intel x86 CPU has a POPCOUNT in SSE instruction set to count the number of 1s. On the GNU compiler you can just use:
int __builtin_popcount (unsigned int x);
In the worst case the compiler will generate a call to a function. In the best case the compiler will emit a cpu instruction to do the same job faster. Unfortunately there is no equivelant on ARM yet.

8. Why I have to count 2 bits by 2 bits? A very interesting method has been developed at the MIT in the 1970's:
int bitcount(unsigned int n)
{
register unsigned int tmp;

tmp = n - ((n >> 1) & 033333333333)
- ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}
MIT HAKMEM Count is funky. Consider a 3 bit number as being 4a+2b+c. If we shift it right 1 bit, we have 2a+b. Subtracting this from the original gives 2a+b+c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a+b+c, which is the number of bits in the original number. How is this insight employed? The first assignment statement in the routine computes tmp. Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1's in the corresponding three bit positions in n. The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. This is accomplished by right-shifting tmp by three bits, adding it to tmp itself and ANDing with a suitable mask. This yields a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970's.

notepad of ttan

prettify

Oct 15, 2015

A Notes to Hacker's Delight

No comments:

Post a Comment