KEMBAR78
Bit Hacks: 6.172 Performance Engineering of Software Systems | PDF | Computer Architecture | Numbers
0% found this document useful (0 votes)
104 views70 pages

Bit Hacks: 6.172 Performance Engineering of Software Systems

Bithacking Doc

Uploaded by

sbatati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views70 pages

Bit Hacks: 6.172 Performance Engineering of Software Systems

Bithacking Doc

Uploaded by

sbatati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

!

6.172
Performance !"##$*
Engineering %&'&(
of Software
Systems

"#)*+)$#)*+,*-./01*

LECTURE 3
Bit Hacks
Julian Shun

1
© 2008–2018 by the MIT 6.172 Lecturers
Binary Representation
Let x = ⟨xw–1xw–2…xO⟩ be a w-bit computer word. The
unsigned integer value stored in x is
The prefix Ob
Y਷
designates a
Z  ం ZM  Y
M
Boolean constant.
Mį
For example, the 8-bit word O b 1 O O 1 O 1 1 O represents
the unsigned value 15O = 2 + 4 + 16 + 128.

The signed integer (two’s complement) value stored


in x is
sign bit
Y਷
Z  ৃం ZM M ৄ ਷ ZY਷ Y਷ Y
Mį
For example, the 8-bit word O b 1 O O 1 O 1 1 O
represents the signed value –1O6 = 2 + 4 + 16 – 128.
2
© 2008–2018 by the MIT 6.172 Lecturers
Two’s Complement

We have ObOO…O = O.

What is the value of x = Ob11...1?


Y਷
Z  ৃం ZM M ৄ ਷ ZY਷ Y਷
Mį
Y਷
 ৃం M ৄ ਷ Y਷
Mį

 Y਷ ਷ 
਷ Y਷
 ਷ Y

3
© 2008–2018 by the MIT 6.172 Lecturers
Complementary Relationship

Important identity
Since we have x + ~x = -1, it follows that
-x = ~x + 1.

Example
x = ObO11O11OOO
~x = Ob1OO1OO111
–x = Ob1OO1O1OOO

4
© 2008–2018 by the MIT 6.172 Lecturers
Binary and Hexadecimal
Decimal Hex Binary Decimal Hex Binary
! ! !!!! " " #!!!
# # !!!# $ $ #!!#
% % !!#! #! & #!#!
' ' !!## ## ( #!##
) ) !#!! #% * ##!!
The prefix !1 + !#!# #' , ##!#
designates a - !##! #) . ###!
hex constant. / !### #+ 0 ####

To translat from hex to binary, translate each hex


digit to its binary equivalent, and concatenate the bits.
Example: 0xDEC1DE2CODE4F00D is
įįįįįįįįįįįįįįįįįįįįįįįįįįįįįįįį
।।।।।।।।।।।।।।।।
& ' %  & '  % į & '  ( į į &
5
© 2008–2018 by the MIT 6.172 Lecturers
C Bitwise Operators
Operator Description
& AND
| OR
^ XOR (exclusive OR)
~ NOT (one’s complement)
<< shift left
>> shift right

Examples (8-bit word)


A = Ob1O11OO11
B = ObO11O1OO1
A&B = O b O O 1 O O O O 1 ~A = O b O 1 O O 1 1 O O
A|B = O b 1 1 1 1 1 O 1 1 A >> 3 = O b O O O 1 O 1 1 O
A^B = O b 1 1 O 1 1 O 1 O A << 2 = O b 1 1 O O 1 1 O O
6
© 2008–2018 by the MIT 6.172 Lecturers
Set the kth Bit
Problem
Set !th bit in a word "%to #.

Idea
Shift and OR.
,%*%"%'%(#%&&%!)-%

Example
!%*%+%
"% #$####$#$
%%# # $ # # $ # %
#%&&%!% $$$$$$$$#
%%$ $ $ $ $ $ $ %
"%'%(#%&&%!)% #$####$##
%%# # $ # # $ # %

7
© 2008–2018 by the MIT 6.172 Lecturers
Clear the kth Bit
Problem
Clear the !th bit in a word ".

Idea
Shift, complement, and AND.
-%+%"%*%'(#%&&%!).%

Example
!%+%,%
"% #$####$##
%%# # $ # # $ # %
#%&&%!% $$$$$$$$#
%%$ $ $ $ $ $ $ %
'(#%&&%!)% ########$
%%# # # # # # # %
"%*%'(#%&&%!)% #$####$#$
%%# # $ # # $ # %

8
© 2008–2018 by the MIT 6.172 Lecturers
Toggle the kth Bit
Problem
Flip the !th bit in a word ".

Idea
Shift and XOR.
,%*%"%'%(#%&&%!)-%

Example ($%! #)
!%*%+%
"% #$####$#$
%%# # $ # # $ # %
#%&&%!% $$$$$$$$#
%%$ $ $ $ $ $ $ %
"%'%(#%&&%!)% #$####$##
%%# # $ # # $ # %

9
© 2008–2018 by the MIT 6.172 Lecturers
Toggle the kth Bit
Problem
Flip the !th bit in a word ".

Idea
Shift and XOR.
*%+%"%'%(#%&&%!),%

Example (#%! $)
!%+%-%
"% #$####$##
%%# # $ # # $ # %
#%&&%!% $$$$$$$$#
%%$ $ $ $ $ $ $ %
"%'%(#%&&%!)% #$####$#$
%%# # $ # # $ # %

10
© 2008–2018 by the MIT 6.172 Lecturers
Extract a Bit Field
Problem
Extract a bit field from a word !.

Idea
Mask and shift.
1!()($%&'2(**(&+,-.3(

Example
&+,-.(/(0(

!( "#""""
(#"#"
("#""#"(
$%&'( #####"
("""#
(######(
!()($%&'( #####"
(#"##
(######(
!()($%&'(**(&+,-.( ############(
"#"#(

11
© 2008–2018 by the MIT 6.172 Lecturers
Set a Bit Field
Problem
Set a bit field in a word !)to a value ".

Idea
Invert mask to clear, and OR the shifted value.
!),)-!)*)+%&'(.)/)-")00)'1234.5)

Example
'1234),)6)
!) #$####
)$#$#
)#$##$#)
") $$$$$$$$$$$$)
$$##)
%&'() $$$$$#
)###$
)$$$$$$)
!)*)+%&'() #$###$
)$$$#
)#$##$#)
!),)-!)*)+%&'(.)/)-")00)'1234.5) #$###$
)$###
)#$##$#)
12
© 2008–2018 by the MIT 6.172 Lecturers
Set a Bit Field
Problem
Set a bit field in a word !)to a val
For safety’s sake:
Idea --")00)'1234.)*)%&'(.)

Invert mask to clear, and OR the shi d value.


!),)-!)*)+%&'(.)/)-")00)'1234.5)

Example
'1234),)6)
!) #$####
)$#$#
)#$##$#)
") $$$$$$$$$$$$)
$$##)
%&'() $$$$$#
)###$
)$$$$$$)
!)*)+%&'() #$###$
)$$$#
)#$##$#)
!),)-!)*)+%&'(.)/)-")00)'1234.5) #$###$
)$###
)#$##$#)
13
© 2008–2018 by the MIT 6.172 Lecturers
Ordinary Swap
Problem
Swap two integers # and %.
! " #$
# " %$
% " !$

14
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap
Problem
Swap ! and $ without using a temporary.
! " ! # $%
$ " ! # $%
! " ! # $%

Example
! &'&&&&'&
$ ''&'&&&'

15
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap
Problem
Swap !"and %"without using a temporary.
!"#"!"$"%&"
%"#"!"$"%&"
!"#"!"$"%&"

Example
!" '(''''('" '(('((''"
%" (('('''(" (('('''("

16
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap
Problem
Swap !&and "&without using a temporary.
!&#&!&$&"%&
"&#&!&$&"%&
!&#&!&$&"%&

Example
!& '(''''('& '(('((''& '(('((''&
"& (('('''(& (('('''(& '(''''('&

17
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap
Problem
Swap !(and $(without using a temporary.
!(%(!(&($'(
$(%(!(&($'(
!(%(!(&($'(

Example
!( "#""""#"( "##"##""( "##"##""( ##"#"""#(
$( ##"#"""#( ##"#"""#( "#""""#"( "#""""#"(

18
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap
Problem
Swap ! and $ without using a temporary.
! " ! # $%
$ " ! # $%
! " ! # $%

Example
! &'&&&&'& &''&''&& &''&''&& ''&'&&&'
$ ''&'&&&' ''&'&&&' &'&&&&'& &'&&&&'&

Why it works !# "# !#$#"# %!#$#"&#$#"#

XOR is its own inverse: ' ' ' '


' & & '
(! # $) # $ ! !
& ' & &
& & ' &
19
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap (Instant Replay)
Problem
Swap ! and $ without using a temporary.
! " ! # $%
$ " ! # $%
! " ! # $%

Example
! &'&&&&'&
$ ''&'&&&'

Why it works !# "# !#$#"# %!#$#"&#$#"#

XOR is its own inverse: ' ' ' '


' & & '
(! # $) # $ ! !
& ' & &
& & ' &
20
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap (Instant Replay)
Problem
Swap !"and %"without using a temporary.
Mask with 1’s !"#"!"$"%&"
where bits differ. %"#"!"$"%&"
!"#"!"$"%&"

Example
!" '(''''('" '(('((''"
%" (('('''(" (('('''("

Why it works !# "# !#$#"# %!#$#"&#$#"#

XOR is its own inverse: ( ( (" ("


( ' '" ("
)!"$"%*"$"%"! !"
' ( '" '"
' ' (" '"
21
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap (Instant Replay)
Problem
Swap !&and $&without using a temporary.
!&"&!&#&$%&
Flip bits in $&that $&"&!&#&$%&
differ from !. !&"&!&#&$%&

Example
!& )*))))*)& )**)**))& )**)**))&
$& **)*)))*& **)*)))*& )*))))*)&

Why it works !# "# !#$#"# %!#$#"&#$#"#

XOR is its own inverse: * * *& *&


* ) )& *&
'!&#&$(&#&$&! !&
) * )& )&
) ) *& )&
22
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap (Instant Replay)
Problem
Swap !&and $&without using a temporary.
!&"&!&#&$%&
Flip bits in !&that $&"&!&#&$%&
differ from $. !&"&!&#&$%&

Example
!& '(''''('& '(('((''& '(('((''& (('('''(&
$& (('('''(& (('('''(& '(''''('& '(''''('&

Why it works !# "# !#$#"# %!#$#"&#$#"#

XOR is its own inverse: ( ( (& (&


( ' '& (&
)!&#&$*&#&$&! !&
' ( '& '&
' ' (& '&
23
© 2008–2018 by the MIT 6.172 Lecturers
No-Temp Swap (Instant Replay)
Problem
Swap " and $ without using a temporary.
" & " # $'
$ & " # $'
" & " # $'

Example
" ()(((()( ())())(( ())())(( ))()((()
$ ))()((() ))()((() ()(((()( ()(((()(

Why it works
XOR is its own inverse: !" # $% # $ ! "
Performance !
Poor at exploiting instruction-level parallelism (ILP).
24
© 2008–2018 by the MIT 6.172 Lecturers
Minimum of Two Integers
Problem
Find the minimum ! of two integers " and #.

$% &" ' #(
! ) "*
+,-+ or ! ) &" ' #( . " / #*
! ) #*

Performance
A mispredicted branch empties the processor pipeline.
Caveat
The compiler is usually smart enough to optimize away
the unpredictable branch, but maybe not.
25
© 2008–2018 by the MIT 6.172 Lecturers
No-Branch Minimum
Problem
Find the minimum ! of two integers " and # without
using a branch.
! $ # % &&" % #' ( )&" * #''+

Why it works
! The C language represents the Booleans TRUE and
FALSE with the integers , and -, respectively.
! If " * #, then )&" * #' ! .,, which is all ,’s in
two’s complement representation. Therefore, we
have # % &" % #' ! ".
! If " " #, then )&" * #' ! -. Therefore, we have
# % - ! #.
26
© 2008–2018 by the MIT 6.172 Lecturers
Merging Two Sorted Arrays
!"#"$% &'$( )*+,*-.'/, 0 11+*!"+$%" 23
.'/, 0 11+*!"+$%" 43
.'/, 0 11+*!"+$%" 53
!$6*1" /#3
!$6*1" /78 9
:;$.* -/# < = >> /7 < =8 9
$? -04 @A 058 9
02BB A 04BBC /#DDC
E *.!* 9
02BB A 05BBC /7DDC
E
E
:;$.* -/# < =8 9
02BB A 04BBC
/#DDC
E
:;$.* -/7 < =8 9
02BB A 05BBC J FI FG HK
/7DDC
E
E
H FH IF IJ
27
© 2008–2018 by the MIT 6.172 Lecturers
Branching
!"#"$% &'$( )*+,*-.'/, 0 11+*!"+$%" 23
.'/, 0 11+*!"+$%" 43
.'/, 0 11+*!"+$%" 53
!$6*1" /#3
4 !$6*1" /78 9
:;$.* -/# < = >> /7 < =8 9
3 $? -04 @A 058 9
02BB A 04BBC /#DDC
E *.!* 9
Branch Predictable?
E
02BB A 05BBC /7DDC
1 Yes
?
E
2 :;$.* -/# < =8 9 2 Yes
?
02BB A 04BBC
/#DDC 3 No
?
E
1 :;$.* -/7 < =8 9 4 Yes
?
02BB A 05BBC
/7DDC
E
E

28
© 2008–2018 by the MIT 6.172 Lecturers
Branchless
!"#"$%H&'$(H)*+,*-.'/,H0H11+*!"+$%"H23H
.'/,H0H11+*!"+$%"H43H
.'/,H0H11+*!"+$%"H53H
!$6*1"H/#3H
!$6*1"H/78H9H
:;$.*H-/#H<H=H>>H/7H<H=8H9H
.'/,H%)?H@H-04HA@H058BH
.'/,H)$/H@H05HCH--05HCH048H>H-D%)?88BH
02EEH@H)$/BH
4HE@H %)?BH/#HD@H %)?BH
5HE@HF%)?BH/7HD@HF%)?BH
GH This optimization works well on
:;$.*H-/#H<H=8H9H some machines, but on modern
02EEH@H04EEBH
/#DDBH machines using %.#/,HD=I, the
GH branchless version is usually
:;$.*H-/7H<H=8H 9H
02EEH@H05EEBH slower than the branching
/7DDBH version. ! Modern compilers
GH
GH can perform this optimization
better than you can!
29
© 2008–2018 by the MIT 6.172 Lecturers
Why Learn Bit Hacks?

Why learn bit hacks if they don’t even work?


• Because the compiler does them, and it will help to
understand what the compiler is doing when you
look at the assembly code.
• Because sometimes the compiler doesn’t optimize,
and you have to do it yourself by hand.
• Because many bit hacks for words extend naturally
to bit and word hacks for vectors.
• Because these tricks arise in other domains, and so
it pays to be educated about them.
• Because they’re fun!

30
© 2008–2018 by the MIT 6.172 Lecturers
Modular Addition
Problem
Compute (!%+ ") mod #, assuming that $%! !%< #%
and $%! "%< #.
Division is expensive, unless
&%'%(!%)%"*%+%#,%
by a power of 2.

-%'%!%)%",% Unpredictable
&%'%(-%.%#*%/%-%0%-1#,% branch is expensive.

-%'%!%)%",% Same trick as


&%'%-%1%(#%3%1(-%4'%#**,% minimum.

31
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute 2⌈lg n⌉. Notation
lg n = log2n

32
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$ #( Example
!
22-222222-2-2222
))#(
# *+ # ,, -(
# *+ # ,, .(
# *+ # ,, &(
# *+ # ,, /(
# *+ # ,, -%(
# *+ # ,, 0.(
11#(

33
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$ #( Example
!
22-222222-2-2222
))#(
# *+ # ,, -( 22-222222-22----
# *+ # ,, .(
# *+ # ,, &(
# *+ # ,, /(
# *+ # ,, -%(
# *+ # ,, 0.(
11#(

34
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute /"lg ##.
!"#$%&'$*#(* Example
!
33.333333.3.3333*
))#(*
#*+,*#*--*.(* 33.333333.33....*
#*+,*#*--*/(* 33..33333..3....*
#*+,*#*--*&(*
#*+,*#*--*0(*
#*+,*#*--*.%(*
#*+,*#*--*1/(*
22#(*

35
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute /"lg ##.
!"#$%&'$.#(. Example
!
33-333333-3-3333.
))#(.
#.*+.#.,,.-(. 33-333333-33----.
#.*+.#.,,./(. 33--33333--3----.
#.*+.#.,,.&(. 33----333-------.
#.*+.#.,,.0(.
#.*+.#.,,.-%(.
#.*+.#.,,.1/(.
22#(.

36
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$/#(/ Example
!
33-333333-3-3333/
))#(/
#/*+/#/,,/-(/ 33-333333-33----/
#/*+/#/,,/.(/ 33--33333--3----/
#/*+/#/,,/&(/ 33----333-------/
#/*+/#/,,/0(/
#/*+/#/,,/-%(/ 33--------------/
#/*+/#/,,/1.(/
22#(/

37
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$/#(/ Example
!
33-333333-3-3333/
))#(/
#/*+/#/,,/-(/ 33-333333-33----/
#/*+/#/,,/.(/ 33--33333--3----/
#/*+/#/,,/&(/ 33----333-------/
#/*+/#/,,/0(/
#/*+/#/,,/-%(/ 33--------------/
#/*+/#/,,/1.(/
22#(/

38
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$0#(0 Example
!
33-333333-3-33330
))#(0
#0*+0#0,,0-(0 33-333333-33----0
#0*+0#0,,0.(0 33--33333--3----0
#0*+0#0,,0&(0 33----333-------0
#0*+0#0,,0/(0
#0*+0#0,,0-%(0 33--------------0
#0*+0#0,,01.(0
22#(0

39
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$0#(0 Example
!
33-333333-3-33330
))#(0
#0*+0#0,,0-(0 33-333333-33----0
#0*+0#0,,0.(0 33--33333--3----0
#0*+0#0,,0&(0 33----333-------0
#0*+0#0,,0/(0
#0*+0#0,,0-%(0 33--------------0
#0*+0#0,,01.(0
22#(0

40
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$ #( Example
!
22-222222-2-2222
))#(
# *+ # ,, -( 22-222222-22----
# *+ # ,, .( 22--22222--2----
# *+ # ,, &( 22----222-------
# *+ # ,, /(
# *+ # ,, -%( 22--------------
# *+ # ,, 0.( 2-22222222222222
11#(

41
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Bit !lg "" – . must be set
Compute !!lg "".
#$"%&'(% ") Example
#
22.222222.2.2222
**")
" +, " -- .) 22.222222.22....
" +, " -- !) 22..22222..2....
" +, " -- ') 22....222.......
" +, " -- /)
" +, " -- .&) 22..............
" +, " -- 0!) 2.22222222222222
11")
Populate all bits
Set bit !lg ""
to the right with .
42
© 2008–2018 by the MIT 6.172 Lecturers
Round up to a Power of 2
Problem
Compute ."lg ##.
!"#$%&'$ #( Example
!
22-222222-2-2222
))#(
# *+ # ,, -( 22-222222-22----
# *+ # ,, .( 22--22222--2----
# *+ # ,, &( 22----222-------
# *+ # ,, /(
# *+ # ,, -%( 22--------------
# *+ # ,, 0.( 2-22222222222222
11#(

Why decrement?
To handle the boundary case when # is a power of ..
43
© 2008–2018 by the MIT 6.172 Lecturers
Least-Significant 1
Problem
Compute the mask of the least-significant ! in word ".
# $ " % &'"()

Example
" **!******!*!****
'" !!*!!!!!!*!!****
" % &'"( * * * * * * * * * * * ! * * * *

Why it works
The binary representation of '" is (+")+!.
Question
How do you find the index of the bit, i.e., lg #?
44
© 2008–2018 by the MIT 6.172 Lecturers
Log Base 2 of a Power of 2
Problem
Compute lg 2, where 2 is a power of 3.
!"#$% &'#%()*% +,-.&'/# 0 121334++(5!!6758(+9
!"#$% '#% !"#:,.%;()< 0 =
1> ?> 3> 75> 5> @> 7)> 3@>
)> 58> )?> 8> 5)> 77> )8> 38>
(3> 7> 56> )(> ))> )3> 33> 6>
3)> 57> 76> 7(> )6> ?8> 36> ??>
(5> 73> (> 3(> 5@> )1> 55> )@>
(?> )7> )5> 3?> 35> 78> ?@> ?1>
7?> 37> 5(> 53> (1> 31> 7@> ?(>
71> 5?> ?6> ?7> 51> ?)> ?5> ?3
A9
. 0 !"#:,.%;B2 C +,-.&'/#D EE 78<9
45
© 2008–2018 by the MIT 6.172 Lecturers
Mathemagic Trick

5 volunteers who can follow directions

Introducing Jess Ray,


“The Golden Raytio”

46
© 2008–2018 by the MIT 6.172 Lecturers
Log Base 2 of a Power of 2
Why it works Example: k=3
A deBruijn sequence s of OOO111O1
length 2k is a cyclic O-1 O OOO
sequence such that each of 1 OO1
the 2k O-1 strings of length 2 O11
k occurs exactly once as a 3 111
substring of s. 4 11O
O b O O O 1 1 1 O 1 *24 ⇒ O b 1 1 O 1 O O O O 5 1O1
O b 1 1 O 1 O O O O >> 5 ⇒ 6 6 O1O
convert[6] ⇒ 4 7 1OO
start with
Performance all O’s
Limited by multiply and const int convert[8]
table look-up. = {O,1,6,2,7,5,4,3};
47
© 2008–2018 by the MIT 6.172 Lecturers
Queens Problem
Problem
Place ! queens on an ! ! ! chessboard so that no
queen attacks another, i.e., no two queens in any row,
column, or diagonal. Count the number of possible
solutions.

48
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

49
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

50
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

51
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

52
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

53
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

Backtrack!
54
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

55
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

Backtrack!
56
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

Backtrack!
57
© 2008–2018 by the MIT 6.172 Lecturers
Backtracking Search
Strategy
Try placing queens row by row. If you can’t place a
queen in a row, backtrack.

58
© 2008–2018 by the MIT 6.172 Lecturers
Board Representation
The backtrack search can be implemented as a simple
recursive procedure, but how should the board be
represented to facilitate queen placement?
• array of n2 bytes?
• array of n2 bits?
• array of n bytes?
• 3 bitvectors of size n, 2n-1, and 2n-1.

59
© 2008–2018 by the MIT 6.172 Lecturers
Bitvector Representation

Placing a queen in column


'*is not safe if
#$%&*(*)"*++*',-*
is nonzero.

" " ! " " " ! "*


#$%&*

60
© 2008–2018 by the MIT 6.172 Lecturers
Bitvector Representation

Placing a queen in row


"+ '+and column (+is not
"+ safe if
!+ #$%&+)+*!+,,+*'-(..+
is nonzero.
!+
"+
!
!+
!+
" ! " " ! " " "+
#$%&+

61
© 2008–2018 by the MIT 6.172 Lecturers
Bitvector Representation

Placing a queen in row #+and


!+ column (+is not safe if
!+ #$%&'+)+*"+,,+*-.".#/(00+
"+ is nonzero.
!+
"+
"+
!+
! ! ! ! " " ! "+
#$%&'+

62
© 2008–2018 by the MIT 6.172 Lecturers
Population Count I
Problem
Count the number of ! bits in a word ".
#$% &%'() "*'() ++%, Repeatedly eliminate
" -' " . !) the least-significant 1.
Example
" ((!(!!(!!!(!((((
" . ! ((!(!!(!!!((!!!!
" - &" . !,) ((!(!!(!!!((((((
Issue
Fast if the popcount is small, but in the worst case, the
running time is proportional to the number of bits in
the word.
63
© 2008–2018 by the MIT 6.172 Lecturers
Population Count II
Table look-up
!"#"$%4%&'!"4$'"4%&('")*+,-4.4
/4014214214*14214*14*14314214!1454674

8&94:$'"494.4074;4<.4074;4==.45>4
94?.4%&('");4@40;AA-74
Performance depends on the size of ;. The cost of
memory operations is a major bottleneck. Typical
costs:
! register: 24cycle,
! L1-cache: B4cycles,
! L2-cache: 204cycles,
! L3-cache: +04cycles, per ,B-byte cache line
! DRAM: 2+04cycles.
64
© 2008–2018 by the MIT 6.172 Lecturers
Population Count III
Parallel divide-and-conquer
!!"#$%&'%"(&)*)"
+,"-".//012"33"4526"""!!"745145" Notation:
+8"-"+,"9"/+,"33"1:26"!!"/71:11:25" E*"= E E ! E"
+4"-"+8"9"/+8"33";26" !!"/7;1;28"
*"times
+5"-"+4"9"/+4"33"826" !!"/78182;"
+1"-"+5"9"/+5"33"526" !!"/751521:"
+7"-"+1"9"/+1"33"126""!!"/71245"
!!"#<(=>'%"=<=?<>@'"
A"-"//A"BB"12"C"+72"D"/A"C"+726"
A"-"//A"BB"52"C"+12"D"/A"C"+126"
A"-"//A"BB"82"D"A2"C"+56"
A"-"//A"BB";2"D"A2"C"+46"
A"-"//A"BB"1:2"D"A2"C"+86"
A"-"//A"BB"452"D"A2"C"+,6"
65
© 2008–2018 by the MIT 6.172 Lecturers
Population Count III
11000010010110111111010001111000
1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 0 x&MO
+ 1 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 (x>>1)&MO
00 01 01 10 10 00 10 00 x&M1
+ 10 00 01 01 10 01 01 0 1 (x>>2)&M1
0001 0011 0001 0001 x&M2
+ 0010 0010 0100 0 0 1 1 (x>>4)&M2
01011011 00000100 x&M3
+ 11000010 0 0 0 0 0 1 0 1 (x>>8)&M3
0000000000001001 x&M4
+ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 (x>>16)&M4
00000000000000000000000000010001

17

66
© 2008–2018 by the MIT 6.172 Lecturers
Population Count III
Parallel divide-and-conquer
!!"#$%&'%"(&)*)" Performance
+,"-".//012"33"4526"""!!"745145"
!(lg w) time,
+8"-"+,"9"/+,"33"1:26"!!"/71:11:25"
where w =
+4"-"+8"9"/+8"33";26" !!"/7;1;28" word length.
+5"-"+4"9"/+4"33"826" !!"/78182;"
+1"-"+5"9"/+5"33"526" !!"/751521:"
+<"-"+1"9"/+1"33"126""!!"/71245" Avoid
!!"#=(>?'%">=>@=?A'" overflow
B"-"//B"CC"12"D"+<2"E"/B"D"+<26"
B"-"//B"CC"52"D"+12"E"/B"D"+126"
B"-"//B"CC"82"E"B2"D"+56"
No worry
B"-"//B"CC";2"E"B2"D"+46"
about
B"-"//B"CC"1:2"E"B2"D"+86"
overflow.
B"-"//B"CC"452"E"B2"D"+,6"
67
© 2008–2018 by the MIT 6.172 Lecturers
Popcount Instructions
Most modern machines provide popcount
instructions, which operate much faster than
anything you can code yourself. You can access
them via compiler intrinsics, e.g., in GCC:
int __builtin_popcount (unsigned int x);
Warning: You may need to enable certain
compiler switches to access built-in functions,
and your code may be less portable.

Exercise
Compute the log base 2 of a power of 2 quickly
using a popcount instruction.

68
© 2008–2018 by the MIT 6.172 Lecturers
Further Reading
Sean Eron Anderson, “Bit twiddling hacks,”
http://graphics.stanford.edu/~seander/bithacks.h
tml, 2009.
Donald E. Knuth,The Art of Computer Programming ,
Volume 4A,Combinatorial Algorithms, Part 1 ,
Addison-Wesley, 2011, Section 7.1.3.

Henry S. Warren,Hacker’s Delight , Addison-Wesley,


2003.

Happy Bit-Hacking!
69
© 2008–2018 by the MIT 6.172 Lecturers
MIT OpenCourseWare
https://ocw.mit.edu

6.172 Performance Engineering of Software Systems


Fall 2018

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

70

You might also like