KEMBAR78
Lecture 4 | PDF | Algorithms | Mathematical Analysis
0% found this document useful (0 votes)
12 views37 pages

Lecture 4

The document discusses the A* search algorithm, emphasizing its optimality with admissible and consistent heuristics, and the importance of heuristic design. It also covers local search algorithms, including hill climbing and simulated annealing, which are used for optimization problems, and introduces genetic algorithms as a method for solving configuration problems. Additionally, it highlights the challenges of local maxima and the strategies to escape them, such as random restarts and sideway moves.

Uploaded by

chemistry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Lecture 4

The document discusses the A* search algorithm, emphasizing its optimality with admissible and consistent heuristics, and the importance of heuristic design. It also covers local search algorithms, including hill climbing and simulated annealing, which are used for optimization problems, and introduces genetic algorithms as a method for solving configuration problems. Additionally, it highlights the challenges of local maxima and the strategies to escape them, such as random restarts and sideway moves.

Uploaded by

chemistry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Recap

 A* expands the fringe node with lowest f value where


 f(n) = g(n) + h(n)
 g(n) is the cost to reach n
 h(n) is an admissible estimate of the least cost from n to a goal node:
0  h(n)  h*(n)
 A* tree search is optimal
 Its performance depends heavily on the heuristic h
Creating Heuristics
Creating Admissible Heuristics

 Often, admissible heuristics are solutions to relaxed problems, where new


actions are available

15
366

 Problem P2 is a relaxed version of P1 if A2(s)  A1(s) for every s


 Theorem: h2*(s)  h1*(s) for every s, so h2*(s) is admissible for P1
Example: 8 Puzzle

Start State Actions Goal State

 What are the states?


 How many states?
 What are the actions?
 What are the step costs?
8 Puzzle I
 Heuristic: Number of tiles misplaced
 Why is it admissible?
 h(start) = 8

Start State Goal State

Average nodes expanded when


the optimal path has…
…4 steps …8 steps …12 steps
UCS 112 6,300 3.6 x 106
A*TILES 13 39 227

Statistics from Andrew Moore


8 Puzzle II

 What if we had an easier 8-puzzle where


any tile could slide any direction at any
time, ignoring other tiles?

 Total Manhattan distance


Start State Goal State

 Why is it admissible?
Average nodes expanded when
 h(start) = 3 + 1 + 2 + … = 18 the optimal path has…
…4 steps …8 steps …12 steps
A*TILES 13 39 227
A*MANHATTAN 12 25 73
Combining heuristics

 Dominance: h1 ≥ h2 if
n h1(n)  h2(n)
 Roughly speaking, larger is better as long as both are admissible
 The zero heuristic is pretty bad (what does A* do with h=0?)
 The exact heuristic is pretty good, but usually too expensive!
 What if we have two heuristics, neither dominates the other?
 Form a new heuristic by taking the max of both:
h(n) = max( h1(n), h2(n))
 Max of admissible heuristics is admissible and dominates both!
Example: Knight’s moves

 Minimum number of knight’s moves to get from A to B?


 h1 = (Manhattan distance)/3
 h1’ = h1 rounded up to correct parity (even if A, B same color, odd otherwise)
 h2 = (Euclidean distance)/√5 (rounded up to correct parity)
 h3 = (max x or y shift)/2 (rounded up to correct parity)
 h(n) = max( h1’(n), h2(n), h3(n)) is admissible!
Optimality of A* Graph Search
A* Graph Search Gone Wrong?
State space graph Search tree
h=4

A S (0+2)
1 h=1
1
S C
A (1+4) B (1+1)
h=2 1
2
3 C (2+1) C (3+1)
B
h=1
G G (5+0) G (6+0)
Simple check against expanded set blocks C
h=0 Fancy check allows new C if cheaper than old
but requires recalculating C’s descendants
Consistency of Heuristics
 Main idea: estimated heuristic costs ≤ actual costs
 Admissibility: heuristic cost ≤ actual cost to goal
A 1 h(A) ≤ h*(A)
 Consistency: heuristic “arc” cost ≤ actual cost for each arc
h=4 C h=1 h(A) – h(C) ≤ c(A,C)
h=3 or h(A) ≤ c(A,C) + h(C) (triangle inequality)
 Note: h* necessarily satisfies triangle inequality
 Consequences of consistency:
 The f value along a path never decreases:
h(A) ≤ c(A,C) + h(C) => g(A) + h(A) ≤ g(A) + c(A,C) + h(C)
G  A* graph search is optimal
Optimality of A* Graph Search

 Sketch: consider what A* does with a


consistent heuristic:
… f1
 Fact 1: In tree search, A* expands nodes in
f2
increasing total f value (f-contours)
f3
 Fact 2: For every state s, nodes that reach
s optimally are expanded before nodes
that reach s suboptimally

 Result: A* graph search is optimal


Optimality
 Tree search:
 A* is optimal if heuristic is admissible

 Graph search:
 A* optimal if heuristic is consistent

 Consistency implies admissibility

 Most natural admissible heuristics tend to be


consistent, especially if from relaxed problems
But…
 A* keeps the entire explored region in memory
 => will run out of space before you get bored waiting for the answer
 There are variants that use less memory (Section 3.5.5):
 IDA* works like iterative deepening, except it uses an f-limit instead of a depth limit
 On each iteration, remember the smallest f-value that exceeds the current limit, use as new limit
 Very inefficient when f is real-valued and each node has a unique value
 RBFS is a recursive depth-first search that uses an f-limit = the f-value of the best
alternative path available from any ancestor of the current node
 When the limit is exceeded, the recursion unwinds but remembers the best reachable f-value on
that branch
 SMA* uses all available memory for the queue, minimizing thrashing
 When full, drop worst node on the queue but remember its value in the parent
A*: Summary
 A* orders nodes in the queue by f(n) = g(n) + h(n)
 A* is optimal for trees/graphs with admissible/consistent heuristics

 Heuristic design is key: often use relaxed problems

g
h
Local search
Local search algorithms
 In many optimization problems, path is irrelevant; the goal state is the solution
 Then state space = set of “complete” configurations;
find configuration satisfying constraints, e.g., n-queens problem; or, find
optimal configuration, e.g., travelling salesperson problem

 In such cases, can use iterative improvement algorithms: keep a single “current”
state, try to improve it
 Constant space, suitable for online as well as offline search
 More or less unavoidable if the “state” is yourself (i.e., learning)
Hill Climbing
 Simple, general idea:
 Start wherever
 Repeat: move to the best neighboring state
 If no neighbors better than current, quit
Heuristic for n-queens problem
 Goal: n queens on board with no conflicts, i.e., no queen attacking another
 States: n queens on board, one per column
 Actions: move a queen in its column
 Heuristic value function: number of conflicts
Hill-climbing algorithm
function HILL-CLIMBING(problem) returns a state
current ← make-node(problem.initial-state)
loop do
neighbor ← a highest-valued successor of current
if neighbor.value ≤ current.value then
return current.state
current ← neighbor

“Like climbing Everest in thick fog with amnesia”


Global and local maxima
Random restarts
 find global optimum
 duh

Random sideways moves


 Escape from shoulders
 Loop forever on flat
local maxima
Hill-climbing on the 8-queens problem
 No sideways moves:
 Succeeds w/ prob. 0.14
 Average number of moves per trial:
 4 when it succeeds, 3 when it gets stuck
 Expected total number of moves needed:
 3(1-p)/p + 4 =~ 22 moves
 Allowing 100 sideways moves:
 Succeeds w/ prob. 0.94
 Average number of moves per trial:
 21 when succeeding, 65 when getting stuck
 Expected total number of moves needed: Moral: algorithms with knobs
 65(1-p)/p + 21 =~ 25 moves to twiddle are irritating
Simulated annealing
 Resembles the annealing process used to cool metals slowly to
reach an ordered (low-energy) state
 Basic idea:
 Allow “bad” moves occasionally, depending on “temperature”
 High temperature => more bad moves allowed, shake the system out of
its local minimum
 Gradually reduce temperature according to some schedule
 Sounds pretty flaky, doesn’t it?
Simulated annealing algorithm
function SIMULATED-ANNEALING(problem,schedule) returns a state
current ← problem.initial-state
for t = 1 to ∞ do
T ←schedule(t)
if T = 0 then return current
next ← a randomly selected successor of current
∆E ← next.value – current.value
if ∆E > 0 then current ← next
else current ← next only with probability e∆E/T
Simulated Annealing
 Theoretical guarantee:
 Stationary distribution (Boltzmann): P(x)  eE(x)/T
 If T decreased slowly enough, will converge to optimal state!
 Proof sketch
 Consider two adjacent states x, y with E(y) > E(x) [high is good]
 Assume xy and yx and outdegrees D(x) = D(y) = D
 Let P(x), P(y) be the equilibrium occupancy probabilities at T
 Let P(xy) be the probability that state x transitions to state y

x y
Occupation probability as a function of T

E(x)

x
Simulated Annealing

 Is this convergence an interesting guarantee?

 Sounds like magic, but reality is reality:


 The more downhill steps you need to escape a local optimum,
the less likely you are to ever make them all in a row
 “Slowly enough” may mean exponentially slowly
 Random restart hillclimbing also converges to optimal state…

 Simulated annealing and its relatives are a key


workhorse in VLSI layout and other optimal
configuration problems
Local beam search
 Basic idea:
 K copies of a local search algorithm, initialized randomly
Or, K chosen randomly with
 For each iteration a bias towards good ones
 Generate ALL successors from K current states
 Choose best K of these to be the new current states
Beam search example (K=4)

8
X7 9
X7 10
9 X8
9 9
8 9 10
9 10

8 10
7 9 9
X6 X5

6
X7 8
X3 9
X7 9
Local beam search
 Why is this different from K local searches in parallel?
 The searches communicate! “Come over here, the grass is greener!”
 What other well-known algorithm does this remind you of?
 Evolution!
Genetic algorithms

 Genetic algorithms use a natural selection metaphor


 Resample K individuals at each step (selection) weighted by fitness function
 Combine by pairwise crossover operators, plus mutation to give variety
Example: N-Queens

 Does crossover make sense here?


 What would mutation be?
 What would a good fitness function be?
Local search in continuous spaces
Example: Siting airports in Romania
Place 3 airports to minimize the sum of squared distances from each city to its nearest airport

Airport locations
x = (x1,y1), (x2,y2), (x3,y3)

City locations (xc,yc)

Ca = cities closest to airport a

Objective: minimize
f(x) = a cC (xa - xc)2 + (ya - yc)2
a
Handling a continuous state/action space
1. Discretize it!
 Define a grid with increment  , use any of the discrete algorithms
2. Choose random perturbations to the state
a. First-choice hill-climbing: keep trying until something improves the state
b. Simulated annealing
3. Compute gradient of f(x) analytically
Finding extrema in continuous space
 Gradient vector f(x) = (f/x1, f/y1, f/x2, …)T
 For the airports, f(x) = a cCa (xa - xc)2 + (ya - yc)2
 f/x1 = cC1 2(x1 - xc)
 At an extremum, f(x) = 0
 Can sometimes solve in closed form: x1 = (cC1 xc)/|C1|
 Is this a local or global minimum of f?
 Gradient descent: x  x - f(x)
 Huge range of algorithms for finding extrema using gradients
Summary
 Many configuration and optimization problems can be
formulated as local search
 General families of algorithms:
 Hill-climbing, continuous optimization
 Simulated annealing (and other stochastic methods)
 Local beam search: multiple interaction searches
 Genetic algorithms: break and recombine states

Many machine learning algorithms are local searches

You might also like