Artificial Intelligence
Lecture 6
Bicol University College of Science
1st Semester, 2021-2022
Adversarial Search
What are and why study games?
Games are a form of multi-agent environment
What do other agents do and how do they affect our success?
Cooperative vs. competitive multi-agent environments.
Competitive multi-agent environments give rise to adversarial
problems a.k.a. games
Why study games?
Fun; historically entertaining
Interesting subject of study because they are hard
Easy to represent and agents restricted to small number of actions
Adversarial Search
Adversarial search problems == games
They occur in multiagent competitive environments
There is an opponent we can’t control planning
against us!
Game vs. search: optimal solution is not a sequence
of actions but a strategy (policy) If opponent does a,
agent does b, else if opponent does c, agent does d,
etc.
Tedious and fragile if hard-coded (i.e., implemented
with rules)
Good news: Games are modeled as search
problems and use heuristic evaluation functions.
Games: hard topic
Games are a big deal in AI
Games are interesting to AI because they are too
hard to solve
100
Chess❑≈10154
has a branching factor of 35, with 35
nodes ~~10,154
Need to make some decision even when the
optimal decision is infeasible
Adversarial Search
Checkers:
Checkers
Chinook ended 40-year-reign of human world
champion Marion Tinsley in 1994.
Used an endgame database defining perfect play
for all positions involving 8 or fewer pieces on the
board, a total of 443,748,401,247 positions.
Adversarial Search
Chess:
In 1949, Caude E. Shannon in his paper “Programming a Computer for
Playing Chess”, suggested Chess as an AI problem for the community.
Deep Blue defeated human world champion Gary Kasparov in a six-
game match in 1997.
In 2006, Vladmir Kramnik, the undisputed world champion, was
defeated 4-2 by Deep Fritz.
Adversarial Search
Go: b > 300! Google Deep mind Project AlphaGo. In 2016,
AlphaGo beat both Fan Hui, the European Go champion and Lee
Sedol the worlds best player.
Othello: Several computer othello exists and human champions
refuse to compete against computers, that are too good.
Relation of Games to Search
Search – no adversary
●
Solution is (heuristic) method for finding goal
●
Heuristics technique can find optimal solution
●
Evaluation function: estimate of cost from start to goal through given node
●
Examples: path planning, scheduling activities
Games – adversary
●
Solution is strategy (strategy specifies move for every possible opponent
reply).
●
Time limits force an approximate solution
●
Evaluation function: evaluate “goodness” of game position
●
Examples: chess, checkers, Othello, backgammon
Types of Games
We are mostly interested in deterministic games, fully
observable environments, zero-sum, where two agents act
alternately.
Games with perfect information. No randomness is involved.
Games with imperfect information. Random factors are part
of the game.
Zero-sum Games
Adversarial: Pure competition.
Agents have different values on the outcomes.
One agent maximizes one single value, while the
other minimizes it.
Each move by one of the players is called a “ply.”
One function: one agents maximizes it and one
minimizes it!
Embedded thinking...
One agent is trying to figure out what to do.
How to decide? He thinks about the consequences of the
possible actions.
He needs to think about his opponent as well...
The opponent is also thinking about what to do etc.
Each will imagine what would be the response from the
opponent to their actions.
This entails an embedded thinking.
Game setup
Two players: MAX and MIN
MAX moves first and they take turns until the game
is over. Winner gets award, looser gets penalty.
Formulate game as search problem
MAX uses search tree to determine next move.
Searching in a two player game
Problem Formulation:
Initial state: board configurations and the player to
move.
Successor function: list of pairs (move, state)
specifying legal moves and their resulting states.
(moves + initial state = game tree)
A terminal test: decide if the game has finished.
A utility function: produces a numerical value for
(only) the terminal states. Example: In chess, outcome
= win/loss/draw, with values +1, -1, 0 respectively.
Players need search tree to determine next move.
Partial Game Tree for Tic-Tac-Toe
Minimax
Find the optimal strategy for Max:
– Depth-first search of the game tree
– An optimal leaf node could appear at any depth of
the tree
– Minimax principle: compute the utility of being in a
state assuming both players play optimally from
there until the end of the game
– Propagate minimax values up the tree once
terminal nodes are discovered
Optimal strategies
Find the contingent strategy for MAX assuming an
infallible MIN opponent.
Assumption: Both players play optimally !!
Given a game tree, the optimal strategy can be
determined by using the minimax value of each node:
MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminal
maxs successors(n) MINIMAX-VALUE(s) If n is a max node
mins successors(n) MINIMAX-VALUE(s) If n is a min node
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
The minimax decision
Minimax maximizes the worst-case outcome for max.
Partial Game Tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
What if MIN does not play optimally?
Definition of optimal play for MAX assumes
MIN plays optimally: maximizes worst-case
outcome for MAX.
But if MIN does not play optimally, MAX will
do even better.
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
vMAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v MIN(v,MAX-VALUE(s))
return v
Properties of Minimax
Criterion Minimax
Complete? Yes
Time O(bm)
Space O(bm)
Optimal? Yes
Minimax Algorithm
Minimax algorithm
Perfect for deterministic, 2-player game
One opponent tries to maximize score (Max)
One opponent tries to minimize score (Min)
Goal: move to position of highest minimax
value
Identify best achievable payoff against best
play
Multiplayer games
Games allow more than two players
Single minimax values become vectors
Problem of minimax search
Number of game’s states is exponential to
the number of moves.
Solution: Do not examine every node
==> Alpha-beta pruning
●
Alpha = value of best choice found so far at any
choice point along the MAX path
●
Beta = value of best choice found so far at any
choice point along the MIN path
Alpha-beta Game Playing
Basic idea:
If you have an idea that is surely bad, don't take the
time to see how truly awful it is.” -- Pat Winston
Some branches will never be played by rational players since
they include sub-optimal decisions (for either player).
>=2
• We don’t need to compute
=2 <=1 the value at this node.
• No matter what it is, it can’t
affect the value of the root
node.
2 7 1 ?
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
[-∞,+∞]
[-∞, +∞]
Alpha-Beta Example (continued)
[-∞,+∞]
[-∞,3]
Alpha-Beta Example (continued)
[-∞,+∞]
[-∞,3]
Alpha-Beta Example (continued)
[3,+∞]
[3,3]
Alpha-Beta Example (continued)
[3,+∞]
This node is worse
for MAX
[3,3] [-∞,2]
Alpha-Beta Example (continued)
[3,14] ,
[3,3] [-∞,2] [-∞,14]
Alpha-Beta Example (continued)
[3,5] ,
[3,3] [−∞,2] [-∞,5]
Alpha-Beta Example (continued)
[3,3]
[3,3] [−∞,2] [2,2]
Alpha-Beta Example (continued)
[3,3]
[3,3] [-∞,2] [2,2]
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Alpha-Beta Pruning Algorithm
function ALPHA-BETA-SEARCH(state) returns an action
inputs: state, current state in game
v← MAX-VALUE(state, - ∞ , +∞)
return the action in SUCCESSORS(state) with value v
function MAX-value (n, alpha, beta) return utility value
if n is a leaf node then return f(n);
for each child n’ of n do
alpha :=max{alpha, MIN-value(n’, alpha, beta)};
if alpha >= beta then return beta /* pruning */
end{do}
return alpha
function MIN-value (n, alpha, beta) return utility value
if n is a leaf node then return f(n);
for each child n’ of n do
beta :=min{beta, MAX-value(n’, alpha, beta)};
if beta <= alpha then return alpha /* pruning */
end{do}
return beta
General alpha-beta pruning
Consider a node n
somewhere in the tree
If player has a better choice
at
●
Parent node of n
●
Or any choice point further up
n will never be reached in
actual play.
Hence when enough is
known about n, it can be
pruned.
Final Comments about Alpha-Beta
Pruning
Pruning does not affect final results
Entire subtrees can be pruned.
Good move ordering improves effectiveness of pruning
With “perfect ordering,” time complexity is O(b m/2)
●
Branching factor of sqrt(b) !!
●
Alpha-beta pruning can look twice as far as minimax in the same amount of
time
Repeated states are again possible.
●
Store them in memory = transposition table
Games of imperfect information
Minimax and alpha-beta pruning require too
much leaf-node evaluations.
May be impractical within a reasonable
amount of time.
SHANNON (1950):
●
Cut off search earlier (replace TERMINAL-TEST by CUTOFF-
TEST)
●
Apply heuristic evaluation function EVAL (replacing utility
function of alpha-beta)
Cutting off search
Change:
if TERMINAL-TEST(state) then return UTILITY(state)
into
if CUTOFF-TEST(state,depth) then return EVAL(state)
Introduces a fixed-depth limit depth
Is selected so that the amount of time will not exceed what the rules of the game
allow.
When cutoff occurs, the evaluation is performed.
Evaluation Function
Evaluation function is a heuristic function, and
it is where the domain experts’ knowledge
resides.
Performed at search cutoff point
Must have same terminal/goal states as utility
function
Tradeoff between accuracy and time →
reasonable complexity
Accurate
Performance of game-playing system dependent on
accuracy/goodness of evaluation
Evaluation of nonterminal states strongly correlated
with actual chances of winning
Heuristic EVAL(uation) Function
Idea: produce an estimate of the expected utility of
the game from a given position.
Performance depends on quality of EVAL.
Requirements:
EVAL should order terminal-nodes in the same way as UTILITY.
Computation may not take too long.
For non-terminal states the EVAL should be strongly correlated with the
actual chance of winning.
Only useful for quiescent (no wild swings in value in
near future) states
Heuristic EVAL example
For Tic-Tac-Toe:
f(s) = [# of 3-lengths open for me] - [# of 3-lengths open for you]
where a 3-length is a complete row, column, or diagonal.
For chess, typically linear weighted sum of features
Eval(s) = w1f1(s)+w2f2(s)+ … +wnfn(s)
e.g., Alan Turing’s function for chess
f(s)=w(s)/b(s) where w(s) = sum of the point value of
white’s pieces and b(s) is sum for black
●
Example features for chess are piece count, piece placement,
squares controlled, etc.
●
Deep Blue has about 6,000 features in its evaluation function.
Heuristic EVAL example
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
Heuristic EVAL example
Addition assumes
independence
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
End