AI Unit-I Chapter-II Problem Solving by Search-II
AI Unit-I Chapter-II Problem Solving by Search-II
UNIT - I
Chapter-I: Problem Solving by Search –II: Problem-Solving Agents, Searching for Solutions,
Uninformed Search Strategies: Breadth-first search, Uniform cost search, Depth-first search,
Iterative deepening Depth-first search, Bidirectional search, Informed (Heuristic) Search
Strategies: Greedy best-first search, A* search, Heuristic Functions, Beyond Classical Search:
Hill-climbing search, Simulated annealing search, Local Search in Continuous Spaces, Searching
with Non-Deterministic Actions, Searching with Partial Observations, Online Search Agents and
Unknown Environment.
Based on the search problems the search algorithms can be classified into uninformed (Blind search)
search and informed search (Heuristic search) algorithms.
Informed Search
Informed search algorithms use domain knowledge.
In an informed search, problem information is available which can guide the search.
Informed search strategies can find a solution more efficiently than an uninformed search strategy.
Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but guaranteed to find a
good solution in reasonable time.
Informed search can solve much complex problem which could not be solved in another way. An
example of informed search algorithms is a traveling salesman problem.
The following are the types of informed search algorithms:
1. Best First Search Algorithm (Greedy search)
2. A* Search Algorithm
1. Breadth-first Search:
Breadth-first search is the most common search strategy for traversing a tree or graph. This
algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
BFS algorithm starts searching from the root node of the tree and expands all successor nodes at the
current level before moving to nodes of next level.
The breadth-first search algorithm is an example of a general-graph search algorithm. Breadth-first
search implemented using FIFO queue data structure.
As breadth-first search algorithm do the traversal level by level, it will also called as level-order
traversal.
Advantages:
BFS will provide a solution if any solution exists.
If there are more than one solution for a given problem, then BFS will provide the minimal solution
which requires the least number of steps.
Disadvantages:
It requires lots of memory since each level of the tree must be saved into memory to expand the next
level.
BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, the traversing of the tree is shown using BFS algorithm from the root
node S to goal node K.
BFS search algorithm traverse in layers, so it will follow the path which is shown by the arrow, and
the traversed path will be:
SABCDGHEFIK
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node.
Where the d= depth of shallowest solution and b is a node at every state.
T(b) = 1+b2+b3+.......+ bd= O(bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier
which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite depth,
then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
2. Depth-first Search
Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
It is called the depth-first search because it starts from the root node and follows each path to its
greatest depth node before moving to the next path.
DFS uses a stack data structure (LIFO) for its implementation.
The process of the DFS algorithm is also like the BFS algorithm.
Advantage:
DFS requires very less memory as it only needs to store a stack of the nodes on the path from root
node to the current node.
It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right path).
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
Disadvantage:
There is the possibility that many states keep re-occurring, and there is no guarantee of finding the
solution.
DFS algorithm goes for deep down searching and sometimes it may go to the infinite loop.
Example:
In the below search tree, the flow of depth-first search is shown, and it will follow the order as:
Pre-order (RootLeftRight)
In-order (LeftRootRight)
Post-order (LeftRightRoot)
It will start searching from root node S, and traverse A, then B, then D and E, after traversing E, it will
backtrack the tree as E has no other successor and still goal node is not found.
After backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
There are three different ways in which DFS can be performed pre-order, in-order and post-order.
Hence, the solution will be:
Pre-order (RootLeftRight): SABDECG
In-order (LeftRootRight): DBEACG
Post-order (LeftRightRoot): DEBG
Where, m= maximum depth of any node and this can be much larger than d (Shallowest solution
depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence space
complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate many steps or high cost to reach
to the goal node.
o Cutoff failure value: It defines no solution for the problem within a given depth limit.
Advantages:
Disadvantages:
Depth-limited search also has a disadvantage of incompleteness.
It may not be optimal if the problem has more than one solution.
Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Time Complexity: Time complexity of DLS algorithm is O(bℓ).
Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not optimal
even if ℓ>d.
Advantages:
Uniform cost search is optimal because at every state the path with the least cost is chosen.
Disadvantages:
It does not care about the number of steps involve in searching and only concerned about path cost.
Due to which this algorithm may be stuck in an infinite loop.
Example:
Completeness: Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity: Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal
node. Then the number of steps is = C*/ε+1. Here +1 is taken, as it starts from state 0 and end to C*/ε.
Hence, the worst-case time complexity of Uniform-cost search is O(b1 + [C*/ε]).
Space Complexity: The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal: Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
5. Iterative deepening depth-first Search:
The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search algorithm
finds out the best depth limit and does it by gradually increasing the limit until a goal is found.
This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing the
depth limit after each iteration until the goal node is found.
1'st Iteration A
2'nd Iteration A, B, C
3'rd Iteration A, B, D, E, C, F, G
In the third iteration, the algorithm will find the goal node.
Advantages:
Bidirectional search is fast.
Bidirectional search requires less memory.
Disadvantages:
Implementation of the bidirectional search tree is difficult.
In bidirectional search, one should know the goal state in advance.
Example:
The uninformed search algorithms which looked through search space for all possible solutions of the
problem without having any additional knowledge about search space.
But informed search algorithm contains an array of knowledge such as how far we are from the goal, path
cost, how to reach to goal node, etc.
This knowledge help agents to explore less to the search space and find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search algorithm uses
the idea of heuristic, so it is also called Heuristic search.
Heuristic is a function which is used in Informed Search, and it finds the most promising path.
It takes the current state of the agent as its input and produces the estimation of how close agent is
from the goal.
The heuristic method, however, might not always give the best solution, but it guaranteed to find a
good solution in reasonable time.
The heuristic method is a technique designed to solve a problem quickly.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic function is
always positive.
Through out AI we use heuristic function / heuristic value in many situations, hence a better
understanding of this is required.
In uninformed search algorithms we do blind search, i.e., we search all possible states for goal state.
Once reaching goal state the search will stop.
Solution:
The puzzle can be solved by moving the tiles one by one in the single empty space and thus
achieving the Goal state.
Rules of solving puzzle
Instead of moving the tiles in the empty space we can visualize moving the empty space in
place of the tile.
The empty space can only move in four directions (Movement of empty space).
1. Up
2. Down
3. Right or
4. Left
The empty space cannot move diagonally and can take only one step at a time.
All possible move of an Empty tile
Time complexity: In worst case time complexity in BFS is O(bd) know as order of b raise to power
d. In this case it is (320).
b – branch factor
d – depth factor
The problem in uninformed search algorithms is the time complexity is growing exponentially.
8 – Puggle Problem 15 – Puggle Problem 24 – Puggle Problem
(Time complexity worst case) (Time complexity worst case) (Time complexity worst case)
O(bd) O(bd) O(bd)
320 – Possible States 1013 – Possible States 1024 – Possible States
Let us say chess problem where the branch factor may go upto 32 and depth factor upto 85 to
90. And the possible states 3285 to 90 (it is a very huge number to imagine).
Tile# 1 2 3 4 5 6 7 8 Total
No of moves required to put the 0 0 0 1 1 0 0 1 3
tile in its right position
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or
equal to the estimated cost.
Advantages:
Best first search can switch between BFS and DFS by gaining the advantages of both the algorithms.
This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
It can behave as an unguided depth-first search in the worst-case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.
Example – 1:
Consider the below search problem, and we will traverse it using greedy best-first search.
At each iteration, each node is expanded using evaluation function f(n)=h(n), which is given in the
below table.
Algorithm of A* search:
Advantages:
A* search algorithm is the best algorithm than other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Example – 1:
In this example, the given graph will be traversed using the A* algorithm.
The heuristic value of all states is given in the below table so, calculate the f(n) of each state using
the formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here OPEN and CLOSED list will be used.
Solution:
Example - 2
Example – 3
Given an initial state of a 8-puzzle problem and final state to be reached-
The numbers written on edges represent the distance between the nodes.
The numbers written on nodes represent the heuristic value.
Find the most cost-effective path to reach from start state A to final state J using A* Algorithm.
Solution:
Step-01: We start with node A.
Node B and Node F can be reached from node A.
A* Algorithm calculates f(B) and f(F).
f(B) = 6 + 8 = 14
f(F) = 3 + 6 = 9
Since f(F) < f(B), so it decides to go to node F.
Path- A → F
Step-02: Node G and Node H can be reached from node F.
A* Algorithm calculates f(G) and f(H).
f(G) = (3+1) + 5 = 9
f(H) = (3+7) + 3 = 13
Since f(G) < f(H), so it decides to go to node G.
Path- A → F → G
Step-03: Node I can be reached from node G.
A* Algorithm calculates f(I).
f(I) = (3+1+3) + 1 = 8
It decides to go to node I.
Example – 5:
Consider the following graph-
The numbers written on edges represent the distance between the nodes.
The numbers written on nodes represent the heuristic value.
Find the most cost-effective path to reach from start state S to final state G using A* Algorithm.
Step-2:
Here in all leaf nodes f(B) is having the minimum fitness value, so we explore node B.
Step-3:
Here in all leaf nodes f(D), which is direct successor of initial node S is having the minimum fitness
value, so we explore node D (in that direction).
Step-4:
Here in all leaf nodes f(E) (in the direction SDE), is having the minimum fitness value, so we
explore node E (in that direction: SDE).
Here in all leaf nodes f(F) (in the direction SDEF), is having the minimum fitness value, so we
explore node F (in that direction: SDEF).
o Hill climbing algorithm is a local search algorithm which continuously moves in the direction of
increasing elevation/value to find the peak of the mountain or best solution to the problem. It
terminates when it reaches a peak value where no neighbor has a higher value.
o Hill climbing algorithm is a technique which is used for optimizing the mathematical problems. One
of the widely discussed examples of Hill climbing algorithm is Traveling-salesman Problem in
which we need to minimize the distance traveled by the salesman.
Local Maximum: Local maximum is a state which is better than its neighbor states, but there is also
another state which is higher than it.
Global Maximum: Global maximum is the best possible state of state space landscape. It has the
highest value of objective function.
Current state: It is a state in a landscape diagram where an agent is currently present.
Flat local maximum: It is a flat space in the landscape where all the neighbor states of current states
have the same value.
Shoulder: It is a plateau region which has an uphill edge.
Simple hill climbing is the simplest way to implement a hill climbing algorithm. It only evaluates
the neighbor node state at a time and selects the first one which optimizes current cost and set
it as a current state.
It only checks it's one successor state, and if it finds better than the current state, then move else be
in the same state.
This algorithm has the following features:
o Step 1: Evaluate the initial state, if it is goal state then return success and Stop.
o Step 2: Loop Until a solution is found or there is no new operator left to apply.
o Step 3: Select and apply an operator to the current state.
o Step 4: Check new state:
1. If it is goal state, then return success and quit.
2. Else if it is better than the current state then assign new state as a current state.
o Step 1: Evaluate the initial state, if it is goal state then return success and stop, else make current state as initial
state.
o Step 2: Loop until a solution is found or the current state does not change.
1. Let SUCC be a state such that any successor of the current state will be better than it.
2. For each operator that applies to the current state:
I. Apply the new operator and generate a new state.
II. Evaluate the new state.
III. If it is goal state, then return it and quit, else compare it to the SUCC.
IV. If it is better than SUCC, then set new state as SUCC.
V. If the SUCC is better than the current state, then set current state to SUCC.
o Step 3: Exit.
Stochastic hill climbing does not examine for all its neighbor before moving.
Rather, this search algorithm selects one neighbor node at random and decides whether to choose it as
a current state or examine another state.
1. Local Maximum: A local maximum is a peak state in the landscape, which is better than each of its
neighboring states, but there is another state also present which is higher than the local maximum.
2. Plateau: A plateau is the flat area of the search space in which all the neighbor states of the current
state contain the same value, because of this algorithm does not find any best direction to move. A
hill-climbing search might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or little steps while searching, to solve the
problem. Randomly select a state which is far away from the current state so it is possible that the
algorithm could find non-plateau region.
3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than its
surrounding areas, but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can improve
this problem.
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to be
incomplete because it can get stuck on a local maximum. And if algorithm applies a random walk,
by moving a successor, then it may complete but not efficient. Simulated Annealing is an algorithm
which yields both efficiency and completeness.
In mechanical term Annealing is a process of hardening a metal or glass to a high temperature then
cooling gradually, so this allows the metal to reach a low-energy crystalline state.
The same process is used in simulated annealing in which the algorithm picks a random move,
instead of picking the best move. If the random move improves the state, then it follows the same
path. Otherwise, the algorithm follows the path which has a probability of less than 1 or it moves
downhill and chooses another path.
In every simulated annealing example, a random new point is generated. The distance between the
current point and the new point has a basis of the probability distribution on the scale of the
proportion of temperature.
There are a set of steps that are performed for simulated annealing in AI. These steps can be summarized as
follows:
Simulated annealing creates a trial point randomly. The algorithm selects the distance between the
current point and the trial point by a probability distribution. The scale of such distribution is
temperature. With the annealing function trial, point distribution distance is set. To keep the
boundaries intact, the trial point is shifted gradually.
Some of the conditions that are considered as the basis to stop the simulated-annealing are as follows:
The simulated-annealing performs until the value of the objective function goes lesser than the
tolerance function.
The default value of iterations in simulated-annealing is INF. This can be set to any positive integer
as well. When the algorithm exceeds the iteration value, it stops.
The annealing concludes when the maximum number of evaluations is achieved. The default value
of such evaluations is 3000 * number of variables.
The default value of maximum time is Inf, We can set a constant value here and when that is
reached, the algorithm stops.
When the best objective function value is lesser than the limit of the objective it concludes. The
default value of such an objective function is -Inf.
To understand how simulated-annealing works, one can take the example of a traveling salesman. The
solution can be created by applying any of the language selections.
Let us understand the problem and the solution with simulated-annealing applications.
At the onset, a city class needs to be created to specify several destinations the travelling salesman
would visit.
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
After that, a class has to be created that keeps track of the cities.
Then a class is created that models the tour of the travelling salesman.
With all the different classes and the information in hand, a simulated-annealing algorithm is
created.
Thus with the types of optimization problems, a relatively simpler algorithm is created, and the
solution is sought.
There is a huge difference between hill-climbing and simulated-annealing considering the way they
are applied, and the results are achieved.
Simulated-annealing is believed to be a modification or an advanced version of hill-climbing
methods.
Hill climbing achieves optimum value by tracking the current state of the neighborhood.
Simulated-annealing achieves the objective by selecting the bad move once a while.
A global optimum solution is guaranteed with simulated-annealing, while such a guarantee is not
assured with hill climbing or descent.
A genetic algorithm (or GA) is a variant of stochastic beam search in which successor states are
generated by combining two parent states rather than by modifying a single state.
The analogy to natural selection is the same as in stochastic beam search, except that now we are
dealing with genetic rather than non-genetic reproduction. Like beam searches, GAs begin with a set
of k randomly generated states, called the population.
Each state, or individual, is represented as a string over a finite alphabet—most commonly, a string
of 0s and 1s. For example, an 8-queens state must specify the positions of 8 queens, each in a
column of 8 squares, and so requires 8 × log2 8 = 24 bits.
Alternatively, the state could be represented as 8 digits, each in the range from 1 to 8. (We
demonstrate later that the two encodings behave differently.) The following figure shows a
population of four 8-digit strings representing 8-queens states.
The production of the next generation of states is also shown in the figure(a). In (b), each state is
rated by the objective function, or (in GA terminology) the fitness function.
A fitness function should return higher values for better states, so, for the 8-queens problem we use
the number of non-attacking pairs of queens, which has a value of 28 for a solution.
The values of the four states are 24, 23, 20, and 11. In this particular variant of the genetic algorithm,
the probability of being chosen for reproducing is directly proportional to the fitness score, and the
percentages are shown next to the raw scores.
In (c), two pairs are selected at random for reproduction, in accordance with the probabilities in (b).
Notice that one individual is selected twice and one not at all. For each pair to be mated, a crossover
point is chosen randomly from the positions in the string. In Figure 4.6, the crossover points are after
the third digit in the first pair and after the fifth digit in the second pair.
In (d), the offspring themselves are created by crossing over the parent strings at the crossover point.
For example, the first child of the first pair gets the first three digits from the first parent and the
remaining digits from the second parent, whereas the second child gets the first three digits from the
second parent and the rest from the first parent. The 8-queens states involved in this reproduction
step are shown in the figure.
The example shows that when two parent states are quite different, the crossover operation can
produce a state that is a long way from either parent state. It is often the case that the population is
quite diverse early on in the process, so crossover (like simulated annealing) frequently takes large
Finally, in (e), each location is subject to random mutation with a small independent probability. One
digit was mutated in the first, third, and fourth offspring. In the 8-queens problem, this corresponds
to choosing a queen at random and moving it to a random square in its column.
In practice, genetic algorithms have had a widespread impact on optimization problems, such as
circuit layout and job-shop scheduling. At present, it is not clear whether the appeal of genetic
algorithms arises from their performance or from their esthetically pleasing origins in the theory of
evolution. Much work remains to be done to identify the conditions under which genetic algorithms
perform well.
What is optimization?
Optimization refers to the task of minimizing or maximizing an objective function f(x) w.r.t weights.
In the above figure at point c, the slope of the gradient will be negative and at point a, the slope of
the gradient will be positive.
When the slope is exactly zero, at that time we can say it is a minimum cost function. It is simply
that the derivative of the cost function will be zero at minimum point.
Gradient is nothing but the partial derivative of the cost function, simply find the first derivative of
the cost function to get the minimum and maximum points. And to calculate the new weight, simply
use the below equation.
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
The –ve sign is used to find the new weight in case of gradient descent and the +ve sign is used to
the new weight in case of gradient ascent.
Gradient Descent
- Usually written as
- ∇f(x) (the gradient itself) represents the direction of the steepest slope
- |∇f(x)| (the magnitude of the gradient) tells you how big the steepest slope is
Suppose we want to find a local minimum of a function f(x). We use the gradient descent rule:
x←x – α∇f(x)
Suppose we want to find a local maximum of a function f(x). We use the gradient ascent rule:
x←x + α∇f(x)
If α is too large
Gradient descent overshoots the optimum point
If α is too small
Gradient descent requires too many steps and will take a very long time to converge
α is the learning rate, which is usually a small number like 0.05
Then
When the environment is fully observable and deterministic then the agent knows what the effects of
each action are. In such case the percepts provide no new information after each action. Percepts
become useful in partially observable or nondeterministic environments
In a partially observable environment, the percept helps narrow down the set of possible states and
make it easier to achieve the goal.
When the environment is nondeterministic, percepts tell the agent which of the possible outcomes of
its actions has actually occurred.
The solution to a problem is not a sequence but a strategy that specifies what to do depending on
what percepts are received.
In the above figure, the buying of a car may be broken down into smaller problems or tasks that can
be accomplished to achieve the main goal in the above figure, which is an example of a simple
AND-OR graph.
The other task is to either steal a car that will help us accomplish the main goal or use your own
money to purchase a car that will accomplish the main goal.
The AND symbol is used to indicate the AND part of the graphs, which refers to the need that all
sub problems containing the AND to be resolved before the preceding node or issue may be finished.
The start state and the target state are already known in the knowledge-based search strategy known
as the AO* algorithm, and the best path is identified by heuristics.
The informed search technique considerably reduces the algorithm’s time complexity.
Example:
Here in the above example below the Node which is given is the heuristic value i.e h(n). Edge length is
considered as 1.
Step – 1:
With help of f(n) = g(n) + h(n) evaluation function,
Start from node A,
f(A⇢B) = g(B) + h(B)
=1 + 5 ……here g(n)=1 is
taken by default for path cost
=6
f(A⇢C+D) = g(c) + h(c) + g(d) + h(d)
=1+2+1+4 ……here we have
added C & D because they are in AND
=8
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
So, by calculation A⇢B path is chosen which is
the minimum path, i.e f(A⇢B)
Step – 2:
According to the answer of step 1, explore node B
Here the value of E & F are calculated as follows.
f(B⇢E) = g(e) + h(e)
f(B⇢E) = 1 + 7 = 8
f(B⇢f) = g(f) + h(f)
f(B⇢f) = 1 + 9 = 10
o So, by above calculation B⇢E path is chosen
which is minimum path, i.e f(B⇢E) because
B's heuristic value is different from its actual
value The heuristic is updated and the
minimum cost path is selected. The
minimum value in our situation is 8.
o Therefore, the heuristic for A must be
updated due to the change in B's heuristic.
o So we need to calculate it again.
f(A⇢B) = g(B) + updated h(B)
=1+8
=9
We have Updated all values in the above tree.
Step – 3:
As we can see that path f(A⇢C+D) is get solved and this tree has become a solved tree now.
In simple words, the main flow of this algorithm is that we have to find firstly level 1st heuristic
value and then level 2nd and after that update the values with going upward means towards the root
node.
In the above tree diagram, we have updated all the values.
Partial observability: The agent's percepts do not suffice to pin down the exact state
In such environment an action may lead to one of several possible outcomes even if the environment
is deterministic.
To solve partially observable problems the concept of the "belief state" is used.
Goal test: The agent wants a plan that is sure to work, which means that a belief state
satisfies the goal only if all the physical states in it satisfy GOAL-TESTP . The agent may
accidentally achieve the goal earlier, but it won’t know that it has done so.
Path cost: Same action can have different costs in different states of a belief state, an
assumption can make the formulation easy, i.e. the cost of an action is the same in all states
o Update: for each possible percept, the belief state that would result from the percept
bo= UPDATE(b, o)={s: o =PERCEPT(s) and s ∈ b} updated belief state bo can be no larger
than the predicted belief state b`;
The offline search algorithms compute a complete solution before setting foot in the real world and
then execute the solution
Online search takes an action, then it observes the environment and computes the next suitable for
dynamic or semi-dynamic domains where there is a penalty for sitting around and computing too
long
Also useful for
o Unknown environments
o Non-deterministic domains
Examples
o Web search
o Autonomous vehicle
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: mdafzal.mbnr@gmail.com
Online Search Agents And Unknown Environments
An agent is anything that can perceive its environment through sensors and acts upon that
environment through Actuators(effectors).
Offline Search Agents- The agents who compute a complete solution before setting foot in the real
world and then execute the solution is called Offline search Agents.
Online Search Agents- The online search works on the concept of interleave computation and
action.
In the real world, Online Search Agent first takes an action, then it observes the environment and
computes the next action.
Online search is a good idea in unknown environments, where the agent does not know what states
exist or what its actions do.
Online search problem – Assume deterministic (If the next state is completely determined by the current
action of the agent, then the environment is deterministic.), fully observable environment.
Agent only knows:
Actions(s) - list of actions allowed in state s.
Step-cost function c(s, a, s') - cannot be used until agent knows that s' is the outcome of doing a.
Goal-Test(s) - In particular, it doesn't know Result (s, a) except by actually being in s and doing a,
then observing s'.
s = state
a = action
s' = state after doing action a
c = Step cost function
In this Maze, the agent start with 'S' state i.e. (1, 1) and then leads to (1, 2), likewise use up leading it
then achieves goal state 'G’.
The going down will lead it to (1, 1) i.e. start state.
The degree of achieving goal state fast can be improved.
Competitive ratio = Online cost/Best cost.
Online path cost: total cost of the path that the agent actually travels.