Module 3 and 4
Module 3 and 4
Best-First Search
Best-first search is a key informed search strategy that employs an evaluation function, f(n), to
prioritize which node to expand next. The node with the lowest evaluation is expanded first, making
the search process more efficient.
• Evaluation Function: The function f(n) serves as a cost estimate. The best-first search can be
implemented similarly to uniform-cost search, but it uses f instead of g (the cost to reach node n)
to order the nodes in the priority queue.
Heuristic Function
A critical component of f(n) is the heuristic function h(n):
• Heuristic Definition: h(n) represents the estimated cost of the cheapest path from the current
state at node n to a goal state. It only depends on the state at node n, not on the path taken to
get there.
• Example: In the context of a route-finding problem, such as determining the shortest path from
Arad to Bucharest, h(n) could be represented by the straight-line distance between these two
cities.
Characteristics of Heuristics
Heuristics play a vital role in informing the search algorithm by guiding it toward promising paths:
1. Nonnegative: Heuris tic values are generally nonnegative, ensuring that they provide a
reasonable estimate of the remaining cost.
2. Goal State: If n is a goal node, then h(n) = 0.
1
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
These strategies can lead to more efficient searches by effectively narrowing down the most promising
paths based on the additional knowledge provided by heuristics.
How It Works
In route-finding problems, such as navigating from Arad to Bucharest in Romania, the straight-line
distance heuristic (denoted as hSLD) is employed. For instance, hSLD(Arad) = 366 kilometers indicates
the direct distance to Bucharest.
Node Expansion: The algorithm expands the node that is closest to the goal based on the heuristic.
For example, starting from Arad:
o The first node expanded is Sibiu, as it is closer to Bucharest than Zerind or Timisoara.
o Next, it expands Fagaras, which leads directly to Bucharest.
In this case, the greedy best-first search finds a solution efficiently without exploring unnecessary
nodes. However, the chosen path (via Sibiu and Fagaras) is not optimal; an alternative route through
Rimnicu Vilcea and Pitesti is actually shorter by 32 kilometers.
2
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
o The optimal route would require going to Vaslui first, but greedy best-first search would
loop back to Iasi, leading to an infinite cycle.
• Complexity:
o The worst-case time and space complexity for the tree version of greedy best-first search
is O(bm), where b is the branching factor and m is the maximum depth of the search
space.
o However, a good heuristic can significantly reduce complexity, depending on its
effectiveness and the specific problem being solved.
Fig. Stages in a greedy best-first tree search for Bucharest with the straight-line distance heuristic hSLD.
Nodes are labeled with their h-values.
3
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. Stages in an A∗ search for Bucharest. Nodes are labeled with f = g +h. The h values are the straight-
line distances to Bucharest.
4
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
A* search is a widely used informed search algorithm that combines the cost to reach a node (g(n))
and the estimated cost to the goal (h(n)) using:
f(n)=g(n)+h(n)
Key Features
1. Path Cost Estimation: A* prioritizes nodes with the lowest f(n), representing the estimated total
cost of the cheapest solution through that node.
2. Optimal and Complete: A* is complete (it finds a solution if one exists) and optimal (it finds the
least-cost solution) if the heuristic h(n) is admissible and consistent.
3. Comparison to Uniform-Cost Search: A* extends uniform-cost search by incorporating heuristic
information.
Optimality of A*
A* search has two optimality properties:
1. Tree-Search Version: Optimal if the heuristic h(n) is admissible.
2. Graph-Search Version: Optimal if h(n) is consistent.
Key Concepts
• Nondecreasing f(n): If h(n) is consistent, then f(n) values along any path do not decrease. This
ensures that when A* selects a node for expansion, it has found the optimal path to that node.
5
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
• Expansion Order: A* expands nodes in nondecreasing order of f(n), meaning the first goal node
expanded is guaranteed to be optimal.
Search Behavior
• Nodes are organized in concentric "contours" based on f(n) values. A* expands all nodes with
f(n)<C∗ (optimal path cost) and may also expand some on the "goal contour" where f(n)=C∗.
Fig. Map of Romania showing contours at f = 380, f = 400, and f = 420, with Arad as the start state.
Nodes inside a given contour have f-costs less than or equal to the contour value.
6
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Practical Limitations
• A* can be impractical for large-scale problems due to high memory usage; it often runs out of
space before time.
• Variants of A* can find suboptimal solutions faster, and more accurate but non-admissible
heuristics can be developed.
Despite its limitations, A* remains a powerful search algorithm, particularly when coupled with good
heuristics.
Key Features:
• Iterative Deepening: IDA* increases the f-cost cutoff after each iteration based on the minimum
f-cost found in the previous iteration.
• Memory Efficient: It doesn't require storing all nodes in memory, making it space-efficient
compared to A*.
• No Priority Queue: IDA* uses a depth-first search strategy without the need for a sorted open list.
Limitations:
• Real-Valued Costs: IDA* struggles with real-valued costs, leading to inefficient repeated searches
and slower performance.
• Redundant Work: The algorithm may revisit the same parts of the search space multiple times,
which can reduce its overall efficiency.
IDA* is useful for large state spaces with unit costs where memory is a concern but can be inefficient
with non-integer or real-valued costs.
7
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
8
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
9
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. Stages in an RBFS search for the shortest route to Bucharest. The f-limit value for each recursive
call is shown on top of each current node, and every node is labelled with its f-cost. (a) The path via
Rimnicu Vilcea is followed until the current best leaf (Pitesti) has a value that is worse than the best
alternative path (Fagaras). (b) The recursion unwinds and the best leaf value of the forgotten subtree
(417) is backed up to Rimnicu Vilcea; then Fagaras is expanded, revealing a best leaf value of 450. (c)
The recursion unwinds and the best leaf value of the forgotten subtree (450) is backed up to Fagaras;
then Rimnicu Vilcea is expanded. This time, because the best alternative path (through Timisoara)
costs at least 447, the expansion continues to Bucharest.
Simplified Memory-Bounded A*
10
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
worst leaf node when memory is full. The discarded node's value is backed up to its parent, allowing
SMA* to retain useful information without storing the entire search tree.
Key points:
• Expansion and Deletion: Expands the best node, deletes the worst (highest f-value) node, and
backs up its value to its parent.
• Ties: If nodes have the same f-value, the newest best leaf is expanded, and the oldest worst leaf
is deleted.
• Completeness: SMA* is complete if the shallowest goal node’s depth is smaller than the memory
size.
• Optimality: SMA* returns the best solution within memory limits, but may not be optimal if the
entire path cannot fit in memory.
• Practical Use: Suitable for problems with limited memory, but can suffer from thrashing (repeated
regeneration of nodes), making it inefficient on harder problems.
SMA* offers a tradeoff between time and memory: it is less optimal than A*, but more feasible when
memory is the limiting factor.
Heuristic Functions
11
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
In the 8-puzzle, the goal is to arrange tiles by sliding them into the empty space. The average solution
cost is about 22 steps, with a branching factor of 3. An exhaustive search would explore around
3.1×10¹⁰ states, but using a graph search reduces this to 181,440 reachable states.
To solve the puzzle efficiently using A*, a good heuristic is needed. Two common admissible heuristics
are:
• h₁ = Misplaced Tiles: Counts the number of tiles out of place. For the start state in Figure, h₁ = 8.
It’s admissible because each misplaced tile must be moved at least once.
• h₂ = Manhattan Distance: Sums the horizontal and vertical distances of each tile from its goal
position. For the start state, h₂ = 18. It’s admissible because each move brings tiles closer to the
goal.
h2 = 3+1 + 2 + 2+ 2 + 3+ 3 + 2 = 18 .
Both heuristics never overestimate the true solution cost 26, making them effective for guiding the
search algorithm.
Fig. Comparison of the search costs and effective branching factors for the ITERATIVE-DEEPENING-
SEARCH and A∗ algorithms with h1, h2. Data are averaged over 100 instances of the 8-puzzle for each
of various solution lengths d.
The effective branching factor (b*) measures the quality of a heuristic by estimating the branching
factor of a uniform search tree that generates the same number of nodes as A* for a given problem. It
is calculated based on the total number of nodes (N) and solution depth (d).
12
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Points:
• b* Formula: N+1=1+b∗+(b∗)2+...+(b∗)d
• Heuristic Comparison:
o h₁ (Misplaced Tiles): A simple heuristic.
o h₂ (Manhattan Distance): The sum of horizontal and vertical distances from the goal.
Results:
• h₂ is much more efficient than h₁, with A* using h₂ expanding significantly fewer nodes (e.g., for
d=12, A* with h₂ expands 73 nodes, while A* with h₁ expands 227).
• Effective Branching Factor: A* with h₂ has a lower b* (e.g., 1.24 vs. 1.42 for h₁) and outperforms
iterative deepening search by orders of magnitude.
Domination of h₂:
• h₂ dominates h₁ because it consistently produces higher heuristic values, reducing the number of
nodes expanded by A*, making it more efficient.
In summary, h₂ (Manhattan distance) improves A* search efficiency by expanding fewer nodes, with a
lower effective branching factor, making it a superior heuristic compared to h₁ (misplaced tiles).
13
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. A subproblem of the 8-puzzle instance. The task is to get tiles 1, 2, 3, and 4 into their correct
positions, without worrying about what happens to the other tiles.
Pattern databases (PDBs) generate admissible heuristics by storing exact solution costs for
subproblems of a larger problem, providing a lower bound for the full problem.
For example, in the 8-puzzle, a subproblem might involve solving only a subset of tiles (e.g., tiles 1, 2,
3, and 4). The cost of solving this subproblem is a lower bound for the entire puzzle, and using this
information can be more accurate than simpler heuristics like Manhattan distance.
PDB Construction: To create a PDB, perform a backward search from the goal for all possible
configurations of the relevant tiles, storing the solution cost for each configuration. This database is
then used to efficiently guide the search in future problem instances.
Combining PDBs: If two subproblems (e.g., tiles 1-2-3-4 and 5-6-7-8) don’t overlap, their heuristics can
be combined by summing the costs from their individual PDBs. This is the essence of disjoint pattern
databases (DPDBs), which are admissible because the subproblems are independent.
Benefits: DPDBs dramatically improve search efficiency. For instance, in the 15-puzzle, they can reduce
nodes explored by a factor of 1000, and for the 24-puzzle, by a factor of a million, compared to simpler
heuristics like Manhattan distance.
Limitations: DPDBs are highly effective for sliding-tile puzzles but less so for puzzles like Rubik's Cube,
where multiple pieces move simultaneously, making disjoint decomposition difficult.
14
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Combining Features
The features can be combined linearly to predict h(n):
h(n) = c1x1(n) + c2x2(n)
where c1 and c2 are constants adjusted to fit the data. These constants are usually positive, reflecting
the fact that more misplaced tiles or incorrect adjacent pairs make the puzzle harder to solve.
Logical Agents
Knowledge-Based Agents
A knowledge-based agent (KBA) uses a knowledge base (KB) of sentences (assertions about the
world) to guide its actions. These sentences are written in a knowledge representation language and
can be axioms (taken as true without derivation).
Key Operations
• TELL: Adds knowledge to the KB, typically based on percepts (sensor input).
• ASK: Queries the KB to determine the best action based on current knowledge.
15
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Both operations may involve inference, ensuring that answers logically follow from the KB without
fabricating information.
Fig. A generic knowledge-based agent. Given a percept, the agent adds the percept to its knowledge
base, asks the knowledge base for the best action, and tells the knowledge base that it has in fact taken
that action.
Agent Process
1. TELL: The agent adds its percept to the KB.
2. ASK: The agent queries the KB for the best action.
3. TELL: After acting, the agent updates the KB with the action taken.
This cycle repeats, with time t incremented each cycle.
16
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. A typical wumpus world. The agent is in the bottom left corner, facing right.
PEAS description:
• Performance Measure:
o +1000 for exiting with gold.
o -1000 for dying (pit or Wumpus).
o -1 per action, -10 for shooting the arrow.
• Environment:
o 4x4 grid, agent starts at [1,1].
o Wumpus and gold placed randomly (except at [1,1]).
o Pits have 20% chance of being in each room (other than the start).
• Actuators:
o Move Forward, Turn Left/Right, Grab gold, Shoot arrow, Climb to exit.
• Sensors:
o Stench: Wumpus or nearby.
o Breeze: Pit or nearby.
o Glitter: Gold in current room.
o Bump: Wall encountered.
o Scream: Wumpus killed.
17
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Environment Characteristics:
• Discrete: The environment is made up of a finite number of states (rooms) and actions.
• Static: The environment does not change except through the agent’s actions (i.e., no external
events affecting the environment).
• Single-agent: The agent is the only one interacting with the environment.
• Sequential: The outcome depends on the sequence of actions the agent takes.
• Partially observable: The agent cannot directly observe the entire state (e.g., it doesn't know the
exact location of the Wumpus, pits, or gold initially).
Fig. The first step taken by the agent in the wumpus world. (a) The initial situation, after percept
[None, None, None, None, None]. (b) After one move, with percept [None, Breeze, None, None,
None].
Fig. Two later stages in the progress of the agent. (a) After the third move, with percept [Stench, None,
None, None, None]. (b) After the fifth move, with percept [Stench, Breeze, Glitter, None, None].
18
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Agent’s Reasoning:
1. Initial knowledge: The agent starts knowing [1,1] is safe. Based on the percepts, it infers adjacent
rooms [1,2] and [2,1] are safe.
2. Move to [2,1]: The agent detects a Breeze and infers a pit nearby (in [2,2] or [3,1]).
3. Move to [1,2]: The agent detects a Stench and infers the Wumpus is in [1,3]. It concludes no pit
in [2,2].
4. Exploration: The agent continues exploring, avoiding dangerous rooms, using logical reasoning to
deduce safe paths and eventually finds gold.
5. Goal: After grabbing the gold, the agent returns to [1,1] and climbs out of the cave.
Knowledge Representation:
The agent builds a knowledge base (KB) with logical rules (e.g., a Breeze means a pit is adjacent). It
uses reasoning to update its knowledge and decide actions that maximize survival and success.
The Wumpus World illustrates logical agents using reasoning to navigate uncertain environments,
combining percepts to form conclusions and make decisions.
Logic
This section outlines key concepts in logical representation and reasoning, essential for knowledge-
based agents.
1. Syntax and Semantics:
o Syntax: Defines the structure of well-formed sentences (e.g., "x + y = 4").
o Semantics: Assigns meaning to sentences, specifying when they are true or false in a
model (a possible world or assignment of variable values).
2. Models and Satisfaction:
o A model fixes the truth value of sentences.
o A model satisfies a sentence if the sentence is true in that model.
3. Logical Entailment:
o Entailment (denoted α⊨β) means that if α is true, then β must also be true in every model
where α holds.
o This is formalized as M(α) ⊆ M(β), where M(α) is the set of models where α is true.
4. Logical Inference:
o Inference is the process of deriving new conclusions from existing knowledge (KB).
o Model checking involves enumerating possible models to verify whether a sentence holds
in all models where KB is true.
19
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. Possible models for the presence of pits in squares [1,2], [2,2], and [3,1]. The KB corresponding to
the observations of nothing in [1,1] and a breeze in [2,1] is shown by the solid line. (a) Dotted line
shows models of α1 (no pit in [1,2]). (b) Dotted line shows models of α2 (no pit in [2,2]).
5. Soundness and Completeness:
o Soundness: An inference algorithm is sound if it only derives sentences entailed by the KB.
o Completeness: An algorithm is complete if it can derive all sentences entailed by the KB.
6. Wumpus World Example:
o In the Wumpus world, the agent uses logical reasoning to infer the presence of pits based
on percepts (e.g., breeze) and KB.
o For instance, detecting a breeze means a pit is nearby, and the agent can use entailment
to confirm where the pit is located.
Fig. Sentences are physical configurations of the agent, and reasoning is a process of constructing new
physical configurations from old ones. Logical reasoning should ensure that the new configurations
represent aspects of the world that actually follow from the aspects that the old configurations
represent.
7. Grounding:
o Grounding connects logical sentences in the KB to real-world sensory experiences. The
agent’s sensors generate percepts that are translated into sentences in the KB, which are
then used for reasoning.
8. Learning:
o Learning allows the agent to update its KB based on new experiences, improving its ability
to make accurate inferences over time. However, learning is fallible, and the KB may not
always be accurate.
20
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Atomic Sentences
Atomic sentences consist of a single proposition symbol, which represents a statement that can be
true or false. Examples include:
• P1: "There is a pit in square 1."
• W1,3: "The Wumpus is in square [1,3]." Two special symbols:
• True: Always true.
• False: Always false.
Complex Sentences
Complex sentences are formed by combining simpler sentences with logical connectives:
1. Negation (¬): Reverses the truth value, e.g., ¬W1,3 means "The Wumpus is not in [1,3]."
2. Conjunction (∧): Both must be true, e.g., W1,3 ∧ P3,1 means "The Wumpus is in [1,3] and
there is a pit in [3,1]."
3. Disjunction (∨): At least one must be true, e.g., (W1,3 ∧ P3,1) ∨ W2,2 means "Either the
Wumpus is in [1,3] and there is a pit in [3,1], or there is a pit in [2,2]."
4. Implication (⇒): If the premise is true, the conclusion must be true, e.g., (W1,3 ∧ P3,1) ⇒
¬W2,2 means "If the Wumpus is in [1,3] and there is a pit in [3,1], then the Wumpus is not in
[2,2]."
5. Biconditional (⇔): Both sides must be either true or false, e.g., W1,3 ↔ ¬W2,2 means "The
Wumpus is in [1,3] if and only if there is no Wumpus in [2,2]."
21
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
22
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Points:
• Exclusive Or (XOR): A variation of disjunction where both operands cannot be true.
• Implication: True when P is false or Q is true, regardless of causality.
• Biconditional (⇔): True when both sides imply each other, e.g., B1,1 ⇔ (P1,2 ∨ P2,1).
These rules define how to compute the truth value of any sentence in propositional logic, allowing
logical inference.
23
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
• KB is true in a model if all its sentences are satisfied, and we check if α is also true in those
models.
For example, if P1,2 is false in all valid models, we can infer that there is no pit in [1,2].
Fig. A truth table constructed for the knowledge base given in the text. KB is true if R1 through R5 are
true, which occurs in just 3 of the 128 rows (the ones underlined in the right-hand column). In all 3
rows, P1,2 is false, so there is no pit in [1,2]. On the other hand, there might (or might not) be a pit in
[2,2].
Fig. A truth-table enumeration algorithm for deciding propositional entailment. (TT stands for truth
table.) PL-TRUE? returns true if a sentence holds within a model. The variable model represents a
partial model—an assignment to some of the symbols. The keyword “and” is used here as a logical
operation on its two arguments, returning true or false.
24
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Logical Equivalences:
Fig. Standard logical equivalences. The symbols α, β, and γ stand for arbitrary sentences of
propositional logic.
These equivalences are used to manipulate and simplify logical formulas during the inference process.
25
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
4. Satisfiability:
o A sentence is satisfiable if there exists at least one model where it is true. The SAT
problem (determining satisfiability) is NP-complete, central to many computational
problems.
5. Reductio ad Absurdum (Proof by Contradiction):
o This technique assumes the negation of what you want to prove and shows it leads to a
contradiction, proving the original statement.
o α∣=β is equivalent to (α∧¬β) being unsatisfiable.
6. Key Logical Results:
o Validity and Satisfiability are connected: α is valid iff ¬α is unsatisfiable.
o Entailment: α∣=β iff (α∧¬β) is unsatisfiable, reflecting proof by contradiction.
These concepts form the foundation of automated reasoning, enabling efficient entailment and
validity checking without enumerating all models.
2. Applying Inference: In the Wumpus World example, inference rules are used to prove ¬P1,2 (no
pit in [1,2]) by applying rules like biconditional elimination, And-Elimination, contraposition,
Modus Ponens, and De Morgan’s rule.
We start with the knowledge base containing R1 through R5 and show how to prove ¬P1,2, that
is, there is no pit in [1,2].
R2 : B1,1 ⇔ (P1,2 ∨ P2,1)
First, we apply biconditional elimination to R2 to obtain
R6: (B1,1 ⇒ (P1,2 ∨ P2,1)) ∧ ((P1,2 ∨ P2,1) ⇒ B1,1).
Then we apply And-Elimination to R6 to obtain
26
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
3. Proof Search:
o A proof problem is framed as a search problem: start with the knowledge base, apply
inference rules, and search for the desired conclusion. This is more efficient than model
checking as irrelevant propositions are ignored.
4. Monotonicity:
o Monotonicity ensures that adding new information to the knowledge base can only add
new conclusions, not invalidate existing ones. For example, adding facts about pits cannot
contradict previously inferred conclusions.
This method shows how inference rules and logical proof strategies can efficiently derive conclusions
without exhaustive model checking.
Proof by Resolution
Resolution is a complete inference rule used in propositional logic that combines two clauses with
complementary literals (one containing a literal p and the other containing ¬p) to derive a new clause
by removing these complementary literals.
Key Points:
1. Unit Resolution: When one clause is a unit clause (containing only one literal), resolution
simplifies to eliminating complementary literals. For example, resolving P1,1 ∨ P3,1 with ¬P1,1
results in P3,1.
2. Generalized Resolution: When both clauses contain multiple literals, resolution removes the
complementary literals and combines the remaining literals. For instance, resolving P1,1 ∨ P3,1
with ¬P1,1 ∨ ¬P2,2 results in P3,1 ∨ ¬P2,2.
3. Factoring: Multiple occurrences of a literal in a clause are eliminated to avoid redundancy,
ensuring only one copy of each literal in the result.
27
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
4. Soundness: The resolution rule is sound, meaning the derived clause is true if the original clauses
are true. The truth of the resulting clause follows logically from the premises.
5. Completeness: Resolution is complete, meaning it can derive a proof for any valid entailment in
propositional logic, ensuring that if α⊨β, resolution can find a proof that β is true given α.
6. Example from the Wumpus World: The agent uses resolution to deduce the locations of pits based
on its perceptions (like a stench) and prior knowledge, eventually concluding that the pit must be
in a specific location.
Resolution Algorithm:
The resolution algorithm proves entailment by contradiction: to show KB⊨α, we prove that KB ∧ ¬α
is unsatisfiable (leads to a contradiction). This is done through the following steps:
Algorithm Steps:
1. Convert to CNF: Convert KB ∧ ¬α into Conjunctive Normal Form (CNF).
2. Apply Resolution: Apply the resolution rule to pairs of clauses with complementary literals (one
clause containing P, the other ¬P).
3. Continue Resolving:
o If no new clauses can be added, then KB does not entail α.
28
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
PL-RESOLUTION Algorithm
The algorithm resolves pairs of clauses, generating new clauses, and checks for the empty clause:
Fig. A simple resolution algorithm for propositional logic. The function PL-RESOLVE returns the set of
all possible clauses obtained by resolving its two inputs.
Completeness of Resolution
The PL-RESOLUTION algorithm is complete, meaning it can always determine entailment in
propositional logic, as guaranteed by the Ground Resolution Theorem.
1. Resolution Closure (RC(S)): The set of all clauses derivable from a set S via resolution. It is finite
due to factoring.
2. Ground Resolution Theorem:
o If a set is unsatisfiable, RC(S) contains the empty clause (a contradiction).
o If no empty clause is found, the set is satisfiable, meaning a truth assignment exists that
satisfies all clauses.
3. Satisfiability Construction: To prove satisfiability, construct a model by assigning truth values to
literals in S based on the clauses in RC(S).
29
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Points:
• Definite Clauses can be written as implications (e.g., (¬L1,1 ∨ ¬Breeze ∨ B1,1) becomes
(L1,1∧Breeze) ⇒ B1,1).
o The body is the conjunction of positive literals, and the head is the single positive literal.
o A fact is a clause with a single positive literal (e.g., L1,1).
• Inference with Horn Clauses:
▪ Forward-Chaining: Derives new facts from known ones.
▪ Backward-Chaining: Attempts to prove a goal from known facts.
• Deciding Entailment for Horn clauses is linear time, making it highly efficient.
Horn clauses enable efficient forward and backward-chaining inference and fast entailment checking.
30
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Both techniques are efficient inference methods, with forward chaining being data-driven and
backward chaining goal-directed.
31
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
32
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
33
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
sounds for "two"), but this approach doesn't scale well to diverse languages, noisy environments,
or large vocabularies. ML can learn from examples to handle this complexity.
• Fluctuating environments: ML systems can adapt to new data automatically, making them more
flexible than traditional approaches.
• Insights into data: ML algorithms can help identify hidden patterns or trends in large datasets,
offering valuable insights through data mining.
34
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
2. Learning Method:
o Online Learning: Learns incrementally as new data arrives.
o Batch Learning: Learns from a fixed dataset, retraining when new data is available.
3. Learning Approach:
o Instance-based Learning: Compares new data to known data points.
o Model-based Learning: Builds a predictive model from training data.
These categories can overlap. For example, a spam filter using a deep neural network could be an
online, model-based, supervised learning system.
Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the amount and type of
supervision they get during training. There are four major categories: supervised
learning, unsupervised learning, semi-supervised learning, and Reinforcement Learn-
ing.
Supervised learning
Supervised learning is a machine learning technique where the model is trained on labeled data,
meaning the input data comes with corresponding desired output (labels). The model learns the
relationship between inputs and labels to make predictions on new, unseen data.
Types of Tasks:
• Classification: Predicts a discrete label (e.g., spam vs. ham in email classification).
Fig. A labeled training set for supervised learning (e.g., spam classification)
• Regression: Predicts a continuous value (e.g., predicting car prices based on features like mileage
and age).
35
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. Regression
Common Algorithms:
• k-Nearest Neighbors (k-NN): Classifies based on nearest neighbors in feature space.
• Linear Regression: Models the relationship between input features and continuous output.
• Logistic Regression: A classification algorithm used to predict the probability of a class.
• Support Vector Machines (SVMs): Finds the hyperplane that best separates classes.
• Decision Trees/Random Forests: Decision Trees make predictions using learned rules, while
Random Forests combine multiple trees.
• Neural Networks: Learn complex patterns for both classification and regression tasks.
Attributes vs Features:
• Attribute: A data characteristic (e.g., "Mileage").
• Feature: An attribute with a specific value (e.g., "Mileage = 15,000 miles").
Unsupervised Learning
In unsupervised learning, the training data is unlabeled, and the algorithm tries to learn patterns and
structure from the data without explicit guidance or labeled outputs. The goal is to uncover hidden
relationships, groupings, or patterns within the data.
36
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
37
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Fig. Clustering
• Dimensionality Reduction: Reduces the number of features in data (e.g., merging car mileage and
age) to speed up training and improve performance.
• Anomaly Detection: Identifying unusual events like fraudulent credit card transactions or
manufacturing defects by recognizing deviations from typical patterns.
Semi-supervised Learning
Semi-supervised learning combines both labeled and unlabeled data for training. Typically, there is a
small amount of labeled data and a large amount of unlabeled data. The algorithm leverages the
unlabeled data to discover patterns and the labeled data to refine its predictions.
38
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Concepts:
• Combination of Supervised and Unsupervised Learning: Semi-supervised learning algorithms
often integrate unsupervised methods (like clustering) with supervised learning techniques to
improve performance.
Example:
• Photo Recognition: In services like Google Photos, the system may first use clustering to identify
groups of photos containing the same person. Once you label a few photos (e.g., identifying
Person A and Person B), the system can apply the learned pattern to automatically identify and
name people in all other photos, making photo searches easier.
Semi-supervised Algorithms:
• Deep Belief Networks (DBNs): These use unsupervised components (like restricted Boltzmann
machines, or RBMs) for pretraining, followed by supervised fine-tuning to enhance performance.
Reinforcement Learning
Reinforcement Learning (RL) involves an agent that interacts with an environment by taking actions
and receiving feedback in the form of rewards or penalties. The goal is for the agent to learn the best
strategy, or policy, that maximizes its long-term reward.
39
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Concepts:
• Agent: The learning system that observes the environment, takes actions, and receives rewards or
penalties.
• Environment: The external context in which the agent operates and interacts.
• Policy: A strategy that defines the best action for the agent to take in any given situation, aimed at
maximizing rewards over time.
Example:
• Robot Learning to Walk: A robot might use RL to learn how to walk by trial and error. It takes
actions (e.g., moving its legs), receives rewards (if it walks successfully) or penalties (if it falls),
and refines its walking policy to improve over time.
• AlphaGo: DeepMind’s AlphaGo used RL to learn the game of Go. It analyzed millions of games and
played against itself to develop a policy that helped it beat the world champion, Ke Jie, in 2017.
During its match against the champion, AlphaGo was simply applying the learned policy, without
further learning.
Reinforcement Learning is distinct in that the agent learns from its own experience and feedback,
unlike other machine learning methods where the model is trained with labeled data.
40
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Online Learning
41
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
• Challenges:
o Sensitivity to Bad Data: Poor-quality data can degrade performance over time (e.g., faulty
sensors or spam).
o Learning Rate: The learning rate controls adaptation speed. High rates lead to rapid
changes but may forget past data, while low rates ensure stability but slow adaptation.
Out-of-Core Learning
• A method similar to online learning, where data is processed in chunks that fit in memory, useful
for large datasets. This typically occurs offline but is conceptually similar to online learning as data
is processed incrementally.
Instance-Based Learning
Instance-based learning is a machine learning approach where the system "learns by heart" by storing
training examples and generalizing new instances through comparison with stored examples using a
similarity measure.
Working Principle:
• Learning Process: The system stores training data and classifies new instances by comparing them
to stored examples.
• Similarity Measure: For example, in a spam filter, similarity can be based on the number of
common words between emails. If a new email shares many words with spam emails, it is flagged
as spam.
42
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Example:
• A spam filter flags new emails based on their similarity to known spam emails.
Advantages:
• Simple and Intuitive: Easy to implement without needing a complex model.
• Flexible: Can handle complex, non-linear patterns in data.
Disadvantages:
• Memory Intensive: Requires storing all training examples, which can consume a lot of memory.
• Slow Predictions: Making predictions involves comparing new instances to all stored examples,
which can be computationally expensive.
Model-based learning
Model-based learning involves creating a model from a set of examples to make predictions.
43
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Process Overview:
1. Data Collection: Data on life satisfaction (OECD) and GDP per capita (IMF) is merged for various
countries.
Fig. Plot of Life satisfaction vs GDP per capita across various countries
2. Model Selection: A linear model is chosen to relate life satisfaction to GDP per capita:
Life Satisfaction = θ0 + θ1 × GDP_per_capita
Fig. The linear model that fits the training data best
4. Prediction: After training, the model can predict life satisfaction for countries not in the original
dataset, like Cyprus.
Cyprus’s GDP per capita is $22,587, and then apply your model and find that life
satisfaction 4.85 + 22,587 × 4.91 × 10 = 5.96. -5
44
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
6. Improvement: To enhance predictions, more attributes (e.g., employment rate, health) or a more
complex model (like Polynomial Regression) might be required.
Code Example:
The Python code demonstrates loading data, visualizing it, training a Linear Regression model using
Scikit-learn, and making predictions for Cyprus.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
# Load the data
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", thousands=',', delimiter='\t', encoding='latin1',
na_values="n/a")
# Prepare the data
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()
# Select a linear model
model = sklearn.linear_model.LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(model.predict(X_new)) # outputs [[ 5.96242338]]
45
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Poor-Quality Data
Poor-quality data, including errors, outliers, and noise, hampers machine learning model performance.
Data cleaning is crucial, with tasks like discarding or fixing outliers and handling missing features (e.g.,
filling in missing values or ignoring instances). Data scientists often spend significant time on this
process to improve model accuracy.
Irrelevant Features
Effective machine learning requires relevant features, as irrelevant ones can hinder performance.
Feature engineering involves:
• Feature selection: Choosing the most useful features.
• Feature extraction: Combining features for greater utility.
• Creating new features: Gathering additional data.
46
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Stepping Back
Machine learning improves performance by learning from data, not explicit rules. ML systems vary
(supervised, unsupervised, batch, online, etc.), and in a project, you gather data, train an algorithm,
and tune the model to fit the data. The training set must be sufficient, representative, and clean, with
47
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
relevant features. Models should avoid underfitting (too simple) and overfitting (too complex). Finally,
evaluate and fine-tune the model to ensure it generalizes well to new cases.
48
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Astronomical Classification:
Task: Classifying celestial objects.
Performance: Classification accuracy.
Experience: Astronomical image databases.
Backgammon:
Task: Playing at a competitive level.
Performance: Win rate.
Experience: Self-play games.
49
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
50
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
51
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Representation Options
• Table Lookup: Stores values for each board state but is inefficient for large boards.
• Rules or Polynomials: Less flexible or too complex.
• Neural Networks: Powerful but require large datasets.
Chosen Representation
The target function c(b) is represented as a linear combination of key board features:
• x1: Number of black pieces
• x2: Number of red pieces
• x3: Number of black kings
• x4: Number of red kings
• x5: Number of black pieces threatened by red
• x6: Number of red pieces threatened by black
Thus, the evaluation function is:
c(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6
Where w0,w1,…,w6 are weights learned by the algorithm.
Summary of Design Choices
• Task (T): Playing checkers.
• Performance Measure (P): Percentage of games won.
• Training Experience (E): Games played against itself.
• Target Function (V): Maps board states to scores.
• Representation (c): A linear function of board features, with weights to be learned.
This reduces the problem to learning the weights w0 through w6 for the linear function.
52
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
53
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Evaluation
While the linear function may not be sufficient to beat a world champion, similar approaches with
more sophisticated representations (e.g., neural networks) have proven successful. Tesauro’s
backgammon program, for instance, used a neural network to learn a complex evaluation function
and played at a high level after extensive training.
Alternative approaches like nearest neighbor or genetic algorithms could also be applied. This design
represents one approach to structuring machine learning tasks.
54
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
55
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Hypothesis Representation
Each hypothesis is a conjunction of constraints on the attributes, represented by a 6-tuple. For each
attribute:
• "?": Any value is acceptable (wildcard).
• Specific value: A required value (e.g., "Warm").
• "0": No value is acceptable.
For example, (?, Cold, High, ?, ?, ?) means Aldo enjoys his sport on cold days with high humidity,
regardless of other attributes.
Hypothesis Space
• Most General Hypothesis: (?, ?, ?, ?, ?, ?) — every day is a positive example.
• Most Specific Hypothesis: (0, 0, 0, 0, 0, 0) — no day is a positive example.
56
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Training Examples
The learner is given examples (days with attribute values and corresponding EnjoySport values). For
instance:
Sky AirTemp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
Learning Process
• Positive Example: The hypothesis generalizes to include new positive examples.
• Negative Example: The hypothesis specializes to exclude negative examples.
Goal
Learn a hypothesis that correctly predicts EnjoySport based on the attributes, starting from the most
specific hypothesis and generalizing or specializing it with each training example.
Notation
In concept learning, the task is to predict the target concept based on given attributes.
1. Instances (X): Set of all possible days, described by attributes: Sky, AirTemp, Humidity, Wind,
Water, and Forecast.
2. Target Concept (c): A boolean function c : X→ {0,1}, where c(x)=1 if EnjoySport = Yes, and c(x)=0 if
EnjoySport = No.
3. Hypotheses (H): Set of possible hypotheses, each representing a boolean-valued function over X,
defined by conjunctions of constraints on the attributes. Each constraint can be:
o "?": Any value is acceptable.
o "0": No value is acceptable.
o A specific value.
4. Training Examples (D): A set of pairs (x, c(x)), where x is an instance and c(x) is the target value
(Yes or No). Positive examples have c(x)=1, and negative examples have c(x)=0.
5. Goal: The learner's task is to find a hypothesis h from H such that h(x) = c(x) for all instance x∈X.
57
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Algorithm Steps:
1. Initialize the hypothesis as the most specific possible.
2. Process each positive example:
o If the hypothesis doesn't cover the example, generalize it by replacing specific attribute
values with ? to match the example.
3. Ignore negative examples, as the hypothesis is already consistent with them.
4. Repeat until the hypothesis covers all positive examples.
Example:
For the EnjoySport task:
• Start with h = <Sunny, Warm, Normal, Strong, Warm, Same>.
• First positive example: No change.
• Second positive example: Generalize to h = <Sunny, Warm, ?, Strong, Warm, Same>.
• Third negative example: No change.
• Fourth positive example: Further generalize to h = <Sunny, Warm, ?, Strong, ?, ?>.
Key Characteristics:
• Most specific hypothesis: The algorithm results in the most specific hypothesis that is consistent
with the positive examples.
• No backtracking: It does not reconsider prior generalizations.
58
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Limitations:
1. Convergence: FIND-S cannot confirm whether it has found the correct target concept, as multiple
hypotheses can be consistent with the positive examples.
2. Preference for specificity: It always favors the most specific hypothesis, though a more general
one might sometimes be better.
3. Inconsistent training data: The algorithm assumes error-free data and does not handle noise or
inconsistencies well.
4. Multiple consistent hypotheses: If there are multiple maximally specific hypotheses, FIND-S does
not explore alternatives and would need extensions to do so.
Key Concepts:
• Consistent Hypothesis: A hypothesis h is consistent with the training set D if it correctly classifies
all examples in D (i.e., h(x)=c(x) for each example (x, c(x))).
• Version Space: The version space, VS(H, D), is the subset of hypotheses from hypothesis space H
that are consistent with the training data D.
• Incremental Refinement: The version space is updated after each training example: hypotheses
are refined to only those consistent with the new example, either positive or negative.
Main Advantage:
The Candidate-Elimination algorithm provides a compact representation of all hypotheses consistent
with the training data, allowing for more flexibility than FIND-S, though both algorithms struggle with
noisy data.
59
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Key Steps:
1. Initialize Version Space: Include all hypotheses in H.
2. Eliminate Inconsistent Hypotheses: Remove hypotheses that do not match the training examples.
3. Output Version Space: After processing all examples, output the remaining hypotheses.
The Candidate-Elimination algorithm improves this by using a more compact representation with
general and specific boundary sets:
• General Boundary G: Maximally general hypotheses consistent with the data.
• Specific Boundary S: Minimally general (maximally specific) hypotheses consistent with the data.
The version space is represented by these boundaries and hypotheses between them in the general-
to-specific ordering, avoiding the need to list all hypotheses explicitly.
Key Steps:
1. Initialize G and S:
o G starts with the most general hypothesis.
o S starts with the most specific hypothesis.
2. For each training example d:
o If d is a positive example:
▪ Remove inconsistent hypotheses from G.
▪ Generalize inconsistent hypotheses in S to be consistent with d.
▪ Remove any more general hypotheses in S.
o If d is a negative example:
▪ Remove inconsistent hypotheses from S.
▪ Specialize inconsistent hypotheses in G to be consistent with d.
▪ Remove any less general hypotheses in G.
60
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
3. Output: After processing all examples, the version space contains all hypotheses in G and S
consistent with the training data.
Duality:
• Positive examples: Narrow G and expand S.
• Negative examples: Expand G and narrow S.
The algorithm ensures that the version space includes only hypotheses consistent with the observed
data, using minimal generalizations and specializations. It can be applied to any learning task where
these operations are defined.
61
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
Instances that satisfy all S hypotheses are classified as positive, and those that satisfy none of the G
hypotheses are classified as negative. If further ambiguity exists, a majority vote can be used to classify
the instance, reflecting the most probable outcome based on the current version space.
Inductive Bias
1. A Biased Hypothesis Space
If the hypothesis space doesn’t include the target concept, the algorithm may fail to find it. For
example, with the EnjoySport task, restricting the hypothesis space to conjunctions of attributes
prevents representing disjunctive concepts like "Sky = Sunny or Cloudy." This can lead to no consistent
hypothesis being found, requiring a more expressive hypothesis space.
2. An Unbiased Learner
To ensure the target concept is in the hypothesis space, an unbiased learner would need a hypothesis
space representing all possible subsets of instances (the power set). However, while this allows any
concept to be expressed, it causes the learner to fail at generalizing beyond observed examples. For
instance, the S boundary would just be a disjunction of positive examples, and the G boundary would
exclude negative ones, meaning only training examples could be classified unambiguously. The learner
would need to see every instance to converge on the target concept, and even with partial learning, it
can't generalize to unobserved instances.
62
Intelligent Systems and Machine Learning Algorithms (BEC515A) Prepared by: Chetan B V, Asst. Prof., ECE, GMIT, Davangere
63