AI Notes Serialized
AI Notes Serialized
• John McCarthy, who coined the term Artificial Intelligence in 1956, defines it as
"the science and engineering of making intelligent machines", especially
intelligent computer programs.
• Artificial Intelligence (AI) is the intelligence of machines and the branch of
computer science that aims to create it.
• Intelligence is the computational part of the ability to achieve goals in the world.
Varying kinds and degrees of intelligence occur in people, many animals and
some machines.
• AI is the study of the mental faculties through the use of computational models.
• AI is the study of: How to make computers do things which, at the moment,
people do better.
• AI is the study and design of intelligent agents, where an intelligent agent is a
system that perceives its environment and takes actions that maximize its chances
of success.
Meaning of Artificial:
Ability to:
✓ Acquire
✓ Understand
✓ Apply Knowledge
✓ Exercise thoughts
✓ Reason
Approaches to AI
Hard or Strong AI
• Generally, artificial intelligence research aims to create AI that can replicate
human intelligence completely.
• Strong AI refers to a machine that approaches or supersedes human intelligence,
◊ If it can do typically human tasks,
◊ If it can apply a wide range of background knowledge and
◊ If it has some degree of self-consciousness.
• Strong AI aims to build machines whose overall intellectual ability is
indistinguishable from that of a human being.
Soft or Weak AI
• Weak AI refers to the use of software to study or accomplish specific problem
solving or reasoning tasks that do not encompass the full range of human
cognitive abilities.
▪ Example: a chess program such as Deep Blue.
▪ Weak AI does not achieve self-awareness; it demonstrates wide range of human-
level cognitive abilities; it is merely an intelligent, a specific problem-solver.
Goals of AI
• Systems that think like humans
• Systems that think rationally
• Systems that act like humans
• Systems that act rationally
General AI Goal
• Replicate human intelligence: still a distant goal.
• Solve knowledge intensive tasks.
• Make an intelligent connection between perception and action.
• Enhance human-human, human-computer and computer to computer
• interaction / communication.
Engineering based AI Goal
• Develop concepts, theory and practice of building intelligent machines
• Emphasis is on system building.
Science based AI Goal
• Develop concepts, mechanisms and vocabulary to understand biological
• intelligent behaviour.
• Emphasis is on understanding intelligent behaviour.
History of AI
1945 Isaac Asimov, a Columbia University alumni, coined the term Robotics.
1985 Harold Cohen created and demonstrated the drawing program, Aaron.
The Deep Blue Chess Program beats the then world chess champion, Garry
1997
Kasparov.
Goals of AI
• To Create Expert Systems − The systems which exhibit intelligent behavior,
learn, demonstrate, explain, and advice its users.
• Gaming − AI plays crucial role in strategic games such as chess, poker, tic-tac-
toe, etc., where machine can think of large number of possible positions based
on heuristic knowledge.
o Police use computer software that can recognize the face of criminal
with the stored portrait made by forensic artist.
• Intelligent Robots − Robots are able to perform the tasks given by a human.
They have sensors to detect physical data from the real world such as light,
heat, temperature, movement, sound, bump, and pressure. They have efficient
processors, multiple sensors and huge memory, to exhibit intelligence. In
addition, they are capable of learning from their mistakes and they can adapt
to the new environment.
Difference between Human and Machine Intelligence
• Humans perceive by patterns whereas the machines perceive by set of rules
and data.
• Humans can figure out the complete object even if some part of it is missing or
distorted; whereas the machines cannot do it correctly.
Heuristics
• Define the problem precisely – find input situations as well as final situations for
acceptable solution to the problem.
• Analyse the problem – find few important features that may have impact on the
appropriateness of various possible techniques for solving the problem.
• Isolate and represent task knowledge - necessary to solve the problem
• Choose the best problem-solving technique(s) and apply to the particular
problem.
Problem Description
A problem consists of the description of:
− current state of the world,
− actions that can transform one state of the world into another,
− desired state of the world.
◊ State space is defined explicitly or implicitly
A state space should describe everything that is needed to solve a
problem and nothing that is not needed to the solve the problem.
◊ Initial state is start state
◊ Goal state is the conditions it has to fulfill
− A description of a desired state of the world;
− The description may be complete or partial.
◊ Operators are to change state
− Operators do actions that can transform one state into another.
− Operators consist of : Preconditions and Instructions;
Preconditions provide partial description of the state of the world
that must be true in order to perform the action,
Instructions tell on how to create next state.
− Operators should be as general as possible, to reduce their number.
◊ Elements of the domain has relevance to the problem
− Knowledge of the starting point.
◊ Problem solving is finding solution
− Finding an ordered sequence of operators that transform the current
(start) state into a goal state;
◊ Restrictions are solution quality any, optimal, or all
− Finding the shortest sequence, or
− Finding the least expensive sequence defining cost, or
− Finding any sequence as quickly as possible.
Search and Control Strategies
2. Optimality: Does the solution have low cost or the minimal cost?
3. What is the search cost associated with the time and memory required to find
a solution?
a. Time complexity: Time taken (number of nodes expanded) (worst or
average case) to find a solution.
b. Space complexity: Space used by the algorithm measured in terms of
the maximum size of fringe
Production System
• A set of rules, left side determines the applicability of the rule and right side
determines the operation to be performed, if rule is applied.
• One or more knowledge/databases that contain whatever information is
appropriate for particular task.
• A control strategy that specifies the order in which the rules will be compared
to the database and a way of resolving the conflicts that arise when several rules
match at once.
• A rule applier
Problem Characteristics
A decomposable Problem
2. Can solution steps be ignored or at least undone if they prove
unwise?
Three kind of problems:
• Ignorable (e.g. Theorem Proving), in which solution steps can be
ignored.
• Recoverable (e.g. 8-puzzle), in which solution steps can be undone.
• Irrecoverable (e.g. Chess), in which solution steps can not be undone.
3. Is the problem’s universe predictable?
Certain–outcome problems (8-puzzle) the result of an action can be predicted
perfectly.
• Planning can be used to generate a sequence of operators that is
guaranteed to lead to a solution.
Un-certain outcome Problems (e.g. Bridge)
• Planning can at best generate a sequence of operators that has a good
probability of leading to a solution.
• Hardest type of problem to solve is irrecoverable, un-certain outcome.
4. Is the good solution Absolute and relative?
5. Is a large amount of knowledge absolutely required to solve the problem, or
is knowledge important only to constrain the search?
Chess Game: Little knowledge is required.
• Additional knowledge helps to constrain search and speed up the execution
of the program.
• Newspaper Story Understanding: Lot Knowledge is required even to be
able to recognize a solution.
6. Can a computer that is simply given the problem return the solution, or will
the solution of the problem require interaction between the computer and a
person?
Two Types of problems:
Search Algorithms
This procedure could lead to an eventual solution within a short period of time if
done systematically.
Inefficient for problems with large space.
For problems much harder than this, even heuristic generate-and-test, is not very
effective technique.
Hill Climbing
• Searching for a goal state = Climbing to the top of a hill
• In other words, the heuristic tells us approximately how far the state is from
the goal state
Ways Out
• Backtrack to some earlier node and try going in a different direction.
F(N)=g(N)+h(N)
g is a measure of the cost of getting from the start node to the current node N.
Evaluation function
• The choice of evaluation function critically determines search results.
• Consider Evaluation function
f (X) = g (X) + h(X)
– h (X) = the number of tiles not in their goal
position in a given state X
– g(X) = depth of node X in the search tree
• For Initial node
– f(initial_node) = 4
• Apply A* algorithm to solve it.
Start State
Search Tree f = 0+4
3 7 6
5 1 2
4 8
up left right
(1+3) (1+5) (1+5)
3 7 6 3 7 6 3 7 6
5 2 5 1 2 5 1 2
4 1 8 4 8 4 8
up left right
(2+3) (2+3) (2+4)
3 6 3 7 6 3 7 6
5 7 2 5 2 5 2
4 1 8 4 1 8 4 1 8
left right
(3+2) (3+4)
3 6 3 6
5 7 2 5 7 2
4 1 8 4 1 8
down
(4+1)
5 3 6 right 5 3 6 Goal
7 2 7 2 State
4 1 8 4 1 8
Note: Breadth First Search, Depth First Search from class
lecture
And all other Problems done in Lectures and Tutorials.
References:
1. ARTIFICIAL INTELLIGENCE: A MODERN APPROACH by STUART RUSSEL, PETER NORVIG, PEARSON, 2nd Edition,
(2012)
2. ARTIFICIAL INTELLIGENCE by SAROJ KAUSHIK, CENGAGE LEARNING, 1st Edition, (2011)
Intelligent Agents
Agent
An Agent is anything that can be viewed
as perceiving its environment through
sensors and acting upon that
environment through actuators.
• A human agent has eyes, ears, and
other organs for sensors and hands,
legs, mouth, and other body parts for
actuators.
• Rational Action: any action that maximizes the expected value of the performance
measure given the percept sequence.
• Agents can perform actions in order to modify future percepts so as to obtain useful
information (information gathering, exploration)
• An agent is autonomous if its behavior is determined by its own experience (with ability
to learn and adapt)
Task Environment
• To design a rational agent we must specify its task
environment.
• PEAS description of the task environment –Example:
Automated Taxi Driving.
Properties of Task Environments
• An environment is partially observable when some parts of the state are missing
from the sensor data.
Example: A vacuum agent with only a local dirt sensor cannot tell whether the
other squares are dirty.
Example: Taxi driving is stochastic, because one can never exactly predict the
behavior of the traffic.
❑ Episodic vs. Sequential:
• In Episodic environment, the agent's experience is divided into
atomic "episodes" (each episode consists of the agent perceiving
and then performing a single action), and the choice of action in
each episode depends only on the episode itself.
Example: agent spotting defective parts on the assembly line.
• In Sequential environments, current decision could affect future
decisions.
Example: playing chess
❑ Static vs. Dynamic:
• If the environment can change while the agent is choosing an
action, the environment is dynamic, otherwise static.
• If the environment itself doesn’t change, but the agents
performance score does, then it is semidynamic.
Example: Crossword – static, Autonomous Driving– dynamic,
playing chess – semi dynamic,
❑ Discrete vs. Continuous:
• Discrete environment involve a limited number of distinct,
clearly defined percepts and actions; Otherwise Continuous.
Example: Chess playing –Discrete
Autonomous Driving- Continuous
1. Tree Structures
A tree is a hierarchical data structure composed of nodes, where each node has a
parent-child relationship with others. Trees are widely used in databases, artificial
intelligence, computer networks, and more.
Types of Trees:
1. Binary Tree: Each node has at most two children (left and right).
2. Binary Search Tree (BST): A binary tree where the left child contains smaller
values and the right child contains larger values.
3. Balanced Trees (AVL, Red-Black Tree): Trees that maintain balance to
optimize search operations.
4. Heap: A complete binary tree used in priority queues (Min-Heap and Max-
Heap).
5. B-Trees and B+ Trees: Used in databases for efficient indexing and searching.
2. Graph Structures
A graph is a data structure that consists of nodes (vertices) and edges (connections)
between them. Graphs are used in networking, social media, route planning, AI, and
many other applications.
Types of Graphs:
• Graph Traversal:
o Depth-First Search (DFS): Explores as deep as possible before
backtracking.
o Breadth-First Search (BFS): Explores all neighbors before going deeper.
• Shortest Path Algorithms:
o Dijkstra’s Algorithm: Finds the shortest path in weighted graphs.
o Bellman-Ford Algorithm: Works for graphs with negative weights.
o Floyd-Warshall Algorithm: Finds shortest paths between all pairs of
nodes.
• Minimum Spanning Tree (MST):
o Kruskal’s Algorithm and Prim’s Algorithm for finding MST.
• Topological Sorting: Used for scheduling tasks with dependencies in DAGs.
Comparison between Tree and Graph structures
Search Algorithms in AI
Search algorithms are fundamental techniques in Artificial Intelligence (AI) used for
problem-solving and decision-making. They help AI agents find paths, explore states, and
reach optimal solutions in domains like robotics, game playing, and navigation.
2. Random Search
Definition
Random search is a basic uninformed search technique where the AI randomly selects
possible moves without any specific strategy until the goal is found.
How It Works
Advantages
• Simple to implement
• Works when no information is available
• Can be useful for stochastic environments
Disadvantages
• Inefficient—may take a long time to reach the goal.
• No guarantee of finding the optimal path.
Repeats states since it lacks memory.
Definitions
• Open List: A set of nodes that have been discovered but not yet fully explored.
• Closed List: A set of nodes that have already been explored and will not be revisited.
A* is an informed search algorithm that finds the shortest path using both:
Steps:
1. Start with the Open List (contains only the start node).
2. Pick the node with the lowest f(n) value.
3. Move the node to the Closed List.
4. Expand its neighbors:
o If a neighbor is not in the Open List, add it.
o If a neighbor is already in Open List but has a better path, update it.
5. Repeat until the goal is found.
BFS is an uninformed search algorithm that explores all nodes level by level before
moving to the next depth.
Advantages of BFS
• Guaranteed to find the shortest path in an unweighted graph.
Explores all possible solutions before moving deeper.
Works well for small graphs.
Disadvantages of BFS
DFS is an uninformed search algorithm that explores a path as deep as possible before
backtracking.
Advantages of DFS
Disadvantages of DFS
• May get stuck in infinite loops (without a depth limit or cycle detection).
• Does not guarantee the shortest path.
• Can be inefficient if many paths need to be explored before reaching the goal.
Example: Chess
Types of Games in AI
Games in AI can be classified based on different criteria:
1. Minimax Algorithm
• Concept:
o One player is MAX (tries to maximize score).
o The other is MIN (tries to minimize MAX's score).
o The algorithm recursively evaluates future game states to choose the best
move.
• Steps:
1. Generate a game tree up to a certain depth.
2. Evaluate the game states using a heuristic function.
3. Use the minimax principle to backtrack and choose the optimal move.
• Example (Tic-Tac-Toe):
o MAX tries to get the highest possible score (win = +1).
o MIN tries to force MAX into the worst outcome (lose = -1).
Limitation:
2. Alpha-Beta Pruning
Alpha-beta pruning is an optimization of the minimax algorithm that reduces the number of
nodes evaluated.
• Key Idea:
o If a branch cannot improve the current best choice, it is ignored (pruned).
o Uses two parameters:
▪ Alpha (α): Best value found for MAX.
▪ Beta (β): Best value found for MIN.
• Benefits:
o Speeds up minimax search.
o More efficient in deep game trees.
• Example:
o In Chess, if one move is found to be worse than an already examined move,
further evaluation is skipped.
fo
.in
rs
de
ea
yr
.m
Knowledge Representation
w
w
,w
◊ Logic ◊ Rules
◊ Frames ◊ Semantic Net
ea
yr
1. Introduction
.m
w
w
,w
• Knowledge Progression
Organizing Interpretation Understanding
Data Information Knowledge Wisdom
Analyzing Evaluation Principles
04
fo
.in
rs
de
KR -Introduction
ea
yr
• Knowledge Model (Bellinger 1980)
.m
w
w
,w
knowledge to wisdom.
C
C
R
Degree of
Connectedness
Wisdom
Understanding
principles
Knowledge
Understanding
patterns
Information
Understanding
relations
Degree of
Data
Understanding
Fig. Knowledge Model
The distinctions between data, information, knowledge, and wisdom are not
very discrete. They are more like shades of gray, rather than black and
white (Shedroff, 2001).
"data" and "information" deal with the past; they are based on the
gathering of facts and adding context.
"knowledge" deals with the present that enable us to perform.
"wisdom" deals with the future, acquire vision for what will be, rather
than for what is or was.
05
fo
.in
rs
de
KR -Introduction
ea
yr
• Knowledge Category
.m
w
w
,w
ea
yr
• Knowledge Type
.m
w
w
,w
ea
yr
1.1 Framework of Knowledge Representation (Poole 1998)
.m
w
w
,w
Solve
Problem Solution
Formal
Compute
Representation Output
10
fo
.in
rs
de
KR - framework
ea
yr
• Knowledge and Representation
.m
w
w
,w
In simple words, we :
− need to know about things we want to represent , and
ea
yr
• Mapping between Facts and Representation
.m
w
w
,w
Reasoning
programs
Internal
Facts
Representation
English English
understanding generation
English
Representation
Facts Representations
ea
■ Forward and Backward Representation
yr
.m
w
w
,w
Desired real
ha
Facts Facts
R
Forward Backward
representation representation
mapping mapping
Internal English
Representation Operated by Representation
program
‡ The doted line on top indicates the abstract reasoning process that a
13
fo
.in
rs
de
KR - framework
ea
yr
• KR System Requirements
.m
w
w
,w
Note : To date no single system can optimizes all of the above properties.
14
fo
.in
rs
de
KR - schemes
ea
yr
1.2 Knowledge Representation Schemes
.m
w
w
,w
◊ Relational Knowledge :
C
R
attributes.
− any instance in which two different objects are compared is a
◊ Inferential Knowledge
− is inferred from objects through relations among objects.
− e.g., a word alone is a simple syntax, but with the help of other
words in phrase the reader may infer more from a word; this
inference within linguistic is called semantics.
◊ Declarative Knowledge
− a statement in which knowledge is specified, but the use to which
Procedural Knowledge
− a representation in which the control information, to use the
knowledge, is embedded in the knowledge itself.
− e.g. computer programs, directions, and recipes; these indicate
specific use or implementation;
ea
yr
• Relational Knowledge :
.m
w
w
,w
different domains.
16
fo
.in
rs
de
KR - schemes
ea
yr
• Inheritable Knowledge :
.m
w
w
,w
inherit attributes from their parents, but in many cases not all attributes of
the parent elements be prescribed to the child elements.
The inheritance is a powerful form of inference, but not adequate. The basic
KR needs to be augmented with inference mechanism.
instance instance
ea
yr
• Inferential Knowledge :
.m
w
w
,w
This new information does not require further data gathering form source,
kr
ha
C
knowledge.
Example :
− given a set of relations and values, one may infer other values or
relations.
− a predicate logic (a mathematical deduction) is used to infer from a set
of attributes.
− inference through predicate logic uses a set of logical operations to relate
individual data.
− the symbols used for the logic operations are :
" → " (implication), " ¬ " (not), " V " (or), " Λ " (and),
" ∀ " (for all), " ∃ " (there exists).
Note : If more information is made available about these objects and their
relations, then more knowledge can be inferred.
19
fo
.in
rs
de
KR - schemes
ea
yr
• Declarative/Procedural Knowledge
.m
w
w
,w
Declarative knowledge :
kr
ha
domains .
− axioms are assumed to be true unless a counter example is found to
invalidate them.
− domains represent the physical world and the perceived functionality.
Procedural knowledge:
Here, the knowledge is a mapping process between domains that specify
“what to do when” and the representation is of “how to make it” rather
than “what it is”. The procedural knowledge :
− may have inferential efficiency, but no inferential adequacy and
acquisitional efficiency.
− are represented as small programs that know how to do specific things,
how to proceed.
Example : A parser in a natural language has the knowledge that a noun
phrase may contain articles, adjectives and nouns. It thus accordingly call
routines that know how to process articles, adjectives and nouns.
20
fo
.in
rs
de
KR - issues
ea
yr
1.3 Issues in Knowledge Representation
.m
w
w
,w
The issues that arise while using KR techniques are many. Some of these
C
R
◊ Important Attributes :
Any attribute of objects so basic that they occur in almost every
problem domain ?
◊ Choosing Granularity :
At what level of detail should the knowledge be represented ?
◊ Set of objects :
How sets of objects be represented ?
Note : These issues are briefly explained, referring previous example, Fig.
Inheritable KR. For detail readers may refer book on AI by Elaine Rich & Kevin
Knight- page 115 – 126.
21
14 Expert System
IEARNING OBJECTIVES
An expert system is a computer system that emulates, or acts in all respects, with the
decision-making capabilities ofa human expert.
Professor Edward Feigenbaum
following:
The types of related problems involve the
error):
1. Diagnosis: (for example, of a system fault, disease or student
2. Design: (of a computer systems, hotel, etc.); and
3. Interpretation: (of, for example, geological data).
Theappropriate problem-solving technique tends to depend more on the type of the problem rather than
on the domain.
There are many examples of ES. Some of these are as follows:
1. MYCIN: One of the earliest expert systems based on the backward chaining. It might determine
various bacteria that can cause severe infections and can also recommend drugs based on the weight of
the person.
2. DENDRAL: It was an Al-based ESused for the chemical analysis. It is used as aspectrographic data of
substance to predict its molecular structure.
3. RI/XCON: I could select specific software to generate acomputer system wished by the user.
4. PXDES: Itcould easily determine the type and the degree of lung cancer in a patient based on the data.
5. CaDet:It is a clinical support system that could identify cancer in its carly stages in the patients.
6. DXplain: It was also a clinical support system that could suggest a variety of diseases based on the
findings of the doctor.
Figure 14.1 Architectureofa typical expert system for a particular problem domain.
Chapter 14|Expert System
351
eprocedures that implement the control cycle are
dhe proceauintain this separation of the knowledgeseparate from the production rules themselves. It
base and inference engine for several reasons: is
eseoaration makes It possible to represent knowledge in a more natural fashion. For example,
rules are closer to the way in which humans describe their
IF-THEN
14.1.2 Components of an ES
ris important to stress to students that expert systems are assistants to decision makers and not substitutes
for them. ESs does not possess human capabilities. These use aknowledge base of aparticular domain and
bring that knowledge to take on the facts of the particular situation at hand. The knowledge base of an ES
abo contzains heuristic knowledge, i.e.,rules of thumb used by human experts who work in the domain.
The components of ES are represented in Fig. 14.2. It includes the following:
1. Knowiedge base
2. Inference engine
3. User interface
Expert system
Stores
knowiedge
Knowledge base (facts and
Facts rules)
User
User
interface Derives new
Expertise knowiedge
Inference engine from existing
knowledge
system.
Figure 14.2 Components of an expert
Backward Chaining
Withthis strategy,
an ES finds out the answer to the question, "Why this happened?" On the basis of what
hasalready happened, the inference engine tries to find which conditions could have happened in the past
oe this result. This strategy is tollowed for finding cause or reason. For example, diagnosis of blood cancer
in humans.
Fact 1
AND Decision 1
Fact 2
ANDH Decision 4
Fact 3
OR Decision 2
Fact 4
User Interface
User interface provides interaction between user of the ES and the ES itself. Generally, it is anatural language
processing so as to be used by the user who is well-versed with the task domain. The user of the ES need not
be necessarily an expert in AI.
Itexplains how the ES has arrived at a particular recommendation. The explanation might appear in the
following forms:
1. Natural language displayed on the screen.
2. Verbal narrations in the natural language.
3. Listing of rule numbers displayed on the screen.
The user interface makes it easy totrace the credibility of the deductions.
Requirements of Efficient ESUser Interface
1. It should help users to achieve their goals in the shortest possible way.
2. It should be designed to work for the user's existing or the desired work practices.
3. Istechnology should be adaptable to the requirements of the user; not the other way round.
4. It should make efficient use of the user input.
Knowledge-based ESs, or simply ESs, use human knowledge to solve problems that would normally require
human intelligence. These ES represent/the expertise knowledgelas data or rules within the computer. These
rules and data can be called upon when needed to solve problems. Book and manuals have tremendous
amount of knowledge but as human has to read and interpret the knowledge for it to be used. Conventional
computer programs execute tasks using conventional decision-making logic containing litle knowledge
other than the basic algorithm for solving that specific problem and the necessary boundary conditions. This
Program knowledge is often embedded as apart of the programming code, so that the knowledge changes, the
Pogram has to be changed and then rebuilt. Knowledge-based systems/collect the small fragments of human
now-how into aknowledge base which is used to reason through a problem using the knowledge that is
OPriate. Adifferent problem, within the domain of knowledge base, can be solved using the same program
teprogramming.
to handle levels of The ability of these system to explain the reasoning process though back traces and
don't handle. Mostconfidence and uncertainty provides an additional feature that conventional programming
ES are developed by specialised software tools called shels. These shells come equipped
with an inference mechanism (backward chaining, forward chaining or both), and require knowledge to be
Introduction to Probabilistic Reasoning,
Basic Probability, Reasoning under
Uncertainty, Bayes’ Rule, Bayesian
Networks
Probabilistic reasoning
◼ Probabilistic reasoning is a way of knowledge representation where
we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability theory
with logic to handle the uncertainty.
◼ We use probability in probabilistic reasoning because it provides a way
to handle the uncertainty that is the result of someone's laziness and
ignorance.
◼ In the real world, there are lots of scenarios, where the certainty of
something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two
players." These are probable sentences for which we can assume that
it will happen but not sure about it, so here we use probabilistic
reasoning.
Uncertainty
● Most intelligent systems have some degree of
uncertainty associated with them.
● Uncertainty may occur in KBS because of the
problems with the data.
− Data might be missing or unavailable.
− Data might be present but unreliable or ambiguous due to
measurement errors, multiple conflicting measurements etc.
− The representation of the data may be imprecise or
inconsistent.
− Data may just be expert's best guess.
− Data may be based on defaults and the defaults may have
exceptions.
Cont…
● Given numerous sources of errors, the most KBS
requires the incorporation of some form of uncertainty
management.
● For any form of uncertainty scheme, we must be
concerned with three issues.
− How to represent uncertain data?
− How to combine two or more pieces of uncertain data?
− How to draw inference using uncertain data?
● Probability is the oldest theory with strong
mathematical basis.
● Other methods for handling uncertainty are Bayesian
belief network, Certainty factor theory etc.
Probability Theory
● Probability is a way of turning opinion or expectation into
numbers.
● It lies between 0 to 1 that reflects the likelihood of an
event.
● The chance that a particular event will occur = the
number of ways the event can occur divided by the total
number of all possible events.
Example: The probability of throwing two successive
heads with a fair coin is 0.25
− Total of four possible outcomes are :
HH, HT, TH & TT
− Since there is only one way of getting HH,
probability = ¼ = 0.25
Event
Event: Every non-empty subset A (of sample space S) is
called an event.
− null set is an impossible event.
− S is a sure event
● P(A) is notation for the probability of an event A.
● P() = 0 and P(S) = 1
● The probabilities of all events S = {A1, A2, …, An}
must sum up to certainty i.e. P(A1) + … + P(An) = 1
● Since the events are the set, it is clear that all set
operations can be performed on the events.
● If A and B are events, then
− A B ; A B and A' are also events.
− A - B is an event "A but not B
− Events A and B are mutually exclusive, if A B=
Axioms of Probability
● Let S be a sample space, A and B are events.
− P(A) 0
− P(S) = 1
− P(A’ ) = 1 - P(A)
− P(A B ) = P(A) + P(B) – P(A B)
− If events A and B are mutually exclusive, then
P(A B ) = P(A) + P(B),
● In general, for mutually exclusive events A1,…,An in S
P(A1 A2 … An ) = P(A1) + P(A2) + …+ P(An)
Joint Probability
● Joint Probability of the occurrence of two independent events is
written as P (A and B) and is defined by
P(A and B) = P(A B) = P(A) * P(B)
Example: We toss two fair coins separately.
Let P(A) = 0.5 , Probability of getting Head of first coin
P(B) = 0.5, Probability of getting Head of second coin
● Probability (Joint probability) of getting Heads on both the coins is
= P(A and B)
= P(A) * P(B) = 0.5 X 0.5 = 0.25
● The probability of getting Heads on one or on both of the coins i.e.
the union of the probabilities P(A) and P(B) is expressed as
P(A or B) = P(A B) = P(A) + P(B) - P(A) * P(B)
= 0.5 X 0.5 - 0.25
= 0.75
Conditional Probability
● It relates the probability of one event to the occurrence of another
i.e. probability of the occurrence of an event H given that an
event E is known to have occurred.
● Probability of an event H (Hypothesis), given the occurrence of an
event E (evidence) is denoted by P(H | E) and is defined as
follows:
P(H and E)
=
P(E)
Example
● What is the probability of a person to be male if
person chosen at random is 80 years old?
● The following probabilities are given
− Any person chosen at random being male is about 0.50
− probability of a given person be 80 years old chosen at
random is equal to 0.005
− probability that a given person chosen at random is both
male and 80 years old may be =0.002
● The probability that an 80 years old person chosen at
random is male is calculated as follows:
P(X is male | Age of X is 80)
= [P(X is male and the age of X is 80)] / [P(Age of X is 80)]
= 0.002 / 0.005 = 0.4
Conditional Probability with Multiple
Evidences
Joint Probabilities A A’
B 0.20 0.12
B’ 0.65 0.03
● Joint probability distribution for n variables require 2n
entries with all possible combinations.
● The time and storage requirements for such
computations become impractical as n grows.
Contd..
● Inferring with such large numbers of probabilities
does not seem to model human process of
reasoning.
● Human tends to single out few propositions which are
known to be causally linked when reasoning with
uncertain beliefs.
● This leads to the concept of forming belief network
called a Bayesian belief network.
● It is a probabilistic graphical model that encodes
probabilistic relationships among set of variables
with their probabilistic dependencies.
● This belief network is an efficient structure for storing
joint probability distribution.
Definition of BBN
● It is a acyclic (with no cycles) directed graph where
the nodes of the graph represent evidence or
hypotheses and arc connecting two nodes represents
dependence between them.
● If there is an arc from node X to another node Y (i.e.,
X →Y), then X is called a parent of Y, and Y is a
child of X.
● The set of parent nodes of a node Xi is represented
by parent_nodes(Xi).
Joint Probability of n variables
● Joint probability for ‘n’ variables (dependent or
independent) is computed as follows.
● For the sake of simplicity we write P(X1 , … , Xn)
instead of P(X1 and … and Xn).
n
P(X1, … , Xn) = P(Xi | parent_nodes(Xi))
i =1
● This expression is reduction of joint probability
formula of ‘n’ variables as some of the terms
corresponding to independent variables will not be
required.
Cont…
● If node Xi has no parents, its probability distribution
is said to be unconditional and it is written as P(Xi)
instead of P(Xi | parent_nodes(Xi)).
● Nodes having parents are called conditional.
● If the value of a node is observed, then the node is
said to be an evidence node.
● Nodes with no children are termed as hypotheses
node and nodes with no parents are called
independent nodes.
Example
A B
C D
P(A) = 0.3
P(B) = 0.6
P(C|A) = 0.4
P(C|~A) = 0.2
P(D|A, B) = 0.7
P(D|A, ~B) = 0.4
P(D|~A, B) = 0.2
P(D|~A, ~B) = 0.01
Cont…
● They can also be expressed as conditional probability
tables as follows:
Alarm ringing
Now, you have to decide which job is better overall — not just based on money, but
all factors combined.
Step 1: Decide What Matters to You
(Weights)
Conclusion: Which Job is Better?
• Job A utility = 83
• Job B utility = 88
Job B is the better choice, even though it pays less, because:
• You save commute time (energy, stress)
• You enjoy a better work-life balance
• The overall happiness (utility) is higher.
Solving Markov Decision Processes (MDPs)
• Markov Decision Processes (MDPs) are mathematical models used for
decision-making in environments where outcomes are partly
random and partly under the control of an agent.
• To solve an MDP, the goal is to find an optimal policy — a mapping
from states to actions that maximizes the expected total reward over
time.
There are two classical and widely used dynamic programming
techniques to solve MDPs:
• Value Iteration
• Policy Iteration
1. Value Iteration
2. Policy Iteration
Step-by-Step Process:
Thus, the agent must rely on the partial information it receives to estimate the full state and make decisions accordingly.
Challenges of a POMDP
The main challenge in a POMDP is that the agent doesn't have direct access to the state of the system, so it
must infer the state from the observations it receives. This introduces complexity in decision-making, and the agent has
to rely on its belief state (an internal representation of what it believes the actual state to be).
This process involves:
Belief Update: When the agent receives a new observation, it updates its belief about the state. The agent's belief
evolves as new observations come in and actions are taken.
Partially Observable Decision Making: The agent must optimize its decisions based not on the state itself, but on the
belief about the state.
Solving POMDPs
Solving POMDPs is much harder than solving MDPs, due to the complexity of dealing with belief states. The typical
solution involves:
1. Belief Update:
Updating the belief state after each action and observation.
2. Value Iteration or Approximation:
Iterating over possible actions, observations, and beliefs to find the optimal policy, though solving POMDPs exactly is
often computationally intractable, so approximation methods are commonly used.
Supervised vs Unsupervised Learning:
Supervised Machine Learning
• In Supervised learning, you train the machine using data which is
well "labeled." It means some data is already tagged with the correct
answer.
• It can be compared to learning which takes place in the presence of a
supervisor or a teacher.
• A supervised learning algorithm learns from labeled training data, helps
you to predict outcomes for unforeseen data.
• Successfully building, scaling, and deploying accurate supervised machine
learning Data science model takes time and technical expertise from a
team of highly skilled data scientists.
Unsupervised Learning
• Unsupervised learning is a machine learning technique, where you do not
need to supervise the model.
• Instead, you need to allow the model to work on its own to discover
information. It mainly deals with the unlabelled data.
• Unsupervised learning algorithms allow you to perform more complex
processing tasks compared to supervised learning. Although,
unsupervised learning can be more unpredictable compared with other
natural learning deep learning and reinforcement learning methods.
How Unsupervised Learning works?
Let's, take the case of a baby and her family dog.
She knows and identifies this dog. A few weeks later a family friend brings along
a dog and tries to play with the baby.
Baby has not seen this dog earlier. But it recognizes many features (2 ears, eyes,
walking on 4 legs) are like her pet dog. She identifies a new animal like a dog. This
is unsupervised learning, where you are not taught but you learn from the data
(in this case data about a dog.) Had this been supervised learning, the family
friend would have told the baby that it's a dog.
Types of Supervised Machine Learning Techniques
Regression:
Regression technique predicts a single output value using training data.
Example: You can use regression to predict the house price from training data.
The input variables will be locality, size of a house, etc.
Classification:
Classification means to group the output inside a class. If the algorithm tries to
label input into two distinct classes, it is called binary classification. Selecting
between more than two classes is referred to as multiclass classification.
Clustering
Association
Association rules allow you to establish associations amongst data objects inside
large databases. This unsupervised technique is about discovering exciting
relationships between variables in large databases. For example, people that buy
a new home most likely to buy new furniture.
Other Examples:
Process In a supervised learning model, input In unsupervised learning model, only input
and output variables will be given. data will be given
Input Data Algorithms are trained using labeled Algorithms are used against data which is not
data. labeled
Algorithms Support vector machine, Neural Unsupervised algorithms can be divided into
Used network, Linear and logistics different categories: like Cluster algorithms, K-
regression, random forest, and means, Hierarchical clustering, etc.
Classification trees.
Use of Data Supervised learning model uses Unsupervised learning does not use output
training data to learn a link between data.
the input and the outputs.
Accuracy of Highly accurate and trustworthy Less accurate and trustworthy method.
Results method.
Real Time Learning method takes place offline. Learning method takes place in real time.
Learning
Main Classifying big data can be a real You cannot get precise information regarding
Drawback challenge in Supervised Learning. data sorting, and the output as data used in
unsupervised learning is labeled and not
known.
🔍 What is Direct Utility Estimation?
Direct Utility Estimation is a simple approach used to learn the utility (value) of each state in a Markov Decision Process
(MDP), by observing rewards while following a fixed policy.
In this method:
We don’t build a model of the environment (like state transitions or reward functions).
We don’t change the policy.
We just observe episodes, note what rewards we get, and average those rewards for each state.
🧠 What is Utility?
In RL, the utility of a state means:
👣 Step-by-step Process:
1. Initialize:
Create a table to store total rewards observed from each state.
Keep count of how many times each state is visited.
2. Run multiple episodes:
Each episode is a sequence of states and rewards (e.g., s1 , s2 , ..., sn ).
i=1
where:
N (s) = number of times state s was visited
Gi = total reward observed from state s in the ith visit
Start → S1 → S2 → Goal
1 S0 → S1 → S2 → Goal 10
2 S0 → S1 → S2 → Goal 8
3 S0 → S1 → S2 → Goal 9
Then:
10 + 8 + 9
U (S1) = =9
3
✅ Key Characteristics:
Feature Description
⚠️ Disadvantages:
Inefficient for large or complex environments.
Takes a long time to converge if some states are visited rarely.
Only works with fixed policy – cannot improve policy.