KEMBAR78
AI Notes Serialized | PDF | Artificial Intelligence | Intelligence (AI) & Semantics
0% found this document useful (0 votes)
33 views161 pages

AI Notes Serialized

Uploaded by

Satyam Chauraia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views161 pages

AI Notes Serialized

Uploaded by

Satyam Chauraia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

What is Artificial Intelligence?

• John McCarthy, who coined the term Artificial Intelligence in 1956, defines it as
"the science and engineering of making intelligent machines", especially
intelligent computer programs.
• Artificial Intelligence (AI) is the intelligence of machines and the branch of
computer science that aims to create it.
• Intelligence is the computational part of the ability to achieve goals in the world.
Varying kinds and degrees of intelligence occur in people, many animals and
some machines.
• AI is the study of the mental faculties through the use of computational models.
• AI is the study of: How to make computers do things which, at the moment,
people do better.
• AI is the study and design of intelligent agents, where an intelligent agent is a
system that perceives its environment and takes actions that maximize its chances
of success.

Meaning of Artificial:

• Made by human skill, produced by humans (Opposed to natural).


• Simulated; Imitation: artificial vanila flavouring.
• Lacking naturalness: an artificial smile.
• Jewellery manufactured to resemble a natural gem, in chemical composition
or appearance.
Meaning of Intelligence:

Ability to:
✓ Acquire
✓ Understand
✓ Apply Knowledge
✓ Exercise thoughts
✓ Reason

Approaches to AI

Hard or Strong AI
• Generally, artificial intelligence research aims to create AI that can replicate
human intelligence completely.
• Strong AI refers to a machine that approaches or supersedes human intelligence,
◊ If it can do typically human tasks,
◊ If it can apply a wide range of background knowledge and
◊ If it has some degree of self-consciousness.
• Strong AI aims to build machines whose overall intellectual ability is
indistinguishable from that of a human being.
Soft or Weak AI
• Weak AI refers to the use of software to study or accomplish specific problem
solving or reasoning tasks that do not encompass the full range of human
cognitive abilities.
▪ Example: a chess program such as Deep Blue.
▪ Weak AI does not achieve self-awareness; it demonstrates wide range of human-
level cognitive abilities; it is merely an intelligent, a specific problem-solver.

Goals of AI
• Systems that think like humans
• Systems that think rationally
• Systems that act like humans
• Systems that act rationally

Human Like Rationally

Think (1) Cognitive science approach (2) Laws of thought approach

Act (3) Turing Test Approach (4) Rational Agent Approach

General AI Goal
• Replicate human intelligence: still a distant goal.
• Solve knowledge intensive tasks.
• Make an intelligent connection between perception and action.
• Enhance human-human, human-computer and computer to computer
• interaction / communication.
Engineering based AI Goal
• Develop concepts, theory and practice of building intelligent machines
• Emphasis is on system building.
Science based AI Goal
• Develop concepts, mechanisms and vocabulary to understand biological
• intelligent behaviour.
• Emphasis is on understanding intelligent behaviour.

Turing Test: Act Human-like

3 rooms contain: a person, a computer, and an interrogator.


• The interrogator can communicate with the other 2 by teletype (to avoid the
machine imitate the appearance or voice of the person).
• The interrogator tries to determine which is the person and which is the
machine.
• The machine tries to fool the interrogator to believe that it is the human, and
the person also tries to convince the interrogator that it is the human.
• If the machine succeeds in fooling the interrogator, then conclude that the
machine is intelligent.
▪ Goal is to develop systems that are human-like.

History of AI

Year Milestone / Innovation

Karel Čapek play named “Rossum's Universal Robots” (RUR) opens in


1923
London, first use of the word "robot" in English.

1943 Foundations for neural networks laid.

1945 Isaac Asimov, a Columbia University alumni, coined the term Robotics.

Alan Turing introduced Turing Test for evaluation of intelligence and


1950 published Computing Machinery and Intelligence. Claude Shannon
published Detailed Analysis of Chess Playing as a search.

John McCarthy coined the term Artificial Intelligence. Demonstration of the


1956
first running AI program at Carnegie Mellon University.

1958 John McCarthy invents LISP programming language for AI.


Danny Bobrow's dissertation at MIT showed that computers can
1964 understand natural language well enough to solve algebra word problems
correctly.

Joseph Weizenbaum at MIT built ELIZA, an interactive problem that carries


1965
on a dialogue in English.

Scientists at Stanford Research Institute Developed Shakey, a robot,


1969
equipped with locomotion, perception, and problem solving.

The Assembly Robotics group at Edinburgh University built Freddy, the


1973 Famous Scottish Robot, capable of using vision to locate and assemble
models.

The first computer-controlled autonomous vehicle, Stanford Cart, was


1979
built.

1985 Harold Cohen created and demonstrated the drawing program, Aaron.

Major advances in all areas of AI −

• Significant demonstrations in machine learning


• Case-based reasoning
• Multi-agent planning
1990
• Scheduling
• Data mining, Web Crawler
• natural language understanding and translation
• Vision, Virtual Reality
• Games

The Deep Blue Chess Program beats the then world chess champion, Garry
1997
Kasparov.

Interactive robot pets become commercially available. MIT displays Kismet,


2000 a robot with a face that expresses emotions. The robot Nomad explores
remote regions of Antarctica and locates meteorites.

Goals of AI
• To Create Expert Systems − The systems which exhibit intelligent behavior,
learn, demonstrate, explain, and advice its users.

• To Implement Human Intelligence in Machines − Creating systems that


understand, think, learn, and behave like humans.
Applications of AI
AI has been dominant in various fields such as −

• Gaming − AI plays crucial role in strategic games such as chess, poker, tic-tac-
toe, etc., where machine can think of large number of possible positions based
on heuristic knowledge.

• Natural Language Processing − It is possible to interact with the computer that


understands natural language spoken by humans.

• Expert Systems − There are some applications which integrate machine,


software, and special information to impart reasoning and advising. They
provide explanation and advice to the users.

• Vision Systems − These systems understand, interpret, and comprehend


visual input on the computer. For example,

o A spying aeroplane takes photographs, which are used to figure out


spatial information or map of the areas.

o Doctors use clinical expert system to diagnose the patient.

o Police use computer software that can recognize the face of criminal
with the stored portrait made by forensic artist.

• Speech Recognition − Some intelligent systems are capable of hearing and


comprehending the language in terms of sentences and their meanings while
a human talks to it. It can handle different accents, slang words, noise in the
background, change in human’s noise due to cold, etc.

• Handwriting Recognition − The handwriting recognition software reads the


text written on paper by a pen or on screen by a stylus. It can recognize the
shapes of the letters and convert it into editable text.

• Intelligent Robots − Robots are able to perform the tasks given by a human.
They have sensors to detect physical data from the real world such as light,
heat, temperature, movement, sound, bump, and pressure. They have efficient
processors, multiple sensors and huge memory, to exhibit intelligence. In
addition, they are capable of learning from their mistakes and they can adapt
to the new environment.
Difference between Human and Machine Intelligence
• Humans perceive by patterns whereas the machines perceive by set of rules
and data.

• Humans store and recall information by patterns, machines do it by searching


algorithms. For example, the number 40404040 is easy to remember, store, and
recall as its pattern is simple.

• Humans can figure out the complete object even if some part of it is missing or
distorted; whereas the machines cannot do it correctly.

What can AI systems NOT do yet?


• Understand natural language robustly (e.g., read and understand articles in a
newspaper)
• Surf the web
• Interpret an arbitrary visual scene
• Learn a natural language
• Construct plans in dynamic real-time domains
• Exhibit true autonomy and intelligence

Heuristics

Heuristics are simple, efficient rules;


■ Heuristics are in common use as Rule of thumb;
In computer science, a heuristic is an algorithm with provably good run times
and with provably good or optimal solution.
■ Heuristics are intended to gain computational performance or conceptual
simplicity, potentially at the cost of accuracy or precision.
■ People use heuristics to make decisions, come to judgments, and solve
problems, when facing complex problems or incomplete information.
These rules work well under most circumstances.
■ In AI programs, the heuristic functions are:
◊ used to measure how far a node is from goal state.
◊ used to compare two nodes, find if one is better than the other.

What are problem solving, search and control strategies ?


• Problem solving is fundamental to many AI-based applications.
There are two types of problems.
− The Problems like, computation of the sine of an angle or the square root of a value.
These can be solved through the use of deterministic procedure and the success is
guaranteed.
− In the real world, very few problems lend themselves to straightforward solutions.
Most real-world problems can be solved only by searching for a solution.
AI is concerned with these types of problems solving.
• Problem solving is a process of generating solutions from observed data.
− a problem is characterized by a set of goals,
− a set of objects, and
− a set of operations.
These could be ill-defined and may evolve during problem solving.
• Problem space is an abstract space.
− A problem space encompasses all valid states that can be generated by
the application of any combination of operators on any combination of
objects.
− The problem space may contain one or more solutions.
Solution is a combination of operations and objects that achieve the goals.
• Search refers to the search for a solution in a problem space.
− Search proceeds with different types of search control strategies.
− The depth-first search and breadth-first search are the two common
search strategies.

To build a system to solve a particular problem, we need to

• Define the problem precisely – find input situations as well as final situations for
acceptable solution to the problem.
• Analyse the problem – find few important features that may have impact on the
appropriateness of various possible techniques for solving the problem.
• Isolate and represent task knowledge - necessary to solve the problem
• Choose the best problem-solving technique(s) and apply to the particular
problem.

Problem Description
A problem consists of the description of:
− current state of the world,
− actions that can transform one state of the world into another,
− desired state of the world.
◊ State space is defined explicitly or implicitly
A state space should describe everything that is needed to solve a
problem and nothing that is not needed to the solve the problem.
◊ Initial state is start state
◊ Goal state is the conditions it has to fulfill
− A description of a desired state of the world;
− The description may be complete or partial.
◊ Operators are to change state
− Operators do actions that can transform one state into another.
− Operators consist of : Preconditions and Instructions;
􀂃 Preconditions provide partial description of the state of the world
that must be true in order to perform the action,
􀂃 Instructions tell on how to create next state.
− Operators should be as general as possible, to reduce their number.
◊ Elements of the domain has relevance to the problem
− Knowledge of the starting point.
◊ Problem solving is finding solution
− Finding an ordered sequence of operators that transform the current
(start) state into a goal state;
◊ Restrictions are solution quality any, optimal, or all
− Finding the shortest sequence, or
− Finding the least expensive sequence defining cost, or
− Finding any sequence as quickly as possible.
Search and Control Strategies

Word "Search" refers to the search for a solution in a problem space.


− Search proceeds with different types of "Search Control strategies".
− A strategy is defined by picking the order in which the nodes expand.
The Search strategies are evaluated in the following dimensions:
Completeness, Time complexity, Space complexity, Optimality.

Evaluating Search strategies


We will look at various search strategies and evaluate their problem solving
performance. What are the characteristics of the different search algorithms and what
is their efficiency? We will look at the following three factors to measure this.
1. Completeness: Is the strategy guaranteed to find a solution if one exists?

2. Optimality: Does the solution have low cost or the minimal cost?

3. What is the search cost associated with the time and memory required to find
a solution?
a. Time complexity: Time taken (number of nodes expanded) (worst or
average case) to find a solution.
b. Space complexity: Space used by the algorithm measured in terms of
the maximum size of fringe

Production System
• A set of rules, left side determines the applicability of the rule and right side
determines the operation to be performed, if rule is applied.
• One or more knowledge/databases that contain whatever information is
appropriate for particular task.
• A control strategy that specifies the order in which the rules will be compared
to the database and a way of resolving the conflicts that arise when several rules
match at once.
• A rule applier

Problem Characteristics

1. Is the problem decomposable into a set of (nearly) independent smaller or


easier sub problems?
Decomposable Problems:
• Recursive integration program.
• Solve large problems
Non Decomposable Problems:
• Blocks World Game
• Two sub problems are not independent. They interact and those
interactions must be considered in order to arrive at solution for the
entire problem.

A decomposable Problem
2. Can solution steps be ignored or at least undone if they prove
unwise?
Three kind of problems:
• Ignorable (e.g. Theorem Proving), in which solution steps can be
ignored.
• Recoverable (e.g. 8-puzzle), in which solution steps can be undone.
• Irrecoverable (e.g. Chess), in which solution steps can not be undone.
3. Is the problem’s universe predictable?
Certain–outcome problems (8-puzzle) the result of an action can be predicted
perfectly.
• Planning can be used to generate a sequence of operators that is
guaranteed to lead to a solution.
Un-certain outcome Problems (e.g. Bridge)
• Planning can at best generate a sequence of operators that has a good
probability of leading to a solution.
• Hardest type of problem to solve is irrecoverable, un-certain outcome.
4. Is the good solution Absolute and relative?
5. Is a large amount of knowledge absolutely required to solve the problem, or
is knowledge important only to constrain the search?
Chess Game: Little knowledge is required.
• Additional knowledge helps to constrain search and speed up the execution
of the program.
• Newspaper Story Understanding: Lot Knowledge is required even to be
able to recognize a solution.
6. Can a computer that is simply given the problem return the solution, or will
the solution of the problem require interaction between the computer and a
person?
Two Types of problems:

Solitary: in which computer is given a problem description and produces an


answer with no intermediate communication and with no demand for an
explanation of reasoning process.

Conversational: in which there is intermediate communication between a


person and the computer.
7. Is the desired solution a state of the world or a path to a state?

Issues in design of search Programs


Every search process can be viewed as a traversal of tree structure in which each node
represents a problem state and each arc represents a relationship between the states
represented by the nodes it connects.
• The direction in which to conduct the search: Forward versus Backward
Reasoning
• How to select the rules: Matching
• How to represent each node of the search process: Knowledge Representation
Problem

Search Algorithms

Many traditional search algorithms are used in AI applications. For complex


problems, the traditional algorithms are unable to find the solution within some
practical time and space limits. Consequently, many special techniques are developed,
using heuristic functions.
The algorithms that use heuristic functions are called heuristic algorithms.
− Heuristic algorithms are not really intelligent; they appear to be intelligent
because they achieve better performance.
− Heuristic algorithms are more efficient because they take advantage of
feedback from the data to direct the search path.
− Uninformed search algorithms or Brute-force algorithms search, through
the search space, all possible candidates for the solution checking whether
each candidate satisfies the problem's statement.
− Informed search algorithms use heuristic functions, that are specific to the
problem, apply them to guide the search through the search space to try
to reduce the amount of time spent in searching.
A good heuristic can make an informed search dramatically out-perform any
uninformed search. For example, the Traveling Salesman Problem (TSP)
where the goal is to find a good solution instead of finding the best solution.
In TPS like problems, the search proceeds using current information about
the problem to predict which path is closer to the goal and follow it, although
it does not always guarantee to find the best possible solution. Such
techniques help in finding a solution within reasonable time and space.
Some prominent intelligent search algorithms are stated below.
1. Generate and Test Search 4. A* Search
2. Best-first Search 5. Constraint Search
3. Greedy Search 6. Means-ends analysis
There are more algorithms, either an improvement or combinations of these.

Hierarchical Representation of Search Algorithms


A representation of most search algorithms is illustrated below. It begins
with two types of search - Uninformed and Informed.
Uninformed Search: Also called blind, exhaustive or brute-force search, uses no
information about the problem to guide the search and therefore may not be very
efficient.
Informed Search: Also called heuristic or intelligent search, uses information about the
problem to guide the search, usually guesses the distance to a goal state and therefore
efficient, but the search may not be always possible.

Generate and Test


Algorithm

1. Generate a possible solution.

2. Test to see if this is actually a solution.

3. Quit if a solution has been found.

Otherwise, return to step 1.

It is known as the British Museum algorithm in reference to a method of finding


object in the British Museum by wandering around.

This procedure could lead to an eventual solution within a short period of time if
done systematically.
Inefficient for problems with large space.

Acceptable for simple problems.

For a simple problems, exhaustive generate-and-test is often a reasonable technique.

For problems much harder than this, even heuristic generate-and-test, is not very
effective technique.

It is better to be combined with other techniques to restrict the space in which to


search even further, the technique can be very effective

Hill Climbing
• Searching for a goal state = Climbing to the top of a hill

• Generate-and-test + direction to move.

• Heuristic function to estimate how close a given state is to a goal state.

• Heuristic function as a way to inject task-specific knowledge into the control


process.

• A Heuristic is a function that, when applied to a state, returns a number that


is an estimate of the merit of the state, with respect to the goal.

• In other words, the heuristic tells us approximately how far the state is from
the goal state

Steepest-Ascent Hill Climbing (Gradient Search)


• Considers all the moves from the current state.

• Selects the best one as the next state.


Hill Climbing: Disadvantages
Local maximum
A state that is better than all of its neighbours, but not better than some other states
far away.
Plateau
A flat area of the search space in which all neighbouring states have the same value.
Ridge
The orientation of the high region, compared to the set of available moves, makes it
impossible to climb up. However, two moves executed serially may increase the
height

Ways Out
• Backtrack to some earlier node and try going in a different direction.

• Make a big jump to try to get in a new section.

• Moving in several directions at once.


A* Algorithm
The heuristic function for a node N is defined as follows:

F(N)=g(N)+h(N)

g is a measure of the cost of getting from the start node to the current node N.
Evaluation function
• The choice of evaluation function critically determines search results.
• Consider Evaluation function
f (X) = g (X) + h(X)
– h (X) = the number of tiles not in their goal
position in a given state X
– g(X) = depth of node X in the search tree
• For Initial node
– f(initial_node) = 4
• Apply A* algorithm to solve it.
Start State
Search Tree f = 0+4
3 7 6
5 1 2
4 8

up left right
(1+3) (1+5) (1+5)
3 7 6 3 7 6 3 7 6
5 2 5 1 2 5 1 2
4 1 8 4 8 4 8

up left right
(2+3) (2+3) (2+4)
3 6 3 7 6 3 7 6
5 7 2 5 2 5 2
4 1 8 4 1 8 4 1 8

left right
(3+2) (3+4)
3 6 3 6
5 7 2 5 7 2
4 1 8 4 1 8
down
(4+1)
5 3 6 right 5 3 6 Goal
7 2 7 2 State
4 1 8 4 1 8
Note: Breadth First Search, Depth First Search from class
lecture
And all other Problems done in Lectures and Tutorials.

For details of all the topics refer books:


Text Books:
1. ARTIFICIAL INTELLIGENCE by RICH, KNIGHT, TATA MCGRAW HILL, 3rd Edition,
(2009)

References:
1. ARTIFICIAL INTELLIGENCE: A MODERN APPROACH by STUART RUSSEL, PETER NORVIG, PEARSON, 2nd Edition,
(2012)
2. ARTIFICIAL INTELLIGENCE by SAROJ KAUSHIK, CENGAGE LEARNING, 1st Edition, (2011)
Intelligent Agents
Agent
An Agent is anything that can be viewed
as perceiving its environment through
sensors and acting upon that
environment through actuators.
• A human agent has eyes, ears, and
other organs for sensors and hands,
legs, mouth, and other body parts for
actuators.

• A robotic agent might have cameras


and infrared range finders for sensors
and various motors for actuators.

• A software agent receives keystrokes,


file contents, and network packets as
sensory inputs and acts on the
environment by displaying on the
screen, writing files, and sending
network packets.
Agents and Environment
Percept: refers to the agent's perceptual inputs at any given instant.
Percept Sequence: An agent's percept sequence is the complete history of
everything the agent has ever perceived.
Agent function: An agent's behavior is described by the agent function
that maps any given percept sequence P* to an Action (A).
[f: P* 🡪 A]
Agent program: The agent function for an artificial agent will be
implemented by an agent program. It runs on the physical architecture to
produce f .

Agent= Architecture + Program


• The job of AI is to design an agent program that implements the agent
function—the mapping from percepts to actions.
It is important to keep these two ideas distinct.
• The agent function is an abstract mathematical description; the agent
program is a concrete implementation.
Example: Vacuum Cleaner World
• Consists of a robotic vacuum-cleaning
agent in a world consisting of squares
(Locations) that can be either dirty or
clean.

• Agent can move left or right and clean


the location that it moves into.

• The vacuum agent perceives which


square it is in and whether there is dirt
in the square.

• Different versions of the vacuum


cleaner world allow for different rules
about what the agent can perceive and
whether its actions always succeed.

• What is the right way to fill out the


table?
The Concept of Rationality
• Rational agent that does the right thing— every entry in the table for the agent
function is filled out correctly and right actions will eventually lead to success.

• How to quantify agent is doing the right thing?

• Performance measure – An objective criteria for agents success.

• For example : performance measure (vacuum cleaner)—amount of dirt cleaned,


amount of time taken, economy etc.
– A rational agent can maximize this performance measure by cleaning up the dirt,
then dumping it all on the floor, then cleaning it up again, and so on.
– more suitable performance measure would reward agent for having a clean floor.

• Rational Action: any action that maximizes the expected value of the performance
measure given the percept sequence.

• As a general rule, it is better to design performance measures according to what one


actually wants in the environment, rather than according to how one thinks the agent
should behave.
Being rational at any given time depends on four things:

PEAS(Performance, Environment, Actuators, Sensors)


• The performance measure that defines the criterion of success.
• The agent’s prior knowledge of the environment.
• The actions that the agent can perform.
• The agent’s percept sequence to date.

A rational agent can be defined:


For each possible percept sequence, a rational agent should select an action that is
expected to maximize its performance measure, given the evidence provided by the
percept sequence and whatever built-in knowledge the agent has.

• Rationality is distinct from omniscience (all-knowing with infinite knowledge).

• Agents can perform actions in order to modify future percepts so as to obtain useful
information (information gathering, exploration)

• An agent is autonomous if its behavior is determined by its own experience (with ability
to learn and adapt)
Task Environment
• To design a rational agent we must specify its task
environment.
• PEAS description of the task environment –Example:
Automated Taxi Driving.
Properties of Task Environments

❑ Fully observable vs. Partially observable:


• An environment is fully observable when the sensors can detect all aspects that
are relevant to the choice of action.

• An environment is partially observable when some parts of the state are missing
from the sensor data.

Example: A vacuum agent with only a local dirt sensor cannot tell whether the
other squares are dirty.

❑ Deterministic vs. Stochastic:


• If the next state of the environment is completely determined by the current state
and the action executed by the agent, the environment is deterministic ,
otherwise stochastic. (If the environment is deterministic except for the actions of
other agents, then the environment is strategic- like playing chess).

Example: Taxi driving is stochastic, because one can never exactly predict the
behavior of the traffic.
❑ Episodic vs. Sequential:
• In Episodic environment, the agent's experience is divided into
atomic "episodes" (each episode consists of the agent perceiving
and then performing a single action), and the choice of action in
each episode depends only on the episode itself.
Example: agent spotting defective parts on the assembly line.
• In Sequential environments, current decision could affect future
decisions.
Example: playing chess
❑ Static vs. Dynamic:
• If the environment can change while the agent is choosing an
action, the environment is dynamic, otherwise static.
• If the environment itself doesn’t change, but the agents
performance score does, then it is semidynamic.
Example: Crossword – static, Autonomous Driving– dynamic,
playing chess – semi dynamic,
❑ Discrete vs. Continuous:
• Discrete environment involve a limited number of distinct,
clearly defined percepts and actions; Otherwise Continuous.
Example: Chess playing –Discrete
Autonomous Driving- Continuous

❑ Single agent vs. Multiagent:


• An agent operating by itself in an environment is Single
agent.
Example: Agent solving Crossword puzzle by itself

• If the environment contains other agents who are also


maximizing some performance measure that depends on the
current agents actions, then it is a Multiagent.
Examples: Task Environments
Structure of Agents
• All agents have same skeleton i.e. input(percepts),output(actions) , program
(work on input to produce output).
• Agent = Architecture + Program.
Example: Driverless Cars
Architecture: any physical components: sensors, actuators, onboard computer
etc.
Program: Software used to recommend actions for the actuators.
❑ Take the current percept as input from the sensors and return an action.
❑ Agent program is different from agent function (which takes the entire
percept history).
TABLE-DRIVEN AGENT PROGRAM
🏳 Function: TABLE-DRIVEN-AGENT (percept) returns an action
static: percepts, a sequence, initially empty
table, a table of actions, indexed by percept sequences, initially
fully specified.

append percept to the end of percepts action← Lookup


(percepts, table) return action
🏳P = the set of possible percepts
🏳 T= lifetime of the agent
🏳 The total number of percepts it receives
🏳 Size of the look up table

🏳 Consider playing chess


🏳P =10, T=150
🏳 Will require a table of at least 10150 entries

🏳 The key challenge of AI


🏳 Find out how to write programs that, to the extent possible, produce rational
behavior from small amount of code rather than from a large number of table
entries
Agent Types
There are four basic kind of agent programs:
❑ Simple reflex Agents
❑ Model-based Agents
❑ Goal -based Agents
❑ Utility based Agents
❑ Learning Agents
Simple Reflex Agents
• These agents select actions on
the basis of the current percept
and ignore the percept history.
Example: Vacuum agent – its
decision is based on the current
location and whether that
location contains dirt(status).
• Implemented through rules
– if dirty then suck dirt.
• They are rational only if a
correct decision is made only on
the basis of current precept.
• Work in a fully observable
environment.
Function: SIMPLE-REFLEX-AGENT (percept) returns an action
function REFLEX-VACUUM-AGENT([location,status])
static: rules- a set of condition-action rules. returns an
action
state 🡨 Interpret-Input (percept)
if status = Dirty then return Suck
rule 🡨if Rule-Match
else location (state,rules)
= A then return Right
else if location = B then return Left
Action🡨 Rule-Action[rule]
return action
Model-based reflex
• They use a model of the world to choose
their actions.
• They maintain an internal state.
• Model: It’s knowledge about “how things
happen within the world,” so it’s called a
Model-based agent.
• Internal State: It’s a representation of the
present state-based on percept history.
• Updating the state requires the
information about −
– How the world evolves.
– How the agent’s actions affect the
world.
• Work in a partially observable environment.

function REFLEX-AGENT-WITH-STATE(percept ) returns an action


persistent: rules, a set of condition–action rules
State←UPDATE-INPUT(state, percept, action )
rule←RULE-MATCH(state, rules)
action ←RULE.ACTION(rule)
return action
Goal based Agents

• Is also a model based agent.


• Knowing something about the
current state of the environment
is not sufficient to decide what to
do.
• They choose their actions to be
performed in order to achieve
goals (expected outcome).
• Incorporates search and planning.
• It is more flexible because the
knowledge that supports its
decisions is represented explicitly
and can be modified.
Utility based agent
• Goals alone are not enough to
generate high-quality behavior
in most environments.
• They choose actions based on
a preference (utility) for each
state.
• A utility function maps a state
onto a real number(that
describes the degree of
happiness).
Learning based Agents
• A learning agent in AI is the type of
agent which can learn from its past
experiences, or it has learning
capabilities.

• It starts to act with basic knowledge


and then able to act and adapt
automatically through learning.
END
Review of Tree and Graph Structures

1. Tree Structures
A tree is a hierarchical data structure composed of nodes, where each node has a
parent-child relationship with others. Trees are widely used in databases, artificial
intelligence, computer networks, and more.

Key Characteristics of Trees:

• Hierarchical Structure: Nodes are arranged in a hierarchical manner.


• One Root Node: A tree has a single root node from which all other nodes
descend.
• Parent-Child Relationship: Each node (except the root) has exactly one parent
but can have multiple children.
• No Cycles: Trees do not contain cycles (loops).

Types of Trees:

1. Binary Tree: Each node has at most two children (left and right).
2. Binary Search Tree (BST): A binary tree where the left child contains smaller
values and the right child contains larger values.
3. Balanced Trees (AVL, Red-Black Tree): Trees that maintain balance to
optimize search operations.
4. Heap: A complete binary tree used in priority queues (Min-Heap and Max-
Heap).
5. B-Trees and B+ Trees: Used in databases for efficient indexing and searching.

Common Operations in Trees:

• Insertion: Adding a new node to the tree.


• Deletion: Removing a node while maintaining tree properties.
• Traversal: Visiting nodes in a specific order (Preorder, Inorder, Postorder,
Level Order).
• Searching: Finding a specific node based on value.

2. Graph Structures
A graph is a data structure that consists of nodes (vertices) and edges (connections)
between them. Graphs are used in networking, social media, route planning, AI, and
many other applications.

Key Characteristics of Graphs:

• Consists of Vertices and Edges: A graph is defined as G = (V, E) where V


represents vertices and E represents edges.
• Can be Directed or Undirected:
o Directed Graph (Digraph): Edges have a direction (e.g., one-way
streets).
o Undirected Graph: Edges do not have a direction (e.g., friendships on
social media).
• Can be Weighted or Unweighted:
o Weighted Graph: Edges have associated weights (e.g., road distances).
o Unweighted Graph: Edges have no weights (e.g., simple connections).
• Can Have Cycles or Be Acyclic:
o Cyclic Graph: Contains cycles.
o Acyclic Graph: Does not contain cycles.

Types of Graphs:

1. Adjacency Matrix Representation: Uses a 2D matrix to store edges between


vertices.
2. Adjacency List Representation: Uses lists to store neighboring nodes.
3. Connected Graph: Every vertex is reachable from any other vertex.
4. Disconnected Graph: Some vertices are not connected.
5. Complete Graph: Every pair of vertices is connected by an edge.
6. Bipartite Graph: Vertices can be divided into two sets with edges only between
different sets.
7. DAG (Directed Acyclic Graph): Used in scheduling problems and compiler
optimizations.

Common Operations in Graphs:

• Graph Traversal:
o Depth-First Search (DFS): Explores as deep as possible before
backtracking.
o Breadth-First Search (BFS): Explores all neighbors before going deeper.
• Shortest Path Algorithms:
o Dijkstra’s Algorithm: Finds the shortest path in weighted graphs.
o Bellman-Ford Algorithm: Works for graphs with negative weights.
o Floyd-Warshall Algorithm: Finds shortest paths between all pairs of
nodes.
• Minimum Spanning Tree (MST):
o Kruskal’s Algorithm and Prim’s Algorithm for finding MST.
• Topological Sorting: Used for scheduling tasks with dependencies in DAGs.
Comparison between Tree and Graph structures
Search Algorithms in AI
Search algorithms are fundamental techniques in Artificial Intelligence (AI) used for
problem-solving and decision-making. They help AI agents find paths, explore states, and
reach optimal solutions in domains like robotics, game playing, and navigation.

1. Types of Search Algorithms


Search algorithms can be broadly classified into:

1. Uninformed Search (Blind Search)


o No additional information about goal location is used.
o Examples: Random Search, Breadth-First Search (BFS), Depth-First
Search (DFS).
2. Informed Search (Heuristic Search)
o Uses heuristics (estimates) to find the best path.
o Examples: A Algorithm, Greedy Best-First Search.*

2. Random Search
Definition

Random search is a basic uninformed search technique where the AI randomly selects
possible moves without any specific strategy until the goal is found.

How It Works

1. Start at an initial state.


2. Choose a random move to transition to a new state.
3. Repeat until the goal state is reached.

Example: Solving a Maze with Random Search

• The AI moves in random directions (up, down, left, right).


• It keeps moving randomly until it finds the exit.

Advantages

• Simple to implement
• Works when no information is available
• Can be useful for stochastic environments

Disadvantages
• Inefficient—may take a long time to reach the goal.
• No guarantee of finding the optimal path.
Repeats states since it lacks memory.

3. Search with Closed and Open Lists


Some advanced search algorithms use Open List and Closed List to track visited states and
improve efficiency.

Definitions

• Open List: A set of nodes that have been discovered but not yet fully explored.
• Closed List: A set of nodes that have already been explored and will not be revisited.

A Search Algorithm (Using Open & Closed Lists)*

A* is an informed search algorithm that finds the shortest path using both:

1. g(n) → Cost from start node to current node.


2. h(n) → Heuristic estimate from current node to goal.
3. f(n) = g(n) + h(n) → Total estimated cost.

Steps:

1. Start with the Open List (contains only the start node).
2. Pick the node with the lowest f(n) value.
3. Move the node to the Closed List.
4. Expand its neighbors:
o If a neighbor is not in the Open List, add it.
o If a neighbor is already in Open List but has a better path, update it.
5. Repeat until the goal is found.

Example: Pathfinding in a Grid

Imagine an AI trying to find the shortest path in a maze:

• Open List: Nodes yet to be explored.


• Closed List: Nodes already explored.
• Heuristic (h(n)): The estimated distance from the node to the exit.

Advantages of Open & Closed Lists

• Prevents unnecessary re-exploration.


• Makes search more efficient.
• Helps in finding optimal paths.
Disadvantages

• Requires more memory.


• Processing overhead due to managing lists.

Comparison of Random Search vs. Open/Closed List Search

Feature Random Search Search with Open & Closed Lists


Efficiency Very low High
Memory Use Minimal More (stores explored nodes)
Path Optimality No guarantee Finds the best path
Complexity Very simple More advanced

1. Breadth-First Search (BFS)


Definition

BFS is an uninformed search algorithm that explores all nodes level by level before
moving to the next depth.

How BFS Works

1. Start at the root node (or any starting position).


2. Enqueue the root node into a queue (FIFO – First In, First Out).
3. Dequeue a node, explore its unvisited neighbors, and enqueue them.
4. Repeat until the goal node is found or all nodes are explored.

Example of BFS Traversal (Graph)


A
/ \
B C
/ \ \
D E F

BFS traversal order: A → B → C → D → E → F (level by level).

Example: Solving a Maze

• Imagine BFS is used to find the shortest path in a maze.


• BFS explores all possible routes evenly.
• It guarantees the shortest path in an unweighted graph.

Advantages of BFS
• Guaranteed to find the shortest path in an unweighted graph.
Explores all possible solutions before moving deeper.
Works well for small graphs.

Disadvantages of BFS

• Consumes a lot of memory (stores many nodes in the queue).


• Slow for large graphs with deep structures.

Time & Space Complexity of BFS

• Time Complexity: O(V + E) (V = number of vertices, E = number of edges).


• Space Complexity: O(V) (due to storing nodes in the queue).

2. Depth-First Search (DFS)


Definition

DFS is an uninformed search algorithm that explores a path as deep as possible before
backtracking.

How DFS Works

1. Start at the root node (or any starting position).


2. Push the root node into a stack (LIFO – Last In, First Out).
3. Pop a node, explore its unvisited neighbors, and push them into the stack.
4. Repeat until the goal node is found or all nodes are explored.

Example of DFS Traversal (Graph)


A
/ \
B C
/ \ \
D E F

DFS traversal order: A → B → D → E → C → F (deep-first before backtracking).

Example: Solving a Maze

• DFS follows one path until it reaches a dead end.


• It then backtracks and explores other paths.
• It does not guarantee the shortest path but is efficient for large spaces.

Advantages of DFS

• Consumes less memory than BFS (stores fewer nodes at a time).


• Good for large graphs where paths are very deep.
• Efficient when the solution is deep in the tree/graph.

Disadvantages of DFS

• May get stuck in infinite loops (without a depth limit or cycle detection).
• Does not guarantee the shortest path.
• Can be inefficient if many paths need to be explored before reaching the goal.

Time & Space Complexity of DFS

• Time Complexity: O(V + E) (similar to BFS).


• Space Complexity: O(V) (for recursive stack in worst case).

3. Comparison: DFS vs. BFS


Feature BFS (Breadth-First Search) DFS (Depth-First Search)
Data Structure Queue (FIFO) Stack (LIFO)
Traversal Level by Level Deep first, then backtrack
Order
Shortest Path? Yes (if all edges have equal No (may find a longer path)
weight)
Memory High (stores all frontier Low (stores only current path)
Usage nodes)
Best for Finding shortest paths Searching deep structures
Cycle Can detect cycles with a Needs additional checks to prevent
Detection visited list infinite loops
Works well in Shallow graphs or small Large, deep graphs
spaces
Game Search in Artificial Intelligence
Game search in AI refers to the methods used by computer programs to make decisions in
competitive environments, such as board games (e.g., Chess, Go) and video games. These
methods involve evaluating possible future moves, predicting opponent actions, and
selecting the best strategy.

Game search problems typically involve:

• Players: Competing entities (e.g., humans or AI agents).


• Moves: A set of possible actions for each player.
• Game Tree: A hierarchical representation of all possible moves.
• Payoff/Utility: A numerical value indicating the outcome of the game.

Example: Chess

In a chess game, the AI:

1. Analyzes possible future moves and opponent responses.


2. Evaluates different board positions to select the best move.
3. Predicts multiple turns ahead using search algorithms.

Types of Games in AI
Games in AI can be classified based on different criteria:

1. Deterministic vs. Stochastic Games


o Deterministic Games: No element of chance (e.g., Chess, Tic-Tac-Toe).
o Stochastic Games: Include randomness (e.g., Backgammon, Poker).
2. Perfect Information vs. Imperfect Information Games
o Perfect Information: All players have complete knowledge of the game state
(e.g., Chess, Checkers).
o Imperfect Information: Players have limited knowledge (e.g., Poker,
Battleship).
3. Zero-Sum vs. Non-Zero-Sum Games
o Zero-Sum Games: One player's gain is another player's loss (e.g., Chess, Tic-
Tac-Toe).
o Non-Zero-Sum Games: Multiple players can benefit simultaneously (e.g.,
Business negotiations).
Game Search Techniques
Game search involves exploring possible game states and selecting the optimal move. The
main techniques include:

1. Minimax Algorithm

The Minimax algorithm is a fundamental AI strategy for two-player games.

• Concept:
o One player is MAX (tries to maximize score).
o The other is MIN (tries to minimize MAX's score).
o The algorithm recursively evaluates future game states to choose the best
move.
• Steps:
1. Generate a game tree up to a certain depth.
2. Evaluate the game states using a heuristic function.
3. Use the minimax principle to backtrack and choose the optimal move.
• Example (Tic-Tac-Toe):
o MAX tries to get the highest possible score (win = +1).
o MIN tries to force MAX into the worst outcome (lose = -1).

Limitation:

• Computationally expensive for large game trees.

2. Alpha-Beta Pruning

Alpha-beta pruning is an optimization of the minimax algorithm that reduces the number of
nodes evaluated.

• Key Idea:
o If a branch cannot improve the current best choice, it is ignored (pruned).
o Uses two parameters:
▪ Alpha (α): Best value found for MAX.
▪ Beta (β): Best value found for MIN.
• Benefits:
o Speeds up minimax search.
o More efficient in deep game trees.
• Example:
o In Chess, if one move is found to be worse than an already examined move,
further evaluation is skipped.
fo
.in
rs
de
ea
yr
.m
Knowledge Representation
w
w
,w

Issues, Predicate Logic, Rules


ty
or
ab
kr
ha

How do we Represent what we know ?


C
C
R

• Knowledge is a general term.


An answer to the question, "how to represent knowledge", requires an
analysis to distinguish between knowledge “how” and knowledge “that”.

■ knowing "how to do something".


e.g. "how to drive a car" is a Procedural knowledge.
■ knowing "that something is true or false".
e.g. "that is the speed limit for a car on a motorway" is a Declarative
knowledge.

• knowledge and Representation are two distinct entities. They play a


central but distinguishable roles in intelligent system.

■ Knowledge is a description of the world.


It determines a system's competence by what it knows.
■ Representation is the way knowledge is encoded.
It defines a system's performance in doing something.

• Different types of knowledge require different kinds of representation.


The Knowledge Representation models/mechanisms are often based on:

◊ Logic ◊ Rules
◊ Frames ◊ Semantic Net

• Different types of knowledge require different kinds of reasoning.


03
fo
.in
rs
de
KR -Introduction

ea
yr
1. Introduction

.m
w
w
,w

Knowledge is a general term.


ty
or

Knowledge is a progression that starts with data which is of limited utility.


ab
kr
ha

ƒ By organizing or analyzing the data, we understand what the data means,


C
C
R

and this becomes information.


ƒ The interpretation or evaluation of information yield knowledge.
ƒ An understanding of the principles embodied within the knowledge is wisdom.

• Knowledge Progression
Organizing Interpretation Understanding
Data Information Knowledge Wisdom
Analyzing Evaluation Principles

Fig 1 Knowledge Progression

■ Data is viewed as collection of : Example : It is raining.


disconnected facts.
■ Information emerges when : Example : The temperature dropped 15
relationships among facts are degrees and then it started raining.

established and understood;


Provides answers to "who",
"what", "where", and "when".
■ Knowledge emerges when : Example : If the humidity is very high
relationships among patterns and the temperature drops substantially,

are identified and understood; then atmospheres is unlikely to hold the


moisture, so it rains.
Provides answers as "how" .
■ Wisdom is the pinnacle of : Example : Encompasses understanding
understanding, uncovers the of all the interactions that happen

principles of relationships that between raining, evaporation, air


currents, temperature gradients and
describe patterns.
changes.
Provides answers as "why" .

04
fo
.in
rs
de
KR -Introduction

ea
yr
• Knowledge Model (Bellinger 1980)
.m
w
w
,w

A knowledge model tells, that as the degree of “connectedness” and


ty
or

“understanding” increases, we progress from data through information and


ab
kr
ha

knowledge to wisdom.
C
C
R

Degree of
Connectedness

Wisdom

Understanding
principles
Knowledge

Understanding
patterns
Information

Understanding
relations
Degree of
Data
Understanding
Fig. Knowledge Model

The model represents transitions and understanding.


ƒ the transitions are from data, to information, to knowledge, and finally
to wisdom;
ƒ the understanding support the transitions from one stage to the next
stage.

The distinctions between data, information, knowledge, and wisdom are not
very discrete. They are more like shades of gray, rather than black and
white (Shedroff, 2001).
ƒ "data" and "information" deal with the past; they are based on the
gathering of facts and adding context.
ƒ "knowledge" deals with the present that enable us to perform.
ƒ "wisdom" deals with the future, acquire vision for what will be, rather
than for what is or was.

05
fo
.in
rs
de
KR -Introduction

ea
yr
• Knowledge Category
.m
w
w
,w

Knowledge is categorized into two major types: Tacit and Explicit.


ty
or
ab

ƒ term “Tacit” corresponds to "informal" or "implicit" type of knowledge,


kr
ha
C

ƒ term “Explicit” corresponds to "formal" type of knowledge.


C
R

Tacit knowledge Explicit knowledge


◊ Exists within a human being; ◊ Exists outside a human being;
it is embodied. it is embedded.

◊ Difficult to articulate formally. ◊ Can be articulated formally.

◊ Difficult to communicate or ◊ Can be shared, copied, processed


share. and stored.

◊ Hard to steal or copy. ◊ Easy to steal or copy

◊ Drawn from experience, ◊ Drawn from artifact of some type as


action, subjective insight. principle, procedure, process,
concepts.
(The next slide explains more about tacit and explicit knowledge).
06
fo
.in
rs
de
KR -Introduction

ea
yr
• Knowledge Type
.m
w
w
,w

Cognitive psychologists sort knowledge into Declarative and Procedural


ty
or

category and some researchers added Strategic as a third category.


ab
kr
ha
C

‡ About procedural knowledge, there is some disparity in views.


C
R

− One, it is close to Tacit knowledge, it manifests itself in the doing of some-


thing yet cannot be expressed in words; e.g., we read faces and moods.
− Another, it is close to declarative knowledge; the difference is that a task or
method is described instead of facts or things.
‡ All declarative knowledge are explicit knowledge; it is knowledge that

can be and has been articulated.


‡ The strategic knowledge is thought as a subset of declarative
knowledge.
Procedural knowledge Declarative knowledge
◊ Knowledge about "how to do ◊ Knowledge about "that
something"; e.g., to determine if something is true or false". e.g.,
Peter or Robert is older, first find A car has four tyres; Peter is
their ages. older than Robert;
◊ Focuses on tasks that must be ◊ Refers to representations of
performed to reach a particular objects and events; knowledge
objective or goal. about facts and relationships;
◊ Examples : procedures, rules, ◊ Example : concepts, objects,
strategies, agendas, models. facts, propositions, assertions,
semantic nets, logic and
descriptive models.
08
fo
.in
rs
de
KR -framework

ea
yr
1.1 Framework of Knowledge Representation (Poole 1998)

.m
w
w
,w

Computer requires a well-defined problem description to process and


ty
or

provide well-defined acceptable solution.


ab
kr
ha
C

To collect fragments of knowledge we need first to formulate a description


C
R

in our spoken language and then represent it in formal language so that


computer can understand. The computer can then use an algorithm to
compute an answer. This process is illustrated below.

Solve
Problem Solution

Represent Interpret Informal

Formal
Compute
Representation Output

Fig. Knowledge Representation Framework


The steps are
− The informal formalism of the problem takes place first.

− It is then represented formally and the computer produces an output.

− This output can then be represented in a informally described solution


that user understands or checks for consistency.

Note : The Problem solving requires


− formal knowledge representation, and
− conversion of informal knowledge to formal knowledge , that is
conversion of implicit knowledge to explicit knowledge.

10
fo
.in
rs
de
KR - framework

ea
yr
• Knowledge and Representation
.m
w
w
,w

Problem solving requires large amount of knowledge and some


ty
or

mechanism for manipulating that knowledge.


ab
kr
ha
C

The Knowledge and the Representation are distinct entities, play a


C
R

central but distinguishable roles in intelligent system.


− Knowledge is a description of the world;

it determines a system's competence by what it knows.


− Representation is the way knowledge is encoded;

it defines the system's performance in doing something.

In simple words, we :
− need to know about things we want to represent , and

− need some means by which things we can manipulate.

◊ know things ‡ Objects - facts about objects in the domain.


to represent
‡ Events - actions that occur in the domain.

‡ Performance - knowledge about how to do things

‡ Meta- - knowledge about what we know


knowledge

◊ need means ‡ Requires - to what we represent ;


to manipulate some formalism

Thus, knowledge representation can be considered at two levels :


(a) knowledge level at which facts are described, and
(b) symbol level at which the representations of the objects, defined in
terms of symbols, can be manipulated in the programs.

Note : A good representation enables fast and accurate access to


knowledge and understanding of the content.
11
fo
.in
rs
de
KR - framework

ea
yr
• Mapping between Facts and Representation
.m
w
w
,w

Knowledge is a collection of “facts” from some domain.


ty
or
ab

We need a representation of "facts" that can be manipulated by a program.


kr
ha
C

Normal English is insufficient, too hard currently for a computer program to


C
R

draw inferences in natural languages.


Thus some symbolic representation is necessary.

Therefore, we must be able to map "facts to symbols" and "symbols to


facts" using forward and backward representation mapping.

Example : Consider an English sentence

Reasoning
programs
Internal
Facts
Representation

English English
understanding generation

English
Representation

Facts Representations

◊ Spot is a dog A fact represented in English sentence

◊ dog (Spot) Using forward mapping function the


above fact is represented in logic

◊ ∀ x : dog(x) → hastail (x) A logical representation of the fact that


"all dogs have tails"

Now using deductive mechanism we can generate a new


representation of object :

◊ hastail (Spot) A new object representation

◊ Spot has a tail Using backward mapping function to


[it is new knowledge] generate English sentence
12
fo
.in
rs
de
KR - framework

ea
■ Forward and Backward Representation

yr
.m
w
w
,w

The forward and backward representations are elaborated below :


ty
or
ab
kr

Desired real
ha

Initial reasoning Final


C
C

Facts Facts
R

Forward Backward
representation representation
mapping mapping

Internal English
Representation Operated by Representation
program

‡ The doted line on top indicates the abstract reasoning process that a

program is intended to model.

‡ The solid lines on bottom indicates the concrete reasoning process

that the program performs.

13
fo
.in
rs
de
KR - framework

ea
yr
• KR System Requirements
.m
w
w
,w

A good knowledge representation enables fast and accurate access to


ty
or

knowledge and understanding of the content.


ab
kr
ha

A knowledge representation system should have following properties.


C
C
R

◊ Representational The ability to represent all kinds of knowledge


Adequacy
that are needed in that domain.

◊ Inferential Adequacy The ability to manipulate the representational


structures to derive new structure corresponding
to new knowledge inferred from old .

◊ Inferential Efficiency The ability to incorporate additional information


into the knowledge structure that can be used to
focus the attention of the inference mechanisms
in the most promising direction.

◊ Acquisitional The ability to acquire new knowledge using


Efficiency
automatic methods wherever possible rather than
reliance on human intervention.

Note : To date no single system can optimizes all of the above properties.
14
fo
.in
rs
de
KR - schemes

ea
yr
1.2 Knowledge Representation Schemes

.m
w
w
,w

There are four types of Knowledge representation :


ty
or

Relational, Inheritable, Inferential, and Declarative/Procedural.


ab
kr
ha
C

◊ Relational Knowledge :
C
R

− provides a framework to compare two objects based on equivalent

attributes.
− any instance in which two different objects are compared is a

relational type of knowledge.


◊ Inheritable Knowledge
− is obtained from associated objects.

− it prescribes a structure in which new objects are created which may

inherit all or a subset of attributes from existing objects.

◊ Inferential Knowledge
− is inferred from objects through relations among objects.

− e.g., a word alone is a simple syntax, but with the help of other

words in phrase the reader may infer more from a word; this
inference within linguistic is called semantics.

◊ Declarative Knowledge
− a statement in which knowledge is specified, but the use to which

that knowledge is to be put is not given.


− e.g. laws, people's name; these are facts which can stand alone, not

dependent on other knowledge;

Procedural Knowledge
− a representation in which the control information, to use the
knowledge, is embedded in the knowledge itself.
− e.g. computer programs, directions, and recipes; these indicate
specific use or implementation;

These KR schemes are detailed in next few slides


15
fo
.in
rs
de
KR - schemes

ea
yr
• Relational Knowledge :
.m
w
w
,w

This knowledge associates elements of one domain with another domain.


ty
or
ab

− Relational knowledge is made up of objects consisting of attributes and


kr
ha
C

their corresponding associated values.


C
R

− The results of this knowledge type is a mapping of elements among

different domains.

The table below shows a simple way to store facts.


− The facts about a set of objects are put systematically in columns.

− This representation provides little opportunity for inference.

Table - Simple Relational Knowledge

Player Height Weight Bats - Throws


Aaron 6-0 180 Right - Right
Mays 5-10 170 Right - Right
Ruth 6-2 215 Left - Left
Williams 6-3 205 Left - Right

‡ Given the facts it is not possible to answer simple question such as :


" Who is the heaviest player ? ".
but if a procedure for finding heaviest player is provided, then these
facts will enable that procedure to compute an answer.
‡ We can ask things like who "bats – left" and "throws – right".

16
fo
.in
rs
de
KR - schemes

ea
yr
• Inheritable Knowledge :
.m
w
w
,w

Here the knowledge elements inherit attributes from their parents.


ty
or
ab

The knowledge is embodied in the design hierarchies found in the


kr
ha
C

functional, physical and process domains. Within the hierarchy, elements


C
R

inherit attributes from their parents, but in many cases not all attributes of
the parent elements be prescribed to the child elements.
The inheritance is a powerful form of inference, but not adequate. The basic
KR needs to be augmented with inference mechanism.

The KR in hierarchical structure, shown below, is called “semantic network”


or a collection of “frames” or “slot-and-filler structure". The structure shows
property inheritance and way for insertion of additional knowledge.
Property inheritance : The objects or elements of specific classes inherit
attributes and values from more general classes. The classes are
organized in a generalized hierarchy.
Baseball knowledge
Person Right
− isa : show class inclusion handed
− instance : show class membership isa
Adult height
5.10
Male

isa height 6.1


EQUAL bats
Baseball
handed Player batting-average
0.252
isa isa
batting-average batting-average
0.106 Pitcher Fielder 0.262

instance instance

Chicago team Three Finger Pee-Wee- team Brooklyn-


Cubs Brown Reese Dodger

Fig. Inheritable knowledge representation (KR)

‡ The directed arrows represent attributes (isa, instance, team) originates


at object being described and terminates at object or its value.
‡ The box nodes represents objects and values of the attributes.
[Continued in the next slide]
17
fo
.in
rs
de
KR - schemes

ea
yr
• Inferential Knowledge :
.m
w
w
,w

This knowledge generates new information from the given information.


ty
or
ab

This new information does not require further data gathering form source,
kr
ha
C

but does require analysis of the given information to generate new


C
R

knowledge.
Example :
− given a set of relations and values, one may infer other values or

relations.
− a predicate logic (a mathematical deduction) is used to infer from a set

of attributes.
− inference through predicate logic uses a set of logical operations to relate

individual data.
− the symbols used for the logic operations are :

" → " (implication), " ¬ " (not), " V " (or), " Λ " (and),
" ∀ " (for all), " ∃ " (there exists).

Examples of predicate logic statements :

1. "Wonder" is a name of a dog : dog (wonder)

2. All dogs belong to the class of animals : ∀ x : dog (x) → animal(x)

3. All animals either live on land or in ∀ x : animal(x) → live (x,


water : land) V live (x, water)

From these three statements we can infer that :


" Wonder lives either on land or on water."

Note : If more information is made available about these objects and their
relations, then more knowledge can be inferred.
19
fo
.in
rs
de
KR - schemes

ea
yr
• Declarative/Procedural Knowledge
.m
w
w
,w

Differences between Declarative/Procedural knowledge is not very clear.


ty
or
ab

Declarative knowledge :
kr
ha

Here, the knowledge is based on declarative facts about axioms and


C
C
R

domains .
− axioms are assumed to be true unless a counter example is found to

invalidate them.
− domains represent the physical world and the perceived functionality.

− axiom and domains thus simply exists and serve as declarative


statements that can stand alone.

Procedural knowledge:
Here, the knowledge is a mapping process between domains that specify
“what to do when” and the representation is of “how to make it” rather
than “what it is”. The procedural knowledge :
− may have inferential efficiency, but no inferential adequacy and
acquisitional efficiency.
− are represented as small programs that know how to do specific things,

how to proceed.
Example : A parser in a natural language has the knowledge that a noun
phrase may contain articles, adjectives and nouns. It thus accordingly call
routines that know how to process articles, adjectives and nouns.
20
fo
.in
rs
de
KR - issues

ea
yr
1.3 Issues in Knowledge Representation

.m
w
w
,w

The fundamental goal of Knowledge Representation is to facilitate


ty
or

inferencing (conclusions) from knowledge.


ab
kr
ha
C

The issues that arise while using KR techniques are many. Some of these
C
R

are explained below.

◊ Important Attributes :
Any attribute of objects so basic that they occur in almost every
problem domain ?

◊ Relationship among attributes:


Any important relationship that exists among object attributes ?

◊ Choosing Granularity :
At what level of detail should the knowledge be represented ?

◊ Set of objects :
How sets of objects be represented ?

◊ Finding Right structure :


Given a large amount of knowledge stored, how can relevant parts be
accessed ?

Note : These issues are briefly explained, referring previous example, Fig.
Inheritable KR. For detail readers may refer book on AI by Elaine Rich & Kevin
Knight- page 115 – 126.
21
14 Expert System

IEARNING OBJECTIVES

Afer reading this chapter, students would be able ro


. Learn the meaning of an expert system. Understand the stages in the development of
Understand the problem domain and knowl an expert system.
edge domain. Examine the general characteristics of an exxpert
Learn the advantages of an expert system. system.

An expert system is a computer system that emulates, or acts in all respects, with the
decision-making capabilities ofa human expert.
Professor Edward Feigenbaum

14.1 Expert System


Ihe most important applied area of artificial intelligence (AI) is the field of expert systems. An expert system
O) 1S aknowledge-based system that employs knowledge about its application domain anduses an infer
encing (reason) procedure to solve problems that would otherwise need human competence or expertise.
domain stored in
PoWerof expert systems stems primarily from the specific knowledge about a narrow
the knowledge base of ES.
pert systems solve the real problems that would normally require a specialised human expert (such as
adoctor or a
lawyer).
Ihe knowledge characteristics include the following:
1. Expert systems are heuristic in nature, and on the basis of the useful "rules of thumb rather than abso-
lute certainties.
2. Much of it is almost subconscious, or appears so obvious that experts don't even bother mentioning it.
Thedomains in which ES are used include the following: medicine, mathematics, engineering, geology,
Computer science, business, law, defence, education, etc. Within each domain, ESs has been used to solve
problems of different
types.
350 Artificial Intelligence: Making aSystem Intelligent

following:
The types of related problems involve the
error):
1. Diagnosis: (for example, of a system fault, disease or student
2. Design: (of a computer systems, hotel, etc.); and
3. Interpretation: (of, for example, geological data).
Theappropriate problem-solving technique tends to depend more on the type of the problem rather than
on the domain.
There are many examples of ES. Some of these are as follows:
1. MYCIN: One of the earliest expert systems based on the backward chaining. It might determine
various bacteria that can cause severe infections and can also recommend drugs based on the weight of
the person.
2. DENDRAL: It was an Al-based ESused for the chemical analysis. It is used as aspectrographic data of
substance to predict its molecular structure.
3. RI/XCON: I could select specific software to generate acomputer system wished by the user.
4. PXDES: Itcould easily determine the type and the degree of lung cancer in a patient based on the data.
5. CaDet:It is a clinical support system that could identify cancer in its carly stages in the patients.
6. DXplain: It was also a clinical support system that could suggest a variety of diseases based on the
findings of the doctor.

14.1.1 Architecture of Expert System


system through a
Figure 14.1 represents the modules that make up a typical ES. The user interacts with the
struc
user interface that simplified communications and hides much of the complexity, such as the internal
ture of the rule base. Expert system interface employ a variety of user styles, including question-and-answer,
needs
menu-driven or graphics interfaces. The end decision on the interface type is a compromise betweenis based
heart of the ES
of the user and the requirement of the knowledge base and inferencing system. The
rule-based expert
on the knowledge, which contains the knowledge of a particular application domain. In a general
The knowledge base has both
system, this knowledge is represented in the form of IF-THEN rules. applies
knowledge as well as case-specific information. The inference engine the knowledge to the solution
production system, the infer
of real problems. Essentially, it is an interpreter for the knowledge base. In the
ence engine performs the recognise-act control cycle.
User
Knowledge-base
interface
editor
May employ;
General knowledge
Question & base
answer, Interface
natural Case-specific
USB engine
language or
Menu graphics
driven
Interface Explanation
styles sub-system

Figure 14.1 Architectureofa typical expert system for a particular problem domain.
Chapter 14|Expert System

351
eprocedures that implement the control cycle are
dhe proceauintain this separation of the knowledgeseparate from the production rules themselves. It
base and inference engine for several reasons: is
eseoaration makes It possible to represent knowledge in a more natural fashion. For example,
rules are closer to the way in which humans describe their
IF-THEN

Jower-level computercode. problem-solving skills than is


2. Because, the knowledge base is separated from the lower-level control structure of the program, ES
ders can focus on captur1ng and organising problem-solving knowledge raher than on the details
implementation.
of its computer
Jdeally, the separation of knowledge and control allows changes to take place in one part of the knowl-
cdge base without creating side effects in other.
AThe separation of che knowledge and control elements of the program allows the same control and
oterface sofrware to be used in a variety of systems. The expert system shell has all the components of
fgure above except that the knowledge base and case-specific data are empty and can be added for a
new application. The broken lines in Fig. 14.1 represent the shell modules.

14.1.2 Components of an ES
ris important to stress to students that expert systems are assistants to decision makers and not substitutes
for them. ESs does not possess human capabilities. These use aknowledge base of aparticular domain and
bring that knowledge to take on the facts of the particular situation at hand. The knowledge base of an ES
abo contzains heuristic knowledge, i.e.,rules of thumb used by human experts who work in the domain.
The components of ES are represented in Fig. 14.2. It includes the following:
1. Knowiedge base
2. Inference engine
3. User interface
Expert system

Stores
knowiedge
Knowledge base (facts and
Facts rules)
User
User
interface Derives new
Expertise knowiedge
Inference engine from existing
knowledge

system.
Figure 14.2 Components of an expert

Knowledge Base knowledge. Knowledge is required to exhibit


Knowi edge base contains domain-specific and high-quality
Intelligence. The successofany ES depends mainly on the collection of highly accurate and precise knowledge.
What ls Knowledge?
The data is defined as collection of facts. The information is organised as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowlede.
Intelligent
352 Artificial Intelligence: Making a System

Components of Knowledge Base


The knowledge base of an ES is a store of both, factual and heuristic knowledge.
the knowledge engineers and scholars in
1. Factual Knowledge: It is the information widely accepted by
the task domain.
judgement, one's ability of evaluation and
Z. Heuristic Knowledge: It is about practice, accurate
guessing.
Knowledge Representation formalise the knowledge in the knowledge
Knowledge representation is he method used to organise and
base. It is in the form of IF-THEN-ELSE rules.
Knowledge Acquisition completeness and accuracy of the information stored
1he success of any ES depends mainly on the quality, experts, scholars and also
in the knowledge base. The knowledge base is formed by the readings from variousof empathy, quick learning
the knowledge engineers. The knowledge engineer is a person with the qualities
and case analysing skills. recording, interviewing and
He acquires information from subject expert by various methods of meaningful way, in the form
in a
observing him at work, etc. He categorises and organises the informationknowledge engineer also monitors
of IF-THEN-ELSE rules, to be used by the interference machine. The
the development of the ES.
Inference Engine flawless
Use of efficient procedures and rules by the inference engine is essential in deducting a correct,
solution. In the case of knowledge-based ES, the inference engine acquires and manipulates the knowledge
from the knowledge base to arrive at a particular solution. In case of rule-based ES, it:
1. Applies rules repeatedly to the facts, which are obtained from earlier rule application.
2. Adds new knowledge into the knowledge base if required.
3. Resolves rules conflict when muliple rules are applicable to a particular case.

To recommend a solution,the inference engine uses the following strategies:


1. Forward chaining
2. Backward chaining
Forward Chaining
It is astrategy of an ES to answer the question, "What can happen next?" Here, the inference engine follows
the chain of conditions and derivations and finally gets the outcome. It considers all the facts and rules, and
sorts them beforederiving a solution. This strategy is followed for working on conclusion, result or effect.
For example, prediction of share market status as an effect of changes in interest rates.
Fact 1
AND Decision1
Fact 2
AND Decision 4
Fact 3
OR Decision 2
Fact 4
Chapter 14|Expert System 353

Backward Chaining
Withthis strategy,
an ES finds out the answer to the question, "Why this happened?" On the basis of what
hasalready happened, the inference engine tries to find which conditions could have happened in the past
oe this result. This strategy is tollowed for finding cause or reason. For example, diagnosis of blood cancer
in humans.

Fact 1
AND Decision 1
Fact 2
ANDH Decision 4
Fact 3
OR Decision 2
Fact 4

User Interface
User interface provides interaction between user of the ES and the ES itself. Generally, it is anatural language
processing so as to be used by the user who is well-versed with the task domain. The user of the ES need not
be necessarily an expert in AI.
Itexplains how the ES has arrived at a particular recommendation. The explanation might appear in the
following forms:
1. Natural language displayed on the screen.
2. Verbal narrations in the natural language.
3. Listing of rule numbers displayed on the screen.
The user interface makes it easy totrace the credibility of the deductions.
Requirements of Efficient ESUser Interface
1. It should help users to achieve their goals in the shortest possible way.
2. It should be designed to work for the user's existing or the desired work practices.
3. Istechnology should be adaptable to the requirements of the user; not the other way round.
4. It should make efficient use of the user input.
Knowledge-based ESs, or simply ESs, use human knowledge to solve problems that would normally require
human intelligence. These ES represent/the expertise knowledgelas data or rules within the computer. These
rules and data can be called upon when needed to solve problems. Book and manuals have tremendous
amount of knowledge but as human has to read and interpret the knowledge for it to be used. Conventional
computer programs execute tasks using conventional decision-making logic containing litle knowledge
other than the basic algorithm for solving that specific problem and the necessary boundary conditions. This
Program knowledge is often embedded as apart of the programming code, so that the knowledge changes, the
Pogram has to be changed and then rebuilt. Knowledge-based systems/collect the small fragments of human
now-how into aknowledge base which is used to reason through a problem using the knowledge that is
OPriate. Adifferent problem, within the domain of knowledge base, can be solved using the same program
teprogramming.
to handle levels of The ability of these system to explain the reasoning process though back traces and
don't handle. Mostconfidence and uncertainty provides an additional feature that conventional programming
ES are developed by specialised software tools called shels. These shells come equipped
with an inference mechanism (backward chaining, forward chaining or both), and require knowledge to be
Introduction to Probabilistic Reasoning,
Basic Probability, Reasoning under
Uncertainty, Bayes’ Rule, Bayesian
Networks
Probabilistic reasoning
◼ Probabilistic reasoning is a way of knowledge representation where
we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability theory
with logic to handle the uncertainty.
◼ We use probability in probabilistic reasoning because it provides a way
to handle the uncertainty that is the result of someone's laziness and
ignorance.
◼ In the real world, there are lots of scenarios, where the certainty of
something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two
players." These are probable sentences for which we can assume that
it will happen but not sure about it, so here we use probabilistic
reasoning.
Uncertainty
● Most intelligent systems have some degree of
uncertainty associated with them.
● Uncertainty may occur in KBS because of the
problems with the data.
− Data might be missing or unavailable.
− Data might be present but unreliable or ambiguous due to
measurement errors, multiple conflicting measurements etc.
− The representation of the data may be imprecise or
inconsistent.
− Data may just be expert's best guess.
− Data may be based on defaults and the defaults may have
exceptions.
Cont…
● Given numerous sources of errors, the most KBS
requires the incorporation of some form of uncertainty
management.
● For any form of uncertainty scheme, we must be
concerned with three issues.
− How to represent uncertain data?
− How to combine two or more pieces of uncertain data?
− How to draw inference using uncertain data?
● Probability is the oldest theory with strong
mathematical basis.
● Other methods for handling uncertainty are Bayesian
belief network, Certainty factor theory etc.
Probability Theory
● Probability is a way of turning opinion or expectation into
numbers.
● It lies between 0 to 1 that reflects the likelihood of an
event.
● The chance that a particular event will occur = the
number of ways the event can occur divided by the total
number of all possible events.
Example: The probability of throwing two successive
heads with a fair coin is 0.25
− Total of four possible outcomes are :
HH, HT, TH & TT
− Since there is only one way of getting HH,
probability = ¼ = 0.25
Event
Event: Every non-empty subset A (of sample space S) is
called an event.
− null set  is an impossible event.
− S is a sure event
● P(A) is notation for the probability of an event A.
● P() = 0 and P(S) = 1
● The probabilities of all events S = {A1, A2, …, An}
must sum up to certainty i.e. P(A1) + … + P(An) = 1
● Since the events are the set, it is clear that all set
operations can be performed on the events.
● If A and B are events, then
− A  B ; A B and A' are also events.
− A - B is an event "A but not B
− Events A and B are mutually exclusive, if A  B=
Axioms of Probability
● Let S be a sample space, A and B are events.
− P(A)  0
− P(S) = 1
− P(A’ ) = 1 - P(A)
− P(A  B ) = P(A) + P(B) – P(A  B)
− If events A and B are mutually exclusive, then
P(A  B ) = P(A) + P(B),
● In general, for mutually exclusive events A1,…,An in S
P(A1  A2 …  An ) = P(A1) + P(A2) + …+ P(An)
Joint Probability
● Joint Probability of the occurrence of two independent events is
written as P (A and B) and is defined by
P(A and B) = P(A  B) = P(A) * P(B)
Example: We toss two fair coins separately.
Let P(A) = 0.5 , Probability of getting Head of first coin
P(B) = 0.5, Probability of getting Head of second coin
● Probability (Joint probability) of getting Heads on both the coins is
= P(A and B)
= P(A) * P(B) = 0.5 X 0.5 = 0.25
● The probability of getting Heads on one or on both of the coins i.e.
the union of the probabilities P(A) and P(B) is expressed as
P(A or B) = P(A  B) = P(A) + P(B) - P(A) * P(B)
= 0.5 X 0.5 - 0.25
= 0.75
Conditional Probability
● It relates the probability of one event to the occurrence of another
i.e. probability of the occurrence of an event H given that an
event E is known to have occurred.
● Probability of an event H (Hypothesis), given the occurrence of an
event E (evidence) is denoted by P(H | E) and is defined as
follows:

Number of events favorable to H


which are also favorable to E
P(H | E) =
No. of events favorable to E

P(H and E)
=
P(E)
Example
● What is the probability of a person to be male if
person chosen at random is 80 years old?
● The following probabilities are given
− Any person chosen at random being male is about 0.50
− probability of a given person be 80 years old chosen at
random is equal to 0.005
− probability that a given person chosen at random is both
male and 80 years old may be =0.002
● The probability that an 80 years old person chosen at
random is male is calculated as follows:
P(X is male | Age of X is 80)
= [P(X is male and the age of X is 80)] / [P(Age of X is 80)]
= 0.002 / 0.005 = 0.4
Conditional Probability with Multiple
Evidences

● If there are n evidences and one hypothesis, then


conditional probability is defined as follows:

P(H and E1 … and En)


P(H | E1 and … and En) =
P(E1 and … and En)
Bayes’ Theorem
• Bayes theorem provides a mathematical model for this
type of reasoning where prior beliefs are combined
with evidence to get estimates of uncertainty.
• This approach relies on the concept that one should
incorporate the prior probability of an event into the
interpretation of a situation.
• It relates the conditional probabilities of events.
• It allows us to express the probability P(H | E) in terms
of the probabilities of P(E | H), P(H) and P(E).
P(E|H) * P(H)
P(H|E) =
P(E)
Proof of Bayes’ Theorem
● Bayes’ theorem is derived from conditional probability.
Proof: Using conditional probability
P(H|E) = P(H and E) / P(E)
 P(H|E) * P(E) = P(H and E) (1)
Also P(E|H) = P(E and H) / P(H)
 P(E|H) * P(H) = P(E and H) (2)

From Eqs (1) and (2), we get


P(H|E) * P(E) = P(E|H) * P(H)
Hence, we obtain
P(E|H) * P(H)
P(H|E) =
P(E)
Extension of Bayes’ Theorem
● Consider one hypothesis H and two evidences E1 and
E2.
● The probability of H if both E1 and E2 are true is
calculated by using the following formula:

P(E1| H) * P(E2| H) * P(H)


P(H|E1 and E2) =
P(E1 and E2)
Contd..
● Consider one hypothesis H and Multiple evidences
E1,…., En.
● The probability of H if E1,…, En are true is calculated
by using the following formula:

P(E1| H) * … * P(En | H) * P(H)


P(H|E1 and … and En) =
P(E1 and … and En)
Example
● Find whether Bob has a cold (hypotheses) given that
he sneezes (the evidence) i.e., calculate P(H | E).
● Suppose that we know / given the following.
P(H) = P (Bob has a cold) = 0.2
P(E | H)= P(Bob was observed sneezing
| Bob has a cold) = 0.75
P(E | ~H)= P(Bob was observed sneezing
| Bob does not have a cold) = 0.2
Now
P(H | E) = P(Bob has a cold | Bob was observed sneezing)
= [ P(E | H) * P(H) ] / P(E)
Cont…
● We can compute P(E) as follows:
P(E) = P( E and H) + P( E and ~H)
= P(E | H) * P(H) + P(E | ~H) * P(~H)
= (0.75)(0.2) + (0.2) (0.8) = 0.31
− Hence P(H | E) = [(0.75 * 0.2)] / 0.31 = 0.48387
− We can conclude that “Bob’s probability of having a cold given
that he sneezes” is about 0.5
● Further it can also determine what is his probability of
having a cold if he was not sneezing?
P(H | ~E) = [P(~E | H) * P(H)] / P(~E)
= [(1 – 0.75) * 0.2] / (1 – 0.31)
= 0.05 / 0.69 = 0.072
− Hence “Bob’s probability of having a cold if he was not
sneezing” is 0.072
Advantages and Disadvantages of
Bayesian Approach
Advantages:
● They have sound theoretical foundation in probability
theory and thus are currently the most mature of all
certainty reasoning methods.
● Also they have well-defined semantics for decision
making.
Disadvantages:
● They require a significant amount of probability data
to construct a KB.
− For example, a diagnostic system having 50 detectable
conclusions (R) and 300 relevant and observable
characteristics (S) requires a minimum of 15,050 (R*S + R)
probability values assuming that all of the conclusions are
mutually exclusive.
Bayesian Belief Network
● Joint probability distribution of two variables A and B
are given in the following Table

Joint Probabilities A A’
B 0.20 0.12
B’ 0.65 0.03
● Joint probability distribution for n variables require 2n
entries with all possible combinations.
● The time and storage requirements for such
computations become impractical as n grows.
Contd..
● Inferring with such large numbers of probabilities
does not seem to model human process of
reasoning.
● Human tends to single out few propositions which are
known to be causally linked when reasoning with
uncertain beliefs.
● This leads to the concept of forming belief network
called a Bayesian belief network.
● It is a probabilistic graphical model that encodes
probabilistic relationships among set of variables
with their probabilistic dependencies.
● This belief network is an efficient structure for storing
joint probability distribution.
Definition of BBN
● It is a acyclic (with no cycles) directed graph where
the nodes of the graph represent evidence or
hypotheses and arc connecting two nodes represents
dependence between them.
● If there is an arc from node X to another node Y (i.e.,
X →Y), then X is called a parent of Y, and Y is a
child of X.
● The set of parent nodes of a node Xi is represented
by parent_nodes(Xi).
Joint Probability of n variables
● Joint probability for ‘n’ variables (dependent or
independent) is computed as follows.
● For the sake of simplicity we write P(X1 , … , Xn)
instead of P(X1 and … and Xn).

P(X1 , … ,Xn) = P(Xn | X1 ,…, Xn-1) * P(X1 , … , Xn-1)


Or
P(X1 , … , Xn) = P(Xn | X1 , … , Xn-1) * P(Xn-1 | X1 , … , Xn-2) * …. * P(X2 | X1) * P(X1)
Joint Probability of ‘n’ Variables using
B-Network
● In Bayesian Network, the joint probability distribution
can be written as the product of the local distributions
of each node and its parents such as:

n
P(X1, … , Xn) =  P(Xi | parent_nodes(Xi))
i =1
● This expression is reduction of joint probability
formula of ‘n’ variables as some of the terms
corresponding to independent variables will not be
required.
Cont…
● If node Xi has no parents, its probability distribution
is said to be unconditional and it is written as P(Xi)
instead of P(Xi | parent_nodes(Xi)).
● Nodes having parents are called conditional.
● If the value of a node is observed, then the node is
said to be an evidence node.
● Nodes with no children are termed as hypotheses
node and nodes with no parents are called
independent nodes.
Example

● The following graph is a Bayesian belief network.


− Here there are four nodes with {A, B} representing
evidences and {C, D} representing hypotheses.
− A and B are unconditional nodes and C and D are
conditional nodes.

A B

C D

Bayesian Belief Network


Cont...
● To describe above Bayesian network, we should
specify the following probabilities.

P(A) = 0.3
P(B) = 0.6
P(C|A) = 0.4
P(C|~A) = 0.2
P(D|A, B) = 0.7
P(D|A, ~B) = 0.4
P(D|~A, B) = 0.2
P(D|~A, ~B) = 0.01
Cont…
● They can also be expressed as conditional probability
tables as follows:

Conditional Probability Tables

P(A) P(B) A P(C) A B P(D)


0.3 0.6 T 0.4 T T 0.7
F 0.2 T F 0.4
F T 0.2
F F 0.01
Cont…
● Using Bayesian belief network on previous slide,
only 8 probability values in contrast to 16 values are
required in general for 4 variables {A, B, C, D} in joint
distribution probability.
● Joint probability using Bayesian Belief Network is
computed as follows:

P(A, B, C, D) = P(D|A, B) * P(C|A) * P(B) * P(A)


= 0.7 * 0.4 * 0.6 * 0.3 = 0.0504
Example of Simple B-Network
● Suppose that there are three events namely
earthquake, burglary or tornado which could cause
ringing of alarm in a house.
● This situation can be modeled with Bayesian network
as follows.
● All four variables have two possible values T (for true)
and F (for false).
− Here the names of the variables have been abbreviated
to A = Alarm, E = Earthquake, and B = Burglary and T =
Tornado.
Cont…

Earthquake Tornado Burglary

Alarm ringing

Earthquake and Burglary are independent variables


Alarm Ringing and Tornado are dependent variables
Cont…
● Table contains the
probability values Conditional Probability Tables
representing
complete Bayesian
P(E) P(B) E B Tor P(A)
belief network. 0.4 0.7 T T T 1.0
● Prior probability of T T F 0.9
‘earthquake’ is 0.4
and if it is E P(Tor) T F T 0.95
earthquake then T 0.8 T F F 0.85
probability of
‘tornado’ is 0.8. and
F 0.5 F T T 0.89
if not then the F T F 0.7
probability of F F T 0.87
‘earthquake’ is 0.5.
F F F 0.3
Cont…
● The joint probability is computed as follows:
P(E, B, T, A) = P(A| E, B, T) * P(T|E) * P(E) * P(B)
= 1.0 * 0.8 * 0.4 * 0.7 = 0.214
● Using this model one can answer questions using the
conditional probability formula as follows:
− "What is the probability that it is earthquake, given the alarm
is ringing?" P(E|A)
− "What is the probability of burglary, given the alarm is
ringing?" P(B|A)
− "What is the probability of ringing alarm if both earthquake
and burglary happens?" P(A|E, B)
Advantages of Bayesian B Network

● It can easily handle situations where some data


entries are missing as this model encodes
dependencies among all variables.
● It is intuitively easier for a human to understand direct
dependencies than complete joint distribution.
● It can be used to learn causal relationships.
● It is an ideal representation for combining prior
knowledge (which often comes in causal form) and
data because the model has both causal and
probabilistic semantics.
Disadvantages of BB Network

● The probabilities are described as a single numeric


point value. This can be a distortion of the precision that
is actually available for supporting evidence.
● There is no way to differentiate between ignorance and
uncertainty. These are distinct two different concepts
and be treated as such.
● The quality and extent of the prior beliefs used in
Bayesian inference processing are major shortcomings.
● Reliability of Bayesian network depends on the
reliability of prior knowledge.
● Selecting the proper distribution model to describe the
data has a notable effect on the quality of the resulting
network. Therefore, selection of the statistical
distribution for modeling the data is very important.
Markov Decision Process (MDP)
• A Markov Decision Process (MDP) is a mathematical framework used
to model decision-making situations where outcomes are partly
under the control of a decision-maker (agent) and partly random. It
provides a formalism for sequential decision-making problems,
especially in areas like reinforcement learning, operations research,
robotics, economics, and artificial intelligence.
What does this mean, exactly?
• Imagine you’re playing a video game. At each moment (called a
state), you can choose an action (like jump, shoot, or move). Based on
your action:
• You move to a new state in the game.
• You get a reward (like points or coins).
• But you don’t always land in the same spot — sometimes there’s a
chance factor (like wind, or an enemy randomly appearing).
• That’s the core of an MDP:
• An MDP models an environment in which an agent interacts through
a series of states, actions, and rewards. The goal of the agent is to
maximize the cumulative reward over time.
• MDPs assume the Markov property, meaning the future state
depends only on the current state and action, not on the sequence of
events that preceded it
Scenario: A Robot Navigating a Room
• Imagine a robot vacuum cleaner in a small 3x3 room grid. It wants to
reach a charging station while avoiding obstacles and minimizing
battery usage.
What is Utility Theory?
• Utility Theory is a way to understand how people (or agents) make
choices when they have different options in front of them.
• Think of utility as a measure of happiness, satisfaction, or value you
get from something.
• People usually choose the option that gives them the highest utility
(most satisfaction).
• So, Utility Theory helps in decision making, especially when there is
uncertainty or trade-offs involved.
• Utility theory is a framework used to model and analyze how
individuals or agents make decisions under uncertainty and
preferences.
• It assumes that people make rational choices based on their
preferences, and that those preferences can be quantified using a
utility function.
In simple words:
• Utility theory helps us assign numbers to preferences, so we can
compare choices and select the best one.
Scenario: You Have Two Job Offers
🔹 Job A
• Salary: ₹1,00,000/month
• Travel Time: 60 minutes (long commute)
• Work-Life Balance: Moderate
🔹 Job B
• Salary: ₹80,000/month
• Travel Time: 15 minutes (short commute)
• Work-Life Balance: Excellent

Now, you have to decide which job is better overall — not just based on money, but
all factors combined.
Step 1: Decide What Matters to You
(Weights)
Conclusion: Which Job is Better?

• Job A utility = 83
• Job B utility = 88
Job B is the better choice, even though it pays less, because:
• You save commute time (energy, stress)
• You enjoy a better work-life balance
• The overall happiness (utility) is higher.
Solving Markov Decision Processes (MDPs)
• Markov Decision Processes (MDPs) are mathematical models used for
decision-making in environments where outcomes are partly
random and partly under the control of an agent.
• To solve an MDP, the goal is to find an optimal policy — a mapping
from states to actions that maximizes the expected total reward over
time.
There are two classical and widely used dynamic programming
techniques to solve MDPs:
• Value Iteration
• Policy Iteration
1. Value Iteration
2. Policy Iteration
Step-by-Step Process:

Repeat Steps 2 and


3 until the policy is
stable (no more
changes).
Partially Observable Markov Decision Processes (POMDPs)
A Partially Observable Markov Decision Process (POMDP) is an extension of a Markov Decision Process (MDP) where
the agent does not have full visibility of the current state of the environment. In simpler terms, the agent can’t see
everything it needs to know about the situation and has to make decisions based on incomplete information.

MDP vs. POMDP


MDP: In a traditional MDP, the agent has complete visibility of the state, meaning it knows exactly where it is and all
the conditions affecting the environment.
POMDP: In a POMDP, the agent only has access to partial observations of the environment. It might know
something about the state, but not the entire state, making the decision-making process more challenging.

Why is it "Partially Observable"?


In real-world scenarios, agents often operate in situations where they cannot fully perceive or observe the environment.
For example:
A robot vacuum might not know the entire layout of a room and relies on sensors to make decisions about where to
clean.
In a self-driving car, sensors may provide partial information about the car's surroundings (such as other vehicles or
pedestrians) but might miss certain details (like an obstacle hiding behind a corner).

Thus, the agent must rely on the partial information it receives to estimate the full state and make decisions accordingly.

Key Elements of a POMDP


Just like an MDP, a POMDP has several components, but with some modifications to handle partial observability.
1. States (S):
These represent all possible configurations or situations the system can be in, just like in an MDP.
2. Actions (A):
These are the decisions or actions the agent can choose from at each state.
3. Transition Probabilities (T):
These represent the probability of transitioning from one state to another after taking a specific action. In other
words, how likely is it that taking a certain action will lead to a certain next state?
4. Observations (Z):
Instead of knowing the exact state, the agent can only observe some signal or measurement about the state. These
observations don’t tell the agent the full story; they only provide partial information.
5. Observation Function (O):
This defines the probability of receiving a particular observation given the current state and the action taken. In
simpler terms, it's a way to model how likely the agent is to observe certain things based on what it does.
6. Rewards (R):
Just like in an MDP, the agent receives feedback (reward or punishment) after taking an action, but this feedback is
also influenced by the partial state it perceives.
7. Policy (π):
This is a strategy that tells the agent what action to take, not based on the current state but on its belief about the
state. A belief is a probability distribution over all possible states that reflects what the agent thinks is the current
state based on the observations it has received.

Challenges of a POMDP
The main challenge in a POMDP is that the agent doesn't have direct access to the state of the system, so it
must infer the state from the observations it receives. This introduces complexity in decision-making, and the agent has
to rely on its belief state (an internal representation of what it believes the actual state to be).
This process involves:
Belief Update: When the agent receives a new observation, it updates its belief about the state. The agent's belief
evolves as new observations come in and actions are taken.
Partially Observable Decision Making: The agent must optimize its decisions based not on the state itself, but on the
belief about the state.

Formal Definition of a POMDP


In mathematical terms, a POMDP is defined by the following components:
1. S (States): A finite set of states.
2. A (Actions): A finite set of actions available to the agent.
3. T (Transition Model): The transition model T (s′ ∣s, a) describes the probability of transitioning from state s to state s′
after taking action a.
4. Z (Observations): A finite set of possible observations that the agent can perceive.
5. O (Observation Model): The observation model O(z∣s′ , a) defines the probability of receiving observation z when
the agent takes action a and ends up in state s′ .
6. R (Rewards): A reward function R(s, a) that defines the immediate reward for taking action a in state s.
7. B (Belief State): The agent keeps track of a belief state, b(s), which is a probability distribution over the states,
reflecting what the agent believes the true state to be, given past actions and observations.
8. Policy (π): A policy π(b) that maps belief states to actions. Since the agent doesn’t know the true state, it must
choose the best action based on its belief.

How Does the Agent Make Decisions in a POMDP?


Because the agent cannot directly observe the true state, it works with a belief state that provides a probability
distribution over all possible states, given its observations and past actions. This belief state is updated over time based
on the observations received after each action.
To make optimal decisions, the agent uses a policy that specifies the best action to take based on the current belief state.
In some sense, POMDPs are much more complex than MDPs because:
The agent doesn’t know the true state, making planning and decision-making more challenging.
It must estimate or infer the state based on probabilistic reasoning.
Real-World Examples of POMDPs
1. Robots in Unknown Environments: A robot exploring a new environment might have incomplete sensor data,
leading to partial information about its surroundings. It has to decide where to go next, based on its current belief
about the layout of the environment.
2. Medical Diagnosis: In healthcare, a doctor might not be able to observe the full set of symptoms or test results for a
patient. The doctor must make decisions about treatment based on partial observations and their belief about the
patient's condition.
3. Autonomous Vehicles: Self-driving cars must deal with partial information, such as incomplete sensor readings,
occlusions, or uncertainty about the behavior of other drivers, while still making safe driving decisions.

Solving POMDPs
Solving POMDPs is much harder than solving MDPs, due to the complexity of dealing with belief states. The typical
solution involves:
1. Belief Update:
Updating the belief state after each action and observation.
2. Value Iteration or Approximation:
Iterating over possible actions, observations, and beliefs to find the optimal policy, though solving POMDPs exactly is
often computationally intractable, so approximation methods are commonly used.
Supervised vs Unsupervised Learning:
Supervised Machine Learning
• In Supervised learning, you train the machine using data which is
well "labeled." It means some data is already tagged with the correct
answer.
• It can be compared to learning which takes place in the presence of a
supervisor or a teacher.
• A supervised learning algorithm learns from labeled training data, helps
you to predict outcomes for unforeseen data.
• Successfully building, scaling, and deploying accurate supervised machine
learning Data science model takes time and technical expertise from a
team of highly skilled data scientists.

Unsupervised Learning
• Unsupervised learning is a machine learning technique, where you do not
need to supervise the model.
• Instead, you need to allow the model to work on its own to discover
information. It mainly deals with the unlabelled data.
• Unsupervised learning algorithms allow you to perform more complex
processing tasks compared to supervised learning. Although,
unsupervised learning can be more unpredictable compared with other
natural learning deep learning and reinforcement learning methods.
How Unsupervised Learning works?
Let's, take the case of a baby and her family dog.

She knows and identifies this dog. A few weeks later a family friend brings along
a dog and tries to play with the baby.

Baby has not seen this dog earlier. But it recognizes many features (2 ears, eyes,
walking on 4 legs) are like her pet dog. She identifies a new animal like a dog. This
is unsupervised learning, where you are not taught but you learn from the data
(in this case data about a dog.) Had this been supervised learning, the family
friend would have told the baby that it's a dog.
Types of Supervised Machine Learning Techniques

Regression:
Regression technique predicts a single output value using training data.

Example: You can use regression to predict the house price from training data.
The input variables will be locality, size of a house, etc.

Classification:
Classification means to group the output inside a class. If the algorithm tries to
label input into two distinct classes, it is called binary classification. Selecting
between more than two classes is referred to as multiclass classification.

Example: Determining whether or not someone will be a defaulter of the loan.


Types of Unsupervised Machine Learning
Techniques
Unsupervised learning problems further grouped into clustering and association
problems.

Clustering

Clustering is an important concept when it comes to unsupervised learning. It


mainly deals with finding a structure or pattern in a collection of uncategorized
data. Clustering algorithms will process your data and find natural
clusters(groups) if they exist in the data. You can also modify how many
clusters your algorithms should identify. It allows you to adjust the granularity of
these groups.

Association
Association rules allow you to establish associations amongst data objects inside
large databases. This unsupervised technique is about discovering exciting
relationships between variables in large databases. For example, people that buy
a new home most likely to buy new furniture.

Other Examples:

• A subgroup of cancer patients grouped by their gene expression


measurements
• Groups of shopper based on their browsing and purchasing histories
Parameters Supervised machine learning Unsupervised machine learning technique
technique

Process In a supervised learning model, input In unsupervised learning model, only input
and output variables will be given. data will be given

Input Data Algorithms are trained using labeled Algorithms are used against data which is not
data. labeled

Algorithms Support vector machine, Neural Unsupervised algorithms can be divided into
Used network, Linear and logistics different categories: like Cluster algorithms, K-
regression, random forest, and means, Hierarchical clustering, etc.
Classification trees.

Computational Supervised learning is a simpler Unsupervised learning is computationally


Complexity method. complex

Use of Data Supervised learning model uses Unsupervised learning does not use output
training data to learn a link between data.
the input and the outputs.

Accuracy of Highly accurate and trustworthy Less accurate and trustworthy method.
Results method.

Real Time Learning method takes place offline. Learning method takes place in real time.
Learning

Number of Number of classes is known. Number of classes is not known.


Classes

Main Classifying big data can be a real You cannot get precise information regarding
Drawback challenge in Supervised Learning. data sorting, and the output as data used in
unsupervised learning is labeled and not
known.
🔍 What is Direct Utility Estimation?
Direct Utility Estimation is a simple approach used to learn the utility (value) of each state in a Markov Decision Process
(MDP), by observing rewards while following a fixed policy.
In this method:
We don’t build a model of the environment (like state transitions or reward functions).
We don’t change the policy.
We just observe episodes, note what rewards we get, and average those rewards for each state.

🧠 What is Utility?
In RL, the utility of a state means:

"How good is it to be in this state if I follow the current policy?"

Utility is also called:


Value function, written as U (s)
It tells us the expected total reward from state s till the end.

📘 Steps in Direct Utility Estimation:


Let’s assume:
You have a fixed policy π .
You run multiple episodes (simulations from start to finish).
Each episode gives you a sequence of states and the total reward collected.

👣 Step-by-step Process:
1. Initialize:
Create a table to store total rewards observed from each state.
Keep count of how many times each state is visited.
2. Run multiple episodes:
Each episode is a sequence of states and rewards (e.g., s1 , s2 , ..., sn ).
​ ​ ​

For every state s in that sequence, record:


The total reward collected from that state till the end of the episode.
3. Update the utility estimate:
For each state, keep a running average of the total rewards collected.
Formula:
N (s)
1
U (s) = ∑ Gi
N (s)
​ ​ ​

i=1

where:
N (s) = number of times state s was visited
Gi = total reward observed from state s in the ith visit

4. Repeat until utilities stabilize.

🔢 Example in Table Form


Let’s say your agent always follows a path like this:

sql Copy Edit

Start → S1 → S2 → Goal

In 3 episodes, suppose the following rewards were observed from S1 onward:

Episode Path Total reward from S1

1 S0 → S1 → S2 → Goal 10

2 S0 → S1 → S2 → Goal 8

3 S0 → S1 → S2 → Goal 9

Then:

10 + 8 + 9
U (S1) = =9
3

So the utility of state S1 is estimated to be 9.

✅ Key Characteristics:

Feature Description

Type Passive Reinforcement Learning

Policy Fixed (given beforehand)

Environment model needed? ❌ No – only experience is used

Learning Utility (value) of states

Exploration ❌ No – follows the same policy

Accuracy Improves with more episodes

Speed Slower if state is rarely visited


📦 Advantages:
Very simple to implement.
Doesn’t require transition probabilities or rewards model.
Good for environments where policies are fixed or pre-determined.

⚠️ Disadvantages:
Inefficient for large or complex environments.
Takes a long time to converge if some states are visited rarely.
Only works with fixed policy – cannot improve policy.

You might also like