0 ratings0% found this document useful (0 votes) 54 views122 pagesModule1 And2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
MODULE 1-2
MACHINE LEARNING
CONCEPTSReferences
Text Books:
1. Tom M. Mitchell, Machine Learning, India Edition 2013, McGraw
Hill Education.
Reference Books:
1, Trevor Hastie, Robert Tibshirani, Jerome Friedman, h The Elements
of Statistical Learning, 2nd edition, springer series in statistics.
2, Ethem Alpaydm, Introduction to machine learning, second edition, MIT press.SYLLABUS
Invoduction. Well Posed Learning Protlems, Designing a Leaning System, Perspectives and Issues in
Machine Leaming,
Concept Learning and the General-to Specific Ordering: Introduction, A Concept Learning Task, Concept
Learning ae Search, FIND-S: Finding a Maximally Specific Hypothesis, Version Spaces and the
CANDIDATE-ELIMINATION Algor,
Decision Tree Learring: Ittoduction, Decision Tree Representation, Appropriate problem for Decision tro
LLesrning, The Basic Decision Tree Learning Algerttnm, Hypothesis Space Search in Decision Tree Leaning,
Inductive Bias in Decision Tree Leaning, lssues in Decision Tree Leaning
Afiial Noural Networks: Intoduction, Natural Network Representations, Appropriate Problems for Neural
Network Learning, Perceptions, Multlayer Network and the BACKPROPAGATION Algorithm
‘Bayesian Learning: Intosuction, Bayes Thecrem, Bayes Theorem and Concept Leaming, Bayes Optimal
Classifier, Natve Bayes Classter,An Example: Learning to Classy Text
\nstance- Based Learning: Inroduction, K-NEAREST NEIGHBOUR Learning, Distance Weighted NEAREST
NEIGHBOUR Algor.
Genetic Aigerithms: Motivation, Genetic Algorithms, Hypothesis Space Search, Genetic Programming,
Parallaiong Genetic Algor
Leaining Sets of Rules: Intoduction, Sequential Covering Algom, Learning Rule Sets: Summary
Learning First-Order Rules, Learning Sets of First-Order Rules: FOIL, Induction as Inverted Deduction,
‘verted Resolution
‘Support Vector Machine: Maximum margin near separator, Quadratic Programming Solution to finding
‘maximum margin separators, Kemels fr lesmning nor-inear functions.MODULE -1Introduction
Ever since computers were invented, we have wondered whether they might be
made to learn. If we could understand how to program them to learn-to improve
automatically with experience-the impact would be dramatic.
+ Imagine computers learning from medical records — which treatments
are most effective for new diseases
* Houses learning from experience to optimize energy costs based on the particular
usage patterns of their occupants.
* Personal software assistants learning the evolving interests of their users in order
to highlight especially relevant stories from the online morning newspaperExamples of Successful Applications of
Machine Learning
+ Learning to recognize spoken words
+ Learning to drive an autonomous vehicle
+ Learning to classify new astronomical structures
+ Learning to play world-class backgammonWhy is Machine Learning Important? =
+ Some tasks cannot be defined well, except by examples (e.g., recognizing
people).
* Relationships and correlations can be hidden within large amounts of
data. Machine Learning/Data Mining may be able to find these
relationships.
* Human designers often produce machines that do not work as well as desired
in the environments in which they are used.
+ The amount of knowledge available about certain tasks might be too large
for explicit encoding by humans (e.g., medical diagnostic)
+ Environments change over time.
+ New knowledge about tasks is constantly being discovered by humans. It
may be difficult to continuously re-design systems “by hand”.Machine Learning: A Definition
+ Machine learning is an application of artificial intelligence that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed.
+ Machine learning focuses on the development of computer programs that can ac
data and use it learn for themselves.
+ A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.Why “Learn”?
Learning is used when:
+ Human expertise does not exist (navigating on Mars)
+ Humans are unable to explain their expertise (speech recognition)
* Solution changes in time (routing on a computer network)
* Solution needs to be adapted to particular cases (user biometrics)Types of Learning
Supervised Learning
Regression
Classification
Unsupervised Learning
Clustering
Association
Reinforcement learningTypes of Learning
Supervised Learning: Training these types of algorithms is having a
supervisor supervising whole thing. When training a supervised learning
algorithm the current training data will consist of inputs paired with
correct outputs. The objective of supervised learning is to predict the
correct labels for newly presented input data, Regression and
classification are two types of supervised learning techniques.
In case of regression output is continuous variable. Example: depending
upon employee performance salary hike will be provided. Here salary hike
is output variable which is a continuous value.Types of Learning
Some common regression algorithms include:
Linear Regression
Polynomial Regression
Support Vector Machine Regression
Decision Tree Regression
Random Forest Regression
Classification: Depending upon employee performance promotion will be provided.
Output is binary variable either promotion is given to the employce or it is not given.
Some common classification algorithms include:
Support Vector Machines
Decision Trees
Random Forests
Naive BayeAdvantages of Supervised learning
Supervised learning allows collecting data and produces data
output from previous experiences.
Helps to optimize performance criteria with the help of experience.
Supervised machine learning helps to solve various types of
real-world computation problems.
It performs classification and regression tasks.
It allows estimating or mapping the result to a new sample.
We have complete control over choosing the number of classes we
‘want in the training data.Disadvantages of Supervised learning
Classifying big data can be challenging,
Training for supervised leaming needs a lot of computation time. So, it
requires a lot of time.
Supervised learning cannot handle all complex tasks in Machine
Learning,
Computation time is vast for supervised learning,
It requires a labelled data set.
It requires a training process.Types of Learning
Supervised learning is when we teach or train the machine using data that is well-labelled. W!
means some data is already tagged with the correct answer. After that, the machine is provided with
a new set of examples(data) so that the supervised leaming algorithm analyses the training data(set
of training examples) and produces a correct outcome from labeled data
© Supervised learning involves training a machine from labeled data,
© Labeled data consists of examples with the correct answer or classification.
© The machine learns the relationship between inputs (images) and outputs (labels).
© The trained machine can then make predictions on new, unlabeled data,
© Regression: A regression problem is when the output variable is a real value, such as “dollars”
or “weight”.
© Classific
or “blue
: A classification problem is when the output variable is a category, such as “Red”
, “disease” or “no disease”Unsupervised Learning
‘Unsupervised learning is a type of machine learning that learns from unlabeled data, This means that the
data does not have any pre-existing labels or categories. The goal of unsupervised learning is to discover
patterns and relationships in the data without any explicit guidance.
‘Unsupervised learning is the training of a machine using information that is neither classified nor labeled
and allowing the algorithm to act on that information without guidance. Here the task of the machine is to
group unsorted information according to similarities, pattems, and differences without any prior training
of data,
Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore the machine is restricted to find the hidden structure in unlabeled data by itself.VI i 3a
nsuper ised Learning ores
Key Points:
© Unsupervised learning allows the model to discover patterns and relationships in unlabeled
data
© Clustering algorithms group similar data points together based on their inherent
characteristics.
© Feature extraction captures essential information from the data, enabling the model to make
meaningful dis s
© Label association ass
characteristics
5 based on the extracted patterns andUnsupervised Learning
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing both
ddogs and cats, The model has never seen an image of a dog or cat before, and it has no pre-existing labels or
categories for these animals. Your task is to use unsupervised learning to identify the dogs and cats in a new,
unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can't categorize it as “dogs and cats
But it can categorize them according to their similarities, patterns, and differences, ie, we can easily
categorize the above picture into two parts. The first may contain all pics having dogs in them and the second.
part may contain all pies having eats in them, Here you didn’t learn anything before, which means no
‘raining data or examples.
It allows the model to work on its own to discover patterns and information that was previously undetected.
It mainly deals with unlabelled dataTypes of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms
© Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as
grouping customers by purchasing behavior. Clustering is a type of unsupervised learning that is used to group
similar data points together. Clustering algorithms work by iteratively moving data points closer to their cluster
centers and further away from data points in other clusters.
Clustering Types:
1
2,
3
Hierarchical clustering
K.means clustering
Principal Component Analysis
Singular Value DecompositionTypes of Unsupervised Learning
Association: An association rule learning problem is where you want to discover rules that describe
large portions of your data, such as people that buy X also tend to buy Y.
Association rule learning is a type of unsupervised learning that is used to identify pattems in a data,
Association rule leaming algorithms work by finding relationships between different items in a dataset.
Some common association rule learning algorithms include:
© Apriori AlgorithmAdvantages of Unsupervised learning
It does not require training data to be labeled,
Dimensionality reduction can be easily accomplished using unsupervised learning,
Capable of finding previously unknown patterns in data
Unsupervised learning can help you gain insights from unlabeled data that you might not have
been able to get otherwise.
Unsupervised leaming is good at finding patterns and relationships in data without being told what
to look for. This can help you leam new things about your data.Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems, including:
‘Anomaly detection: Unsupervised learning ean identify unusual patterns or deviations from normal behavior in
data, enabling the deteetion of fraud, intrusion, or system failures.
Scientfie discovery: Unsupervised lesming can uncover hidden relationships and pattems in scientific data,
leading to new hypotheses and insights in various scifi Feds
‘Recommendation systems: Unsupervised learning ean identity pattems and similarities in user behavior and
preferences to recommend products, movies, or music that align with their interests
Customer segmentation: Unsupervised learning ean identify groups of customers with similar characteristics,
allowing businesses to target marketing campaigns and improve customer service more effectively
Image analysis: Unsupervised learning can group images based on their content, facilitating tasks such as
image classi
cation, object detection, and image retrieval.Reinforcement Learning
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in
an environment fo obtain maximum reward, In RL, the data is accumulated from machine learning s
that use a trial-and-error method. Data is not part of the input that we would find in supervised or
unsupervised machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next
After each action, the algorithm receives feedback that helps it determine whether the choice it made was
correct, neutral or incorrect. It is a good technique to use for automated systems that have to make a lot of
small decisions without human guidance,
Reinforcement learning is an autonomous, self-eaching system that essentially learns by trial and error, It
performs actions with the aim of maximizing rewards, or in other words, it is leaning by doing in order to
achieve the best outcomesMain points in Reinforcement learning
© Input: The input should be an initial state from which the model will start
© Output: There are many possible outputs as there are a variety of solutions to
2 particular problem
© Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
© The model keeps continues to learn.
© The best solution is decided based on the maximum reward,Types of Reinforcement
There are two types of Reinforcement
1. Positive: Positive Reinforcement is defined as when an event, occurs due to particular behavior, increases the strength
and the frequency of the behavior, In other words, it has a positive effect on behavior
Advantages of reinforcement learning are:
© Maximizes Performance
‘© Sustain Change for a long period of time
‘© Too much Reinforcement can lead to an overload of states which can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative condition is stopped or
avoided.
Advantages of reinforcement learning
‘© Increases Behavior
‘© Provide defiance to a minimum standard of performance
‘¢ It Only provides enough to meet up the minimum behaviorAdvantages of Reinforcement learning
1. Reinforcement learning can be used to solve very complex problems that cannot be solved by
conventional techniques,
2. The model can correct the errors that occurred during the training process.
3. In RL, training data is obtained via the direct interaction of the agent with the environment
4, Reinforcement learning can handle environments that are non-deterministic, meaning that the
‘outcomes of actions are not always predictable. This is useful in real-world applications where the
environment may change over time or is uncertain.
5. Reinforcement learning can be used to solve a wide range of problems, including those that
involve decision making, control, and optimization,
6. Reinforcement learning is a flexible approach that can be combined with other machine learning,
techniques, s
leep learning, to improve performance.Disadvantages of Reinforcement learning
1. Reinforcement learning is not preferable to use for solving simple problems.
2. Reinforcement learning needs a lot of data and a lot of computation
3. Reinforcement learning is highly dependent on the quality of the reward function.
If the reward function is poorly designed, the agent may not learn the desired
behavior,
4, Reinforcement learning can be difficult to debug and interpret. It is not always
clear why the agent is behaving in a certain way, which can make it difficult to
diagnose and fix problemsWell-Posed Learning Problem
Definition: A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.
To have a well-defined learning problem, three features needs to be identified:
1, The class of tasks
2. The measure of performance to be improved
3. The source of experienceGame Basics
Checkers is played by two players. Each player begins the game with 12 colored
discs. (One set of pieces is black and the other red.) Each player places his or her
pieces on the 12 dark squares closest to him or her. Black moves first. Players
then alternate moves.
The board consists of 64 squares, alternating between 32 dark and 32 light
squares.
It is positioned so that each player has a light square on the right side corner
closest to him or her.
A player wins the game when the opponent cannot make a move. In most cases,
this is because all of the opponent's pieces have been captured, but it could also
be because all of his pieces are blocked in.Rules of the Game
Moves are allowed only on the dark squares, so pieces always move diagonally.
Single pieces are always limited to forward moves (toward the opponent).
A piece making a non-capturing move (not involving a jump) may move only
one square.
A piece making a capturing move (a jump) leaps over one of the opponent's
pieces, landing in a straight diagonal line on the other side. Only one piece may
be captured in a single jump; however, multiple jumps are allowed during a
single turn.
When a piece is captured, it is removed from the board.
Ifa player is able to make a capture, there is no option; the jump must be made.
If more than one capture is available, the player is free to choose whichever he or
she prefers.+ When a piece reaches the furthest row from the player who controls that piece, it
is crowned and becomes a king. One of the pieces which had been captured is
placed on top of the king so that it is twice as high as a single piece.
Kings are limited to moving diagonally but may move both forward and
backward. (Remember that single pieces, i.c. non-kings, are always limited to
forward moves.)
Kings may combine jumps in several directions, forward and backward, on the
same turn. Single pieces may shift direction diagonally during a multiple capture
turn, but must always jump forward (toward the opponent).
Rules of the Game Cont.Well-Defined Learning Problem
A checkers learning problem:
+ Task T: playing checkers
+ Performance measure P: percent of games won against opponents
Training experience E: playing practice games against itself
A handwriting recognition learning problem:
+ Task T: recognizing and classifying handwritten words within images
+ Performance measure P: percent of words correctly classified
+ Training experience E: a database of handwritten words with
given classificationsA robot driving learning problem:
+ Task T: driving on public four-lane highways using vision sensors
+ Performance measure P: average distance travelled before an error (as judged
by human overseer)
+ Training experience E: a sequence of images and steering commands
recorded while observing a human driverDesigning a Learning System
. Choosing the Training Experience
.. Choosing the Target Function
. Choosing a Representation for the Target Function
ReNe
. Choosing a Function Approximation Algorithm
1, Estimating training values
2. Adjusting the weights
5. The Final DesignCHECKERS GAME
The basic design issues and approaches to machine learning is illustrated
by considering designing a program to learn to play checkers, with the goal
of entering it in the world checkers tournament. a . ©
1. Choosing the Training Experience “=
* The first design choice is to choose the type of training experience from which
the system will learn.
+ The type of training experience available can have a__ significant
impact on success or failure of the learner.
There are three attributes which impact on success or failure of the leamer
1. Whether the training experience provides direct or indirect feedback regarding
the choices made by the performance system.
2. The degree to which the learner controls the sequence of training examples
3. How well it represents the distribution of examples over which the final system
performance P must be measured.1
regarding the choices made by the performance system.
Whether the training experience provides direct or indirect feedback
For example, in checkers game:
In learning to play checkers, the system might learn from direct training examples consisting of individual
checkers board states and the correct move for each.
Indirect training examples consisting of the move sequences and final outcomes of various games played.
The information about the correctness of specific moves early in the game must be inferred indirectly from
the fact that the game was eventually won or lost.
Here the leamer faces an additional problem of credit assignment, or determining the degree to which each
move in the sequence deserves credit or blame for the final outcome.
Credit assignment can be a particularly difficult problem because the game can be lost even when early
moves are optimal, if these are followed later by poor moves.
Hence, learning from direct training feedback is typically easier than learning from indirect feedback.2. A second important attribute of the training experience is the degree to
which the learner controls the sequence of training examples
For example, in checkers game:
+ The learner might depends on the teacher to select informative board states and to provide the correct move
for each
+ Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the
teacher for the correct move.
+ The leamer may have complete control over both the board states and (indirect) training classifications, as it
does when it leams by playing against itself with no teacher present.
+ Notice in this last case the learner may choose between experimenting with novel board states that it has not
yet considered, or honing its skill by playing minor variations of lines of play it currently finds most
promising.3. A third attribute of the training experience is how well it represents
the distribution of examples over which the final system performance P
must be measured.
Learning is most reliable when the training examples follow a distribution similar to that of future test
examples.
For example, in checkers game:
* In checkers learning scenario, the performance metic P is the percent of games the system wins in the world
tournament.
+ Ifits training experience E consists only of games played against itself, there is an danger that this training
experience might not be fully representative of the distribution of situations over which it will later be tested,
For example, the leamer might never encounter certain crucial board states that are very likely to be played
by the human checkers champion
It is necessary to learn from a distribution of examples that is somewhat different from those on which the
final system will be evaluated. Such situations are problematic because mastery of one distribution of
examples will not necessary lead to strong performance over some other distribution,2. Choosing the Target Function
The next design choice is to determine exactly what type of knowledge will be
Iearned and how this will be used by the performance program.
* Lets begin with a checkers-playing program that can generate the legal moves
from any board state.
* The program needs only to learn how to choose the best move from among these
legal moves. This Icarning task is representative of a large class of tasks for
which the legal moves that define some large search space are known a priori, but
for which the best search strategy is not known.eS)
SS i
Given this setting where we must learn to choose among the legal moves, the most
obvious choice for the type of information to be learned is a program, or function,
that chooses the best move for any given board state.
1. Let ChooseMove _ be the target function and the notation is
ChooseMove :B —>M
which indicate that this function accepts as input any board from the set of legal
board states B and produces as output some move from the set of legal moves M
(best move).
ChooseMove is an choice for the target function in checkers example, but this
function will turn out to be very difficult to learn given the kind of indirect training
experience available to our system2, An alternative target function is an evaluation function that assigns a numerical
score to any given board state
Let the target function V and the notation
V:BOR
which denote that V maps any legal board state from the set B to some real value.
We intend for this target function V to assign higher scores to better board states. If
the system can successfully learn such a target function V, then it can easily use it to
select the best move from any current board position.se
Let us define the target value V(b) for an arbitrary board state b in B, as follows: ae ee
1. if isa final board state that is won, then V(b) = 100 "
2. ifb isa final board state that is lost, then V(b) = -100
3. if b isa final board state that is drawn, then V(b) = 0
4, ifb isa nota final state in the game, then V(b) = V(b' ),
where b! is the best final board state thatcan be achieved starting
from b and playing optimally until the end of the game.
Learning algorithms to acquire only some approximation to the target function so
target function is often called function approximation Vv.. . fe
3. Choosing a Representation forthe “<=
Target Function
let us choose a simple representation - for any given board state, the function ¢
will be calculated as a linear combination of the following board features:
xl: the number of black pieces on the board
x2: the number of red pieces on the board
x3: the number of black kings on the board
x4: the number of red kings on the board
x5: the number of black pieces threatened by red (i.e., which can be
captured on red's next turn)
x6: the number of red pieces threatened by blackThus, learning program will represent as a linear function of the form
V(b) = wo + wri + worn + waxy + wars + ws + WOE
Where,
+ w, through — w, are numerical coefficients, or weights, to be chosen by
the learning algorithm.
+ Leamed values for the weights w, through w, will determine the
relative importance of the various board features in determining the value of the
board
+ The weight w, will provide an additive constant to the board valuePartial design of a checkers learning prograt
Task T: playing checkers
Performance measure P: percent of games won in the world tournament
Training experience E: games played against itself
Target function: V: Board —>R
Target function representation
V(b) = wo + wien + wox2 + w3.x3 + wary + w5x5 + WEG
The first three items above correspond to the specification of the learning task,
whereas the final two items constitute design choices for the implementation of
the learning program.G
4. Choosing a Function Approximation “<=
Algorithm
* In order to learn the target function f we require a set of training examples,
each describing a specific board state b and the training value V,,,,,(b) for b.
(b)).
+ Each training example is an ordered pair of the form (b, Vi
+ For instance, the following training example describes a board state b in
which black has won the game (note x, = 0 indicates that red has no remaining
pieces) and for which the target function value V, ,,,(b) is therefore +100.
((,=3, x,=0, x,=, x,=0, x,=0, x,=0), +100)Function Approximation Procedure
1. Derive training examples from the indirect training experience available to the
learner
2. Adjusts the weights w, to best fit these training examples1, Estimating training values
A simple approach for estimating training values for intermediate board states is
to assign the training value of V,,,,,(b) for any intermediate board state b to be
vCSuccessor(b))
train|
Where ,
Vis the learner's current approximation to V
Successor(b) denotes the next board state following b for which it is again the
program's turn to move
Rule for estimating training values
Veain(b) — ¥ (Successor(b))2. Adjusting the weights
Specify the learning algorithm for choosing the weights w, to best fit the set of
training examples {(b, Viai,())}
A first step is to define what we mean by the best fit to the training data.
+ One common approach is to define the best hypothesis, or set of weights, as that
which minimizes the squared error E between the training values and the values
predicted by the hypothesis.
E > WVerain(b) — 00)?
(b, Virain(B))€ training examples
+ Several algorithms are known for finding weights of a linear function
that minimize E.In our case, we require an algorithm that will incrementally refine the weights as
new training examples become available and that will be robust to errors in these
estimated training values
One such algorithm is called the least mean squares, or LMS training rule. For
cach observed training example it adjusts the weights a small amount in the
direction that reduces the error on this training example
LMS weight update rule :- For each training eg.(b,V,,,(b))
Use the current weights to calculate V(b)
For each weight w,, update it as
ww, — w, +n (Virain (b) - V(b)) x,(Virain(b) - ¥(@)) = 0 -> No need to change weights
(Virain(6) — V(b) >0 -> Increase weight values
(Virain(6) — ¥(b)) < 0 -> Decrease weight valuesHere n is a small constant (e.g., 0.1) that moderates the size of the weight update.
Working of weight update rule
When the error (Vtrain(b)- V(b)) is zero, no weights are changed.
When (Vtrain(b) - V(b)) is positive (i.e., when V(b) is too low), then each weight
is increased in proportion to the value of its corresponding feature. This will
raise the value of V(b), reducing the error.
If the value of some feature x; is zero, then its weight is not altered regardless of
the error, so that the only weights updated are those whose features actually
occur on the training example board.5. The Final Design
The final design of checkers learning system can be described by four distinct
program modules that represent the central components in many learning systems
wey) et a) > 2 Ma >}Ce)
L.The Performance System is the module that must solve the given performance “=
task by using the learned target function(s).
It takes an instance of a new problem (new game) as input and produces a trace of
its solution (game history) as output.
In checkers game, the strategy used by the Performance System to select its next
move at each step is determined by the learned V evaluation function. Therefore, we
expect its performance to improve as this evaluation function becomes increasingly
accurate.
2.The Critic takes as input the history or trace of the game and produces as output a
set of training examples of the target function. As shown in the diagram, each
training example in this case corresponds to some game state in the trace, along
with an estimate V__,;, of the target function value for this example.3. The Generalizer takes as input the training examples and produces an output
hypothesis that is its estimate of the target function.
It generalizes from the specific training examples, hypothesizing a general function
that covers these examples and other cases beyond the training examples.
In our example, the Generalizer corresponds to the LMS algorithm, and the output
hypothesis is the function V described by the learned weights w,, ..., W,
6
4.The Experiment Generator takes as input the current hypothesis and outputs a
new problem (i.c., initial board state) for the Performance System to explore. Its
role is to pick new practice problems that will maximize the learning rate of the
overall system.
In our example, the Experiment Generator always proposes the same initial
game board to begin a new game.‘The sequence of design choices made for the checkers program is summarized in
below figure
‘fs nas arkOy
Issues in Machine Learning Ss
+ What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired
function, given sufficient training data? Which algorithms perform best for
which types of problems and representations?
How much training data is sufficient? What general bounds can be found to
relate the confidence in learned hypotheses to the amount of training experience
and the character of the learner's hypothesis space?
When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is
only approximately correct?+ What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy alter the complexity of the learning problem?
+ What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the
system attempt to learn? Can this process itself be automated?
+ How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?MODULE-2Concept Learning
+ Learning involves acquiring general concepts from specific training examples.
Example: People continually learn general concepts or categories such as "bird,"
"car," "situations in which I should study more in order to pass the exam," etc.
+ Each such concept can be viewed as describing some subset of objects or events
defined over a larger set
* Alternatively, each concept can be thought of as a Boolean-valued function
defined over this larger set. (Example: A function defined over all animals, whose
value is true for birds and false for other animals).
Concept learning - Inferring a Boolean-valued function from training examples of
its input and outputA Concept Learning Task
Consider the example task of learning the target concept
"Days on which my friend Aldo enjoys his favorite water sport."
Strong Warm Same Yes
High Strong Warm Same Yes
High Strong Warm Change No
High Strong Cool Change YesThe attribute EnjoySport indicates whether or not a Person enjoys
his favorite water sport on this day.
The task is to learn to predict the value of EnjoySport for an arbitrary day,
based on the values of its other attributes ?What hypothesis representation is provided to the learner?
Let’s consider a simple representation in which each hypothesis consists
of a conjunction of constraints on the instance attributes.
Let each hypothesis be a vector of six constraints, specifying the values of the six
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.
For each attribute, the hypothesis will either
+ Indicate by a"? that any value is acceptable for this attribute,
+ Specify a single required value (¢.g., Warm) for the attribute, or
+ Indicate by a "O" that no value is acceptableSS
If some instance x satisfies all the constraints of hypothesis h, then h classifies
X as a positive example (h(x) = 1).
The hypothesis that PERSON enjoys his favorite sport only on cold days with high
humidity (independent of the values of the other attributes) is represented by the
expression
(2, Cold, High, ?, 2, ?)
The most general hypothesis-that every day is a positive example-is represented by
C2BWwWW?
The most specific possible day is a positive _ example-is
hypothesis-that no represented by
(O, 0,0, 0,0,
o)Notation
The set of items over which the concept is defined is called the set of instances,
which we denote by X.
Example: X is the set of all possible days, each represented by the attributes:
Sky, AirTemp, Humidity, Wind, Water, and Forecast
The concept or function to be learned is called the target concept, which we denote
bye.
In general, c can be any boolean-valued function defined over the instances
X; that is, o:X— {0, 1}
Example: The target concept corresponds to the value of the attribute EnjoySport
(ie., o(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).SS
Instances for which c(x) = 1 are called positive examples, or members of the
target concept.
Instances for which c(x) = 0 are called negative examples, or non-members of
the target concept.
The ordered pair (x, c(x)) to describe the training example
consisting of the instance x and its target concept value c(x).
Dto denote the set of available training examples
The symbol H to denote the set of all possible hypotheses that the learner
may consider regarding the identity of the target concept. Each hypothesis h
in H represents a Boolean-valued function defined over X
h:X—0, 1}
The goal of the learner is to find a hypothesis h such that h(x) = (x) for all x in
x.|
=
‘Warm
Wa
‘Normal
High
High
High
Strong Warm
Strong Warm
Strong Warm
Strong Cool
Same
Same
Change
Change
‘Yes
No
‘Yes
oe© Given: z
@ Instances X: Possible days, each described by the attributes
‘© Sky (with possible values Sunny, Cloudy, and Rainy),
AirTemp (with values Warm and Cold),
Humidity (with values Normal and High),
Wind (with values Strong and Weak),
Water (with values Warm and Cool), and
Forecast (with values Same and Change).
© Hypotheses H: Each hypothesis is described by a conjunction of constraints on the at-
tributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be “?”
(any value is acceptable), “A” (no value is acceptable), or a specific value.
Target concept c: EnjoySport : X— {0,1}
@ Training examples D: Positive and negative examples of the target function (see Table 2.1).
© Determine:
A hypothesis h in H such that A(x)
c(x) for all x in X.
TABLE The EnjoySport concept learning task.3
The Inductive Learning Hypothesis Se
Any hypothesis found to approximate the target function well over a sufficiently
large set of training examples will also approximate the target function well
over other unobserved examples.Concept learning as Search
* Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly
defined by the hypothesis representation,
+ The goal of this search is to find the hypothesis that best fits the training
examples.
‘Example, the instances X and hypotheses HT in the EnjoySport leaning task.
The attribute Sky has three possible values, and AirTemp, Humidity, Wind, Water Forecast cach have two
possible values, the instance space X contains exactly
+ 3.2.2.2.2.2 = 96 Distinct instances
+ 544.44.
5120 Syntactically distinet hypotheses within H.
Every hypothesis containing one or more " " symbols represents the empty set of
instances; that is, it classifies every instance as negative,
+ 14+ (43.33.3.3)=973.
emantically distinct hypothesGeneral-to-Specific Ordering of Hypotheses
+ Consider the two hypotheses
b= (Sunny, 2, 2 Strong, 2, ?)
h,= (Sunny, 2, 2,2, 2,2)
+ Consider the sets of instances that are classified positive by h, and by h,.
+ h, imposes fewer constraints on the instance, it classifies more instances as
positive. So, any instance classified positive by h, will also be classified positive
by h,. Therefore, h, is more general than h,.General-to-Specific Ordering of Hypotheses
>
é
* Given hypotheses h, and h,, h, is more-general-than or- equal do h,, if and only if
any instance that satisfies h, also satisfies h,
ig
i
Ss
1
Definition: Let h, and h, be Boolean-valued functions defined over X. Then h, is
more general-than-or-equal-to h, (written h, h,) if and only if
(Wx € X)[Che(x) = 1) > (4 @) = D]Instances X Hypotheses H
x1= “Suny; Warm, High Ligh, Farm, Some y= Suan, 22
®
Sumy, 2,2, 2, Cook
specific
Genel
oO
In the figure, the box 2)
represents the set X of all
instances, the box on the right the
set H of all hypotheses.
Each hypothesis corresponds to
some subset of X-the subset of
instances that it classifies positive.
The arrows connecting hypotheses
represent the more - general -than
relation, with the arrow pointing
toward the less general hypothesis.
Note the subset of instances
characterized by h, subsumes the
subset characterized by h , , hence
h, is more - general- than h,FIND-S: Finding a Maximally Specific
Hypothesis
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a,in h by the next more general constraint that is satisfied by x
3. Output hypothesis /To illustrate this algorithm, assume the learner is given the sequence of
training examples from the EnjoySport task
Warm Normal
Strong,
Warm High Strong Warm Same Yes
Cold High Strong Warm Change No
Warm High ‘Strong Cool Change ‘Yes
The first step of _FIND-S is to initialize A to the most specific hypothesis in
A
h-(@,@, O, D, B, B)x, = , +
Observing the first training example, it is clear that our hypothesis is too specific. In
particular, none of the "@" constraints in h are satisfied by this example, so each is
replaced by the next more general constraint that fits the example
h, =
This h is still very specific; it asserts that all instances are negative except for the
single positive training example
x, = , +
The second training example forces the algorithm to further generalize h, this time
substituting a "?' in place of any attribute value in h that is not satisfied by the new
example
h, = x3 = , -
Upon encountering the third training the algorithm makes no changeto h.
The FIND-S algorithm simply ignores every negative example.
h3 =< Sunny Warm ? Strong Warm Same>
x4 = , +
The fourth example leads to a further generalization of h
hd =< Sunny Warm ? Strong ? ?>Instances X Hypotheses H SS
Specific
General
hy= <.2,B, BB, >
v= , + hy = ,+ hy =
+= , - hg =
x= , + hy = The key property of the FIND-S algorithm is
+ FIND-S is guaranteed to output the most specific hypothesis within H that is
consistent with the positive training examples
+ FIND-S algorithm final hypothesis will also be consistent with the negative
examples provided the correct target concept is contained in H, and provided the
training examples are correct.Example 1
Find S-Algo
example | citations size inLibrary price editions | buy
1 ‘some small no affordable many | no
2 many dig no expensive one | yas
3 some big always expensive few | no
4 many medium no — expensive many | yes
5 many small no affordable many _| yes
QI.How many concepts are possible for this instance space?
Q2. How many hypothesis can be expressed by hypothesis language.
Q3. Apply the Find S algorithm on training data given. Consider the
examples in specified order and write hypothesis each time after
observing an example.Example 2
Find $-Algo
Example | Eyes | Nose | Head | Feolor | Hair | Smile
1 Round Triangle Round Purple ~=Yes_—=SYes
2 Square Square Square Green = Yes._—=—sNo
3 Square Triangle Round Yellow ‘Yes ‘Yes
4 Round Triangle Round Green ~=«s«No. ON
5 ‘Square Square Round Yellow ‘Yes ‘Yes
QI.How many concepts are possible for this instance space?
Q2. How many hypothesis can be expressed by hypothesis language.
Q3. Apply the Find S algorithm on training data given, Consider the
examples in specified order and write hypothesis each time after observing
an example.Unanswered by FIND-S
Has the learner converged to the correct target concept?
Although FIND-S will find a hypothesis consistent with the
training data, it has no way to determine whether it has found the
only hypothesis in H consistent with the data (i.e., the correct target
concept), or whether there are many other consistent hypotheses as
well. We would prefer a learning algorithm that could determine
whether it had converged
Why prefer the most specific hypothesis?
In case there are multiple hypotheses consistent with the training
examples, FIND-S will find the most specific. It is unclear whether
we should prefer this hypothesis over, say, the most general, or
some other hypothesis of intermediate generality.Unanswered by FIND-S
3. Are the training examples consistent?
In most practical learning problems there is some chance that the training
examples will contain at least some errors or noise. Such inconsistent
sets of training examples can severely mislead FIND-S, given the fact
that it ignores negative examples.
4. What if there are several maximally specific consistent
hypotheses?
However, for other hypothesis spaces there can be several maximally
specific hypothesis consistent with the data. In this case, FIND-S must be
extended to allow it to backtrack on its choices of how to generalize the
hypothesis.S
Version Space and CANDIDATE
ELIMINATION Algorithm
The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of
the set of all hypotheses consistent with the training examples
Representation
* Definition: A hypothesis h is consistent with a set of training examples D if and only if
h(x) = (x) for each example (x, e(x)) in D.
Consistent(h, D) = (W (x, e(x)) € D) h(x) = ex)
Note difference between definitions of consistent and satisfies
+ an example x is said to satisfy hypothesis h when A(x) ~ 1, regardless of whether x is a positive
or negative example of the target concept.
+ an example x is said to consistent with hypothesis h iff h(x) = e(x)Consistent hypothesis and version space
Cece Editions | Buy
1 Some ‘Small No Affordable —One— No
2 Many aig No Expensive Many Yes
‘Assume having 2 hypothesis hl and h2.
for x1;
hi=<2,2,n0,?,many> Consistent
h2=,?,no,?,2> Not consistentVersion Space
A representation of the set of all hypotheses which are consistent with D
Definition: The version space, denoted VS, with respect to hypothesis space H
and training examples D, is the subset of hypotheses from H consistent with the
training examples in D
VS, )={h € H | Consistent(h, DY}The LIST-THEN-ELIMINATE ALGORITHM
The LIST-THEN-ELIMINATE algorithm first initializes the version space to
contain all hypotheses in H and then climinates any hypothesis found inconsistent
with any training example.
Assume fl and £2 as two instances with attributes as (A,B) AND (X,Y) respectively.
No of distinct instances=4
Syntactically hypothesis space=16
Semantically hypothesis=9The LIST-THEN-ELIMINATE ALGORITHM
Assume training instance:
FI F2 TARGET
A x Yes
A Y Yes
Version space=(A,X)(A, Y\(A,?\(B,X)(B,Y)(B,?)(2,X)(2,Y)(2,2),0)
By using LIST THEN ELIMINATION ALGORITHM
Version space/ consistent hypothesis=(A,?)(?,?)The LIST-THEN-ELIMINATE ALGORITHM
1. VersionSpace ¢ a list containing every hypothesis in H
2. For each training example, (x, e(x))
remove from VersionSpace any hypothesis h for which h(x) # c(x)
3. Output the list of hypotheses in VersionSpace
THE L
\T-THEN-ELIMINATE PROBLEMS
* List-Then-Eliminate works in principle, so long as version space is finite.
+ However, since it requires exhaustive enumeration of all hypotheses in practice it
is not feasible.A More Compact Representation for Version Spaces
* The version space is represented by its most general and least general members.
* These members form general and specific boundary sets that delimit the
version space within the partially ordered hypothesis space.{-)
+ A. version space with
‘general and specific boundary
sets.
Strong, 2, ?> <2, Warn, 2, Srong, 2, 2» * The version space includes all
six hypotheses shown here, but
can be represented more
simply by S and G.
+ Arrows indicate instance of the
a:| (, <2, Warm, 2, 2,2, 2>} ‘more-general-than relation,
This is the version space for
the Enjoysport concept
learning,
i pb ont
Normal Strong Yes examples described in below
table
Warm = High Strong Warm = Same Yes
Cold High Strong Warm Change No
Warm High Strong Cool Change = Yes.Definition: The general boundary G, with _respectto_ hypothesis space
H and training data D, is the set of maximally general members of H consistent
with D
G={g € H| Consistent(g, DNAs’ € H[(g">, 2) A Consistent(g’, D)}}
Definition: The specific boundary S, with respect to hypothesis space H and
training data D, is the set of minimally general (i.¢., maximally specific) members of
H consistent with D.
S={s €H| Consistent(s, DA(-3s' € HD[(s >,3°) A Consistent, DY]Version Space representation theorem
Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean-
valued hypotheses defined over X. Let ¢ : X —>{O, 1} be an arbitrary target concept
defined over X, and let D be an arbitrary set of training examples {(x, c(x))). For all
X, H, ¢, and D such that $ and G are well defined,
VS, )={h € H |(As € S) Ag € G) (g2,h2,s)}VWSyp={h € H (As € S) Ag € G) (g2,h2,5)}
To Prove:
1. Every h satisfying the right hand side of the above expression is in VS yy
2. Every member of VS ,, satisfies the right-hand side of the expression
ketch of proof,
1. let g,h,s be arbitrary members of G, H, S respectively with g 2, h 2,5
By the definition of 5, s must be satisfied by all positive examples in D. Because h 2, , h must also
be satisfied by all positive examples in D.
By the definition of G, g cannot be satisfied by any negative example in D, and because g 2, hh
cannot be satisfied by any negative example in D. Because h is satisfied by all positive examples in D
and by no negative examples in D, h is consistent with D, and therefore h is a member of VS,
2.It can be proven by assuming some h in VS,, »,that does not satisfy the right-hand side of the
expression, then showing that this leads to an inconsistencyThe CANDIDATE-ELIMINATION Learning Algorithm
The CANDIDATE-ELIMINATION
training examples.
algorithm computes the version space
containing all hypotheses from H that are consistent with an observed sequence of
fe
s
>
é
aInitialize G to the set of maximally general hypotheses in H
Initialize $ to the set of maximally specific hypotheses in H
For each training example d, do
+ Ifd is a positive example
+ Remove from G any hypothesis inconsistent with d
+ For each hypothesis s in S that is not consistent with
+ Remove s from
*+ Add to § all minimal generalizations h of s such that
+ hris consistent with d, and some member of G is more general than h
+ Remove from S any hypothesis that is more general than another hypothesis in S
+ Ifd is a negative example
+ Remove from S any hypothesis inconsistent with d
+ For each hypothesis g in G that is not consistent with d
+ Remove g from G
*+ Add to G all minimal specializations h of g such that
+ his consistent with d, and some member of S is more specific than h
+ Remove from G any hypothesis that is less general than another hypothesis in GAn Illustrative Example
The boundary sets are first initialized to G, andS,, the most general
and — most specific hypotheses in H.
Ss, (2, Q, @, O, ©, W))For training example d,
(Sunny, Warm, Normal, Strong, Warm, Same ) +
a (Sunny, Warm, Normal, Strong, Warm, Same)
(2,2, 2.2.2, 2)For training example d,
(Sunny, Warm, High, Strong, Warm, Same) +
x (Sunny, Warm, ?, Strong, Warm, Same)
(2,2, 2, 2,2.)For training example d
(Rainy, Cold, High, Strong, Warm, Change )~
(Sunny, Warm, ?, Strong, Warm, Same)
2, 2)(?, Warm, ?, ?, ?, ?)(?, 2, 2, 2, 2, Same)For training example d,
(Sunny, Warm, High, Strong, Cool Change ) +
. (Sunny, Warm, ?, Strong,
G, (Sunny, ?, ?, 2, 2, 2){?, Warm, ?, ?, ?, ?){ , <2, Warm, 2, 2, 2, 2>}
The final version space for the EnjoySport concept learning problem and training
examples described earlier.Example 1:
Size ern Bia Crean
Big Red Circle No
Small Red Triangle No
‘Small Red Circle Yes
Big Blue Circle No
‘Small Blue Circle Yes
Compute number of consistent hypothesis for above data using candidate
eliminationExample 2:
Seemed Ec oe Em
1 ‘Some ‘Small No Affordable © One No
2 Many Big No Expensive Many — Yes
3 Many Medium No Expensive Few Yes.
4 Many ‘Small No Affordable © Many —Yes.
Compute number of consistent hypothesis for above data using candidate
climinationExample 3 :
Example | Shape | Size | Color | Surface Thickness oy
1 Circular_| Large Light | Smooth | Thick Malignant (+)
2 Circular | Large | Light Irregular ‘Thick Malignant (+)
3 Oval Large -Dark =| Smooth —‘Thin Benign (-)
4 Oval | Large Light irregular. Thick Malignant (+)
Compute number of consistent hypothesis for above data using candidate
eliminationExample 4:
Example | Eyes | Nose | Head | Feolor | Hair Smile
1 Round Triangle Round Purple Yes Yes
2 Square | Square Square Green Yes No
3 Square Triangle Round Yellow Yes Yes
4 Round Triangle Round | Green No No
aes Square Square Round | Yellow Yes Yes
Compute number of consistent hypothesis for above data using candidate
eliminationExample 5:
cy oto coor ard Gone
er Type
Japan Honda Blue 1980 Economy Positive
Japan Toyota Green 1970 Sports Negative
Japan Toyota Blue 1990 Economy Positive
USA Chrysler Red 1980 Economy Negative
Japan Honda White 1980 Economy _ Positive
Japan Toyota Green 1980 Economy Positive
Japan Honda Red 1990 Economy Negative
Compute number of consistent hypothesis for above data using candidate
eliminationExample
Weather Temperature = WindSpend === ‘FlghtTime Delayed
lar ia oe Moran Ne
aim e201 ode benina ve
tar Het um orang Ne
Rainy ia Moderate ing Yee
Snomy ald 0h renoon Yee
ay Her Hen ring vee
Some Ma oats srenaen Yas
Compute number of consistent hypothesis for above data using candidate
climinationExample 7:
‘Ase Income Gander—Education——Martalstatua Purchased
25 50000 Mle HighSehoal Single Ne
m T5c00 Female Cotage Macs ve
45 60000 Mile Unisys ve
a Matias _
25 55000 Mle HighSehoot Single Ne
10000 Me Gotape singe Ne
9 72000 Fanaa Uniersty Mac vee
2950000 Mle HignSehoal Shale Ne
"2 90000 Female Comage Maes vee
Compute number of consistent hypothesis for above data using candidate
climinationVERSION SPACE AND CANDIDATE ELIMINATION
REMARKS
1. Will candidate elimination algorithm converge to correct hypothesis?
2. How can partially learned concepts be used
3. What training example should leaner request next?1. Will the CANDIDATE-ELIMINATION Algorithm Converge to the Correct Hypothesis?
The version space leamed by the CANDIDATE-ELIMINATION algorithm will converge toward
the hypothesis that correctly describes the target concept, provided
(1) there are no errors in the training examples, and
(2) there is some hypothesis in H that correctly describes the target concept.
If & G converge to single hypothesis the concept is exaetly learned
IfS & G do not converge to single hypothesis the concept is partially learned
IfS & G is null means no hypothesis is consistent with training examples.2.How can partially learned concepts be used
The learner is required to classify new instances that it has not yet observed
To
— <2, Warm, ?, Strong, 2, ?>
NAN
Gz] (,-<2, Warm, 2, 2, 22>)a Po
— , Warm, ?, Strong,
NAN
{ , Warm, ?, 2,2 ?>}
>
Instance Sky AirTemp — Humidity Water Forecast EnjoySport
A Sunny = Warm Normal Cool Change ?
B Rainy Cold-~—Normal_ Light» Warm = Same 2
C Sunny Warm = Normal Light’ Warm == Same ?
D Sunny Cold. ~——Normal- Strong) Warm = Same ?3.What training example should learner request next?
In this case learner should choose an instance that would be classified as positive by
some of these hypothesis, but negative by others.
One such instance is (Sunny, Warm, Normal, Light, Warm, Same)Inductive Bias
The fundamental questions for inductive inference
+ What if the target concept is not contained in the hypothesis space?
+ Can we avoid this difficulty by using a hypothesis space that includes every possible
hypothesis?
+ How does the size of this hypothesis space influence the ability of the algorithm to
generalize to unobserved instances?
+ How does the size of the hypothesis space influence the number of training examples
that must be observed?Effect of incomplete hypothesis space
Preceding algorithms work if target function is in H
Will generally not work if target function not in H
Consider following examples which represent target function
“sky = sunny or sky = cloudy”:
{Sunny Warm Normal Strong Cool Change)
(Cloudy Warm Normal Strong Cool Change)
{Rainy Warm Normal Strong Cool Change)
Z<<
If apply Candidate Elimination algorithm as before, end up with empty Version Space
After first two training example
S= (2 Warm Normal Strong Cool Change)
New hypothesis is overly general and it covers the third negative training example! Our H does not
include the appropriate ¢An Unbiased Learner
Incomplete hypothesis space
+ Ifc not in H, then consider generalizing representation of H to contain ¢
+ The size of the instance space X of days described by the six available attributes is 96.
The number of distinct subsets that can be defined over a set X containing |X| elements
(ie., the size of the power set of X) is 2!
+ Recall that there are 96 instances in EnjoySport; hence there are 2” possible hypotheses
in full space H
+ Can do this by using full propositional calculus with AND, OR, NOT
+ Hence H defined only by conjunctions of attributes is biased (containing only 973 h’ s)+ Let us reformulate the Enjoysport learning task in an unbiased way by defining a
new hypothesis space H’ that can represent every subset of instances; that is, let H’
correspond to the power set of X.
+ One way to define such an H' is to allow arbitrary disjunctions, conjunctions, and
negations of our earlier hypotheses.
For instance, the target concept "Sky = Sunny or Sky = Cloudy" could then be
described as
(Sunny, 2, 2, 2, 2, 2) V (Cloudy, 2, 2, 2, 2,2)Definition:
Consider a concept learning algorithm L for the set of instances X.
+ Let ¢ be an arbitrary concept defined over X
+ Let D,= {(x, c())} be an arbitrary set of training examples of c.
+ Let L(x, D,) denote the cl
data De
sification assigned to the instance x, by L after training on the
+ The inductive bias of L is any minimal set of assertions B such that for any target concept
c and corresponding training examples D,
(V(x, EX) [BAD.Ax) FL, DI]Induetive system,
Training examples
New instance
Classification of
Candidate new instance, or
Elimination ‘don't know"
Algorithm ——
Hypothesis
‘quivalent deductive system
Training examples
New instance
Assertion "Hcontains
the target concept”
Classification of
new instance, oF
‘don't know’
Theorem Prover
7
Inductive bias
made explicit
Modelling inductive systems by
equivalent deductive systems.
The input-output behavior of the
CANDIDATE-ELIMINATION
algorithm using a hypothesis spa
is identical to that of a deductive
theorem prover utilizing the assertion
""H contains the target concept." This
assertion is therefore called the
inductive bias of the CANDIDATE-
ELIMINATION algorithm,
characterizing inductive systems
by theirinductive bias allows
‘modelling them by their equivalent
deductive systems. This provides a
way to compare inductive
systems according to theirpolicies
for generalizing beyond the
observed training data.
Hu