KEMBAR78
lazy learners and other classication methods | PPTX
M.Rajshree
M.SC(IT)
Nadar saraswathi college of
arts&science
Lazy learners
 lazy learning is a learning method in which
generalization of the training data is, in
theory, delayed until a query is made to the
system, as opposed to in eager learning,
where the system tries to generalize the
training data before receiving queries.
 Lazy learners do less work while training
data is given and more work when
classification of a test tuple is given.
 The classification methods discussed so far
in this chapter—decision tree induction,
Bayesian classification, rule-based
classification, classification by
backpropagation, support vector machines,
and classification based on association rule
mining—are all examples of eager learners
 A lazy learner simply stores the training data
and only when it sees a test tuple starts
generalization to classify the tuple based on
its similarity to the stored training tuple
 Building a model from a given set of training
data
 Applying the model to a given set of testing
data
 Eager Learners like Bayesian Classification,
Rule-based classification, support vector
machines, etc. will construct a classification
model before receiving new tuple when a set
of training tuple is given
k-Nearest-Neighbor
Classifiers
 The k-nearest-neighbor method was first
described in the early 1950s.
 Nearest-neighbor classifiers are based on
learning by analogy, that is, by comparing a
given test tuple with training tuples that are
similar to it.
 The training tuples are described
by n attributes. Each tuple represents a point
in an n-dimensional space.
 In this way, all of the training tuples are
stored in an n-dimensional pattern space.
When given an unknown tuple, a k-nearest-
neighbor classifier searches the pattern
space for the k training tuples that are
closest to the unknown tuple
 distance between two points or tuples,
say, X1 = (x11, x12…. x1n) and X2 =
(x21, x22…x2n)
 When given a test tuple, a k-nearest
neighbor classifier searches the pattern
space for the k training tuples that are
closest to the test tuple.
 These k training tuples are the k “nearest
neighbors” of the test tuple
Case-Based Reasoning
 Base-based reasoning is the process of
solving new problems based on the solutions
of similar past problems.
 These classifiers use a database of problem
solutions to solve new problems.
 The case-based reasoner tries to combine
the solutions of the neighboring training
cases in order to propose a solution for the
new case
 Case-based reasoning (CBR) classifiers use
a database of problem solutions to solve new
problems.
 Unlike nearest-neighbor classifiers, which
store training tuples as points in Euclidean
space, CBR stores the tuples or cases‖ for
problem solving as complex symbolic
descriptions.
 Business applications of CBR include
problem resolution for customer service help
desks, where cases describe product-related
diagnostic problems.
 CBR has also been applied to areas such as
engineering and law, where cases are either
technical designs or legal rulings, respectively.
 Medical education is another area for CBR,
where patient case histories and treatments are
used to help diagnose and treat new patients.
 The case-based reasoner tries to combine the
solutions of the neighboring training cases in
order to propose a solution for the new case.
 The case-based reasoner may employ
background knowledge and problem-solving
strategies in order to propose a feasible
combined solution.
Other classification methods
 Data mining involves six common classes of
tasks. Anomaly detection, Association rule
learning, Clustering, Classification,
Regression,
Summarization. Classification is a
major technique in data mining and widely
used in various fields.
 Classification is a technique where we
categorize data into a given number of
classes
 Binary Classification: Classification task
with two possible outcomes Eg: Gender
classification (Male / Female)
 Multi class classification: Classification
with more than two classes. In multi class
classification each sample is assigned to one
and only one target label Eg: An animal can
be cat or dog but not both at the same time
 Multi label classification: Classification
task where each sample is mapped to a set
of target labels (more than one class). Eg: A
news article can be about sports, a person,
and location at the same time.
Naïve Bayes
 Naive Bayes algorithm based on Bayes’
theorem with the assumption of
independence between every pair of
features. Naive Bayes classifiers work well in
many real-world situations such as document
classification and spam filtering.
 This algorithm requires a small amount of
training data to estimate the necessary
parameters. Naive Bayes classifiers are
extremely fast compared to more
sophisticated methods.
Fuzzy Set Approaches
 Fuzzy Set Theory is also called Possibility
Theory. This theory was proposed by Lotfi
Zadeh in 1965 as an alternative the two-value
logic and probability theory
 This theory allows us to work at a high level of
abstraction. It also provides us the means for
dealing with imprecise measurement of data.
 fuzzy set approach an important consideration
is the treatment of data from a linguistic view
point from this has developed an approach that
uses linguistically quantified propositions to
summarize the content of a data base by
providing a general characterization of the
analyzed data

lazy learners and other classication methods

  • 1.
  • 2.
    Lazy learners  lazylearning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.  Lazy learners do less work while training data is given and more work when classification of a test tuple is given.
  • 3.
     The classificationmethods discussed so far in this chapter—decision tree induction, Bayesian classification, rule-based classification, classification by backpropagation, support vector machines, and classification based on association rule mining—are all examples of eager learners  A lazy learner simply stores the training data and only when it sees a test tuple starts generalization to classify the tuple based on its similarity to the stored training tuple
  • 4.
     Building amodel from a given set of training data  Applying the model to a given set of testing data  Eager Learners like Bayesian Classification, Rule-based classification, support vector machines, etc. will construct a classification model before receiving new tuple when a set of training tuple is given
  • 5.
    k-Nearest-Neighbor Classifiers  The k-nearest-neighbormethod was first described in the early 1950s.  Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it.  The training tuples are described by n attributes. Each tuple represents a point in an n-dimensional space.
  • 6.
     In thisway, all of the training tuples are stored in an n-dimensional pattern space. When given an unknown tuple, a k-nearest- neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple  distance between two points or tuples, say, X1 = (x11, x12…. x1n) and X2 = (x21, x22…x2n)  When given a test tuple, a k-nearest neighbor classifier searches the pattern space for the k training tuples that are closest to the test tuple.  These k training tuples are the k “nearest neighbors” of the test tuple
  • 8.
    Case-Based Reasoning  Base-basedreasoning is the process of solving new problems based on the solutions of similar past problems.  These classifiers use a database of problem solutions to solve new problems.  The case-based reasoner tries to combine the solutions of the neighboring training cases in order to propose a solution for the new case
  • 9.
     Case-based reasoning(CBR) classifiers use a database of problem solutions to solve new problems.  Unlike nearest-neighbor classifiers, which store training tuples as points in Euclidean space, CBR stores the tuples or cases‖ for problem solving as complex symbolic descriptions.  Business applications of CBR include problem resolution for customer service help desks, where cases describe product-related diagnostic problems.
  • 10.
     CBR hasalso been applied to areas such as engineering and law, where cases are either technical designs or legal rulings, respectively.  Medical education is another area for CBR, where patient case histories and treatments are used to help diagnose and treat new patients.  The case-based reasoner tries to combine the solutions of the neighboring training cases in order to propose a solution for the new case.  The case-based reasoner may employ background knowledge and problem-solving strategies in order to propose a feasible combined solution.
  • 11.
    Other classification methods Data mining involves six common classes of tasks. Anomaly detection, Association rule learning, Clustering, Classification, Regression, Summarization. Classification is a major technique in data mining and widely used in various fields.  Classification is a technique where we categorize data into a given number of classes
  • 12.
     Binary Classification:Classification task with two possible outcomes Eg: Gender classification (Male / Female)  Multi class classification: Classification with more than two classes. In multi class classification each sample is assigned to one and only one target label Eg: An animal can be cat or dog but not both at the same time  Multi label classification: Classification task where each sample is mapped to a set of target labels (more than one class). Eg: A news article can be about sports, a person, and location at the same time.
  • 13.
    Naïve Bayes  NaiveBayes algorithm based on Bayes’ theorem with the assumption of independence between every pair of features. Naive Bayes classifiers work well in many real-world situations such as document classification and spam filtering.  This algorithm requires a small amount of training data to estimate the necessary parameters. Naive Bayes classifiers are extremely fast compared to more sophisticated methods.
  • 14.
    Fuzzy Set Approaches Fuzzy Set Theory is also called Possibility Theory. This theory was proposed by Lotfi Zadeh in 1965 as an alternative the two-value logic and probability theory  This theory allows us to work at a high level of abstraction. It also provides us the means for dealing with imprecise measurement of data.  fuzzy set approach an important consideration is the treatment of data from a linguistic view point from this has developed an approach that uses linguistically quantified propositions to summarize the content of a data base by providing a general characterization of the analyzed data