Machine Learning-Classification

Classification is a data analysis technique that builds models, known as classifiers, to predict categorical labels based on historical data. The process involves two main steps: constructing a classification model using training data and then evaluating its accuracy with test data before applying it to new data. Various classification methods, such as decision trees and Bayesian classifiers, are discussed, along with their applications in fields like fraud detection and medical diagnosis.

Uploaded by

22b81a05y0.2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

21 views52 pages

Machine Learning-Classification

Uploaded by

22b81a05y0.2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 52

Classification: Basic Concepts Classification is a form of data analysis that extracts models describing important data classes. Sach models, called classifiers, predict categorical (discrete, unordered) clas labels. For ‘example, we can build a classification model to categorize bank loan applications as either safe or risky, Such analysis can help provide us with a better understanding ofthe data at large. Many classification methods have been proposed by researchers in machine learning, pattern recognition, and statistics. Most algorithms are memory resident, typically assuming a small data size. Recent data mining research has built on such work, develop- ing scalable classification and prediction techniques capable of handling large amounts of disk-resident data, Classification has numerous applications, including fraud detection, target marketing, performance prediction, manufacturing, and medical diagnosis. We start off by introducing the main ideas of classification in Section 8.1. the rest of this chapter, you will learn the basic techniques for data classification such as how to build decision tree classifiers (Section 8.2), Bayesian classifiers (Section 8.3), and rule-based classifiers (Section 8.4). Section 8.5 discusses how (0 evaluate and compare different classifiers. Various measures of accuracy are given as well as techniques for obtaining reliable sented in Section 8.6, including cases for when the data set is class imbalanced (ie. where the main class of interes is rare). .ceuracy estimates, Methods for increasing classifier accuracy are pre- Basic Concepts We introduce the concept of classification in Section 8,1. Section 8.1.2 describes the ‘general approach to classification as a two-step process, In the first step, we build a las- sification model based on previous data, In the second step, we determine ifthe model's accuracy is acceptable, and if so, we use the model to classify new data 8.1.1 What Is Classification? A bank loans officer needs analysis of her data to learn which loan applicants are “safe” and which are “risky” for the bank. A marketing manager at AllElectranics needs data aang mer 327328 Chapter 8 Classification: Basic Concepts 8.1.2 analysis to help guess whether a customer with a given profile will buy a new computer. A medical researcher wants to analyze breast cancer data to predict which one of three specific treatments a patient should receive. In each of these examples, the data analysis task is classification, where a model or elassifier is constructed to predict clas (categorical) labels, such as “safe” or “risky” for the loan application data; “yes” or “no” for the marketing data; or “treatment A? “treatment By” or “treatment C” for the medical data. ‘These categories can be represented by discrete values, where the ordering among values has no meaning. For example, the values 1,2, and 3 may be used to represent treatments A,B, and C, where there is no ordering implied among this group of treatment regimes, Suppose that the marketing manager wants to predict how much a given customer will spend during sale at Allélectronics. This data analysis ask isan example of numeric prediction, where the model constructed predicts a continuous-valued function, oF ordered value, as opposed to a class label. This model is a predictor. Regression analysis is a statistical methodology that is most often used for numeric prediction; hence the two terms tend to be used synonymously, although other methods for numeric predic tion exist. Classification and aumeric prediction are the two major ty problems. This chapter focuses on classification. General Approach to Classification “How does clasifcation work?” Data classification isa two-step process, consisting of a learning step (were aclasiiation model is constructed) and & classification step (where the model is used to predict class labels for given data). The proces is shown for the Joan application data of Figure 8.1. (The data aze simplified for lustrative purposes In realty, we may expect many more atributes to be considered, In the first step, a classifier i built describing a predetermined set of data classes or concepts. This is the learning step (or training phase), where a classification algorithm builds the classifier by analyzing or “learning from” a training set made up of database tuples and their associated clas labels. A tuple, X, i represented by an eimensional attribute vector, X= (x1, x2... %-)» depicting n measurements made on the tuple from n database attributes, respectively, Ay, A2y---, Ay! Each tuple, X, is assumed to ‘belong toa predefined class as determined by another database attribute called the elass label attribute, The cass label atribute is discrete-valued and unordered. It is categor- ‘eal (or nominal) in that each value serves asa category or class. The individual tuples raking up the training set are refered to as training tuples and are randomly sam- pled from the database under analysis. Inthe context of classification, data tuples can be referred toas samples examples, instances, datapoints, or objects? ach attribute represents “feature” of X. Hence, the pattern recognition iterature uses the term fr ture vector rather than atribute vector. In our discussion, we use the term attribute vecor and in xr notation, any variable representing a vector is shown in bold italic font measurements depicting the vector are shown in italic font (eo X-= (4,5) un the machine Jerning literature, training tuples are commonly refesed to as training samples “Throughout this tet, we prefer to use the term tuples instead of samples.8.1 Basic Concepts 329 __-» [Casiticationalgorit fname ae Income loan decision Sandy Jones youh ow ekg Billce youth low —_sisky [Caroline Fox iiddle_aged high safe Rick Field middle-aged low risky Susan Lake senior low safe (Clare Phips senior medium safe Joe Smith middle aged high safe IB age = youth THEN lnan decision = risky IR income high THEN lnan-decxion ~ ee IF age ~ middle aged AND income ~ lowe @ (conten is Testa frame age —_tacome Toa decison ‘tn eoy, ites oo) Dan Belo venir tow ae bean dso Sylvia Crest middle aged low sky ‘Anne Vee middle-aged high safe riaky © The data classification process: (a) Learning: Training data are analyzed by a classification algorithm, Here, the class label attributes loan-decsion, and the learned model or classifier is represented in the form of classification rules, (b) Clasfcation: Test data are used to estimate the accuracy ofthe clasification rules, Ifthe accuracy is considered acceptable, the rules can Figure 8.1 be applied to the classification of new data tuples,330 Chapter 8 Classification: Basic Concepts Because the class label of each training tuple is provided, this step is also known as supervised learning (ic. the learning of the classifier is “supervised” in that itis told to which class each training tuple belongs). It contrasts with unsupervised learning (or

18mca52c U3
No ratings yet
18mca52c U3
8 pages
Classification
No ratings yet
Classification
23 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
20 pages
FDS Unit-4
No ratings yet
FDS Unit-4
15 pages
Classification Algorithms
No ratings yet
Classification Algorithms
23 pages
Unit 3
No ratings yet
Unit 3
28 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
DMDW 11 Classification Basic
No ratings yet
DMDW 11 Classification Basic
41 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
No ratings yet
Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
7 pages
Classification
No ratings yet
Classification
15 pages
ML Unit-2
No ratings yet
ML Unit-2
51 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
98 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Session 5
No ratings yet
Session 5
91 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Data Classification & Prediction Guide
No ratings yet
Data Classification & Prediction Guide
38 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
05 Classification
No ratings yet
05 Classification
79 pages
ITP4-Lesson 4-Week 7-8
No ratings yet
ITP4-Lesson 4-Week 7-8
18 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
71 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages

Machine Learning-Classification

Uploaded by

Machine Learning-Classification

Uploaded by

You might also like