Data Mining-Unit-3

data mining notes

Uploaded by

lokeshappalaneni9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

72 views16 pages

Data Mining-Unit-3

data mining notes

Uploaded by

lokeshappalaneni9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 16

UNIT-1 LECTURE 1 n? ‘What is classification and predi Data classification is a two step process. In the fi predetermined t step, a model is built describing a Set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. Each tuple is assumed to belong to a predefined class, as determined by one of the attributes, called the class label attribute. In the context of classification, data tuples are also referred to as samples, examples, or objects. The Data tuples analyzed to build the model collectively form the training data set. The individual tuples making up the training set are referred to as training samples and are randomly selected from the sample population. Since the class label of each training sample is provided, this step is also known as supervised learning (ic., the Learning of the model is ‘supervised! in that itis told to which class each training sample belongs). It contrasts with unsupervised learning (or clustering), in which the class labels of the training samples are not known, and the Number or set of classes to be learned may not be known in advance. Typically, the learned model is represented in the form of classification rules, decision trees, or mathematical formulae. For example, given a database of customer credit information, classification rules can be learned to identify customers as having either excellent or fair credit ratings Figure below. The rules can be used to categorize future data samples, as well as provide a better understanding of the database contents In the second step is shown in fig, the model is used for classification. First, the predictive accuracy of the model is estimated. The holdout method is a simple technique which uses a test set of class-labeled samples. These samples areClassification Process 4 Construction (1): Model Classification Algorithms 4 Mike [Assistant Prof | 3 no Mary |Assistant Prof 7 yes Bill Professor 2 yes Jim [Associate Prof} 7 yes Dave |Assistant Prof | 6 no [Anne [Associate Prof] 3 no IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’Classification Process (2): Use the Model in Prediction Es a Tom |AssistantProt | 2 no Merlisa [Associate Prof 7 no George|Professar é yes Joseph [AssistantProft[ 7 yes Figure 7.1: The data classification process: a)Leaming: Training data are analyzed by a classification algorithm. Here, the class label attribute is credit rating, and the learned model or classifier is represented in the form of classification rules, b) Classification: Test data are used to ceptable, the rules can be applied to the classification of new data tuples randomly selected and are estimate the accuracy of the classification rules. If the accuracy is considered a independent of the training samples. The accuracy of a model on a given test set is the percentage of test set samples that are correctly classified by the model. For each test sample, the known class label is compared with the leamed model's class prediction for that sample. Note that if the accuracy of the model were estimated based on the training data set, this estimate could be optimistic since the learned model tends to over fit the data (that is, it may have incorporated some particular anomalies of the training data which are not present in the overall sample population). Therefore, a test set is used, If the accuracy of the model is considered acceptable, the model can be used to classify future data tuples or objects for which the class label is not known. (Such data are also referred to in the machine learning literature as “unknown" or “previously unseen" data). For example, theclassification rules learned in Figure 7.1a from the analysis of data from existing customers can be used to predict the credit rating of new or future (i.e., previously unseen) customers. “How is prediction different from classification?” Prediction can be viewed as the construction and use of a model to assess the class of an unlabeled object, or to assess the value or value ranges of an attribute that a given object is likely to have. In this view, classification and regression are the two major types of prediction problems where classification is used to predict discrete or nominal values, while regression is used to predict continuous ot ordered values. In our view, however, we refer to the use of predication to predict class labels as classification and The use of predication to predict continuous values (e.g., using regression techniques) as prediction. This view is commonly accepted in data mining Lecture? Issues regarding classification and prediction Preparing the data for classification and prediction: The following preprocessing steps may be applied to the data in order to help improve the accuracy, efficiency, and scalability of the classification or prediction process. 1, Data cleaning, This refers to the preprocessing of data in order to remove or reduce noise (by applying smoothing techniques, for example), and the treatment of missing values (e.g., by replacing a missing value with the most commonly occurring value for that attribute, or with the most probable value based on statistics). Although most classification algorithms have some mechanisms for handling noisy or missing data, this step can help reduce confusion during learning, 2. Relevance analysis. Many of the attributes in the data may be irrelevant to the classification or prediction task. For example, data recording the day of the week on which a bank loan application was filed is unlikely to be relevant to the success of the application. Furthermore, other attributes may be redundant. Hence, relevance analysis may be performed on the data with the aim of removing any irrelevant or redundant attributes from the learning process. In machine learning, this step is known as feature selection. Including such attributes may otherwise slow down, and possibly mislead, the learning step Ideally, the time spent on rel e analysis, when added to the time spent on learning from the resulting “reduced” feature subset, should be less than the time that would have been spent on learning from the original set of features. Hence, such analysis can help improve classification efficiency and scalability. ani3. Data transformation. The data can be generalized to higher-level concepts. Concept hierarchies may be used for this purpose, This is particularly useful for continuous-valued attributes. For example, numeric values for the attribute income may be generalized to discrete ranges such as low, medium, and high. Similarly, nominal-valued attributes, like street, can be generalized to higher-level concepts, like city. Since generalization compresses the original training data, fewer input/output operations may be involved during learning. The data may also be normalized, particularly when neural networks or methods involving distance measurements are used in the learning step. Normalization involves scaling all values for a given attribute so that they fall within a small specified range, such as -1.0 to 1.0, or 0 to 1.0. In methods which use distance measurements, for example, this would prevent attributes with initially large ranges (like, say income) from outweighing attributes with initially smaller ranges (such as binary attributes). Comparing classification methods. Classification and prediction methods can be compared and evaluated according to the following criteria: 1, Predictive accuracy. This refers to the ability of the model to correctly predict the class label of new or Previously unseen data. 2. Speed. This refers to the computation costs involved in generating and using the model. 3. Robustness. This is the ability of the model to make correct predictions given noisy data or data with missing Values, 4, Scalabil amounts of data. ity. This refers to the ability of the learned model to perform efficiently on large 5. Interpretability. This refers is the level of understanding and insight that is provided by the learned model.Lecture 3 Classification by Decision Tree Induction: A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. The topmost node in a tree is the root node. A typical decision tree is shown in Figure 7.2. It represents the concept buys computer, that is, it predicts whether or not a customer at All Electronics is likely to purchase a computer. Internal nodes are denoted by rectangles, and leaf nodes are denoted by ovals. Algorithm for Decision Tree Induction: Basic algorithm (a greedy algorithm) 1. Tree is constructed in a top-down recursive divide-and-conquer manner 2. At start, all the training examples are at the root 3. Attributes are categorical (if continuous-valued, they are discretized in advance) 4, Examples are partitioned recursively based on selected attributes 5, Test attributes are selected on the basis of a heuristic or stati information gain) ical measure (e.g., Conditions for stopping partitioning 1. All samples for a given node belong to the same class, 2. There are no remaining attributes for further partitioning — majority voting is employed for classifying the leaf 3. There are no samples left Attribute Selection Measure: 1. Information gain (ID3/C4.5) All attributes are assumed to be categorical Can be modified for continnous-valued attributes Select the attribute with the highest information gain Assume there are two classes, P and NLet the set of examples $ contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as 1(p.n) =-—2e, pen 2. Gini index (IBM Intelligent Miner) All attributes are assumed continuous-valued Assume there exist several possible split values for each attribute May need other tools, such as clustering, to get the possible split values Can be modified for categorical attributes Ifa data set T contains examples from n classes, gini index, gini (I) is defined as Where p, is the relative frequency of class j in T Ifa data set T is split into two subsets T; and T2 with sizes N; and Nz respectively, the gini index of the split data contains examples from n classes, the gini index gini (T) is defined as SINI spi (7) Nisin (r)+N2 y er) The attribute provides the smallest ginigpii(T) is chosen to split the node (need to enumerate all possible splitting points for each attribute).Lecture 4 Bayesian Classification “What are Bayesian classifiers"? Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Bayesian classification is based on Bayes theorem, described below. Studies comparing classification algorithms have found a simple Bayesian classifier known as the naive Bayesian classifier to be comparable in performance with decision tree and neural network classifiers. Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases. Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved, and in this sense, is considered “naive. Bayesian belief networks are graphical models, which unlike naive Bayesian classifiers allow the representation of dependencies among subsets of attributes. Bayesian belief networks can also be used for classification, Bayesian Theorem Given training data D, posteriori probability of a hypothesis h, P{h/D) follows the Bayes theorem, PID|h)P(h) PalD)=“Ses MAP (maximum posteriori) hypothesis h rgmax P(h|D)=argmax P(D|h)P(h). he, heH MAP Practical difficulty: require initial knowledge of many probabilities, significant computational cost Naive Bayes Classifier simplified assumption: attributes are conditionally independent: PCI) © PCN LPC) Greatly reduces the computation cost, only count the class distribution. Given a training set, we can compute the probabilitiesOutlook | P| N| [Humidity P | N sunny 2/9 | 3/5 [high 3/9 | 4/5 overcast 4/19 | 0 normal | 6/9 | 1/5 rain 3/9 | 2/5 hot 2/9 | 2/5 true 3/9 | 3/5 mild 4/9 | 2/5 false 6/9 | 2/5 cool 3/9 _| 1/5 fssing a-posteriori probabitit es P(C|X) = prob. that the sample tuple X= is of elass C E.g. P(class=N | outlook=sunny ,windy-true,...) Idea: assign to sample X the class label C such that P(C|X) is maximal Estimating a-posteriori probabilities Bayes theorem P(CIX) = P(XIC)-P(C) / POX) P(X) is constant for all classes P(C) = relative freq of class C samples c such that PCIX) is maximum, = C such that P(X|C)-P(C) is maximum Problem: computing P(X{C) is unfeasible Naive Bayesian Classification: Naive assumption: attribute independence POX XKIC) = POG IC)... PCRIC) If i-th attribute is categorical: P(xi{C) is estimated as the relative freq of samples having value x: as i-h attribute in class CIf i-th attribute is continuous: P(x{C) is estimated thru a Gaussian density function Computationally easy in both cases Lecture 5 Classification by Back propagation A multilayer feed-forward neural network The backpropagation algorithm performs learning on a multilayer feed-forward neural network ‘An example of such a network is shown in fig below. The inputs correspond to the attributes measured for each training sample. The inputs are fed simultaneously into a layer of units making up the input layer. The weighted outputs of these units are, in turn, fed simultaneously to a second layer of “neuron-like” units, known as a hidden layer. The hidden layer's weighted outputs can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, although in practice, usually only one is used. The weighted outputs of the last hidden layer are input to units making up the output, layer, which emits the network's prediction for given samples. The units in the hidden layers and output layer are sometimes referred to as neurodes, due to their symbolic biological basis, or as output units. The multilayer neural network shown in Figure 7.9 has two layers of output units, Therefore, we say that it is a two-layer neural network. Similarly, a network containing two hidden layers is called a three-layer neural network, and so on. The network is feed-forward in that none of the weights cycle back to an input unit or to an output unit of a previous layer. It is fully connected in that each unit provides input to each unit in the next forward layer. Multilayer feed-forward networks of linear threshold functions, given enough hidden units, can closely approximate any function. Neural Networks: 1. Advantages prediction accuracy is generally high robust, works when training examples contain errors, output may be discrete, real-valued, or a vector of several discrete or real-valued attributes fast evaluation of the leamed target function2.Criticism Jong training time difficult to understand the leaned function (weights) not easy to incorporate domain knowledge Neuron output y Input weight weighted Activation vectorx vector w sum function Lecture 6 Classification Based on Concepts from Association Rule Mining Several methods for association-based classification ARCS: Quantitative association mining and clustering of association rules (Lent et al’97) It beats C4.5 in (mainly) scalability and also accuracyAssociative classification: (Liu et al’98) It mines high support and high confidence rules in the form of “cond_set => y”, where y is a class label CAEP (Classification by aggregating emerging patterns) (Dong et al’99) Emerging patterns (EPs): the item sets whose support increases significantly from one class to another Mine Eps based on minimum support and growth rate Lecture 7 Other Classification Methods 1. k-nearest neighbor classifier Al instances correspond to points in the n-D space. The nearest neighbor is defined in terms of Euclidean distance. ‘The target function could be diserete- or real- valued. For discrete-valued, the k-NN returns the most common value among the k training Examples nearest to xq. Vonoroi diagram: the decision surface induced by I-NN for a typical set of training examples, 2. case-based reasoning Also uses: lazy evaluation + analyze similar instances Difference: Instances are not “points in a Euclidean space” Example: Water faucet problem in CADET (Sycara et al’92) Methodology Instances represented by rich symbolic descriptions (e.g., function graphs) Multiple retrieved cases may be combinedTight coupling between case retrieval, knowledge-based reasoning, and problem solving Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases 3. Genetic algorithm GA: based on an analogy to biological evolution Each rule is represented by a string of bits ‘An initial population is created consisting of randomly generated rules e.g, IF A: and Not Aa then C2 can be encoded as 100 Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offsprings The fitness of a rule is represented by its classification accuracy on a set of training examples Offsprings are generated by crossover and mutation. 4, Rough set approach: Rough sets are used to approximately or “roughly” define equivalent classes A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to ° Finding the minimal subsets (reducts) of attributes (for feature reduction) is NP-hard but a discernibility matrix is used to reduce the computation intensity. 5. Fuzzy set approaches: Fuzzy logic uses truth values between 0.0 and 1.00 to represent the degree of membership (such as using fuzzy membership graph) Attribute values are converted to fuzzy valuese.g, income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated For a given new sample, more than one fuzzy value may apply Each applicable rule contributes a vote for membership in the categories Typically, the truth values for each predicted category are summed Lecture 8 Prediction and Classifier Accuracy: The prediction of continuous values can be modeled by statistical techniques of regression. For example, we may like to develop a model to predict the salary of college graduates with 10 years of work experience, or the potential sales of a new product given its price. Many problems can be solved by linear regression, and even more can be tackled by applying transformations to the variables so that a nonlinear problem can be converted to a linear one. Regress Analysis and Log-Linear Models in Prediction = Linear regression: Y = a + 6X = Two parameters , « and B specify the line and are to be estimated by using the data at hand. = using the least squares criterion to the known values of Yi, Y2, ..., Xt, X2, = Multiple regression: Y = b0 + bt X1 + b2 x2. = Many nonlinear functions can be transformed into the above. = Log-linear models: = The multi-way table of joint probabilities is approximated by a product of lower-order tables. = Probability: ofa, b, ¢ d) = aabBacvad ObedPrediction: Numerical Data Prediction: Categorical DataEien ger ter see toe cial Oeste) Toke of Ll =) en =e el Sef ST] is] emanate rot “ Ill 366.00-420.00 J 205.00-1000.00 lj 1260.00-6008.00 ™ ous i ie |i 42000-20500 113000-128000 ee me ‘Camping Chain 60 Outet Independent 4 be "| : Pee 4 Mass Marketer Sports Chain 3 (sami a0 Go sean hee siete 23h can, sstimating Error Rates: Partition: Training-and-testing ‘use two independent data sets, e.g, training set (2/3), test set(1/3) used for data set with large number of samples alidation divide the data set into k subsamples use k-1 subsamples as training data and one sub-sample as test data —- k-fold cross-validation for data set with moderate size Bootstrapping (leave-one-out) for small size data,

Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
98 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
CH 5
No ratings yet
CH 5
84 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
Classification
No ratings yet
Classification
73 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Classification
No ratings yet
Classification
81 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Unit 3
No ratings yet
Unit 3
16 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
Session 5
No ratings yet
Session 5
91 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Data Classification & Prediction Guide
No ratings yet
Data Classification & Prediction Guide
38 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Unit 3
No ratings yet
Unit 3
28 pages
Classification
No ratings yet
Classification
33 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Classification
100% (1)
Classification
37 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
Unit 4
No ratings yet
Unit 4
186 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Topic7 Classification
No ratings yet
Topic7 Classification
104 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Module 04
No ratings yet
Module 04
75 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
Unit 1 MWS
No ratings yet
Unit 1 MWS
22 pages
Unit-5 Fom
No ratings yet
Unit-5 Fom
29 pages
Unit-4 Fom
No ratings yet
Unit-4 Fom
44 pages
Unit-1 Fom
No ratings yet
Unit-1 Fom
39 pages

Data Mining-Unit-3

Uploaded by

Data Mining-Unit-3

Uploaded by

You might also like