0% found this document useful (0 votes)

9 views20 pages

Data Mining Unit II Am

The document outlines the fundamentals of data mining, including task primitives, data mining query language (DMQL), and system architecture components. It discusses the advantages and disadvantages of data mining, as well as techniques for data generalization, summarization, and analytical characterization. Additionally, it highlights the importance of comparing data classes to understand their characteristics and differences.

Uploaded by

ashrathkhan54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views20 pages

Data Mining Unit II Am

Uploaded by

ashrathkhan54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Dr. A.

MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

UNIT – II

Data Mining Primitives – A data mining task can be specified in the form of a data mining query,
which is input to the data mining system. A data mining query is defined in terms of data mining
task primitives.

1. Task-relevant data:
o Specifies the portion of the database to be mined (e.g., attributes, tuples).
2. Kind of knowledge to be mined:
o Specifies the type of patterns to be discovered (e.g., classification, association,
clustering).

 Associations: Discovering relationships between items (e.g., "customers who buy bread
also tend to buy milk").
 Classifications: Categorizing data into predefined classes (e.g., classifying emails as
spam or not spam).
 Clusters: Grouping similar data points together (e.g., segmenting customers based on
purchasing behavior).
 Sequential patterns: Discovering patterns of events that occur in a specific order (e.g.,
identifying steps in a customer's online buying process).
 Predictions: Forecasting future values based on historical data (e.g., predicting sales for
the next quarter).
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

3. Background knowledge to be used in discovery process

o Includes domain knowledge (e.g., concept hierarchies, ontologies).
o It is the information about the domain to be mined
o Concept hierarchy: is a powerful form of background knowledge.
o Four Major Types of Concept Hierarchies – with Examples:

 Schema hierarchies:
o Defined in the database schema; e.g., City → State → Country.
 Set-grouping hierarchies:
o Formed by grouping values; e.g., {apple, banana} → fruit.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

 Operation-derived hierarchies:
o Created by applying operations; e.g., age → age group (20-29, 30-39).
 Rule-based hierarchies:
o Defined using domain rules; e.g., if salary > 100K → high income.

4. Interestingness measures and pattern evaluation:

o Criteria for selecting interesting patterns (e.g., support, confidence, lift).
5. Presentation of discovered patterns:
o Specifies how results should be displayed (e.g., rules, charts, tables).
6. Data mining query language (DMQL):
o Used to define mining tasks using the primitives.
o Data mining language must be designed to facilitate flexible and effective knowledge
discovery.
o DMQL allows mining of different kinds of knowledge from relational databases and
data warehouses at multiple levels of abstraction.

*****************
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

DATA MINING LANGUAGE AND SYSTEM ARCHITECTURE

Data Mining Architecture: Data mining architecture refers to the structural design and system
components that facilitate the process of extracting valuable insights and patterns from large
datasets.

1. Data Mining Query Language (DMQL):

Specialized language used to define data mining tasks.
Enables users to specify:
 Data to be mined
 Type of knowledge to discover
 Constraints and thresholds
Example:
sql
Mine association rules from transaction_data
where support ≥ 30% and confidence ≥ 70%

2. System Architecture Components:

1. Data Sources

 Databases, data warehouses, and external data repositories (flat files, web data, etc.)
 Accepts mining queries and displays results.
 Provide raw data for mining.

2. Data Warehouse

 Centralized repository to store integrated data.

 Supports OLAP operations for multidimensional analysis.

3. Data Cleaning and Integration

 Cleaning: Removes noise and inconsistencies.

 Integration: Combines data from multiple sources.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

4. Data Selection and Transformation

 Selection: Extracts relevant data for mining.

 Transformation: Converts data into suitable format (e.g., normalization).

5. Data Mining Engine

 Core component that applies algorithms to extract patterns.

 Core module that performs actual mining (e.g., classification, clustering).
 Supports tasks like classification, clustering, association.

6. Pattern Evaluation Module

 Identifies interesting patterns using measures like support and confidence.

 Filters and ranks discovered patterns based on interestingness.

7. Knowledge Base

 Stores background knowledge (e.g., concept hierarchies, user constraints).

8. User Interface

 Allows users to interact with the system.

 Supports query submission and result visualization.

9. Data Preprocessing Module

 Handles cleaning, integration, selection, and transformation of data.

Diagram: This architecture ensures efficient, flexible, and scalable data mining operations.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Types of Data Mining Architecture:

1. No Coupling:
o Uses external data only; not efficient or accurate.
2. Loose Coupling:
o Retrieves data from databases; suitable for memory-based mining.
3. Semi-Tight Coupling:
o Uses database features like sorting and indexing for better performance.
4. Tight Coupling:
o Fully integrates with data warehouse for high performance and scalability.

Advantages of Data Mining:

1. Predicts future trends accurately.

2. Supports key decision-making.
3. Converts raw data into useful info.
4. Identifies new trends and patterns.
5. Analyzes large datasets efficiently.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

6. Helps attract and retain customers.

7. Improves customer relationships.
8. Optimizes production and reduces costs.

Disadvantages of Data Mining:

1. Requires skilled teams and training.

2. Involves high investment costs.
3. May risk data security and privacy.
4. Wrong data can give false results.
5. Managing large databases is complex.

*************************
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

DATA MINING QUERY LANGUAGE:

Data Mining Query Language (DMQL) is a high-level language designed to define and
control data mining tasks. It enables users to express what patterns to mine and how to mine them,
using a declarative syntax.

Key Features:

1. Specification of data:
o Define the subset of data to be mined (e.g., table, attributes).
2. Type of knowledge to be mined:
o Supports mining tasks like association, classification, clustering, etc.
3. Pattern constraints:
o Set thresholds like minimum support, confidence, interestingness.
4. Background knowledge:
o Incorporate concept hierarchies or domain ontologies.
5. Presentation preferences:
o Specify output format (rules, tables, charts).

Example DMQL Syntax:

sql
use database sales_data;
mine association_rules
from transactions
where support ≥ 30% and confidence ≥ 70%
display as rule_table;

Advantages:

 User-friendly and high-level.

 Allows customization of mining tasks.
 Bridges the gap between users and mining algorithms.

*****************
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

CONCEPT DESCRIPTION IN DATA MINING:

Definition:
Concept description is a form of data generalization that provides a concise and high-level
summary of data, describing the characteristics of a target class or concept.

Types of Concept Description:

1. Characteristic Rule:
o Describes general features of a class (e.g., "Graduate students are mostly aged 22–
30").
2. Discriminant Rule:
o Compares features between different classes (e.g., "Graduate students have more
research hours than undergraduates").

Diagrammatic Representation:

[Raw Data] [Data Preprocessing (Cleaning, Integration, Transformation)]

[Data Generalization/Aggregation] [Concept Characterization/Comparison]
[Summarized/Compared Concepts] [Knowledge/Insights]

Techniques Used:

 Attribute-oriented induction
 Data summarization
 OLAP operations (like roll-up, drill-down)

Applications:

 Data summarization
 Pattern discovery
 Report generation
 Decision support systems
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Example:
Describing "Senior Customers":

 Age: 60+
 Preferred Products: Health supplements, reading materials
 Purchase Frequency: Monthly

Goal:
To make large datasets understandable by summarizing key characteristics of data
classes.

***********

DATA GENERALIZATION AND SUMMARIZATION IN DATA MINING

 Data generalization and summarization are crucial processes in data mining that simplify
large, complex datasets into more manageable and understandable forms.
 They involve transforming detailed data into higher-level, abstract representations to reveal
broader patterns and trends, making it easier to extract meaningful insights and facilitate
decision-making.

1. Data Generalization:

Definition: Transforms detailed low-level data into higher-level abstract forms using concept
hierarchies.

Purpose:

 Reduces the complexity of data attributes

 Uses attribute-oriented induction.
 Involves replacing specific values with generalized concepts.
 Summarize the detailed data into higher level concepts
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Benefits:
 Simplifies data, making it easier to understand and analyze.
 Reduces noise and redundancy, revealing hidden patterns.
 Enables the extraction of actionable insights.

Example:

 "New York" → "USA"

 "25" → "Young Adult"

 Attribute-Oriented Induction: Generalizing data by replacing low-level attribute values with

higher-level concepts, like replacing specific ages with age ranges (e.g., "young", "middle-
aged", "senior").

2. Data Summarization:

Definition:

 Produces compact, concise descriptions of data sets or data classes.

Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

 Data summarization involves creating concise representations of datasets, often using

statistical measures like mean, median, mode, or quartiles.
 Data summarization in data mining involves reducing large datasets into concise,
representative summaries, often using tabular or graphical formats to highlight key trends
and patterns

Purpose:

 Uses statistical measures (mean, count, max, min).

 To provide a high level overview of data
 Highlighting key characteristics and trends
 Supports data cube and OLAP operations.
 Provides summary reports.

Example:

 Average age of employees = 35

 Total sales in Q1 = $1.2M
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Types of Data Summarization in Data Mining are:

 Tabular Summarization: This method instantly conveys patterns such as frequency

distribution, cumulative frequency, etc, and
 Data Visualization: Visualizations from a chosen graph style such as histogram, time-series
line graph, column/bar graphs, etc.

Techniques Used:

 Attribute-Oriented Induction:

 Example: Replace "New York", "Los Angeles" → "USA"

 Concept Hierarchies:

 Example: "30" years → "Adult" → "Age Group"

 OLAP (Roll-up, Drill-down):

 Example: Roll-up from "City" → "Country", Drill-down from "Year" → "Month"

 Statistical Aggregation:

 Example: Average salary = $55,000; Total sales = $1M

Simplified Methods for Data Generalization and Summarization:

1. Descriptive Statistics:
o Uses measures like mean and median to describe data.
2. Data Cubes:
o Summarizes data across multiple dimensions.
3. Clustering:
o Groups similar data points for easy analysis.
4. Sampling:
o Uses a subset of data to represent the whole dataset.

***********************
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

ANALYTICAL CHARACTERIZATION IN DATA MINING

Analytical characterization is the process of summarizing and comparing data characteristics

of target classes using statistical and data mining techniques.

It involves attribute relevance analysis and comparative analysis between different classes or
clusters.

 It uses descriptive statistics (mean, median, variance) to highlight key patterns.

 Enables comparison between classes (e.g., high-income vs. low-income groups).
 Often uses OLAP, data cube, or visualization tools.

Key Aspects:
 Attribute Relevance Analysis: This is a crucial part of analytical characterization. It
involves evaluating how strongly each attribute contributes to describing the target class or
concept.
 Data Generalization: Analytical characterization often involves generalizing the data to a
higher level of abstraction to reveal broader patterns and characteristics.
 Class Comparison: It can also be used to compare the characteristics of different classes or
groups of data objects, helping to identify what distinguishes them.

Advantages:

 Simplifies complex data into understandable summaries.

 Identifies patterns and key differentiators between groups.

Benefits:

 Helps in decision-making and market segmentation.

 Supports business analysis and customer profiling.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Example of Analytical Characterization:

A university wants to understand the difference between high-performing and low-

performing students.

They analyze:

 Average attendance
o High performers: 90%
o Low performers: 60%
 Study hours per week
o High performers: 20 hours
o Low performers: 5 hours
 Participation in activities
o High performers: Active in 2+ clubs
o Low performers: Rarely participate

*******************
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

MINING CLASS COMPARISON

Mining class comparisons involve comparing two or more data classes to discover
similarities and differences between them. It helps in understanding contrasting characteristics of
different groups.

Key Aspects:

 Data Collection and Partitioning: The first step involves gathering relevant data and dividing
into target and contrasting classes.
Example: Comparing Graduate and Undergraduate Students
 If the task is to compare graduate students (target class) with undergraduate students
(contrasting class):
 Data on both groups is collected (e.g., age, GPA, major).
 Only the most relevant dimensions (e.g., GPA and age) are kept for analysis.
 Both groups are summarized to the same level (e.g., average GPA by major).
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

 The results are displayed as a table, chart, or a set of rules (e.g., "Graduate students have
higher GPAs in Computer Science, while undergraduates have higher GPAs in
Humanities").

Dimension Relevance Analysis: This step focuses on identifying the most relevant attributes or
dimensions for comparison.
o For instance, in the student example, attributes like GPA, major, and research experience
might be relevant, while attributes like name or phone number might be less so.

Synchronous Generalization: Both the target and contrasting classes are generalized to the same
level of abstraction.
o For instance, you might generalize GPA to ranges (e.g., "high," "medium," "low") or
generalize research experience to the number of publications.

Presentation of Comparison Results: The final step involves presenting the comparison results in
a clear and informative manner.
o Common methods include tables, charts, and rules.
o These presentations highlight the differences between the classes, often using
contrasting measures like percentage differences or discriminant rules.
Class comparisons can be presented to users in various ways, similar to class characterizations.
These include:
 Generalized relations
 Crosstabs
 Bar charts
 Pie charts
 Curves
 Rules
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Types of Class Comparisons:

1. Discriminant Analysis (Discriminative Comparison):

 Purpose: Identifies attributes that distinguish one class from another.

 Example:
Comparing high-income vs low-income customers:
o High-income: Spend more on luxury goods
o Low-income: Spend more on basic needs

2. Class Characterization (Descriptive Comparison):

 Purpose: Describes typical features of a class.

 Example:
Characterizing graduate students:
o Age: 22–30
o Study Hours: 20+week
o Common Field: Engineering

3. Cluster-Based Comparison:

 Purpose: Compares user-defined clusters to understand group behavior.

 Example:
o Cluster A: Young adults, tech-savvy, high online shopping
o Cluster B: Middle-aged, low online activity

4. Attribute Relevance Analysis:

 Purpose: Identifies which attributes are most useful in class differentiation.

 Example:
In a health dataset, attributes like exercise and diet may be more relevant than hair color in
predicting fitness level.

$$$$$$$$$$$$$$$$$$$$$$$$$$$
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

ELABORATE ON STATISTICAL MEASURES WITH EXAMPLE.

 Statistical measures in data mining are used to analyze and interpret data, extracting
meaningful insights from large datasets.
 These measures help summarize, describe, and understand data characteristics, enabling
informed decision-making and predictive analytics.

Descriptive Statistics:

Measures of Central Tendency: These describe the center or typical value of a dataset. Common
measures include:
 Mean: The average of all values.
 Median: The middle value when the data is sorted.
 Mode: The most frequent value.

Measures of Dispersion: These describe how spread out the data is. Examples include:
 Range: The difference between the highest and lowest values.
 Standard Deviation: Measures the variability of data around the mean.

Inferential Statistics:
 Hypothesis Testing: Used to make inferences about a population based on a
sample. Techniques include t-tests, chi-square tests, and ANOVA.
 Regression Analysis: Used to predict a dependent variable based on one or more
independent variables.
 Correlation Analysis: Measures the strength and direction of the relationship between
variables.
 Clustering: Groups similar data points together.
 Classification: Assigns data points to predefined categories.
 Outlier Detection: Identifies unusual data points that deviate significantly from the norm.
Dr. A.MURUGANANDAM
Head cum Associate Professor, Data Mining And Warehousing
Karan Arts and Science College, TVM Department of Computer Science
Cell: 9842636119

Other Statistical Techniques:

 Time Series Analysis: Analyzes data points collected over time to identify trends and
patterns.
 Factor Analysis: Reduces the number of variables by identifying underlying factors.
 Discriminant Analysis: Predicts a categorical outcome based on predictor variables.
 Survival Analysis: Analyzes time-to-event data, such as customer churn or equipment
failure.
These statistical measures and techniques are essential tools for data mining, enabling the extraction
of valuable knowledge from data and supporting data-driven decision-making.

1 Statistical measures help summarize and describe key features of a dataset.

2 Common measures include mean, median, mode, variance, standard deviation, and
correlation.
3 Mean (average): Sum of values divided by count.
Example: Average salary of employees = ₹50,000.
4 Median: Middle value when data is sorted.
Example: Median age in a dataset = 30.
5 Mode: Most frequently occurring value.
Example: Mode of purchase category = "Electronics".
6 Standard Deviation: Measures data spread from the mean.
Example: Low std. dev. in marks = consistent performance.
7 Variance: Square of standard deviation, shows dispersion.
8 Correlation: Shows relationship between two variables.
Example: Strong positive correlation between study hours and grades.
9 These measures help in pattern discovery and data comparison.
10 Useful in preprocessing, data summarization, and decision making.

$$$$$$$$$$$$$$$$$$$$$$$

Data Mining
No ratings yet
Data Mining
14 pages
Unit 1
No ratings yet
Unit 1
46 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Unit 4 Introduction To Data Mining
No ratings yet
Unit 4 Introduction To Data Mining
22 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
DWDM Unit - 1-1
No ratings yet
DWDM Unit - 1-1
25 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Chapter 6 - Data Mining Techniques
No ratings yet
Chapter 6 - Data Mining Techniques
19 pages
Unit 2 Data Mining and Warehousing
No ratings yet
Unit 2 Data Mining and Warehousing
14 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
40 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Minng
No ratings yet
Data Minng
20 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Data Mining Moodle Notes U1
No ratings yet
Data Mining Moodle Notes U1
11 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
DWDM Unit I
No ratings yet
DWDM Unit I
20 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
91 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
Data Mining 5 Units Notes
No ratings yet
Data Mining 5 Units Notes
85 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Chap 1
No ratings yet
Chap 1
32 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
No ratings yet
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
54 pages
Gokaraju Rangaraju Institute of Engineering and Technology
No ratings yet
Gokaraju Rangaraju Institute of Engineering and Technology
49 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Session 35 - Data Mining and Data Warehousing
No ratings yet
Session 35 - Data Mining and Data Warehousing
14 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining
No ratings yet
Data Mining
26 pages
Unit-2 Data Mining
No ratings yet
Unit-2 Data Mining
23 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Lecture 4 - 6
No ratings yet
Lecture 4 - 6
18 pages
Data Warehouse and Data Mining - Unit 2
No ratings yet
Data Warehouse and Data Mining - Unit 2
24 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Unit - II DW
No ratings yet
Unit - II DW
20 pages
FDS Chap 1
No ratings yet
FDS Chap 1
22 pages
Data Mining Primitives
No ratings yet
Data Mining Primitives
39 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Data Mining - GDi Techno Solutions
No ratings yet
Data Mining - GDi Techno Solutions
145 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
86 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
17 pages
Docker Container Orcas
No ratings yet
Docker Container Orcas
4 pages
Bizom - Concept Deck'22
No ratings yet
Bizom - Concept Deck'22
21 pages
IG Surgical Table TAB Series Electric Motorized
No ratings yet
IG Surgical Table TAB Series Electric Motorized
4 pages
Minor Project Report
No ratings yet
Minor Project Report
17 pages
Ajit Kumar 8899739001 C-64sanjay Gram Gurgaon: Objective
No ratings yet
Ajit Kumar 8899739001 C-64sanjay Gram Gurgaon: Objective
5 pages
Innovative Mining Services - Capability Statement
No ratings yet
Innovative Mining Services - Capability Statement
9 pages
JIRA - Acp-100-Exam-Topics-And-Resources
No ratings yet
JIRA - Acp-100-Exam-Topics-And-Resources
8 pages
Getting Started With NDI
No ratings yet
Getting Started With NDI
36 pages
Intel PC Emulator Setup Guide
No ratings yet
Intel PC Emulator Setup Guide
5 pages
Problem On Monte Carlo Simulation
No ratings yet
Problem On Monte Carlo Simulation
3 pages
Ashirvad UGD Price List 08-May-23
No ratings yet
Ashirvad UGD Price List 08-May-23
8 pages
ETAP Specifications Guide V5 PDF
0% (1)
ETAP Specifications Guide V5 PDF
21 pages
On Egsismo: Frequently Asked Questions
No ratings yet
On Egsismo: Frequently Asked Questions
2 pages
Character Sheet: STR DEX CON INT WIS CHA HP Speed
No ratings yet
Character Sheet: STR DEX CON INT WIS CHA HP Speed
2 pages
RFI-157 Request For NDT (Only RT) Acceptance For PQR Joints
No ratings yet
RFI-157 Request For NDT (Only RT) Acceptance For PQR Joints
1 page
DEP450 - User Manual (PT)
100% (1)
DEP450 - User Manual (PT)
142 pages
Learning-Plan-Week-5-6 POM Q4
No ratings yet
Learning-Plan-Week-5-6 POM Q4
2 pages
Faculty Profile-CSE Updated
No ratings yet
Faculty Profile-CSE Updated
3 pages
Solve Simple Square Root Equations - Worksheet
No ratings yet
Solve Simple Square Root Equations - Worksheet
5 pages
BlueJay Uttar Presentation
No ratings yet
BlueJay Uttar Presentation
13 pages
3.1 Hardware The CPU Architecture
No ratings yet
3.1 Hardware The CPU Architecture
11 pages
Python String Concatenation Guide
No ratings yet
Python String Concatenation Guide
11 pages
Echos of War
No ratings yet
Echos of War
102 pages
TMI-Sample Session Plan
No ratings yet
TMI-Sample Session Plan
3 pages
In Gov Rajasthan rajeduboard-SSCER-15237712020
No ratings yet
In Gov Rajasthan rajeduboard-SSCER-15237712020
1 page
Error Prop Lab
No ratings yet
Error Prop Lab
10 pages
Approval
No ratings yet
Approval
4 pages
Introduction To Malware Detection
No ratings yet
Introduction To Malware Detection
8 pages
Comprehensive Guide to Cyber Law
No ratings yet
Comprehensive Guide to Cyber Law
11 pages
ATR72 Modifications Overview
No ratings yet
ATR72 Modifications Overview
133 pages

Data Mining Unit II Am

Uploaded by

Data Mining Unit II Am

Uploaded by

Dr. A.

3. Background knowledge to be used in discovery process

4. Interestingness measures and pattern evaluation:

DATA MINING LANGUAGE AND SYSTEM ARCHITECTURE

1. Data Mining Query Language (DMQL):

2. System Architecture Components:

 Centralized repository to store integrated data.

3. Data Cleaning and Integration

 Cleaning: Removes noise and inconsistencies.

4. Data Selection and Transformation

 Selection: Extracts relevant data for mining.

5. Data Mining Engine

 Core component that applies algorithms to extract patterns.

6. Pattern Evaluation Module

 Identifies interesting patterns using measures like support and confidence.

 Stores background knowledge (e.g., concept hierarchies, user constraints).

 Allows users to interact with the system.

9. Data Preprocessing Module

 Handles cleaning, integration, selection, and transformation of data.

Types of Data Mining Architecture:

Advantages of Data Mining:

1. Predicts future trends accurately.

6. Helps attract and retain customers.

Disadvantages of Data Mining:

1. Requires skilled teams and training.

DATA MINING QUERY LANGUAGE:

Example DMQL Syntax:

 User-friendly and high-level.

CONCEPT DESCRIPTION IN DATA MINING:

Types of Concept Description:

[Raw Data] [Data Preprocessing (Cleaning, Integration, Transformation)]

DATA GENERALIZATION AND SUMMARIZATION IN DATA MINING

 Reduces the complexity of data attributes

 "New York" → "USA"

 Attribute-Oriented Induction: Generalizing data by replacing low-level attribute values with

 Produces compact, concise descriptions of data sets or data classes.

 Data summarization involves creating concise representations of datasets, often using

 Uses statistical measures (mean, count, max, min).

 Average age of employees = 35

Types of Data Summarization in Data Mining are:

 Tabular Summarization: This method instantly conveys patterns such as frequency

 Example: Replace "New York", "Los Angeles" → "USA"

 Example: "30" years → "Adult" → "Age Group"

 OLAP (Roll-up, Drill-down):

 Example: Roll-up from "City" → "Country", Drill-down from "Year" → "Month"

 Example: Average salary = $55,000; Total sales = $1M

Simplified Methods for Data Generalization and Summarization:

ANALYTICAL CHARACTERIZATION IN DATA MINING

Analytical characterization is the process of summarizing and comparing data characteristics

 It uses descriptive statistics (mean, median, variance) to highlight key patterns.

 Simplifies complex data into understandable summaries.

 Helps in decision-making and market segmentation.

Example of Analytical Characterization:

A university wants to understand the difference between high-performing and low-

MINING CLASS COMPARISON

Types of Class Comparisons:

1. Discriminant Analysis (Discriminative Comparison):

 Purpose: Identifies attributes that distinguish one class from another.

2. Class Characterization (Descriptive Comparison):

 Purpose: Describes typical features of a class.

 Purpose: Compares user-defined clusters to understand group behavior.

4. Attribute Relevance Analysis:

 Purpose: Identifies which attributes are most useful in class differentiation.

ELABORATE ON STATISTICAL MEASURES WITH EXAMPLE.

Other Statistical Techniques:

1 Statistical measures help summarize and describe key features of a dataset.

You might also like