KEMBAR78
Data Mining Methods | PDF | Data Mining | Cluster Analysis
0% found this document useful (0 votes)
40 views18 pages

Data Mining Methods

The document provides an introduction to data mining methods, covering topics such as frequent pattern analysis, classification, clustering, anomaly detection, and trend analysis. It describes the Apriori algorithm for frequent itemset mining and provides an example of how it works by generating candidate itemsets and pruning infrequent ones.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views18 pages

Data Mining Methods

The document provides an introduction to data mining methods, covering topics such as frequent pattern analysis, classification, clustering, anomaly detection, and trend analysis. It describes the Apriori algorithm for frequent itemset mining and provides an example of how it works by generating candidate itemsets and pruning infrequent ones.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Introduction to

Data Mining Methods


Data Mining:
Data Mining Methods
with Dr. Qin Lv
Learning objective: Identify the core
functionalities of data modeling in the
data mining pipeline. Apply the Apriori
algorithm for frequent itemset mining.
Data Mining: Four Views
Application

Knowledge Technique

Data
Application
Data Mining Pipeline
Knowledge
Pattern
evaluation

Data modeling

Data
warehousing
Technique
Data
preprocessing

Data
understanding

Data
Technique View
Ø Frequent pattern analysis
Ø Classification, prediction
Ø Clustering
Ø Anomaly detection
Ø Trend and evolution analysis
Frequent Pattern
Analysis
Ø Frequent itemset
Ø Frequent sequence
Ø Frequent structure
Ø Association rules
Ø Correlation analysis
Classification
Ø Pre-defined
classes
Ø Need training data
Ø Build model to
distinguish classes
Prediction
Ø Numerical prediction
(continuous value)
• E.g., weather
• E.g., stock price
• E.g., traffic
Clustering
Ø No predefined
classes
Ø Intra-cluster
similarity
Ø Inter-cluster
dissimilarity
Anomaly Detection
280

Ø Anomaly/outlier 260 Unusual Time


Series Snippets Level Shifting

• Differ from the “norm” 240

Kelvin
• E.g., error, noise 220

• E.g., fraud 200

• E.g., extreme events 180

160
199 200 200 200 200 200
8-0 0-0 1-0 2-1 4-0 5-0
9-1 1-3 6-1 0-2 3-1 7-2
8 1 5 8 1 4
Date (yyyy-mm-dd)
Trend and Evolution Analysis
Ø Changes over time
• Overall trend
• Periodical patterns
• Anomalies
• E.g.,
Data Mining Methods
Ø Frequent pattern analysis
Ø Classification
Ø Clustering
Ø Outlier analysis
Market Basket Analysis
Tid Items
Ø List of transactions
1 A, B, C, E
• Each Ti contains multiple items
2 A, D, E
Ø (Frequent) itemset
• X = {x1, x2, …, xk} 3 B, C, E

Ø (Minimum) support 4 B, C, D, E
• Probability of Ti containing X 5 B, D, E
Frequent Pattern Mining
Ø Brute force approach (e.g., 100 items)

Ø Closed pattern X: no super-pattern Y ⊃ X


w/ the same support
Ø Max-pattern X: no super-pattern Y ⊃ X
Closed & Max Pattern Example
Ø {<a1, ..., a100>, <a1, ..., a50>} min_sup = 0.5
Ø Frequent pattern? all item combinations
Ø Closed pattern?
• <a1, ..., a100>: 1; <a1, ..., a50>: 2
Ø Max-pattern?
• <a1, ..., a100>: 1
Apriori Algorithm
Ø Apriori pruning: if X is infrequent, then any
of its superset cannot be frequent
Ø Procedure
• Scan dataset to get freq. 1-itemsets
• Generate candidate (k+1)-itemsets from freq. k-itemsets
• Scan dataset to remove infreq. candidate (k+1)-itemsets
• Stop when no more freq. or candidate itemsets
Itemset #
Apriori Algorithm Example {B, C} 3
What about {B, D, E} {B, D} 2
Ø min_sup = 0.6 or {C, D, E}?
{B, E} 4
Tid Items Itemset #
{C, D} 1
1 A, B, C, E {A} 2
{C, E} 3
2 A, D, E {B} 4
{D, E} 3
3 B, C, E {C} 3
4 B, C, D, E {D} 3 Itemset #
5 B, D, E {E} 5 {B, C, E} 3
Important Details
Ø Self-joining of k-itemsets => (k+1)-itemsets
• Only join if their first (k-1) items are the same
Ø Pruning: remove if subset is not frequent
Ø Example: L3 = {abc, abd, acd, ace, bcd}
• abc and abd => abcd and bcd is in L3 => valid candidate
• acd and ace => acde but ade is not in L3 => pruned

You might also like