0% found this document useful (0 votes)

10 views56 pages

Lec.01 Introduction To DM

The document outlines a course on Data Mining and Knowledge Discovery, detailing the syllabus, assessment components, and the importance of data mining in various fields. It discusses the explosive growth of data, the necessity for automated analysis, and potential applications in commercial and scientific contexts. Key topics include data mining functionalities, types of data, and the knowledge discovery process.

Uploaded by

khanhndn2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views56 pages

Lec.01 Introduction To DM

Uploaded by

khanhndn2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

Course: 505043

Data Mining and Knowledge Discovery

Lecture 1. Introduction to Data Mining

Types of Data

Dr. Anh HOANG

1
Report
 QT1 (10%): attending classes and discuss
 QT2 (20%): Homework #1-2-3
 Midterm (20%)

Exam.
 Final report (50%)

Group presentation

Individual performance
 Requirement:

Submit HW, Report, … before deadline

Presentation:

1) Understanding proble clearly

2) Solution/ Algorithm

3) Demo code
2
Contents
 Why data mining?
 What is data mining?
 What types of data can be mined?
 Data mining functionalities/ Tasks
 Interesting patterns
 Classification of data mining systems
 Major issues in data mining

3
Large-scale Data is Everywhere!
 There has been enormous data
growth in both commercial and
scientific databases due to
advances in data generation and
collection technologies.
Cyber Security E-Commerce

 New mantra
 Gather whatever data you can
whenever and wherever possible.

Social Networking: Twitter

 Expectations Traffic Patterns
 Gathered data will have value
either for the purpose collected or
for a purpose not envisioned.

Sensor Networks Computational Simulation

Introduction to Data Mining, 2nd Edition
Tan, Steinbach, Karpatne, Kumar 4
Q1. Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability

Automated data collection tools, database systems, Web, computerized
society
 Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation, …

Society and everyone: news, digital cameras,

…
 We are drowning in data but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets

5
Why Data Mining? Commercial Viewpoint

 Lots of data is being collected

and warehoused
 Web data

Google has Peta Bytes of web data

Facebook has billions of active users
 Purchases at department/
grocery stores, e-commerce

Amazon handles millions of visits/day
 Bank/Credit Card transactions

 Computers have become cheaper and more powerful

 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer
Relationship Management)

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 6
Why Data Mining? Scientific Viewpoint
 Data collected and stored at
enormous speeds
 Remote sensors on a satellite

NASA EOSDIS archives over
petabytes of earth science data / year
fMRI Data from Brain Sky Survey Data
 Telescopes scanning the skies

Sky survey data
 High-throughput biological data

Scientific simulations

Terabytes of data generated in a few hours
Gene Expression Data
 Data mining helps scientists
 In automated analysis of massive datasets

 In hypothesis formation

Surface Temperature of Earth

Introduction to Data Mining, 2nd Edition
Tan, Steinbach, Karpatne, Kumar 7
Great opportunities to improve productivity in all walks of life

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 8
Great Opportunities to Solve Society’s Major Problems

Improving health care and reducing costs Predicting the impact of climate change

Reducing hunger and poverty by

Finding alternative/ green energy sources
increasing agriculture production

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 9
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web databases
 2000s:
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

10
Why Data Mining?—Potential Applications

 Data analysis and decision support/making

 Market analysis and management

Target marketing, customer relationship management
(CRM), market basket analysis, market segmentation
 Risk analysis and management

Forecasting, customer retention, quality control,
competitive analysis
 Fraud detection and detection of unusual patterns (outliers)

11
Why Data Mining?—Potential Applications

 Other Applications
 Text mining (news group, email, documents) and Web
mining
 Stream data mining
 Bioinformatics and bio-data analysis

12
Market Analysis and Management
 Where does the data come from?
 Credit card transactions, discount coupons, customer
complaint calls

 Target marketing
 Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
 Determine customer purchasing patterns over time

13
Market Analysis and Management
 Cross-market analysis
 Associations/co-relations between product sales, &
prediction based on such association
 Customer profiling
 What types of customers buy what products
 Customer requirement analysis
 Identifying the best products for different customers
 Predict what factors will attract new customers

14
Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds, outlier analysis

 Applications: Health care, retail, credit card service, telecom.

 Medical insurance

Professional patients, and ring of doctors

Unnecessary or correlated screening tests
 Telecommunications:

Phone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm
 Retail industry

Analysts estimate that 38% of retail shrink is due to dishonest
employees

15
Other Applications

 Internet Web Surf-Aid

 IBM Surf-Aid applies data mining algorithms to Web
access logs for market-related pages to discover customer
preference and behavior pages, analyzing effectiveness of
Web marketing, improving Web site organization, etc.
 …

16
Q2. What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
 Alternative name
 Knowledge discovery in databases (KDD)
 Watch out: Is everything “data mining”?
 Query processing
 Expert systems
 Statistical programs
17
Data Mining: KDD Process


Data mining—core of Pattern Evaluation
knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
18
Steps of a KDD Process

 Learning the application domain

 Relevant prior knowledge and goals of application
 Creating a target data set: data selection
 Data cleaning and preprocessing: (may take 60% - 80% of effort!)
 Data reduction and transformation
 Find useful features, dimensionality/variable reduction.
 Choosing functions of data mining
 Summarization, classification, regression, association, clustering.
 Choosing the mining algorithm(s)
 Data mining: search for patterns of interest
 Pattern evaluation and knowledge presentation
 Visualization, transformation, removing redundant patterns, etc.
 Use of discovered knowledge
 …
19
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-
Database or
data warehouse base
server
Data cleaning & data integration Filtering

Data
Databases Warehouse

20
What is Data Mining?
 Many Definitions
 Non-trivial extraction of implicit, previously unknown and
potentially useful information from data
 Exploration & analysis, by automatic or semi-automatic
means, of large quantities of data in order to discover
meaningful patterns

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 21
Origins of Data Mining
 Draws ideas from machine learning/AI, pattern recognition,
statistics, and database systems

 Traditional techniques may be unsuitable due to data that is


Large-scale

High dimensional

Heterogeneous

Complex

Distributed

 A key component of the emerging field of data science and data-driven

discovery

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 22
Q3. What types of data can be mined?
 Database data (RDBMs)
 Data warehouse
 Transactional data
 Other types of data:

Sequence data, data streams (cont.), spatial data (maps), engineering
design data, hypertext, multimedia, web data, etc.

 Advanced database and information repository

 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
23
Database data (RDBMs): Relational -> tables
 RDBMs

Set of tables – has rows (tuples) and columns (attributes)

While mining databases, we can search for trends or data
pattern

 Example:

Analysing customer data to predict the credit risks of new
customers (based on previous data)

Analysing sales data - (any deviations)
data
Data warehouse cub
e
 Collection of data integrated from different sources
with querying and decision making on data
 In data warehouse, data is stored in multidimensional
structure (datacube) where each dimension is each
attribute
Data
Source-1 Client-1
Data Data Querying
Source-2 Warehouse Analysis
Client-2
Data
Source-3
Transactional data
 Each record is called as transaction

sales,

flight booking,

user clicks on web page

 Transaction has transaction ID, list of other items making

transaction

 From transaction database, we can mine frequent patterns

 Other types of data:


Sequence data, data streams (cont.), spatial data (maps),
engineering design data, hypertext, multimedia, web data, etc.
Q4. Data Mining Functionalities
 Data is always associated with class/concepts Descriptions:

Data characterisation:

Refers to the summary of the class/ concept

Output -> General overview

Data discrimination:

Compares the common features of the classes

Output -> barcharts, curves, etc.

 Mining frequent patterns, Association, and Correlations


Frequent patterns:

Things which are found most commonly in data

Frequent itemsets (data items/ data objects)

Frequent subsequence

Frequent substructure

Association analysis: (relationship)

It is a way identifying the relation between various items

Example: used to determine sales of items that are frequently purchased
together
27
Q4. Data Mining Functionalities
 Correlation analysis:

Mathematical technique

Shows how strongly pair of attributes are related together

Example: tall peope tend to have more weight

 Classification and Regression for predictive analysis


Classsification:

Process of finding a model that distinguishes data items

Decision tree is used for classification


Regression:

Statistical methodology that is used for numeric prediction (done based on
previous data) of missing data

28
Q4. Data Mining Functionalities
 Cluster analysis (Group)
 Class label is unknown: Group data to form new classes, e.g., cluster

houses to find distribution patterns

 Maximizing intra-class similarity & minimizing interclass similarity

 Outlier analysis
 Outlier: a data object that does not comply with the general behavior of

the data
 Useful in fraud detection, rare events analysis

 Trend and evolution analysis

 Trend and deviation: regression analysis

 Sequential pattern mining, periodicity analysis

29
Data Mining Tasks …

Clu
s teri
Data
ng
Tid Refund Marital Taxable
ng Status Income Cheat
l i
e
od
1 Yes Single 125K No
2 No Married 100K No
M
ve
3 No Single 70K No

c ti
4 Yes Married 120K No
i
ed
5 No Divorced 95K Yes
6
7
No
Yes
Married 60K
Divorced 220K
No
No P r
8 No Single 85K Yes
9 No Married 75K No

An
10 No Single 90K Yes

De oma
11 No Married 60K No

at i on 12 Yes Divorced 220K No

tec ly
oc i
13 No Single 85K Yes

s 14 No Married 75K No
ti o
As s 15 No Single 90K Yes n
le
10

Milk

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 30
Predictive Modeling: Classification

 Find a model for class attribute as a function of the

values of other attributes Model for predicting credit
worthiness

Class Employed
# years at
Level of Credit Yes
Tid Employed present No
Education Worthy
address
1 Yes Graduate 5 Yes
2 Yes High School 2 No No Education
3 No Undergrad 1 No
{ High school,
4 Yes High School 10 Yes Graduate
Undergrad }
… … … … …
10

Number of Number of
years years

> 3 yr < 3 yr > 7 yrs < 7 yrs

Yes No Yes No

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 31
Classification Example
l l ive
ir ca ir ca a t # years at
go go nti t Tid Employed
Level of
present
Credit
ate ate u a ass Education
address
Worthy
c c q cl 1 Yes Undergrad 7 ?
# years at 2 No Graduate 3 ?
Level of Credit
Tid Employed present 3 Yes High School 2 ?
Education Worthy
address
1 Yes Graduate 5 Yes … … … … …
10

2 Yes High School 2 No

3 No Undergrad 1 No
4 Yes High School 10 Yes
… … … … …
10 Test
Set

Training
Learn
Model
Set Classifier

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 32
Examples of Classification Task

 Classifying credit card transactions

as legitimate or fraudulent

 Classifying land covers (water bodies, urban areas,

forests, etc.) using satellite data

 Categorizing news stories as finance,

weather, entertainment, sports, etc

 Identifying intruders in the cyberspace

 Predicting tumor cells as benign or malignant

 Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random coil

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 33
Classification: Application 1
 Fraud Detection
 Goal: Predict fraudulent cases in credit card transactions.
 Approach:

Use credit card transactions and the information on its
account-holder as attributes.
 When does a customer buy, what does he buy, how often

he pays on time, etc


Label past transactions as fraud or fair transactions. This
forms the class attribute.

Learn a model for the class of the transactions.

Use this model to detect fraud by observing credit card
transactions on an account.

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 34
Classification: Application 2
 Churn prediction for telephone customers

Goal: To predict whether a customer is likely to be lost to a
competitor.

Approach:

Use detailed record of transactions with each of the past
and present customers, to find attributes.

How often the customer calls, where he calls, what time-of-
the day he calls most, his financial status, marital status,
etc.

Label the customers as loyal or disloyal.

Find a model for loyalty.

From [Berry & Linoff] Data Mining Techniques, 1997

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 35
Classification: Application 3
 Sky Survey Cataloging
– Goal: To predict class (star or galaxy) of sky objects,
especially visually faint ones, based on the telescopic survey
images (from Palomar Observatory).

3000 images with 23,040 x 23,040 pixels per image.
– Approach:

Segment the image.

Measure image attributes (features) - 40 of them per
object.

Model the class based on these features.

Success Story: Could find 16 new high red-shift quasars,
some of the farthest objects that are difficult to find!

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 36
Classifying Galaxies
Courtesy: http://aps.umn.edu

Early Class: Attributes:

• Stages of Formation • Image features,
• Characteristics of light
waves received, etc.
Intermediate

Late

Data Size:
• 72 million stars, 20 million galaxies
• Object Catalog: 9 GB
• Image Database: 150 GB

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 37
Regression
 Predict a value of a given continuous valued variable based on
the values of other variables, assuming a linear or nonlinear
model of dependency.
 Extensively studied in statistics, neural network fields.
 Examples:

Predicting sales amounts of new product based on
advertising expenditure.

Predicting wind velocities as a function of temperature,
humidity, air pressure, etc.

Time series prediction of stock market indices.

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 38
Clustering
 Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 39
Applications of Cluster Analysis
 Understanding

Custom profiling for targeted
marketing

Group related documents for
browsing

Group genes and proteins that have
similar functionality

Group stocks with similar price
fluctuations
 Summarization

Reduce the size of large data sets

Courtesy: Michael Eisen

Clusters for Raw SST and Raw NPP

Use of K-means to
partition Sea Surface
60

Land Cluster 2

30 Temperature (SST) and

Land Cluster 1 Net Primary Production
latitude

0
(NPP) into clusters that
Ice or No NPP

-30
reflect the Northern and
Sea Cluster 2 Southern Hemispheres.
-60

Sea Cluster 1

-90
-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
Cluster Introduction to Data Mining, 2nd Edition
longitude
Tan, Steinbach, Karpatne, Kumar 40
Clustering: Application 1
 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers
where any subset may conceivably be selected as a market
target to be reached with a distinct marketing mix.
 Approach:

Collect different attributes of customers based on their
geographical and lifestyle related information.

Find clusters of similar customers.

Measure the clustering quality by observing buying
patterns of customers in same cluster vs. those from
different clusters.

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 41
Clustering: Application 2
 Document Clustering:

Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.

Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the
frequencies of different terms. Use it to cluster.

Enron email dataset

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 42
Association Rule Discovery: Definition
 Given a set of records each of which contain some
number of items from a given collection

Produce dependency rules which will predict occurrence of
an item based on occurrences of other items.

TID Items
1 Bread, Coke, Milk
Rules
RulesDiscovered:
Discovered:
2 Beer, Bread
{Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Beer, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 43
Association Analysis: Applications
 Market-basket analysis

Rules are used for sales promotion, shelf management, and
inventory management

 Telecommunication alarm diagnosis


Rules are used to find combination of alarms that occur
together frequently in the same time period

 Medical Informatics

Rules are used to find combination of patient symptoms and
test results associated with certain diseases

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 44
Association Analysis: Applications
 An Example Subspace Differential Co-expression Pattern
from lung cancer dataset Three lung cancer datasets [Bhattacharjee et al.
2001], [Stearman et al. 2005], [Su et al. 2007]

Enriched with the TNF/NFB signaling pathway

which is well-known to be related to lung cancer
P-value: 1.4*10-5 (6/10 overlap with the pathway)

[Fang et al PSB 2010]

Introduction to Data Mining, 2nd Edition
Tan, Steinbach, Karpatne, Kumar 45
Deviation/Anomaly/Change Detection
 Detect significant deviations from normal
behavior
 Applications:

Credit Card Fraud Detection

Network Intrusion
Detection

Identify anomalous behavior from sensor
networks for monitoring and surveillance.

Detecting changes in the global forest
cover.

Introduction to Data Mining, 2nd Edition

Tan, Steinbach, Karpatne, Kumar 46
Q5. Are All the “Discovered” Patterns Interesting?

 Data mining may generate thousands of patterns: Not all of them are
interesting
 Suggested approach: Human-centered, query-based, focused mining
 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid on new or test
data with some degree of certainty, potentially useful, novel, or validates some
hypothesis that a user seeks to confirm
 Objective vs. subjective interestingness measures
 Objective: based on statistics and structures of patterns, e.g., support,
confidence, etc.
 Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty.

47
Q6. Data Mining: Classification Schemes

 Different views, different classifications

 Kinds of data to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted

48
Multi-Dimensional View of Data Mining
 Data to be mined
 Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-
media, heterogeneous, WWW

 Knowledge to be mined
 Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
 Multiple/integrated functions and mining at multiple levels

49
Multi-Dimensional View of Data Mining
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, etc.

 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data
mining, stock market analysis, Web mining, etc.

50
OLAP Mining: Integration of Data Mining and Data Warehousing

 Data mining systems, DBMS, Data warehouse systems

coupling
 On-line analytical mining data
 Integration of mining and OLAP technologies
 Interactive mining multi-level knowledge
 Necessity of mining knowledge and patterns at different levels of
abstraction.
 Integration of multiple mining functions
 Characterized classification, first clustering and then association

51
Data Mining: Confluence of Multiple Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines

52
Q7. Major Issues in Data Mining
 Mining methodology
 Mining different kinds of knowledge from diverse data
types, e.g., bio, stream, Web
 Performance: efficiency, effectiveness, and scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining methods
 Integration of the discovered knowledge with existing one:
knowledge fusion

53
Q7. Major Issues in Data Mining
 User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of data mining results
 Interactive mining of knowledge at multiple levels of
abstraction

 Applications and social impacts

 Domain-specific data mining & invisible data mining
 Protection of data security, integrity, and privacy

54
Summary
 Data mining: discovering interesting patterns from large amounts of data
 A natural evolution of database technology, in great demand, with wide
applications
 A KDD process includes data cleaning, data integration, data selection,
transformation, data mining, pattern evaluation, and knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination, association,
classification, clustering, outlier and trend analysis, etc.
 Data mining systems and architectures
 Major issues in data mining

55
Where to Find References?
 More conferences on data mining
 PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc.
 Data mining and KDD
 Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
 Journal: Data Mining and Knowledge Discovery, KDD Explorations
 Database systems
 Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA
 Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.
 AI & Machine Learning
 Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), etc.
 Journals: Machine Learning, Artificial Intelligence, etc.
 Statistics
 Conferences: Joint Stat. Meeting, etc.
 Journals: Annals of statistics, etc.
 Visualization
 Conference proceedings: CHI, ACM-SIGGraph, etc.
 Journals: IEEE Trans. visualization and computer graphics, etc. 56

Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
1 Intro
No ratings yet
1 Intro
50 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Intro to Data Mining Course
No ratings yet
Intro to Data Mining Course
56 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
VIPDMTheory Chapter 1
No ratings yet
VIPDMTheory Chapter 1
25 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Intro to Data Mining Concepts
No ratings yet
Intro to Data Mining Concepts
50 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining Concepts & Techniques Guide
100% (2)
Data Mining Concepts & Techniques Guide
27 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
17 pages
Introduction
No ratings yet
Introduction
27 pages
Data Mining SSWT ZC 425
No ratings yet
Data Mining SSWT ZC 425
381 pages
Unit 1 A
No ratings yet
Unit 1 A
39 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining
No ratings yet
Data Mining
26 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
CSC 452 DM Lecture01 Course Information 13102020 014048pm
No ratings yet
CSC 452 DM Lecture01 Course Information 13102020 014048pm
49 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Introduction
No ratings yet
Introduction
46 pages
01intro Edited v1
No ratings yet
01intro Edited v1
42 pages
01 Intro
No ratings yet
01 Intro
41 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
01 Intro
No ratings yet
01 Intro
45 pages
Day-2 BE-VIII DMDW (Into. Contd..)
No ratings yet
Day-2 BE-VIII DMDW (Into. Contd..)
23 pages
01 Intro
No ratings yet
01 Intro
23 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Basic Concepts Data Mining (Lecture 02) - 1
No ratings yet
Basic Concepts Data Mining (Lecture 02) - 1
40 pages
Unit 1a
No ratings yet
Unit 1a
39 pages
Lec Slides Combined Mid Quiz With Old Quizzes
No ratings yet
Lec Slides Combined Mid Quiz With Old Quizzes
378 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
No ratings yet
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
21 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
DWDM 3rd Edition Text Book Slides
No ratings yet
DWDM 3rd Edition Text Book Slides
938 pages
Ch1 (1) (Read-Only) (Compatibility Mode)
No ratings yet
Ch1 (1) (Read-Only) (Compatibility Mode)
39 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
35 pages
Data Mining and Decision Support Systems
No ratings yet
Data Mining and Decision Support Systems
2 pages
DWDM 1st Mid R2031053
No ratings yet
DWDM 1st Mid R2031053
7 pages
Marketing Analytics New
No ratings yet
Marketing Analytics New
66 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
33 pages
Knowledge Graph For Identifying Hazards On Construction Sites
No ratings yet
Knowledge Graph For Identifying Hazards On Construction Sites
10 pages
Unit 1 (DMW)
No ratings yet
Unit 1 (DMW)
53 pages
DM 01 Introduction ML Data Mining
No ratings yet
DM 01 Introduction ML Data Mining
39 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Chapter 4
No ratings yet
Chapter 4
54 pages
Name: Akshansh Aswal - Course: B.Tech 3Rd Year - Section: B' - Roll No: 07 - Graphic Ea Hill University Dehradun Campus
No ratings yet
Name: Akshansh Aswal - Course: B.Tech 3Rd Year - Section: B' - Roll No: 07 - Graphic Ea Hill University Dehradun Campus
13 pages
Data Mining Concepts and Challenges
No ratings yet
Data Mining Concepts and Challenges
5 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Research Papers On Data Mining PDF 2012
No ratings yet
Research Papers On Data Mining PDF 2012
4 pages
Graph Construction and Applicaiton
No ratings yet
Graph Construction and Applicaiton
7 pages
BI Module 4 Notes
No ratings yet
BI Module 4 Notes
31 pages
Introduction To: Data Science
No ratings yet
Introduction To: Data Science
52 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
25 pages
Lesson 3
No ratings yet
Lesson 3
17 pages
Unit 1
No ratings yet
Unit 1
8 pages
Introduction to Data Mining Basics
100% (1)
Introduction to Data Mining Basics
18 pages
Data Mining: Concepts & Techniques
No ratings yet
Data Mining: Concepts & Techniques
29 pages
Data Mining
100% (1)
Data Mining
30 pages
Data Mining Basics for Students
No ratings yet
Data Mining Basics for Students
33 pages
2 DM Module 1 Introduction DVS
No ratings yet
2 DM Module 1 Introduction DVS
81 pages
Implementation of Data Mining Techniques For Meteorological Data Analysis
No ratings yet
Implementation of Data Mining Techniques For Meteorological Data Analysis
5 pages
Enhancement of Newton Law of Cooling Method Based
No ratings yet
Enhancement of Newton Law of Cooling Method Based
6 pages
Data Mining Basics & Techniques
No ratings yet
Data Mining Basics & Techniques
166 pages
2017DataMiningTools PDF
No ratings yet
2017DataMiningTools PDF
4 pages
Comparative Study of Data Mining Tools
No ratings yet
Comparative Study of Data Mining Tools
8 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
9 pages

Lec.01 Introduction To DM

Uploaded by

Lec.01 Introduction To DM

Uploaded by

Course: 505043

Data Mining and Knowledge Discovery

Lecture 1. Introduction to Data Mining

Dr. Anh HOANG

Social Networking: Twitter

Sensor Networks Computational Simulation

 Lots of data is being collected

 Computers have become cheaper and more powerful

Introduction to Data Mining, 2nd Edition

Surface Temperature of Earth

Introduction to Data Mining, 2nd Edition

Reducing hunger and poverty by

Introduction to Data Mining, 2nd Edition

 Data analysis and decision support/making

 Approaches: Clustering & model construction for frauds, outlier analysis

 Applications: Health care, retail, credit card service, telecom.

 Internet Web Surf-Aid

 Data mining (knowledge discovery from data)

 Learning the application domain

Graphical user interface

Data mining engine

Introduction to Data Mining, 2nd Edition

 Traditional techniques may be unsuitable due to data that is

 A key component of the emerging field of data science and data-driven

Introduction to Data Mining, 2nd Edition

 Advanced database and information repository

 Transaction has transaction ID, list of other items making

 From transaction database, we can mine frequent patterns

 Other types of data:

 Mining frequent patterns, Association, and Correlations

 Classification and Regression for predictive analysis

houses to find distribution patterns

 Trend and evolution analysis

 Sequential pattern mining, periodicity analysis

at i on 12 Yes Divorced 220K No

Introduction to Data Mining, 2nd Edition

 Find a model for class attribute as a function of the

> 3 yr < 3 yr > 7 yrs < 7 yrs

Introduction to Data Mining, 2nd Edition

2 Yes High School 2 No

Introduction to Data Mining, 2nd Edition

 Classifying credit card transactions

 Classifying land covers (water bodies, urban areas,

 Categorizing news stories as finance,

 Identifying intruders in the cyberspace

 Predicting tumor cells as benign or malignant

 Classifying secondary structures of protein

Introduction to Data Mining, 2nd Edition

he pays on time, etc

Introduction to Data Mining, 2nd Edition

From [Berry & Linoff] Data Mining Techniques, 1997

Introduction to Data Mining, 2nd Edition

Introduction to Data Mining, 2nd Edition

Early Class: Attributes:

Introduction to Data Mining, 2nd Edition

Introduction to Data Mining, 2nd Edition

Introduction to Data Mining, 2nd Edition

Courtesy: Michael Eisen

Clusters for Raw SST and Raw NPP

30 Temperature (SST) and

Introduction to Data Mining, 2nd Edition

Enron email dataset

Introduction to Data Mining, 2nd Edition

Introduction to Data Mining, 2nd Edition

 Telecommunication alarm diagnosis

Introduction to Data Mining, 2nd Edition

Enriched with the TNF/NFB signaling pathway

[Fang et al PSB 2010]

Introduction to Data Mining, 2nd Edition

 Different views, different classifications

 Data mining systems, DBMS, Data warehouse systems

 Applications and social impacts

You might also like