0% found this document useful (0 votes)

23 views41 pages

Data Mining Introduction

The document serves as an introduction to Data Mining, outlining its importance, methodologies, and applications. It covers the course structure, evaluation criteria, class rules, and recommended texts for students. Additionally, it discusses the evolution of database technology, the KDD process, and various functionalities of data mining such as classification, clustering, and outlier analysis.

Uploaded by

Obaid Amir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views41 pages

Data Mining Introduction

Uploaded by

Obaid Amir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

1

Data Mining
An Introduction

Instructor: Qurat-ul-Ain
quratulain.ssc@stmu.edu.pk

WELCOME TO THIS LOVELY AND JOYFUL SUBJECT

Recommended
Text
 Data Mining: Concepts and Techniques”,
Second Edition and above by Jiawei Han
”
 Mining of Massive Datasets, 3 edition
 Jure Leskovec, Anand Rajaraman, Jeffrey D.
Ullman

 Data Science and Big Data Analytics

 EMC Education Services

 Instructor’s Notes
 Lecture slides & Notes
3 Student’s Performance
Evaluation
Credit hours 3
Prerequisite Probability and Statistics
Quizzes 10%
Assignment 15%
Mid-term 20%
Class Participation 5%
Final-term 50%
4
Grading Policy
 No makeup for any of the evaluation activities.
 Regular project related assignments.
 Strict submission deadlines.

In case of late submissions, marks will be deducted


15% per late day. No submissions after 3 days of due date.

 Strict penalty for any copied/plagiarized material.


An individual/group may be assigned a straight-forward 0, if the submitted
assessed work (lab work, assignment or quiz) is copied from another
individual/group or from any other source (books, research papers, web
sites).


An individual/group may be penalized if substantial amount of the submitted
assessed work falls under plagiarism by deducting marks from the assessed work.
5
Class Rules [1/2]
 No visitor are allowed
 Be Punctual
 Late comers are not allowed
 75% attendance is compulsory
 Be Attentive
 Be Prompt
 Ready to learn
 Class participation
 Surprised quizzes
 Be Polite
 Soft-spoken
6 Class Rules [2/2]
 Be Honest
 With yourself
 Credit others
 No cheating
 No wastage of time
 Be Responsible
 SWITCH OFF your phone
 Penalty: treat for the whole class

 Ask the questions

7 Get Connected
 Contacts
 Quratulain.ssc@stmu.edu.pk

 Link for study resources

 Google Drive:
https://drive.google.com/drive/u/1/folders/1R8MOSt6MBC7
Ke1-3GVF7zGSI4iyfpha7
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes

 Data collection and data availability

 Automated data collection tools, database systems, Web,

computerized society

 Major sources of abundant data

 Business: Web, e-commerce, transactions, stocks, …

 Science: Remote sensing, bioinformatics, scientific simulation, …

 Society and everyone: news, digital cameras,

 We are drowning in data, but starving for knowledge!

 “Necessity is the mother of invention”—Data mining—Automated

analysis of massive data sets
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems
What Is Data Mining?
 Alternative name

 Knowledge discovery in databases (KDD)

 Watch out: Is everything “data mining”?

 Query processing

 Expert systems or statistical programs

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously

unknown and potentially useful) patterns or knowledge
from huge amount of data
What Is Data Mining?
Let’s start data mining with a interesting statement.

 The statement, given by Donald Rumsfeld, Defense Secretary of the

USA in an interview, is as under.

 As we know, there are known knowns. There are things we know that
we know like you know your names, your parent’s names. We also
know there are known unknowns.

 That is to say, we know that there are some things we do not know like
what one is thinking about you, what you will eat after six days, what
will be result of a lottery and so on.

 But there are also unknown unknowns, the ones we don't know that
we don't know. Are they beneficial if you know? Or it is harmful no to
know them?
What Is Data Mining?
There are also unknown knowns, things we'd like to know, but
don't know, but know someone who can doctor them and pass
them off as known knowns. To associate Rumsfeld’s above
quotation with data mining, we identify four core phrases as
1. Known knowns
2. Known unknowns
3. Unknown unknowns
 The items 1 3, and 4 deal with “Knowns”. Data mining has
relevance to the third point in red.
 It is an art of digging out what exactly we don’t know that we
must know in our business.
 The methodology is to first convert “unknown unkowns” into
“known unknowns” and then finally to “known knowns”.
What is Data Mining?: Slightly Informal

Tell me something that I should know. When you don’t know what you
should be knowing, how do you write SQL?

You cant!!

Tell me something that I should know i.e. you ask your DWH, data
repository that tell me something that I don’t know, or I should know.
Since we don’t know what we actually don’t know and what we must
know to know, we can’t write SQL’s for getting answers like we do in
OLTP systems.

Data mining is an exploratory approach, where browsing through data

using data mining techniques may reveal something that might be of
interest to the user as information that was unknown previously. Hence,
in data mining we don’t know the results.
Why Data Mining?—Potential Applications
 Data analysis and decision support
 Market analysis and management
 Target marketing, customer relationship management (CRM),
market basket analysis, market segmentation
 Risk analysis and management
 Forecasting, customer retention, quality control, competitive
analysis
 Fraud detection and detection of unusual patterns (outliers)
 Other Applications
 Text mining (news group, email, documents) and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis
Market Analysis and Management
 Where does the data come from?
 Credit card transactions, discount coupons, customer complaint calls
 Target marketing
 Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
 Determine customer purchasing patterns over time
 Cross-market analysis
 Associations/co-relations between product sales, & prediction based
on such association
 Customer profiling
 What types of customers buy what products

 Customer requirement analysis

 Identifying the best products for different customers

 Predict what factors will attract new customers

Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds,

outlier analysis
 Applications: Health care, retail, credit card service, telecomm.
 Medical insurance
 Professional patients, and ring of doctors
 Unnecessary or correlated screening tests
 Telecommunications:
 Phone call model: destination of the call, duration, time
of day or week. Analyze patterns that deviate from an
expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to
dishonest employees
Other Applications
 Internet Web Surf-Aid
 IBM Surf-Aid applies data mining algorithms to Web
access logs for market-related pages to discover
customer preference and behavior pages, analyzing
effectiveness of Web marketing, improving Web site
organization, etc.
Data Mining: A KDD Process
 Data mining—core of knowledge Pattern Evaluation
discovery process

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
Steps of a KDD Process
 Learning the application domain
 Relevant prior knowledge and goals of application
 Creating a target data set: data selection
 Data cleaning and preprocessing: (may take 60% of effort!)
 Data reduction and transformation
 Find useful features, dimensionality/variable reduction.
 Choosing functions of data mining
 Summarization, classification, regression, association, clustering.
 Choosing the mining algorithm(s)
 Data mining: search for patterns of interest
 Pattern evaluation and knowledge presentation
 Visualization, transformation, removing redundant patterns, etc.
 Use of discovered knowledge
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-
Database or
data warehouse base
Data cleaningserver
& data Filteri
integration ng
Data
Databa Warehou
ses se
Claude Shannon's Info. Theory
More Volume
 Data mining evolved as a mechanism to cater the limitations of
OLTP systems to deal massive data sets with high dimensionality,
new data types, multiple heterogeneous data resources etc.

 The conventional systems couldn’t keep pace with the ever

changing and increasing data sets.

 Data mining algorithms are built to deal high dimensionality data,

new data types (images, video etc.), complex associations
among data items, distributed data sources and associated
issues (security etc.)
How Data Mining is different?

 Traditional Database (Transactions): -- Querying data in well-

defined processes. Reliable storage
Data Mining: On What Kinds of Data?

 Relational database
 Data warehouse
 Transactional database
 Advanced database and information repository
 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
Data Mining Functionalities
 Concept description: Characterization and
discrimination
 Generalize, summarize, and contrast data characteristics

 Association (correlation and causality)

 Diaper à Beer [0.5%, 75%]

 Classification and Prediction

 Construct models (functions) that describe and
distinguish classes or concepts for future prediction
 Presentation: decision-tree, classification rule, neural
network
Data Mining
Functionalities
 Cluster analysis
 Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
 Maximizing intra-class similarity & minimizing interclass
similarity
 Outlier analysis
 Outlier: a data object that does not comply with the
general behavior of the data
 Useful in fraud detection, rare events analysis
 Trend and evolution analysis
 Trend and deviation: regression analysis
 Sequential pattern mining, periodicity analysis
Are All the “Discovered” Patterns Interesting?

 Data mining may generate thousands of patterns: Not all of

them are interesting
 Suggested approach: Human-centered, query-based, focused mining

 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid on
new or test data with some degree of certainty, potentially useful,
novel, or validates some hypothesis that a user seeks to confirm

 Objective vs. subjective interestingness measures

 Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
 Subjective: based on user’s belief in the data, e.g., unexpectedness,
novelty.
Data Mining: Confluence of Multiple Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines
Data Mining: Classification Schemes

 Different views, different classifications

 Kinds of data to be mined

 Kinds of knowledge to be discovered

 Kinds of techniques utilized

 Kinds of applications adapted

Multi-Dimensional View of Data Mining
 Data to be mined
 Relational, data warehouse, transactional, stream,
object-oriented/relational, active, spatial, time-series, text, multi-media,
heterogeneous, WWW

 Knowledge to be mined
 Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
 Multiple/integrated functions and mining at multiple levels
Multi-Dimensional View of Data Mining
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.

 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining,
stock market analysis, Web mining, etc.
OLAP Mining: Integration of Data Mining and Data Warehousing

 Data mining systems, DBMS, Data warehouse

systems coupling
 On-line analytical mining data
 Integration of mining and OLAP technologies

 Interactive mining multi-level knowledge

 Necessity of mining knowledge and patterns at different
levels of abstraction.

 Integration of multiple mining functions

 Characterized classification, first clustering and then
association
Data Mining is…
Data Mining
Data Mining
 A neural network is a series of algorithms that endeavors to
recognize underlying relationships in a set of data through a
process that mimics the way the human brain operates. In this
sense, neural networks refer to systems of neurons, either
organic or artificial in nature.

 Rule induction is an area of machine learning in which formal

rules are extracted from a set of observations. The rules
extracted may represent a full scientific model of the data, or
merely represent local patterns in the data.
Major Issues in Data Mining
 Mining methodology
 Mining different kinds of knowledge from diverse data types, e.g., bio,
stream, Web
 Performance: efficiency, effectiveness, and scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining methods
 Integration of the discovered knowledge with existing one: knowledge
fusion
Major Issues in Data Mining
 User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of data mining results
 Interactive mining of knowledge at multiple levels of abstraction
 Applications and social impacts
 Domain-specific data mining & invisible data mining
 Protection of data security, integrity, and privacy
Summary
38

 Data mining: Discovering interesting patterns from large

amounts of data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis,
etc.
 Data mining systems and architectures
 Major issues in data mining
Tools used for Data
Mining
 Data Mining Tools
 Weka, Rapid Miner, Mini Tab etc.
 Data Warehouses
 A subject-oriented, integrated, time-variant, and non-volatile
collection of data
 Developed to support of management’s decision-making process
 Benefits of DWH [high returns on investment, substantial
competitive advantage, increased productivity of corporate
decision-makers ]
 Python / R language
Where to Find References?
 More conferences on data mining
 PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc.

 Data mining and KDD

 Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.

 Journal: Data Mining and Knowledge Discovery, KDD Explorations

 Database systems
 Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA

 Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.

 AI & Machine Learning

 Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), etc.

 Journals: Machine Learning, Artificial Intelligence, etc.

 Statistics
 Conferences: Joint Stat. Meeting, etc.

 Journals: Annals of statistics, etc.

 Visualization
 Conference proceedings: CHI, ACM-SIGGraph, etc.

 Journals: IEEE Trans. visualization and computer graphics, etc.

Topic to be Covered

 Introduction to Data Mining

 Data Reduction
 Clustering
 Classification
 Association Analysis
 Link analysis
 Outlier mining
 Sequence mining
 Text Mining
 Web mining
 Recommender System

01 Intro 1
No ratings yet
01 Intro 1
50 pages
Unit 3
No ratings yet
Unit 3
23 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
49 pages
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
No ratings yet
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
21 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Week1 1
No ratings yet
Week1 1
18 pages
Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Introduction
No ratings yet
Introduction
46 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining
No ratings yet
Data Mining
61 pages
1 - 1 Intro To Data Mining - ch1
No ratings yet
1 - 1 Intro To Data Mining - ch1
18 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining
No ratings yet
Data Mining
395 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Unit III
No ratings yet
Unit III
101 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
Data Mining
No ratings yet
Data Mining
27 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
2 DM Module 1 Introduction DVS
No ratings yet
2 DM Module 1 Introduction DVS
81 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
01 Intro
No ratings yet
01 Intro
40 pages
Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Haramaya University College of Engineering and Technology Department of Information Technology
No ratings yet
Haramaya University College of Engineering and Technology Department of Information Technology
38 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Intro to Data Mining Concepts
No ratings yet
Intro to Data Mining Concepts
50 pages
Lecture 1 and 2 - Introduction and Background
No ratings yet
Lecture 1 and 2 - Introduction and Background
28 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
01 Intro
No ratings yet
01 Intro
40 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
DB 14
No ratings yet
DB 14
97 pages
Data Mining Nostos - Resp
No ratings yet
Data Mining Nostos - Resp
39 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Course: COMP6140 - Data Mining Effective Period: September 2017
No ratings yet
Course: COMP6140 - Data Mining Effective Period: September 2017
24 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
Module 3
No ratings yet
Module 3
187 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
01 Intro
No ratings yet
01 Intro
41 pages
1 Intro
No ratings yet
1 Intro
50 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Lecture - 1 02032023 095637am 1 29022024 124126pm
No ratings yet
Lecture - 1 02032023 095637am 1 29022024 124126pm
33 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
01 Intro
No ratings yet
01 Intro
29 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
59 pages
Study Guide QA1
No ratings yet
Study Guide QA1
3 pages
DS1720 01
No ratings yet
DS1720 01
19 pages
Product Guide: Hyundai Construction Equipment
100% (1)
Product Guide: Hyundai Construction Equipment
26 pages
Unit-1 Feature Point of View Types of Os
No ratings yet
Unit-1 Feature Point of View Types of Os
5 pages
t201 Visit Report
100% (1)
t201 Visit Report
16 pages
Dre8 Progress Test 2 A
No ratings yet
Dre8 Progress Test 2 A
3 pages
Wireless Communications: Principles and Practice 2 Edition T.S. Rappaport
No ratings yet
Wireless Communications: Principles and Practice 2 Edition T.S. Rappaport
19 pages
Linear Inequalities
100% (1)
Linear Inequalities
7 pages
UEME3112 Fluid Mechanics II May 2019 CFD Assignment: Laminar Pipe Flow
No ratings yet
UEME3112 Fluid Mechanics II May 2019 CFD Assignment: Laminar Pipe Flow
18 pages
Automotive Service Management: Principles Into Practice
33% (3)
Automotive Service Management: Principles Into Practice
14 pages
43 To 49 - 2025 - Notice-NE-4
No ratings yet
43 To 49 - 2025 - Notice-NE-4
4 pages
Examen Final - Semana 8 - Esp - Segundo Bloque - Virtual-Ingles General 7 - (Grupo b01)
No ratings yet
Examen Final - Semana 8 - Esp - Segundo Bloque - Virtual-Ingles General 7 - (Grupo b01)
16 pages
Team Corporation R 10 Rotary Actuator
No ratings yet
Team Corporation R 10 Rotary Actuator
4 pages
TSS HD Suspension
No ratings yet
TSS HD Suspension
2 pages
Roberts and Lamp - Geoeconomics Narrative
No ratings yet
Roberts and Lamp - Geoeconomics Narrative
21 pages
Seminar Face Recognition Technology
No ratings yet
Seminar Face Recognition Technology
21 pages
Diagramas GDZ-50E
No ratings yet
Diagramas GDZ-50E
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
No ratings yet
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
13 pages
LG Oem Lgit Plde-P017a SCH
No ratings yet
LG Oem Lgit Plde-P017a SCH
2 pages
Set Lesson 3
No ratings yet
Set Lesson 3
14 pages
Starfinder Alien Archive 4 Pawn Collection 3 4
No ratings yet
Starfinder Alien Archive 4 Pawn Collection 3 4
2 pages
Manual IBC 5 New Controls ENG MAY 2023 - 2
No ratings yet
Manual IBC 5 New Controls ENG MAY 2023 - 2
41 pages
Test 1 PDF
No ratings yet
Test 1 PDF
6 pages
Efficient Market Hypothesis in The Indian Stock Market: January 2020
No ratings yet
Efficient Market Hypothesis in The Indian Stock Market: January 2020
11 pages
Damodaram Sanjivayya National Law University: Visakhapatnam: 3 Year Students List Subject: Law of Evidence AY - 2020-21
No ratings yet
Damodaram Sanjivayya National Law University: Visakhapatnam: 3 Year Students List Subject: Law of Evidence AY - 2020-21
5 pages
Ati:F:Ht1: Service Bulletin
No ratings yet
Ati:F:Ht1: Service Bulletin
42 pages
DLP in Math Ttleg
No ratings yet
DLP in Math Ttleg
3 pages
Agsc QP
No ratings yet
Agsc QP
15 pages
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
No ratings yet
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
10 pages

Data Mining Introduction

Uploaded by

Data Mining Introduction

Uploaded by

1

WELCOME TO THIS LOVELY AND JOYFUL SUBJECT

 Data Science and Big Data Analytics

 Strict penalty for any copied/plagiarized material.

 Ask the questions

 Link for study resources

 Data collection and data availability

 Automated data collection tools, database systems, Web,

 Major sources of abundant data

 Business: Web, e-commerce, transactions, stocks, …

 Science: Remote sensing, bioinformatics, scientific simulation, …

 Society and everyone: news, digital cameras,

 We are drowning in data, but starving for knowledge!

 “Necessity is the mother of invention”—Data mining—Automated

 Knowledge discovery in databases (KDD)

 Watch out: Is everything “data mining”?

 Expert systems or statistical programs

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously

 The statement, given by Donald Rumsfeld, Defense Secretary of the

Data mining is an exploratory approach, where browsing through data

 Customer requirement analysis

 Predict what factors will attract new customers

 Approaches: Clustering & model construction for frauds,

Data Warehouse Selection

Graphical user interface

Data mining engine

 The conventional systems couldn’t keep pace with the ever

 Data mining algorithms are built to deal high dimensionality data,

 Traditional Database (Transactions): -- Querying data in well-

 Association (correlation and causality)

 Classification and Prediction

 Data mining may generate thousands of patterns: Not all of

 Objective vs. subjective interestingness measures

 Different views, different classifications

 Kinds of data to be mined

 Kinds of knowledge to be discovered

 Kinds of techniques utilized

 Kinds of applications adapted

 Data mining systems, DBMS, Data warehouse

 Interactive mining multi-level knowledge

 Integration of multiple mining functions

 Rule induction is an area of machine learning in which formal

 Data mining: Discovering interesting patterns from large

 Data mining and KDD

 Journal: Data Mining and Knowledge Discovery, KDD Explorations

 Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.

 AI & Machine Learning

 Journals: Machine Learning, Artificial Intelligence, etc.

 Journals: Annals of statistics, etc.

 Journals: IEEE Trans. visualization and computer graphics, etc.

 Introduction to Data Mining

You might also like