KEMBAR78
Lecture 1 and 2 - Introduction and Background | PDF | Data Mining | Databases
0% found this document useful (0 votes)
4 views28 pages

Lecture 1 and 2 - Introduction and Background

Uploaded by

Ahmed kaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views28 pages

Lecture 1 and 2 - Introduction and Background

Uploaded by

Ahmed kaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Mining

Lecture 1
Course Outline
1. Introduction to data mining
2. Data Pre-processing
3. Information Retrieval
4. Associations & Rule Generation
5. Classification and Prediction
6. ML Algorithms and Models
7. Clustering
8. Correlation analysis
Course Description
• Through this course students can learn:
• Basic principles, techniques, tools and applications of Data
Mining
• The concepts of data pre-processing, cluster analysis,
classification, prediction and frequent pattern mining
• Science of data mining as the automatic extraction of
patterns representing knowledge stored in large databases,
data warehouses, and other massive data repositories
What Is Data Mining?
• Text book:
• Data Mining: Concepts and Techniques (Latest Edition) by
Jiawei Han and Micheline Kamber

• Reference book:
• Elements of Statistical Learning by Hastie, Tibshirani and
Friedman
• Freely available online
What Is Data Mining?
• Data mining is the principle of sorting through large
amounts of data and picking out relevant information
 The extraction of knowledge from data is called data
mining
 Data mining can also be defined as the exploration
and analysis of large quantities of data in order to
discover meaningful patterns and rules
 The ultimate goal of data mining is to discover
knowledge
What Is Data Mining?
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/ pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Data Rich, Information Poor
Motivation
Lots of data is being collected and
warehoused
 Web data, e-commerce
 purchases at department/grocery stores
 Bank/Credit Card transactions
Computers have become cheaper and
more powerful
Data collected and stored at enormous
speeds (GB/hour)
Motivation
Traditional techniques infeasible for raw
data
Human analysts may take weeks to
discover useful information
We are drowning in data, but starving for
knowledge!
Data mining may help scientists
 Classifying and segmenting data
Why data mining is important?
Rapid computerization of businesses produce
huge amount of data
How to make best use of data?
A growing realization:
knowledge discovered from data can be used
for competitive advantage
Classification and future prediction
Why data mining is important?
• Data analysis and decision support
• Market analysis and management
• Risk analysis and management
• Fraud detection and detection of unusual
patterns (outliers)
• Other Applications
• Text mining (news group, email) and Web
mining
• Stream data mining

Why data mining is important?
• Ex. 1: Market Analysis and Management
• Target marketing
• Cross-market analysis
• Customer profiling
• Customer requirement analysis

• Ex. 2: Fraud Detection & Mining Unusual


Patterns
• Auto insurance
• Money laundering
• Medical insurance
Why data mining is important?
• Ex.3: Biomedical Applications
• Approaches: Clustering & Classification
• Applications:
• Automated diagnosis
• Discovery of disease trends
• Prediction of epidemics
• Discovering causes for certain conditions
• Patient data retrieval
Data Mining: Combination of
Multiple Disciplines
Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Artificial
Algorithm Intelligence
Knowledge Discovery (KDD) Process
Data mining—core of
Pattern Evaluation
knowledge discovery process

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
• Find useful features, dimensionality/variable reduction, invariant
representation
• Choosing functions of data mining
• summarization, classification, regression, association, clustering
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
• visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
DBA
Data Preprocessing/Integration, Data Warehouses

Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Architecture: Typical Data Mining System
Graphical User Interface

Pattern Evaluation
Know
Data Mining Engine ledge
-Base

Database or Data Warehouse Server

data cleaning, integration, and selection

Data World-Wide Other Info


Database Repositories
Warehouse Web
Evolution of Science
Before 1600, Theoretical Science
1600-1950s, Empirical Science
• 1950s-1990s, Computational Science
• 1990-now, Data Science
 The flood of data from new scientific instruments and
simulations
 The ability to economically store and manage petabytes of data
online
 The Internet and computing Grid that makes all these archives
universally accessible
Evolution of Database Technology
1960s:
 Data collection, database creation, IMS and network DBMS

1970s:
 Relational data model, relational DBMS implementation

1980s:
 RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
1990s—2000s:
 Data mining and data warehousing, multimedia databases,
and Web databases
Evolution of Database Technology
Evolutionary Step Business Question Enabling Technologies Product Providers Product Providers

Data Collection "What was my total Computers, tapes, IBM, static data delivery
(1960s) revenue in the last disks
five years?"

Data Access "What were unit Relational databases Oracle, Sybase, dynamic data
(1980s) sales in last March?” (RDBMS), Structured Informix, IBM, delivery at record
Query Language Microsoft level
(SQL), ODBC

Data Warehousing "What were unit multidimensional Oracle, Pilot dynamic data
(1990) sales in New databases, data delivery at multiple
England last March? warehouses levels
Drill down to
Boston."

Data Mining "What’s likely to Advanced Pilot, Lockheed, Prospective,


( Emerging Today) happen to Boston algorithms, massive IBM, SGI, numerous proactive
unit sales next databases startups (nascent information delivery
month? Why?" industry)
Data Warehouse example
Data Warehouses: Data warehousing is defined as a process of
centralized data management and retrieval
It is repository of information collected from multiple sources, stored
under a unified schema and usually reside at a single site
The process Of Data Mining
There are 3 main steps in the Data Mining
process:
Preparation:
data is selected from the warehouse and
“cleansed”
Processing:
Different algorithms are used to process the
data in order to make predictions
Analysis:
Output is evaluated
Reasons for growing popularity
Growing data volume-
enormous amount of existing and
appearing data that require processing.
Limitations of Human Analysis-
humans lacking objectiveness when
analyzing.
Low cost of Machine Learning-
the data mining process has a lower cost
than hiring highly trained professionals to
analyze data.
Applications of Data Mining
Data Mining is applied in the following areas:
Prediction of the Stock Market:
predicting the future trends
Bankruptcy prediction:
prediction based on computer generated rules,
using models
Foreign Exchange Market:
data Mining is used to identify trading rules
Fraud Detection:
construction of algorithms and models that will
help recognize a variety of fraud patterns
Results of Data Mining Include:
Forecasting what may happen in the
future
Classifying people or things into groups
by recognizing patterns
Clustering people or things into groups
based on their attributes
Associating what events are likely to
occur together
Sequencing what events are likely to lead
to later events
Data Mining Functions
Two types of model:
Predictive models predict unknown values
based on
known data

Descriptive models identify patterns in data

Each type has several sub-categories, each of


which has many algorithms
Data Mining Functions

You might also like