KEMBAR78
Data Mining | PDF | Data Mining | Data
0% found this document useful (0 votes)
4 views20 pages

Data Mining

The document discusses data mining as a crucial process for extracting valuable patterns from large datasets, driven by the need for competitive advantage and the availability of quality data. It outlines various data mining tasks, methods, and applications across different industries, emphasizing the importance of data quality and the use of sophisticated tools. Additionally, it addresses common misconceptions and mistakes associated with data mining practices.

Uploaded by

yopafe2903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

Data Mining

The document discusses data mining as a crucial process for extracting valuable patterns from large datasets, driven by the need for competitive advantage and the availability of quality data. It outlines various data mining tasks, methods, and applications across different industries, emphasizing the importance of data quality and the use of sophisticated tools. Additionally, it addresses common misconceptions and mistakes associated with data mining practices.

Uploaded by

yopafe2903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Mining for Business

Intelligence
Data Mining Concepts and Definitions
Why Data Mining?
 More intense competition at the global scale
 Recognition of the value in data sources
 Availability of quality data on customers,
vendors, transactions, Web, etc.
 Consolidation and integration of data
repositories into data warehouses
 The exponential increase in data processing
and storage capabilities; and decrease in cost
 Movement toward conversion of information
resources into nonphysical form
Definition of Data Mining
 The nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data stored in
structured databases - Fayyad et al., (1996)
 Keywords in this definition: Process, nontrivial,
valid, novel, potentially useful, understandable
 Data mining: a misnomer?
 Other names: knowledge extraction, pattern
analysis, knowledge discovery, information
harvesting, pattern searching, data dredging
Data Mining at the Intersection of
Many Disciplines

Pattern
Recognition

DATA Machine
MINING Learning

Mathematical
Modeling Databases

Management Science &


Information Systems
Data Mining Characteristics/Objectives
 Source of data for DM is often a consolidated
data warehouse (not always!).
 DM environment is usually a client-server or a
Web-based information systems architecture.
 Data is the most critical ingredient for DM
which may include soft/unstructured data.
 The miner is often an end user.
 Striking it rich requires creative thinking.
 Data mining tools’ capabilities and ease of use
are essential (Web, Parallel processing, etc.).
Data in Data Mining
 Data: a collection of facts usually obtained as the
result of experiences, observations, or experiments
 Data may consist of numbers, words, and images
 Data: lowest level of abstraction (from which
information and knowledge are derived)
Data
- DM with different
data types?
Categorical Numerical - Other data types?

Nominal Ordinal Interval Ratio


What Does DM Do? How Does it Work?

 DM extracts patterns from data


 Pattern? A mathematical (numeric and/or symbolic)
relationship among data items
 Types of patterns
 Association: (Beer & diapers in a markets basket analysis)
 Prediction: Predicts future occurrences based on the past (Super
Bowl winner, temperature on a specific day)
 Cluster: (segmentation based on demographics or past purchase
behavior)
 Sequential (or time series) relationships: existing bank
customer with checking account will open savings account within a
year
A Taxonomy for Data Mining Tasks
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Other Data Mining Tasks
 These are in addition to the primary DM
tasks (prediction, association, clustering)

 Time-series forecasting
 Part of sequence or link analysis?
 Visualization
 Another data mining task?

 Types of DM
 Hypothesis-driven data mining
 Discovery-driven data mining
Data Mining Applications
 Customer Relationship Management
 Maximize return on marketing campaigns
 Improve customer retention (churn analysis)
 Maximize customer value (cross- or up-selling)
 Identify and treat most valued customers

 Banking & Other Financial


 Automate the loan application process
 Detecting fraudulent transactions
 Maximize customer value (cross- and up-selling)
 Optimizing cash reserves with forecasting
Data Mining Applications (cont.)
 Retailing and Logistics
 Optimize inventory levels at different locations
 Improve the store layout and sales promotions
 Optimize logistics by predicting seasonal effects
 Minimize losses due to limited shelf life

 Manufacturing and Maintenance


 Predict/prevent machinery failures
 Identify anomalies in production systems to
optimize manufacturing capacity
 Discover novel patterns to improve product quality
Data Mining Applications (cont.)
 Brokerage and Securities Trading
 Predict changes on certain bond prices
 Forecast the direction of stock fluctuations
 Assess the effect of events on market movements
 Identify and prevent fraudulent activities in trading

 Insurance
 Forecast claim costs for better business planning
 Determine optimal rate plans
 Optimize marketing to specific customers
 Identify and prevent fraudulent claim activities
Data Mining Applications (cont.)
 Computer hardware and software
 Science and engineering
 Government and defense
 Homeland security and law enforcement
 Travel industry
 Healthcare Highly popular application
 Medicine areas for data mining

 Entertainment industry
 Sports
 Etc.
Data Mining Methods: Classification
 Most frequently used DM method
 Part of the machine-learning family
 Employ supervised learning
 Learn from past data, classify new data
 The output variable is categorical
(nominal or ordinal) in nature
 Classification versus regression?
 Classification versus clustering?
Classification Techniques
 Decision tree analysis
 Statistical analysis
 Neural networks
 Support vector machines
 Case-based reasoning
 Bayesian classifiers
 Genetic algorithms
 Rough sets
Decision Trees
 Employs the divide and conquer method
 Recursively divides a training set until each
division consists of examples from one class
A general 1. Create a root node and assign all of the training
algorithm data to it.
for 2. Select the best splitting attribute.
decision 3. Add a branch to the root node for each value of
tree the split. Split the data into mutually exclusive
building subsets along the lines of the specific split.
4. Repeat the steps 2 and 3 for each and every leaf
node until the stopping criteria is reached.
Data Mining SPSS PASW Modeler (formerly Clementine)

RapidMiner

SAS / SAS Enterprise Miner

Software Microsoft Excel

Your own code

Weka (now Pentaho)

 Commercial KXEN

MATLAB
 IBM SPSS Modeler Other commercial tools

(formerly Clementine)
KNIME

Microsoft SQL Server

 SAS – Enterprise Miner Other free tools

Zementis
 IBM – Intelligent Miner Oracle DM

StatSoft – Statistica Data


Statsoft Statistica

Salford CART, Mars, other

Miner Orange

Angoss
 … many more C4.5, C5.0, See5

Free and/or Open Source


Bayesia

Insightful Miner/S-Plus (now TIBCO)

 RapidMiner Megaputer

Viscovery

 Weka Clario Analytics


Total (w/ others) Alone
Miner3D
 … many more Thinkanalytics

0 20 40 60 80 100 120
Source: KDNuggets.com, May 2009
Data Mining Myths
 Data mining …
 provides instant solutions/predictions.
 is not yet viable for business applications.
 requires a separate, dedicated database.
 can only be done by those with advanced
degrees.
 is only for large firms that have lots of
customer data.
 is another name for good-old statistics.
Common Data Mining Blunders
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data
mining is and what it really can/cannot do
3. Not leaving sufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and not
at individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results
Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings
and quickly moving on
7. Running mining algorithms repeatedly and
blindly, without thinking about the next stage
8. Naively believing everything you are told
about the data
9. Naively believing everything you are told
about your own data mining analysis
10. Measuring your results differently from the
way your sponsor measures them

You might also like