KEMBAR78
Introduction to data mining technique | PPTX
INTRODUCTION TO DATA MINING
TECHNIQUE
By – Pawneshwar Datt Rai
WHAT IS DATA MINING?
 Data mining is also called knowledge discovery and data
mining (KDD)
 Data mining is
 extraction of useful patterns from data sources, e.g.,
databases, texts, web, image.
 Patterns must be:
 valid, novel, potentially useful, understandable
This PPT presented By - Pawneshwar Datt Rai
EXAMPLE OF DISCOVERED PATTERNS
 Association rules:
“80% of customers who buy cheese and milk also buy
bread, and 5% of customers buy all of them together”
Cheese, Milk Bread [sup =5%, confid=80%]
This PPT presented By - Pawneshwar Datt Rai
MAIN DATA MINING TASKS
 Classification:
mining patterns that can classify future data into known
classes.
 Association rule mining
mining any rule of the form X  Y, where X and Y are
sets of data items.
 Clustering
identifying a set of similarity groups in the data
This PPT presented By - Pawneshwar Datt Rai
MAIN DATA MINING TASKS
 Sequential pattern mining:
A sequential rule: A B, says that event A will be
immediately followed by event B with a certain confidence
 Deviation detection:
discovering the most significant changes in data
 Data visualization: using graphical methods to show
patterns in data.
This PPT presented By - Pawneshwar Datt Rai
WHY IS DATA MINING IMPORTANT?
 Rapid computerization of businesses produce huge
amount of data
 How to make best use of data?
 A growing realization: knowledge discovered from
data can be used for competitive advantage.
This PPT presented By - Pawneshwar Datt Rai
WHY IS DATA MINING NECESSARY?
 Make use of your data assets
 There is a big gap from stored data to knowledge; and
the transition won’t occur automatically.
 Many interesting things you want to find cannot be found
using database queries
“find me people likely to buy my products”
“Who are likely to respond to my promotion”
This PPT presented By - Pawneshwar Datt Rai
WHY DATA MINING NOW?
 The data is abundant.
 The data is being warehoused.
 The computing power is affordable.
 The competitive pressure is strong.
 Data mining tools have become available
This PPT presented By - Pawneshwar Datt Rai
RELATED FIELDS
 Data mining is an emerging multi-disciplinary field:
Statistics
Machine learning
Databases
Information retrieval
Visualization
etc.
This PPT presented By - Pawneshwar Datt Rai
DATA MINING (KDD) PROCESS
 Understand the application domain
 Identify data sources and select target data
 Pre-process: cleaning, attribute selection
 Data mining to extract patterns or models
 Post-process: identifying interesting or useful patterns
 Incorporate patterns in real world tasks
This PPT presented By - Pawneshwar Datt Rai
DATA MINING APPLICATIONS
 Marketing, customer profiling and retention,
identifying potential customers, market
segmentation.
 Fraud detection
identifying credit card fraud, intrusion detection
 Scientific data analysis
 Text and web mining
 Any application that involves a large amount of data.
This PPT presented By - Pawneshwar Datt Rai
WEB DATA EXTRACTION
Data
region1
Data
region2
A data
record
A data
record
This PPT presented By - Pawneshwar Datt Rai
OPINION ANALYSIS
 Word-of-mouth on the Web
 The Web has dramatically changed the way that
consumers express their opinions.
 One can post reviews of products at merchant
sites, Web forums, discussion groups, blogs
 Techniques are being developed to exploit these
sources.
 Benefits of Review Analysis
 Potential Customer: No need to read many reviews
 Product manufacturer: market intelligence, product
benchmarking
This PPT presented By - Pawneshwar Datt Rai
FEATURE BASED ANALYSIS &
SUMMARIZATION
 Extracting product features (called Opinion Features) that
have been commented on by customers.
 Identifying opinion sentences in each review and
deciding whether each opinion sentence is positive or
negative.
 Summarizing and comparing results.
This PPT presented By - Pawneshwar Datt Rai
A Happy and Prosperous day to all friends.
This PPT presented By – Pawneshwar Datt Rai
ThisPPTpresentedBy-PawneshwarDattRai

Introduction to data mining technique

  • 1.
    INTRODUCTION TO DATAMINING TECHNIQUE By – Pawneshwar Datt Rai
  • 2.
    WHAT IS DATAMINING?  Data mining is also called knowledge discovery and data mining (KDD)  Data mining is  extraction of useful patterns from data sources, e.g., databases, texts, web, image.  Patterns must be:  valid, novel, potentially useful, understandable This PPT presented By - Pawneshwar Datt Rai
  • 3.
    EXAMPLE OF DISCOVEREDPATTERNS  Association rules: “80% of customers who buy cheese and milk also buy bread, and 5% of customers buy all of them together” Cheese, Milk Bread [sup =5%, confid=80%] This PPT presented By - Pawneshwar Datt Rai
  • 4.
    MAIN DATA MININGTASKS  Classification: mining patterns that can classify future data into known classes.  Association rule mining mining any rule of the form X  Y, where X and Y are sets of data items.  Clustering identifying a set of similarity groups in the data This PPT presented By - Pawneshwar Datt Rai
  • 5.
    MAIN DATA MININGTASKS  Sequential pattern mining: A sequential rule: A B, says that event A will be immediately followed by event B with a certain confidence  Deviation detection: discovering the most significant changes in data  Data visualization: using graphical methods to show patterns in data. This PPT presented By - Pawneshwar Datt Rai
  • 6.
    WHY IS DATAMINING IMPORTANT?  Rapid computerization of businesses produce huge amount of data  How to make best use of data?  A growing realization: knowledge discovered from data can be used for competitive advantage. This PPT presented By - Pawneshwar Datt Rai
  • 7.
    WHY IS DATAMINING NECESSARY?  Make use of your data assets  There is a big gap from stored data to knowledge; and the transition won’t occur automatically.  Many interesting things you want to find cannot be found using database queries “find me people likely to buy my products” “Who are likely to respond to my promotion” This PPT presented By - Pawneshwar Datt Rai
  • 8.
    WHY DATA MININGNOW?  The data is abundant.  The data is being warehoused.  The computing power is affordable.  The competitive pressure is strong.  Data mining tools have become available This PPT presented By - Pawneshwar Datt Rai
  • 9.
    RELATED FIELDS  Datamining is an emerging multi-disciplinary field: Statistics Machine learning Databases Information retrieval Visualization etc. This PPT presented By - Pawneshwar Datt Rai
  • 10.
    DATA MINING (KDD)PROCESS  Understand the application domain  Identify data sources and select target data  Pre-process: cleaning, attribute selection  Data mining to extract patterns or models  Post-process: identifying interesting or useful patterns  Incorporate patterns in real world tasks This PPT presented By - Pawneshwar Datt Rai
  • 11.
    DATA MINING APPLICATIONS Marketing, customer profiling and retention, identifying potential customers, market segmentation.  Fraud detection identifying credit card fraud, intrusion detection  Scientific data analysis  Text and web mining  Any application that involves a large amount of data. This PPT presented By - Pawneshwar Datt Rai
  • 12.
    WEB DATA EXTRACTION Data region1 Data region2 Adata record A data record This PPT presented By - Pawneshwar Datt Rai
  • 13.
    OPINION ANALYSIS  Word-of-mouthon the Web  The Web has dramatically changed the way that consumers express their opinions.  One can post reviews of products at merchant sites, Web forums, discussion groups, blogs  Techniques are being developed to exploit these sources.  Benefits of Review Analysis  Potential Customer: No need to read many reviews  Product manufacturer: market intelligence, product benchmarking This PPT presented By - Pawneshwar Datt Rai
  • 14.
    FEATURE BASED ANALYSIS& SUMMARIZATION  Extracting product features (called Opinion Features) that have been commented on by customers.  Identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative.  Summarizing and comparing results. This PPT presented By - Pawneshwar Datt Rai
  • 15.
    A Happy andProsperous day to all friends. This PPT presented By – Pawneshwar Datt Rai ThisPPTpresentedBy-PawneshwarDattRai