KEMBAR78
Data Mining and Knowledge Discovery in Large Databases | PDF
Outline
                                                  Data Mining
                                                        and
  „We are drowning in data, but we are starving for knowledge“
                                          Knowledge Discovery
  Part 2: Clustering                         in Large Databases
          - Hierarchical Clustering
          - Divisive Clustering
          - Density based Clustering



                                              Erik Kropat
                                       University of the Bundeswehr
                                            Munich, Germany
Why “Data Mining”?
• Companies are collecting massive amounts of data on customers,
  operations, and the competitive landscape.

        Firms can gain a competitive advantage from these data


• But, there is far too much data
   − Online shops record purchase behaviours for millions of customers
     (sometimes with hundreds features for each customer)
   − Phone companies keep info on 100’s of millions of accounts
     (each with thousands of transactions)
   − Databases can often be hundreds of terabytes in size
     (this will be peanuts in the future).
Why “Data Mining”?

     „We are drowning in data, but we are starving for knowledge“
                                                          (John Naisbitt)
Knowledge Discovery in Large Databases

      Process of finding valuable and useful patterns in datasets
Analysis of data sets from …
•   businesses & investments
•   finance & economics
•   science & technology
•   bioinformatics
• telecommunication



                               … or more complex data sets
                               • multimedia & sound
                               • images & video
                               • automatic news analysis
                               • social media analysis.
What are the data sources?
Consumer data
−   Credit card transactions data
−   Supermarket transactions data
−   Loyalty cards
−   Web server logs
−   Social media

                                    Variety of features
                                    − Name and address
                                    − History of shopping and purchases
                                    − Demographics
                                    − Credit rating
                                    − Quality & market share of products
Business Intelligence ‒
Customer Data Analytics & Market Analysis

  −   customer segmentation
  −   market basket analysis
  −   target marketing
  −   geo-marketing
  −   cross-selling / up-selling
  −   customer relation management
Market Basket Analysis ‒ Cross Selling
Key Tasks
               Decision Trees


              Assocation Rule
                 Learning

             Neural Networks


             Digital Forensics

            Automatic Derivation
               of Ontologies
Retail
• Customer segmentation
   Identify purchase patterns of „typical“ customers
   Targeted advertisement, costumized pricing, cost-effective promotions

• Market basket analysis
   Identify the purchase behaviour of groups of customers

• Sales promotions
   Identify likely responders to sales promotions
Banking

• Credit rating
   Given a large number names, which persons are likely
   to default on their credit cards?

• Fraud detection
   −   Credit card fraud detection
   −   Network intrusion detection
Telecommunications
Companies are facing an escalating competition and are forced to
aggressively market special pricing programs aimed at retaining
existing customers and attracting new ones.

• Call detail record analysis
     Identify customer segments with similar use patterns.
     Offer attractive pricing and feature promotions.

• Customer loyalty / customer churn management
     Some customers repeatedly „churn“ (switch providers).
     Identify those who are likely to switch or who are likely to remain loyal.
     Companies can target their spending on customers who will produce the most profit.

• Set pricing strategies in a highly competitive market.
Big Data is Big Business
Companies are using their data sets to aim their services
and products with increasing precision.


Business Intelligence
  −   SAP AG is a German global software corporation
      that provides enterprise software applications.
  −   SAP AG is one of the largest enterprise software companies.

  −   In October 2007, SAP AG announced a $6.8 billion deal to acquire „Business Objects“.
  −   Since 2009 „Business Objects“ is a division of SAP AG instead of a separate company.
Outline
Outline

  Part 1: Introduction                  Part 4: Classification
          - What is „Data Mining“ ?             - k-th Nearest Neighbors
          - Examples                            - Support Vector Machines

  Part 2: Formal Concept Analysis       Part 5: Spatial Data Mining
          - Contexts and Concepts                - DBSCAN
          - Concept Lattices                     - Density & Connectivity

  Part 3: Clustering                    Part 6: Regulatory Networks
          -   Hierarchical Clustering            - Eco-Finance Networks
          -   Partitional Clustering             - Gene-Environment Networks
          -   Fuzzy Clustering
          -   Graph Based Clustering
Questions ?

              For more information after today
                Email me at   Erik.Kropat@unibw.de

Data Mining and Knowledge Discovery in Large Databases

  • 1.
    Outline Data Mining and „We are drowning in data, but we are starving for knowledge“ Knowledge Discovery Part 2: Clustering in Large Databases - Hierarchical Clustering - Divisive Clustering - Density based Clustering Erik Kropat University of the Bundeswehr Munich, Germany
  • 2.
    Why “Data Mining”? •Companies are collecting massive amounts of data on customers, operations, and the competitive landscape. Firms can gain a competitive advantage from these data • But, there is far too much data − Online shops record purchase behaviours for millions of customers (sometimes with hundreds features for each customer) − Phone companies keep info on 100’s of millions of accounts (each with thousands of transactions) − Databases can often be hundreds of terabytes in size (this will be peanuts in the future).
  • 3.
    Why “Data Mining”? „We are drowning in data, but we are starving for knowledge“ (John Naisbitt)
  • 4.
    Knowledge Discovery inLarge Databases Process of finding valuable and useful patterns in datasets
  • 5.
    Analysis of datasets from … • businesses & investments • finance & economics • science & technology • bioinformatics • telecommunication … or more complex data sets • multimedia & sound • images & video • automatic news analysis • social media analysis.
  • 6.
    What are thedata sources? Consumer data − Credit card transactions data − Supermarket transactions data − Loyalty cards − Web server logs − Social media Variety of features − Name and address − History of shopping and purchases − Demographics − Credit rating − Quality & market share of products
  • 7.
    Business Intelligence ‒ CustomerData Analytics & Market Analysis − customer segmentation − market basket analysis − target marketing − geo-marketing − cross-selling / up-selling − customer relation management
  • 8.
    Market Basket Analysis‒ Cross Selling
  • 9.
    Key Tasks Decision Trees Assocation Rule Learning Neural Networks Digital Forensics Automatic Derivation of Ontologies
  • 10.
    Retail • Customer segmentation Identify purchase patterns of „typical“ customers Targeted advertisement, costumized pricing, cost-effective promotions • Market basket analysis Identify the purchase behaviour of groups of customers • Sales promotions Identify likely responders to sales promotions
  • 11.
    Banking • Credit rating Given a large number names, which persons are likely to default on their credit cards? • Fraud detection − Credit card fraud detection − Network intrusion detection
  • 12.
    Telecommunications Companies are facingan escalating competition and are forced to aggressively market special pricing programs aimed at retaining existing customers and attracting new ones. • Call detail record analysis Identify customer segments with similar use patterns. Offer attractive pricing and feature promotions. • Customer loyalty / customer churn management Some customers repeatedly „churn“ (switch providers). Identify those who are likely to switch or who are likely to remain loyal. Companies can target their spending on customers who will produce the most profit. • Set pricing strategies in a highly competitive market.
  • 13.
    Big Data isBig Business Companies are using their data sets to aim their services and products with increasing precision. Business Intelligence − SAP AG is a German global software corporation that provides enterprise software applications. − SAP AG is one of the largest enterprise software companies. − In October 2007, SAP AG announced a $6.8 billion deal to acquire „Business Objects“. − Since 2009 „Business Objects“ is a division of SAP AG instead of a separate company.
  • 14.
  • 15.
    Outline Part1: Introduction Part 4: Classification - What is „Data Mining“ ? - k-th Nearest Neighbors - Examples - Support Vector Machines Part 2: Formal Concept Analysis Part 5: Spatial Data Mining - Contexts and Concepts - DBSCAN - Concept Lattices - Density & Connectivity Part 3: Clustering Part 6: Regulatory Networks - Hierarchical Clustering - Eco-Finance Networks - Partitional Clustering - Gene-Environment Networks - Fuzzy Clustering - Graph Based Clustering
  • 16.
    Questions ? For more information after today Email me at Erik.Kropat@unibw.de