KEMBAR78
Data mining concepts | PPTX
Group 7
What is Data
                  Mining ?



                                                Mining and discovery of new
                                                information in terms of
                                                patterns or
                                                rules from vast amounts of
                                                data.



The process of discovering meaningful new correlations, patterns and trends by sifting
through large amounts of data stored in repositoties, using pattern recognition
technologies as well as statical and methematics techniques.
Why we mine
  Data ?




  Commercial View Point :-
  Lots of data is being collected and warehoused .
  Computers have become cheaper and more powerful.
  Competitive Pressure is Strong .


  Scientific View Point :-
  Data collected and stored at enormous speeds (GB/hour).
  Traditional techniques infeasible for raw data.
  Data mining may help scientists.
On what kind of
   Data...?



          •   Relational databases
          •   Data warehouses
          •   Transactional databases
          •   Advanced database systems:
                   Object-relational
                   Spacial and Temporal
                   Time-series
                   Multimedia, text
                   WWW
What are the goals
 of Data mining?



    • Prediction  e.g. sales volume, earthquakes
    • Identification e.g. existence of genes, system
    intrusions
    • Classification of different categories e.g. discount
    seeking shoppers or loyal regular shoppers in a
    supermarket
    • Optimization of limited resources such as time,
    space, money or materials and maximization of
    outputs such as sales or profits
What are the
      applications of Data-
            Mining ?


● Marketing
                                     ● Finance
 Analysis of consumer behavior
                                      Creditworthiness of clients
 Advertising campaigns
                                      Performance analysis of finance
 Targeted mailings
                                        investments
 Segmentation of
                                      Fraud detection
  customers, stores, or products

● Manufacturing
                                     ● Health Care
 Optimization of resources
                                      Discovering patterns in X-ray
 Optimization of manufacturing
                                        images
  processes
                                      Analyzing side effects of drugs
 Product design based on customer
                                      Effectiveness of treatments
  requirements
What are the present
commercial tools for
   Data Mining ?




                     Data to knowledge
 SAS                                            Oracle data-miner




 Intelligent miner                 Clementine
How to build a data
  mining model?       An important concept is
                      that building a mining
                      model is part of a larger
                      process.
1. Defining
    the
 problem.     Clearly define the business
                       problem.
2. Preparing
    Data       consolidate and clean the data that
               was identified in the Defining the
               Problem step.
3.Exploring
   Data
              Explore the prepared data



       .
4.Building
 Models      Before you build a model, you must
             randomly separate the prepared data into
             separate training and testing datasets.
             You use the training dataset to build the
             model, and the testing dataset to test the
             accuracy of the model by creating
             prediction queries.
5. Exploring
and validating
models           Explore the models that you
                 have built and test their
                 effectiveness.
6. Deploying
and updating
               Deploy to a production
models         environment the models
               that performed the best.
What are the major
issues in Data-Mining
      concept ?

    Mining different kinds of knowledge in databases
    Interactive mining of knowledge at multiple levels of
     abstraction
    Incorporation of background knowledge
    Data mining query languages and ad-hoc data mining
    Expression and visualization of data mining results
    Handling noise and incomplete data
    Pattern evaluation: the interestingness problem
    Integration of the discovered knowledge with existing
     knowledge: A knowledge fusion problem
    Protection of data security, integrity, and privacy
How will be the future of
 Data-Mining concept?




      ● Active research is ongoing
       Neural Networks
       Regression Analysis
       Genetic Algorithms
      ● Data mining is used in many areas today. We
      cannot even begin to imagine what the future
      holds in its womb!
Thank You !

Data mining concepts

  • 1.
  • 2.
    What is Data Mining ? Mining and discovery of new information in terms of patterns or rules from vast amounts of data. The process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositoties, using pattern recognition technologies as well as statical and methematics techniques.
  • 3.
    Why we mine Data ? Commercial View Point :- Lots of data is being collected and warehoused . Computers have become cheaper and more powerful. Competitive Pressure is Strong . Scientific View Point :- Data collected and stored at enormous speeds (GB/hour). Traditional techniques infeasible for raw data. Data mining may help scientists.
  • 4.
    On what kindof Data...? • Relational databases • Data warehouses • Transactional databases • Advanced database systems: Object-relational Spacial and Temporal Time-series Multimedia, text WWW
  • 5.
    What are thegoals of Data mining? • Prediction e.g. sales volume, earthquakes • Identification e.g. existence of genes, system intrusions • Classification of different categories e.g. discount seeking shoppers or loyal regular shoppers in a supermarket • Optimization of limited resources such as time, space, money or materials and maximization of outputs such as sales or profits
  • 6.
    What are the applications of Data- Mining ? ● Marketing ● Finance  Analysis of consumer behavior  Creditworthiness of clients  Advertising campaigns  Performance analysis of finance  Targeted mailings investments  Segmentation of  Fraud detection customers, stores, or products ● Manufacturing ● Health Care  Optimization of resources  Discovering patterns in X-ray  Optimization of manufacturing images processes  Analyzing side effects of drugs  Product design based on customer  Effectiveness of treatments requirements
  • 7.
    What are thepresent commercial tools for Data Mining ? Data to knowledge SAS Oracle data-miner Intelligent miner Clementine
  • 8.
    How to builda data mining model? An important concept is that building a mining model is part of a larger process.
  • 9.
    1. Defining the problem. Clearly define the business problem.
  • 10.
    2. Preparing Data consolidate and clean the data that was identified in the Defining the Problem step.
  • 11.
    3.Exploring Data Explore the prepared data .
  • 12.
    4.Building Models Before you build a model, you must randomly separate the prepared data into separate training and testing datasets. You use the training dataset to build the model, and the testing dataset to test the accuracy of the model by creating prediction queries.
  • 13.
    5. Exploring and validating models Explore the models that you have built and test their effectiveness.
  • 14.
    6. Deploying and updating Deploy to a production models environment the models that performed the best.
  • 15.
    What are themajor issues in Data-Mining concept ?  Mining different kinds of knowledge in databases  Interactive mining of knowledge at multiple levels of abstraction  Incorporation of background knowledge  Data mining query languages and ad-hoc data mining  Expression and visualization of data mining results  Handling noise and incomplete data  Pattern evaluation: the interestingness problem  Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem  Protection of data security, integrity, and privacy
  • 16.
    How will bethe future of Data-Mining concept? ● Active research is ongoing  Neural Networks  Regression Analysis  Genetic Algorithms ● Data mining is used in many areas today. We cannot even begin to imagine what the future holds in its womb!
  • 17.