KEMBAR78
Introduction to Data Mining | PPT
Introduction  to  Data Mining Dr.  Sushil Kulkarni   Jai Hind College  (sushiltry@yahoo.co.in)
—  Introduction to database  —  A Problem and A    Solution —  What Is Data Mining?  — Goal of Data Mining — What is (not) Data    Mining? — Convergence of 3 key    Technologies —  Data mining Functions —  Kinds of Data Mining    Problems Road Map
What is Database? A database is any organized collection of data.
Examples Co-workers
Examples Patient Information
Examples Airline reservation system
Data vs. information What is  data ? Data is unprocessed information. What is  information ? Information is data that have been organized and communicated in a coherent and meaningful manner.  Data  is converted into  information,  and information is converted into  knowledge . Knowledge; information evaluated and organized so that it can be used purposefully.
Why do we need a database? Keep records of our: Clients Staff Volunteers To keep a record of activities and interventions Keep sales  records Develop  reports Perform  research
Purpose of Database system Data Information Knowledge Action Is to transform
Database Database:   Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. Database management System:   A software system that enables users to define, create, and maintain the database and that provides controlled access to this database.
Who and How to do it ? Database Management System (DBMS) does this job.  Using Software tools:  Access, FileMaker, Lotus Notes, Oracle or SQL Server, …….  It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents.
hmm.. Let’s jump to Data Mining  With this background we will now see what is data Mining
A Problem … You are a marketing manager of a brokerage company —  Problem:  Churn  is too high >  Turnover is 40% (after six month  introductory  period ends)  —  Customers receive  incentives   (average cost:  ₹ 160) when account is opened —  Giving new incentives to everyone who might  leave is very expensive (as well as wasteful) —  Bringing back a customer after they leave is  both difficult and costly
A Solution … —  One month before the end of the introductory period is over, predict which customers will leave —  If you want to keep a customer that is predicted to churn,  offer  them something based on their  predicted value > The ones that are not predicted to churn need no  attention —  If you don’t want to keep the customer, do nothing —   How can you predict future behavior? > Tarot Cards > Magic 8 Ball
KDD Process Knowledge discovery in databases  (KDD)  is a multi step process of finding useful information and patterns in data Data Mining  is the use of algorithms to extract information and patterns derived by the KDD process. Many texts treat KDD and Data Mining as the same process, but it is also possible to think of Data Mining as the discovery part of KDD.
Steps of KDD Process Many texts treat KDD and Data Mining as the same process, but it is also possible to think of Data Mining as the discovery part of KDD. Knowledge discovery in databases (KDD)  is a multi step process of finding useful information and patterns in data Data Mining is the use of algorithms to extract information and patterns derived by the KDD process.
Steps of KDD Process 1. Selection- Data Extraction -Obtaining Data from heterogeneous data sources -Databases, Data warehouses, World wide web or other information repositories. 2. Preprocessing- Data Cleaning- Incomplete , noisy, inconsistent data to be cleaned- Missing data may be ignored or predicted, erroneous data may be deleted or corrected. 3. Transformation- Data Integration-  Combines data from multiple sources into a coherent store -Data can be encoded in common formats, normalized, reduced.
Steps of KDD Process 4. D ata mining – Apply algorithms to transformed data an extract  patterns. 5. Pattern Interpretation/evaluation  Pattern Evaluation-  Evaluate the interestingness of resulting patterns or  apply interestingness measures to filter out discovered patterns. Knowledge presentation-  present the mined knowledge- visualization techniques can be used.
What Is Data Mining? Some Definitions “ The  nontrivial  extraction  of implicit, previously unknown, and potentially  useful  information  from  data ” (Piatetsky-Shapiro) "...the  automated  or convenient  extraction  of  patterns representing   knowledge  implicitly stored or captured in large  databases , data warehouses, the Web, ... or data streams." (Han, pg xxi) “ ...the process of  discovering   patterns  in  data . The process must be  automatic  or (more usually) semiautomatic. The patterns discovered must be  meaningful ...” (Witten, pg 5) “ ... finding  hidden  information  in a  database .” (Dunham, pg 3) “ ...the process of employing one or more computer learning techniques to  automatically   analyse and extract   knowledge  from  data  contained within a  database .” (Roiger, pg 4)
Why Data Mining? That all sounds ... complicated. Why should I learn about Data Mining? What's wrong with just a relational database? Why would I want to go through these extra [complicated] steps? Isn't it expensive? It sounds like it takes a lot of skill, programming, computational time and storage space. Where's the benefit? Data Mining isn't just a cute academic exercise, it has very profitable real world uses. Practically all large companies and many governments perform data mining as part of their planning and analysis.
Goal of Data Mining —  Simplification and automation of the overall  statistical process, from data source (s) to model  application —  Changed over the years > Statistician replace data to a model > Many different data mining algorithms / tools  available > Statistical expertise required to build intelligence  into the software
Data Mining is …
What is (not) Data Mining? What is Data Mining? Certain names are more common in certain locations of Mumbai (Kulkarni, Shah, Iyer… ) Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) What is not Data Mining? Look up phone number in phone directory   Query a Web search engine for information about Amazon”
DB VS DM Processing Query Well defined SQL Query Poorly defined No precise query language Data Operational data Output Precise Subset of  database Data Not operational data Output Fuzzy Not a subset  of database
Convergence of 3 key Technologies
1. Increasing Computing Power —  Moore’s law  doubles computing power   every 18 months —  Powerful workstations  became common —  Cost effective servers (SMPs) provide  parallel processing  to the mass market —  Interesting tradeoff: < Small number of large analyses vs. large  number of small analyses
1. The Data Explosion The rate of  data  creation is  accelerating  each year. In 2003, UC Berkeley estimated that the previous year generated 5 exabytes of data, of which 92% was stored on electronically accessible media.  Mega < Giga < Tera < Peta < Exa ...  All the  data  in all the  books  in the US Library of Congress is  ~136 Terabytes.  So 37,000  New  Libraries of Congress in 2002. VLBI Telescopes produce  16 Gigabytes  of data every second.  Google searches 18 billion+ accessible web pages.
1. The Data Explosion Implications As the amount of  data  increases , the proportion of  information decreases .  As more and more data is generated automatically, we need to find automatic solutions to turn those stored raw results into information. Companies need to turn stored data into profit ... Otherwise why are they storing it?
2. Improved Data Collection and Management —  Data Collection ? Access ? Navigation ? Mining —  The more data the better (usually)
3. Statistical & Machine Learning Algorithms —  Techniques have often been waiting for computing  technology to catch up —  Statisticians already doing “manual data mining” —  Good machine learning is just the intelligent  application of statistical processes —  A lot of data mining research focused on tweaking existing techniques to get small percentage gains
3.Data/Information/Knowledge/Wisdom For example , a data mining application may tell you that there is a  correlation  between  buying  music magazines and beer , but it doesn't tell you how to use that knowledge. Should you put the two close together to reinforce the tendency, or should you put them far apart as people will buy them anyway and thus stay in the store longer? Data mining can help managers plan strategies for a company, it does not give them the strategies.
Data mining Functions All Data Mining functions can be thought of as attempting to find a model to fit the data. Each function needs criteria to create one model over another. Each function needs a technique to compare the data. Two types of model: –  Predictive models  predict unknown values based  on known data –  Descriptive models  identify patterns in data
Data mining Functions
Predictive Model —  A “black box” that makes predictions about  the future based on information from the  past and present —  Large number of inputs usually available
Kinds of Data Mining problems Database Data Mining Find all customers who have purchased milk Find all items which are frequently purchased with  milk. (association rules) Find all credit applicants with Aditi as first name  Identify customers who have purchased  more than  ₹  10,000 in the last month   Find all credit applicants who are poor credit risks.  (classification) Identify customers with similar buying habits.  (Clustering)
Classification Clustering Association Rule Kinds of Data Mining problems
Classification Classification Model
Definition of Classification Problem Given a database D={t 1 ,t 2 ,…,t n } and a set of  classes C={C 1 ,…,C m }, the Classification Problem  is to define a mapping  f: D  C where each t  i  is assigned to one class .
Example: Credit Card Training  Set Learn  Classifier Test Set Model
Another Example ... In which group, these object belongs to ? Group 1: Delia Group 2: Roses Target Object (Experiment reported on in Cognitive Science, 2002) oopps
Resemblance People classify things by finding other items that are  similar  which have already been classified. For example:  Is a new species a  bird ? Does it have the same attributes as lots of other birds? If so, then it's probably a bird too. A combination of rote memorization and the notion of  'resembles'. Although  kiwis  can't fly like most other birds, they resemble  birds more than they resemble other types of animals. So the problem is to find which instances most closely  resemble the instance to be classified.
Few  More Examples Loan companies  can “give you results in minutes” by classifying you into a  good credit risk or a bad risk,  based on your personal information and a large supply of previous, similar customers. Cell phone companies  can classify customers into those likely to leave, and hence need enticement, and those that are likely to stay regardless. The data generated by  airplane engines  can be used to determine when it  needs to be serviced . By discovering the patterns that are indicative of problems, companies can service working engines less often (increasing profit) and discover faults before they  materialise (increasing safety).
Clustering Classification is supervised learning the supervision comes from labeling the instances with the class. Clustering is unsupervised learning -- there are no predefined class labels, no training set. So our clustering algorithm needs to assign a cluster to each instance such that all objects with the same cluster are more similar than others.
Clustering Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups The goal is to find the most 'natural' groupings of the instances. - Within a cluster:  Maximize similarity between instances. - Between clusters:  Minimize similarity between instances. Inter-cluster distances are maximized Intra-cluster distances are minimized
Clustering For example, we might have the following data: Where the axes are two dimensions and shape is a third, nominal attribute.
Clustering A clustering algorithm might find three clusters: Even though there are some squares and circles mixed together.
Outliers Cluster 1 Cluster 2 Outliers
What is a natural grouping among these objects? School Employees   Tatkare’s Family   Males   Females   Clustering is subjective
What is Similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features.   Similarity is hard to define, but…  “ We know it when we see it ” The real meaning of similarity is a philosophical question. We will take a more pragmatic approach.  Webster's Dictionary
Clustering Problem  Given a database D={t 1 ,t 2 ,…,t n } of tuples and an integer value k, the  Clustering Problem  is to define a mapping f:D  {1,..,k} where each t i  is assigned to one cluster K j , 1<=j<=k. A  Cluster , K j , contains precisely those tuples mapped to it. Unlike classification problem, clusters are not known a priori.
Applications Marketing:  Discover consumer groups based on their purchasing habits City Planning:  Identify groups of buildings by type, value, location
Applications Image Processing:  Identify clusters of similar images (eg horses) Biological:  Discover groups of plants/animals with similar properties
Applications Given: A source of textual documents Similarity measure e.g., how many words are common in these documents Clustering System Similarity measure Documents source Doc Doc Doc Doc Doc Doc Doc Doc Doc Doc Find: Several clusters of documents that are  relevant  to each other
Association Rules  A common application  is  market basket  analysis   which (1) items are frequently  sold together at a  supermarket (2) arranging items on  shelves which items  should be promoted  together
Association Rule Discovery
Association Rule Discovery Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
Market basket: Rule form:  “ Body   ead [support,  confidence] ” . buys(X, `beer')    buys(X, “snacks')  [1%, 60%] (a) If a customer X purchased `beer',  60% of them purchased `snacks' (b) 1% of all transactions contain the  items `beer' and `snacks‘ together Association Rule Discovery
A Weka bird is a strong brown bird which is native to New Zealand and grows to be about the same size as a chicken. The Weka was once fairly common on the North and South Islands of New Zealand but over the years has heavily declined on the North Island due to the major damage of their habitats.
Three graphical user interfaces “ The Explorer” (exploratory data analysis) “ The Experimenter” (experimental environment) “ The KnowledgeFlow” (new process model inspired interface) WEKA is available at http:// www.cs.waikato.ac.nz/ml/weka
Witten, Ian and Eibe Frank,  Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann, 2005 Dunham, Margaret H,  Data Mining: Introductory and Advanced Topics, Prentice Hall, 2003 References
‘ dbmsnotes’ -  http:// tech.groups.yahoo.com/group/dbmsnotes /   References: Yahoo Group
THANKS!!

Introduction to Data Mining

  • 1.
    Introduction to Data Mining Dr. Sushil Kulkarni Jai Hind College (sushiltry@yahoo.co.in)
  • 2.
    — Introductionto database — A Problem and A Solution — What Is Data Mining? — Goal of Data Mining — What is (not) Data Mining? — Convergence of 3 key Technologies — Data mining Functions — Kinds of Data Mining Problems Road Map
  • 3.
    What is Database?A database is any organized collection of data.
  • 4.
  • 5.
  • 6.
  • 7.
    Data vs. informationWhat is data ? Data is unprocessed information. What is information ? Information is data that have been organized and communicated in a coherent and meaningful manner. Data is converted into information, and information is converted into knowledge . Knowledge; information evaluated and organized so that it can be used purposefully.
  • 8.
    Why do weneed a database? Keep records of our: Clients Staff Volunteers To keep a record of activities and interventions Keep sales records Develop reports Perform research
  • 9.
    Purpose of Databasesystem Data Information Knowledge Action Is to transform
  • 10.
    Database Database: Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. Database management System: A software system that enables users to define, create, and maintain the database and that provides controlled access to this database.
  • 11.
    Who and Howto do it ? Database Management System (DBMS) does this job. Using Software tools: Access, FileMaker, Lotus Notes, Oracle or SQL Server, ……. It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents.
  • 12.
    hmm.. Let’s jumpto Data Mining With this background we will now see what is data Mining
  • 13.
    A Problem …You are a marketing manager of a brokerage company — Problem: Churn is too high > Turnover is 40% (after six month introductory period ends) — Customers receive incentives (average cost: ₹ 160) when account is opened — Giving new incentives to everyone who might leave is very expensive (as well as wasteful) — Bringing back a customer after they leave is both difficult and costly
  • 14.
    A Solution …— One month before the end of the introductory period is over, predict which customers will leave — If you want to keep a customer that is predicted to churn, offer them something based on their predicted value > The ones that are not predicted to churn need no attention — If you don’t want to keep the customer, do nothing — How can you predict future behavior? > Tarot Cards > Magic 8 Ball
  • 15.
    KDD Process Knowledgediscovery in databases (KDD) is a multi step process of finding useful information and patterns in data Data Mining is the use of algorithms to extract information and patterns derived by the KDD process. Many texts treat KDD and Data Mining as the same process, but it is also possible to think of Data Mining as the discovery part of KDD.
  • 16.
    Steps of KDDProcess Many texts treat KDD and Data Mining as the same process, but it is also possible to think of Data Mining as the discovery part of KDD. Knowledge discovery in databases (KDD) is a multi step process of finding useful information and patterns in data Data Mining is the use of algorithms to extract information and patterns derived by the KDD process.
  • 17.
    Steps of KDDProcess 1. Selection- Data Extraction -Obtaining Data from heterogeneous data sources -Databases, Data warehouses, World wide web or other information repositories. 2. Preprocessing- Data Cleaning- Incomplete , noisy, inconsistent data to be cleaned- Missing data may be ignored or predicted, erroneous data may be deleted or corrected. 3. Transformation- Data Integration- Combines data from multiple sources into a coherent store -Data can be encoded in common formats, normalized, reduced.
  • 18.
    Steps of KDDProcess 4. D ata mining – Apply algorithms to transformed data an extract patterns. 5. Pattern Interpretation/evaluation Pattern Evaluation- Evaluate the interestingness of resulting patterns or apply interestingness measures to filter out discovered patterns. Knowledge presentation- present the mined knowledge- visualization techniques can be used.
  • 19.
    What Is DataMining? Some Definitions “ The nontrivial extraction of implicit, previously unknown, and potentially useful information from data ” (Piatetsky-Shapiro) &quot;...the automated or convenient extraction of patterns representing knowledge implicitly stored or captured in large databases , data warehouses, the Web, ... or data streams.&quot; (Han, pg xxi) “ ...the process of discovering patterns in data . The process must be automatic or (more usually) semiautomatic. The patterns discovered must be meaningful ...” (Witten, pg 5) “ ... finding hidden information in a database .” (Dunham, pg 3) “ ...the process of employing one or more computer learning techniques to automatically analyse and extract knowledge from data contained within a database .” (Roiger, pg 4)
  • 20.
    Why Data Mining?That all sounds ... complicated. Why should I learn about Data Mining? What's wrong with just a relational database? Why would I want to go through these extra [complicated] steps? Isn't it expensive? It sounds like it takes a lot of skill, programming, computational time and storage space. Where's the benefit? Data Mining isn't just a cute academic exercise, it has very profitable real world uses. Practically all large companies and many governments perform data mining as part of their planning and analysis.
  • 21.
    Goal of DataMining — Simplification and automation of the overall statistical process, from data source (s) to model application — Changed over the years > Statistician replace data to a model > Many different data mining algorithms / tools available > Statistical expertise required to build intelligence into the software
  • 22.
  • 23.
    What is (not)Data Mining? What is Data Mining? Certain names are more common in certain locations of Mumbai (Kulkarni, Shah, Iyer… ) Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) What is not Data Mining? Look up phone number in phone directory Query a Web search engine for information about Amazon”
  • 24.
    DB VS DMProcessing Query Well defined SQL Query Poorly defined No precise query language Data Operational data Output Precise Subset of database Data Not operational data Output Fuzzy Not a subset of database
  • 25.
    Convergence of 3key Technologies
  • 26.
    1. Increasing ComputingPower — Moore’s law doubles computing power every 18 months — Powerful workstations became common — Cost effective servers (SMPs) provide parallel processing to the mass market — Interesting tradeoff: < Small number of large analyses vs. large number of small analyses
  • 27.
    1. The DataExplosion The rate of data creation is accelerating each year. In 2003, UC Berkeley estimated that the previous year generated 5 exabytes of data, of which 92% was stored on electronically accessible media. Mega < Giga < Tera < Peta < Exa ... All the data in all the books in the US Library of Congress is ~136 Terabytes. So 37,000 New Libraries of Congress in 2002. VLBI Telescopes produce 16 Gigabytes of data every second. Google searches 18 billion+ accessible web pages.
  • 28.
    1. The DataExplosion Implications As the amount of data increases , the proportion of information decreases . As more and more data is generated automatically, we need to find automatic solutions to turn those stored raw results into information. Companies need to turn stored data into profit ... Otherwise why are they storing it?
  • 29.
    2. Improved DataCollection and Management — Data Collection ? Access ? Navigation ? Mining — The more data the better (usually)
  • 30.
    3. Statistical &Machine Learning Algorithms — Techniques have often been waiting for computing technology to catch up — Statisticians already doing “manual data mining” — Good machine learning is just the intelligent application of statistical processes — A lot of data mining research focused on tweaking existing techniques to get small percentage gains
  • 31.
    3.Data/Information/Knowledge/Wisdom For example, a data mining application may tell you that there is a correlation between buying music magazines and beer , but it doesn't tell you how to use that knowledge. Should you put the two close together to reinforce the tendency, or should you put them far apart as people will buy them anyway and thus stay in the store longer? Data mining can help managers plan strategies for a company, it does not give them the strategies.
  • 32.
    Data mining FunctionsAll Data Mining functions can be thought of as attempting to find a model to fit the data. Each function needs criteria to create one model over another. Each function needs a technique to compare the data. Two types of model: – Predictive models predict unknown values based on known data – Descriptive models identify patterns in data
  • 33.
  • 34.
    Predictive Model — A “black box” that makes predictions about the future based on information from the past and present — Large number of inputs usually available
  • 35.
    Kinds of DataMining problems Database Data Mining Find all customers who have purchased milk Find all items which are frequently purchased with milk. (association rules) Find all credit applicants with Aditi as first name Identify customers who have purchased more than ₹ 10,000 in the last month Find all credit applicants who are poor credit risks. (classification) Identify customers with similar buying habits. (Clustering)
  • 36.
    Classification Clustering AssociationRule Kinds of Data Mining problems
  • 37.
  • 38.
    Definition of ClassificationProblem Given a database D={t 1 ,t 2 ,…,t n } and a set of classes C={C 1 ,…,C m }, the Classification Problem is to define a mapping f: D  C where each t i is assigned to one class .
  • 39.
    Example: Credit CardTraining Set Learn Classifier Test Set Model
  • 40.
    Another Example ...In which group, these object belongs to ? Group 1: Delia Group 2: Roses Target Object (Experiment reported on in Cognitive Science, 2002) oopps
  • 41.
    Resemblance People classifythings by finding other items that are similar which have already been classified. For example: Is a new species a bird ? Does it have the same attributes as lots of other birds? If so, then it's probably a bird too. A combination of rote memorization and the notion of 'resembles'. Although kiwis can't fly like most other birds, they resemble birds more than they resemble other types of animals. So the problem is to find which instances most closely resemble the instance to be classified.
  • 42.
    Few MoreExamples Loan companies can “give you results in minutes” by classifying you into a good credit risk or a bad risk, based on your personal information and a large supply of previous, similar customers. Cell phone companies can classify customers into those likely to leave, and hence need enticement, and those that are likely to stay regardless. The data generated by airplane engines can be used to determine when it needs to be serviced . By discovering the patterns that are indicative of problems, companies can service working engines less often (increasing profit) and discover faults before they materialise (increasing safety).
  • 43.
    Clustering Classification issupervised learning the supervision comes from labeling the instances with the class. Clustering is unsupervised learning -- there are no predefined class labels, no training set. So our clustering algorithm needs to assign a cluster to each instance such that all objects with the same cluster are more similar than others.
  • 44.
    Clustering Finding groupsof objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups The goal is to find the most 'natural' groupings of the instances. - Within a cluster: Maximize similarity between instances. - Between clusters: Minimize similarity between instances. Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 45.
    Clustering For example,we might have the following data: Where the axes are two dimensions and shape is a third, nominal attribute.
  • 46.
    Clustering A clusteringalgorithm might find three clusters: Even though there are some squares and circles mixed together.
  • 47.
    Outliers Cluster 1Cluster 2 Outliers
  • 48.
    What is anatural grouping among these objects? School Employees Tatkare’s Family Males Females Clustering is subjective
  • 49.
    What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity of features. Similarity is hard to define, but… “ We know it when we see it ” The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. Webster's Dictionary
  • 50.
    Clustering Problem Given a database D={t 1 ,t 2 ,…,t n } of tuples and an integer value k, the Clustering Problem is to define a mapping f:D  {1,..,k} where each t i is assigned to one cluster K j , 1<=j<=k. A Cluster , K j , contains precisely those tuples mapped to it. Unlike classification problem, clusters are not known a priori.
  • 51.
    Applications Marketing: Discover consumer groups based on their purchasing habits City Planning: Identify groups of buildings by type, value, location
  • 52.
    Applications Image Processing: Identify clusters of similar images (eg horses) Biological: Discover groups of plants/animals with similar properties
  • 53.
    Applications Given: Asource of textual documents Similarity measure e.g., how many words are common in these documents Clustering System Similarity measure Documents source Doc Doc Doc Doc Doc Doc Doc Doc Doc Doc Find: Several clusters of documents that are relevant to each other
  • 54.
    Association Rules A common application is market basket analysis which (1) items are frequently sold together at a supermarket (2) arranging items on shelves which items should be promoted together
  • 55.
  • 56.
    Association Rule DiscoveryGiven a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
  • 57.
    Market basket: Ruleform: “ Body  ead [support, confidence] ” . buys(X, `beer')  buys(X, “snacks') [1%, 60%] (a) If a customer X purchased `beer', 60% of them purchased `snacks' (b) 1% of all transactions contain the items `beer' and `snacks‘ together Association Rule Discovery
  • 58.
    A Weka birdis a strong brown bird which is native to New Zealand and grows to be about the same size as a chicken. The Weka was once fairly common on the North and South Islands of New Zealand but over the years has heavily declined on the North Island due to the major damage of their habitats.
  • 59.
    Three graphical userinterfaces “ The Explorer” (exploratory data analysis) “ The Experimenter” (experimental environment) “ The KnowledgeFlow” (new process model inspired interface) WEKA is available at http:// www.cs.waikato.ac.nz/ml/weka
  • 60.
    Witten, Ian andEibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann, 2005 Dunham, Margaret H, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2003 References
  • 61.
    ‘ dbmsnotes’ - http:// tech.groups.yahoo.com/group/dbmsnotes / References: Yahoo Group
  • 62.