KEMBAR78
Data Mining-Introduction | PDF | Data Mining | Databases
0% found this document useful (0 votes)
8 views47 pages

Data Mining-Introduction

sastra...data warehouse

Uploaded by

divya28032006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views47 pages

Data Mining-Introduction

sastra...data warehouse

Uploaded by

divya28032006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Unit - I

Data Mining

(Contents: Text book 2 - Chapter


1)

10/03/21
Chapter 1: Introduction
2

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.1 Why Data Mining?
3

◻ The Explosive Growth of Data: from terabytes to petabytes

🞑 Data collection and data availability

■ Automated data collection tools, database systems, Web, computerized society

🞑 Major sources of abundant data

■ Business: Web, e-commerce, transactions, stocks, …

■ Science: Remote sensing, bioinformatics, scientific simulation, …

■ Society and everyone: news, digital cameras, YouTube

◻ We are drowning in data, but starving for knowledge!


◻ “Necessity is the mother of invention”—Data mining—Automated analysis of massive
data sets

10/03/21
Evolution of Information Technology
4
Data Collection and Database Creation
(1960s and earlier) How can I analyze these data?
Primitive file processing

Database Management Systems


(1970s to early 1980s)
Hierarchical and network database systems
Relational database systems
Data modeling: entity-relationship models, etc.
Indexing and accessing methods
Query languages: SQL, etc.
User interfaces, forms, and reports
Query processing and optimization
Transactions, concurrency control, and recovery
Online transaction processing (OLTP)

Advanced Database Systems Advanced Data Analysis


(mid-1980s to present) (late-1980s to present)
Advanced data models: extended-relational, Data warehouse and OLAP
object relational, deductive, etc. Data mining and knowledge discovery:
Managing complex data: spatial, temporal, classification, clustering, outlier analysis,
multimedia, sequence and structured, scientific, association and correlation, comparative
engineering, moving objects, etc. Data streams summary, discrimination analysis, pattern
and cyber-physical data systems Web-based discovery, trend and deviation analysis, etc.
databases (XML, semantic web) Managing Mining complex types of data: streams,
uncertain data and data cleaning Integration of sequence, text, spatial, temporal, multimedia,
heterogeneous sources Web, networks, etc.
Text database systems and integration with Data mining applications: business, society,
information retrieval retail, banking, telecommunications, science
Extremely large data management and engineering, blogs, daily life, etc.
Database system tuning and adaptive systems Data mining and society: invisible data
mining, privacy-preserving data mining,
Advanced queries: ranking, skyline, etc.
mining social and information networks,
Cloud computing and parallel data processing recommender systems, etc.
Issues of data privacy and security

Future Generation of Information Systems


(Present to future)
Chapter 1: Introduction
5

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.2 What is Data Mining?
6

◻ Data mining (knowledge discovery from data)


🞑 discovering of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns and knowledge from large amount of data
◻ Alternative names
🞑 Knowledge discovery (mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information harvesting,
business intelligence, etc.

◻ Watch out: Is everything “data mining”?


🞑 Simple search and query processing

🞑 (Deductive) expert systems

Knowledge

10/03/21
Knowledge Discovery from Data
(KDD) Process
7

◻ Data mining plays an essential role in the knowledge discovery process

◻ The KDD process

🞑 Data cleaning

🞑 Data integration

🞑 Data selection

🞑 Data transformation

🞑 Data mining

🞑 Pattern evaluation

🞑 Knowledge presentation

10/03/21
Knowledge Discovery from Data
(KDD) Process
8

◻ This is a view from typical database systems


and data warehousing communities Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection & Transformation

Data Cleaning

Data Integration

Databases
10/03/21
Chapter 1.
9
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.3 What Kinds of Data Can Be Mined?
10

◻ Applied to any kind of data as long as the information is meaningful for targeted
application.
◻ Database Data:
🞑 DBMS – collection of interrelated data (database) and set of programs for
manipulation
🞑 RDMS – collection of tables (unique name)
■ Table – set of attributes (columns/fields) and stores tuples (records/rows)
■ Unique key –ER model
🞑 RDMS – accessed by database queries (SQL)
■ Query – relational operations such as join, selection & projection
🞑 RDMS – to analyze the trends or data patterns

10/03/21
What Kinds of Data Can Be Mined?
11

◻ Example: A relational database for AllElectronics. The company is described by the


following relation tables: customer, item, employee, and branch.

custome (cust ID, name, address, age, occupation, annual income, credit
r information, category, . . . )
ite (item ID, brand, category, type, price, place made, supplier, cost, .
m . . ) (empl ID, name, category, group, salary, commission, . . . )
employee (branch ID, name, address, . . . )
branch (trans ID, cust ID, empl ID, date, time, method paid, amount)
purchase (trans ID, item ID, qty)
s items
(empl ID, branch ID)
sold
works at

10/03/21
What Kinds of Data Can Be Mined?
12

◻ Data Warehouses
🞑 A data warehouse is a repository of information collected from
multiple sources, stored under a unified schema, and usually residing at a single
site.
🞑 data cleaning, data integration, data transformation, data loading, and periodic
data refreshing

Data source in Chicago


Client

Clean
Data source in New York Integrate Data Query and
Transform Warehouse analysis tools
Load
Refresh
Data source in Toronto Client

Data source in Vancouver

10/03/21
What Kinds of Data Can Be Mined?
13

◻ Transactional Data:
🞑 transactional database captures a transaction - a customer’s purchase, a flight
booking, or a user’s clicks on a web page.
🞑 A transaction typically includes a unique transaction identity number (trans ID) and a
list of the items making up the transaction, such as the items purchased in the
transaction.
🞑 A transactional database may have additional tables, which contain other
information related to the transactions, such as item description, information about
the salesperson or the branch, and so on.
◻ Example: A transactional database for AllElectronics. Transactions can be stored in a
table, with one record per transaction
◻ Nested relational structures: list_of_item_IDs consists of set of items
◻ Query: Which items sold well together? trans ID list of item IDs
T100 I1, I3, I8, I16
T200 I2, I8
... ...
10/03/21
Chapter 1.
14
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.4 What Kinds of Patterns Can Be Mined?
15

◻ Concept/Class Description: Characterization and Discrimination


◻ Mining Frequent Patterns, Associations and

Correlations

◻ Classification and Regression

◻ Cluster Analysis

◻ Outlier Analysis

10/03/21
1.4.1 Concept/Class Description:
Characterization and Discrimination
16

◻ Data can be associated with classes or Class/


concepts
🞑 Concept
classes of items – computers, printers, … Descriptio
E.g. n
concepts of customers – bigSpenders, budgetSpenders,

🞑 Descriptions can be derived via data characterization, data summarization, or both
■ Data characterization – summarizing the general characteristics of a target class of
data.
■ E.g. summarizing the characteristics of customers who spend more than
$1,000 a year at AllElectronics.
■ Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.
■ Data discrimination – comparing the target class with one or a set of comparative
classes
■ E.g. Compare the general features of software products whole sales increase
by 10% in the last year with those whose sales decrease by 30% during the
same period
■ Or both of the above 10/03/21
1.4.2 Mining Frequent Patterns,
Associations and Correlations
17

◻ Frequent patterns are patterns that occur frequently in data


◻ Types:
🞑 Frequent itemset: a set of items that frequently appear together in a
transactional data set (e.g. charger cable and adapter)
🞑 Frequent subsequence / sequential patterns: a pattern that customers tend to
purchase product A, followed by a purchase of product B
(Ex: Mobile -->Charger -->Earphones)

🞑 Frequent substructure: A substructure can refer to different structural forms (e.g.,


graphs, trees, or lattices) that may be combined with itemsets or
subsequences.

10/03/21
Mining Frequent Patterns,
Associations and Correlations
18

🞑 Association Analysis: find frequent patterns


■ E.g. AllElectronics Store which items are frequently purchased:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that he/she will buy
software. 1% of all of the transactions under analysis showed that computer and
software are purchased together. )
■ This association rule involves a single attribute or predicate (i.e., buys) that
repeats.
■ Association rules that contain a single predicate are referred to
as single- dimensional association rules.
■ Dropping the predicate notation, the rule can be written simply as
computer ⇒ software [1%, 50%]

10/03/21
Mining Frequent Patterns,
Associations and Correlations
19

■ AllElectronics: Purchases
■ EX: age(X , “20..29”) ∧ income(X , “40K..49K”) ⇒ buys(X , “laptop”) [support
= 2%, confidence = 60%].

◻ Rule denotes 2% are 20 to 29 years old with an income of $40,000 to $49,000 and have
purchased a laptop (computer) at AllElectronics.

◻ There is a 60% probability that a customer in this age and income group will purchase a
laptop.
◻ Thisis an association involving more than one attribute or predicate (i.e., age, income, and
buys) - multidimensional association rule.
◻ Typically, association rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold.

10/03/21
1.4.3 Classification and Regression for
predictive Analysis
20

◻ Classification: The process of finding a model that describes and distinguishes the data
classes or concepts.

🞑 The derived model is based on the analysis of a set of training data (data objects whose
class label is known).

🞑 The model can be represented in classification (IF-THEN) rules, decision trees, neural
class(X, “A”)
networks, etc. age(X, “youth”) AND income(X, “high”) If-then
age(X, “youth”) AND income(X, “low”) class(X, “B”)
age(X, “middle_aged”) class(X, “C”)
class(X, “C”)
age(X, “senior”)
Decision Tree
Neural Networks
age?
f3 f6 class A
youth middle_aged, senior
age f1
f4 f7 class B
income? class C
income f2
high low f5 f8 class C

class A class B

10/03/21
Classification and Regression for
Predictive Analysis
21

◻ Regression: predict missing or unavailable numerical data values rather than (discrete)
class labels.

🞑 classification predicts categorical (discrete,unordered) labels, regression


models continuous-valued functions.
◻ Prediction: both numeric prediction and class label prediction
◻ Regression analysis is a statistical methodology that is most often used for numeric
prediction. Regression also encompasses the identification of distribution trends based
on the available data.
◻ Classification and regression may need to be preceded by relevance analysis, which
attempts to identify attributes that are significantly relevant to the classification and
regression process.
🞑 Such attributes will be selected for the classification and regression process. Other
attributes, which are irrelevant, can then be excluded from consideration.

10/03/21
Classification and Regression for
Predictive Analysis
22

◻ Classification Example: AllElectronics - classify a large set of items in the store, based on
three kinds of responses to a sales campaign: good response, mild response and no
response.
◻ Derive a model for each of these three classes based on the descriptive features of the
items, such as price, brand, place made, type, and category.
🞑 The resulting classification should maximally distinguish eachclass from
the others, presenting an organized picture of the data set.
◻ Regression Example: AllElectronics - Predict the amount of revenue that each item will
generate during an upcoming sale, based on the previous sales data.
🞑 This is an example of regression analysis because the regression model constructed
will predict a continuous function (or ordered value.)

10/03/21
1.4.4 Cluster Analysis
23

◻ Unlike classification and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels.
◻ In many cases, class- labeled data may simply not exist at the beginning.
◻ Clustering can be used to generate class labels for a group of data.
◻ The objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity.

10/03/21
1.4.5 Outlier Analysis
24

◻ A data set may contain objects that do not comply with the general behavior or model of
the data. These data objects are outliers.

◻ Many data mining methods discard outliers as noise or exceptions. However, in some
applications (e.g., fraud detection) the rare events can be more interesting than the more
regularly occurring ones. The analysis of outlier data is referred to as outlier analysis
or anomaly mining.
◻ Example: Fraudulent Activity Credit Card Usage

10/03/21
1.4.6 Are All Patterns Interesting?
25

◻ A data mining system has the potential to generate thousands or even millions of
patterns, or rules.

◻ Are all of the patterns interesting? – No


◻ A pattern is interesting if it is

🞑 easily understood by humans

🞑 valid on new or test data with some degree of certainty,

🞑 potentially useful

🞑 novel

🞑 validates some hypothesis that a user seeks to confirm

◻ An interesting measure represents knowledge !

10/03/21
Are All Patterns Interesting?
26

◻ Objective measures
🞑 statistics and structures of patterns, e.g., support, confidence, etc. (Rules that do
not satisfy a threshold are considered uninteresting.)

🞑 accuracy and coverage - percentage of data that are correctly classified by a rule.
Coverage is similar to support, in that it tells us the percentage of data to which a
rule applies

🞑 Although objective measures help identify interesting patterns, they are often
insufficient unless combined with subjective measures that reflect a particular user’s
needs and interests.

🞑 For example, patterns describing the characteristics of customers who shop frequently
at AllElectronics should be interesting to the marketing manager, but may be of little
interest to other analysts studying the same database for patterns on employee
performance.

10/03/21
Are All Patterns Interesting?
27

◻ Subjective measures
🞑 Reflect the needs and interests of a particular user.
■ E.g. A marketing manager is only interested in characteristics of customers who
shop frequently.
🞑 Based on user’s belief in the data.
■ e.g., Patterns are interesting if they are unexpected, or can be used for
strategic planning, etc
◻ Objective and subjective measures need to be combined.
◻ Find all the interesting patterns: Completeness
🞑 Unrealistic and inefficient
🞑 User-provided constraints and interestingness measures should be used
◻ Search for only interesting patterns: An optimization problem
🞑 Highly desirable
🞑 No need to search through the generated patterns to identify truly interesting ones.
🞑 Measures can be used to rank the discovered patterns according their
interestingness
10/03/21
Chapter 1.
28
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ A Multi-Dimensional View of Data Mining

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary
10/03/21
1.5 Which Technologies Are Used?
29

◻ As a highly application-driven domain, datamining has incorporated


many techniques from other domains such as:
🞑 statistics Machine Pattern Statistic
Learnin Recognitio s
🞑machine learning g n

🞑pattern recognition

🞑database and data warehouse Application Data


Visualizatio
s n
Mining
🞑information retrieval

🞑Visualization
Algorith Database High-Performanc
🞑Algorithms m Technolog e Computing
y
🞑 high- performance computing, and many application domains
◻ The interdisciplinary nature of data mining research and development contributes
significantly to the success of data mining and its extensive applications.

10/03/21
1.5.1 Statistics
30

◻ studies the collection, analysis, interpretation or explanation, and presentation of


data.

◻ A statistical model is a set of mathematical functions that describe the behaviour of


the objects in a target class in terms of random variables and their associated
probability distributions.
◻ Statistical models are widely used to model data and data classes.
◻ For example, in data mining tasks like data characterization and classification,
statistical models of target classes can be built. In other words, such statistical
models can be the outcome of a data mining task.
◻ Alternatively, data mining tasks can be built on top of statistical models.
◻ For example, we can use statistics to model noise and missing data values. Then,
when mining patterns in a large data set, the data mining process can use the model
to help identify and handle noisy or missing values in the data.

10/03/21
1.5.2 Machine Learning
31

◻ Machine learning investigates howcomputers can learn (or improve their


performance) based on data.

◻ classic problems in machine learning that are highly related to data mining.

🞑 Supervised learning - Classification

🞑 Unsupervised learning - Clustering

🞑 Semi-supervised learning - both

🞑 Active learning - users in learning process

10/03/21
1.5.3 Database Systems and Data Warehouses
32

◻ Database systems

🞑 focuses on the creation, maintenance, and use of databases for organizations and
end-users.

🞑 Follows - highly recognized principles in data models, query languages, query


processing and optimization methods, data storage, and indexing and accessing
methods.

◻ Data warehouse

🞑 integrates data originating from multiple sources and various timeframes.

🞑 consolidates data in multidimensional space to form partially materialized data


cubes.

🞑 facilitates not only OLAP in multidimensional databases but also promotes


multidimensional data mining

10/03/21
1.5.4 Information Retrieval
33

◻ science of searching for documents or information in documents


◻ Differences:
🞑 (1) the data under search are unstructured; and
🞑 (2) the queries are formed mainly by keywords
◻ Types:
🞑 Language Model - probability density function that generates the bag of words in
the document
■ two documents can be measured by the similarity between
their corresponding language models.
🞑 Topic Model - a topic in a set of text documents can be modeled as a probability
distribution over the vocabulary

10/03/21
Chapter 1.
34
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ A Multi-Dimensional View of Data Mining

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary
10/03/21
1.6 Which Kinds of Applications Are Targeted?
35

“Where there are data, there are data mining applications”


◻ Many applications
◻ To demonstrate the importance of applications as a major dimension in data
mining research and development, discuss two highly successful and popular
application examples of data mining:
🞑 business intelligence
🞑 search engines

10/03/21
1.6.1 Business Intelligence
36

◻ Business intelligence (BI) technologies provide historical, current, and predictive


views of business operations.
◻ Examples include reporting, online analytical processing, business performance
management, competitive intelligence, benchmarking, and predictive analytics.
◻ “How important is business intelligence?”
◻ Without data mining, many businesses may not be able to perform effective
market analysis, compare customer feedback on similar products, discover the
strengths and weaknesses of their competitors, retain highly valuable customers,
and make smart business decisions.
◻ Clearly, data mining is the core of business intelligence.
🞑OLAP tools in businessintelligence rely on datawarehousing and
multidimensional data mining.
🞑 Classification and prediction techniques are the core of predictive analytics in
analyzing markets, supplies, and sales.

10/03/21
Business Intelligence
37

🞑Moreover, clustering plays a central role in customer relationship management,


which groups customers based on their similarities.
🞑Using characterization mining techniques, we can better
understand features of
each customer group and develop customized customer reward programs.

10/03/21
1.6.2 Web Search Engines
38

◻ A Web search engine is a specialized computer server that searches for


information on the Web.
◻ Various data mining techniques are used in all aspects of search engines:
🞑 crawling - deciding which pages should be crawled and the crawling
frequencies
🞑 indexing - selecting pages to be indexed and deciding to which extent the index
should be constructed
🞑 searching - deciding how pages should be ranked, which advertisements
should be added, and how the search results can be personalized or made
“context aware”
◻ Search engines - grand challenges:
🞑 handle a huge and ever-growing amount of data
🞑 have to deal with online data
🞑 Responding to context-aware query

10/03/21
1.7 Major Issues in Data Mining
39

◻ Major issues in data mining research:


🞑 mining methodology,
🞑 user interaction,
🞑 efficiency and scalability,
🞑 diversity of data types, and
🞑 data mining and society
◻ Many of these issues have been addressed in recent data mining research and
development to a certain extent and are now considered data mining requirements;
others are still at the research stage.

10/03/21
1.7.1 Mining Methodology
40

◻ Developing new data mining methodologies involves in the investigation of


🞑Mining various and new kinds of knowledge: dataanalysis and knowledge
discovery tasks
🞑 Mining knowledge in multidimensional space : Data Cube
🞑Data mining—an interdisciplinary effort: integrating methodsfrom
other disciplines
🞑Boosting the power of discovery in a networked environment: a
linked or interconnected environment
🞑 Handling uncertainty, noise, or incompleteness of data: noise, errors, exceptions,
or uncertainty, or are incomplete
🞑Pattern evaluation and pattern- or constraint-guided mining: interesting
patterns

10/03/21
1.7.2 User Interaction
41

◻ Interesting areas of research include how to interact with a data mining system, how to
incorporate a user’s back- ground knowledge in mining, and how to visualize and
comprehend data mining results.
◻ Interactive mining:
🞑 The data mining process should be highly interactive. Thus, it is important to build
flexible user interfaces and an exploratory mining environment, facilitating the user’s
interaction with the system.
◻ Incorporation of background knowledge:
🞑 Background knowledge, constraints, rules, and other information regarding the
domain under study should be incorporated into the knowledge discovery process
◻ Ad hoc data mining and data mining query languages:
🞑 high-level data mining query languages or other high-level flexible user interfaces will
give users the freedom to define ad hoc data mining tasks.
◻ Presentation and visualization of data mining results:
🞑 adopt expressive knowledge representations, user-friendly interfaces, and
visualization techniques.
10/03/21
1.7.3 Efficiency and Scalability
42

◻ Efficiency and scalability are always considered when comparing data mining
algorithms.
◻ As data amounts continue to multiply, these two factors are especially critical.
◻ Efficiency and scalability:
🞑 running time of a data mining algorithm must be predictable, short, and
acceptable by applications.
🞑 Efficiency, scalability, performance, optimization, and the ability to execute in real
time are key criteria that drive the development of many new data mining
algorithms.
◻ Parallel, distributed, and incremental mining algorithms:
🞑 First partition the data into “pieces.” Each piece is processed, in parallel, by
searching for patterns.
🞑 The parallel processes may interact with one another. The patterns from each
partition are eventually merged.
10/03/21
1.7.4 Diversity of Database Types
43

◻ Handling complex types of data:


🞑 Diverse applications generate a wide spectrum of new data type
🞑 unrealistic to expect one data mining system to mine all kinds of data, given the
diversity of data types and the different goals of data mining
◻ Mining dynamic, networked, and global data repositories:
🞑 Multiple sources of data are connected by the Internet and various kinds of
networks, forming gigantic, distributed, and heterogeneous global information
systems and networks

10/03/21
1.7.5 Data Mining and Society
44

◻ Social impacts of data mining:


🞑 With data mining penetrating our everyday lives, it is important to study the impact of
data mining on society.
🞑 How can we use data mining technology to benefit society? How can we guard against
its misuse?
■ The improper disclosure or use of data and the potential violation of individual
privacy and data protection rights are areas of concern that need to be
addressed.
◻ Privacy-preserving data mining:
🞑 Data mining will help scientific discovery, business management, economy
recovery, and security protection (e.g., the real-time discovery of intruders and
cyberattacks).
🞑 However, it poses the risk of disclosing an individual’s personal information.
🞑 Studies on privacy-preserving data publishing and data mining are ongoing.
🞑 The philosophy is to observe data sensitivity and preserve people’s privacy while
performing successful data mining.

10/03/21
1.7.5 Data Mining and Society
45

◻ Invisible data mining:


🞑 We cannot expect everyone in society to learn and master data mining
techniques.
🞑 More and more systems should have data mining functions built within
🞑 Intelligent search engines and Internet-based stores perform such invisible
data mining by incorporating data mining into their components to improve their
functionality and performance.
🞑 For example, when purchasing items online, users may be unaware that the store
is likely collecting data on the buying patterns of its customers, which may be
used to recommend other items for purchase in the future.

10/03/21
Summary
46

◻ Data mining: Discovering interesting patterns and knowledge from


massive amount of data

◻ A natural evolution of database technology, in great demand,


with wide applications

◻ A KDD process includes datacleaning,dataintegration, dataselection,


transformation, data mining, pattern evaluation, and knowledge presentation

◻ Mining can be performed in a variety of data


◻ Data mining functionalities: characterization, discrimination, association,
classification, clustering, outlier and trend analysis, etc.

◻ Data mining technologies

10/03/21
Dr. R. Elakkiya, AP-SoC, SASTRA Deemed University 10/03/21

You might also like