0% found this document useful (0 votes)

8 views47 pages

Data Mining-Introduction

sastra...data warehouse

Uploaded by

divya28032006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views47 pages

Data Mining-Introduction

sastra...data warehouse

Uploaded by

divya28032006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Unit - I

Data Mining

(Contents: Text book 2 - Chapter

10/03/21
Chapter 1: Introduction
2

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.1 Why Data Mining?
3

◻ The Explosive Growth of Data: from terabytes to petabytes

🞑 Data collection and data availability

■ Automated data collection tools, database systems, Web, computerized society

🞑 Major sources of abundant data

■ Business: Web, e-commerce, transactions, stocks, …

■ Science: Remote sensing, bioinformatics, scientific simulation, …

■ Society and everyone: news, digital cameras, YouTube

◻ We are drowning in data, but starving for knowledge!

◻ “Necessity is the mother of invention”—Data mining—Automated analysis of massive
data sets

10/03/21
Evolution of Information Technology
4
Data Collection and Database Creation
(1960s and earlier) How can I analyze these data?
Primitive file processing

Database Management Systems

(1970s to early 1980s)
Hierarchical and network database systems
Relational database systems
Data modeling: entity-relationship models, etc.
Indexing and accessing methods
Query languages: SQL, etc.
User interfaces, forms, and reports
Query processing and optimization
Transactions, concurrency control, and recovery
Online transaction processing (OLTP)

Advanced Database Systems Advanced Data Analysis

(mid-1980s to present) (late-1980s to present)
Advanced data models: extended-relational, Data warehouse and OLAP
object relational, deductive, etc. Data mining and knowledge discovery:
Managing complex data: spatial, temporal, classification, clustering, outlier analysis,
multimedia, sequence and structured, scientific, association and correlation, comparative
engineering, moving objects, etc. Data streams summary, discrimination analysis, pattern
and cyber-physical data systems Web-based discovery, trend and deviation analysis, etc.
databases (XML, semantic web) Managing Mining complex types of data: streams,
uncertain data and data cleaning Integration of sequence, text, spatial, temporal, multimedia,
heterogeneous sources Web, networks, etc.
Text database systems and integration with Data mining applications: business, society,
information retrieval retail, banking, telecommunications, science
Extremely large data management and engineering, blogs, daily life, etc.
Database system tuning and adaptive systems Data mining and society: invisible data
mining, privacy-preserving data mining,
Advanced queries: ranking, skyline, etc.
mining social and information networks,
Cloud computing and parallel data processing recommender systems, etc.
Issues of data privacy and security

Future Generation of Information Systems

(Present to future)
Chapter 1: Introduction
5

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.2 What is Data Mining?
6

◻ Data mining (knowledge discovery from data)

🞑 discovering of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns and knowledge from large amount of data
◻ Alternative names
🞑 Knowledge discovery (mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information harvesting,
business intelligence, etc.

◻ Watch out: Is everything “data mining”?

🞑 Simple search and query processing

🞑 (Deductive) expert systems

Knowledge

10/03/21
Knowledge Discovery from Data
(KDD) Process
7

◻ Data mining plays an essential role in the knowledge discovery process

◻ The KDD process

🞑 Data cleaning

🞑 Data integration

🞑 Data selection

🞑 Data transformation

🞑 Data mining

🞑 Pattern evaluation

🞑 Knowledge presentation

10/03/21
Knowledge Discovery from Data
(KDD) Process
8

◻ This is a view from typical database systems

and data warehousing communities Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection & Transformation

Data Cleaning

Data Integration

Databases
10/03/21
Chapter 1.
9
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.3 What Kinds of Data Can Be Mined?
10

◻ Applied to any kind of data as long as the information is meaningful for targeted
application.
◻ Database Data:
🞑 DBMS – collection of interrelated data (database) and set of programs for
manipulation
🞑 RDMS – collection of tables (unique name)
■ Table – set of attributes (columns/fields) and stores tuples (records/rows)
■ Unique key –ER model
🞑 RDMS – accessed by database queries (SQL)
■ Query – relational operations such as join, selection & projection
🞑 RDMS – to analyze the trends or data patterns

10/03/21
What Kinds of Data Can Be Mined?
11

◻ Example: A relational database for AllElectronics. The company is described by the

following relation tables: customer, item, employee, and branch.

custome (cust ID, name, address, age, occupation, annual income, credit
r information, category, . . . )
ite (item ID, brand, category, type, price, place made, supplier, cost, .
m . . ) (empl ID, name, category, group, salary, commission, . . . )
employee (branch ID, name, address, . . . )
branch (trans ID, cust ID, empl ID, date, time, method paid, amount)
purchase (trans ID, item ID, qty)
s items
(empl ID, branch ID)
sold
works at

10/03/21
What Kinds of Data Can Be Mined?
12

◻ Data Warehouses
🞑 A data warehouse is a repository of information collected from
multiple sources, stored under a unified schema, and usually residing at a single
site.
🞑 data cleaning, data integration, data transformation, data loading, and periodic
data refreshing

Data source in Chicago

Client

Clean
Data source in New York Integrate Data Query and
Transform Warehouse analysis tools
Load
Refresh
Data source in Toronto Client

Data source in Vancouver

10/03/21
What Kinds of Data Can Be Mined?
13

◻ Transactional Data:
🞑 transactional database captures a transaction - a customer’s purchase, a flight
booking, or a user’s clicks on a web page.
🞑 A transaction typically includes a unique transaction identity number (trans ID) and a
list of the items making up the transaction, such as the items purchased in the
transaction.
🞑 A transactional database may have additional tables, which contain other
information related to the transactions, such as item description, information about
the salesperson or the branch, and so on.
◻ Example: A transactional database for AllElectronics. Transactions can be stored in a
table, with one record per transaction
◻ Nested relational structures: list_of_item_IDs consists of set of items
◻ Query: Which items sold well together? trans ID list of item IDs
T100 I1, I3, I8, I16
T200 I2, I8
... ...
10/03/21
Chapter 1.
14
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary

10/03/21
1.4 What Kinds of Patterns Can Be Mined?
15

◻ Concept/Class Description: Characterization and Discrimination

◻ Mining Frequent Patterns, Associations and

Correlations

◻ Classification and Regression

◻ Cluster Analysis

◻ Outlier Analysis

10/03/21
1.4.1 Concept/Class Description:
Characterization and Discrimination
16

◻ Data can be associated with classes or Class/

concepts
🞑 Concept
classes of items – computers, printers, … Descriptio
E.g. n
concepts of customers – bigSpenders, budgetSpenders,
…
🞑 Descriptions can be derived via data characterization, data summarization, or both
■ Data characterization – summarizing the general characteristics of a target class of
data.
■ E.g. summarizing the characteristics of customers who spend more than
$1,000 a year at AllElectronics.
■ Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.
■ Data discrimination – comparing the target class with one or a set of comparative
classes
■ E.g. Compare the general features of software products whole sales increase
by 10% in the last year with those whose sales decrease by 30% during the
same period
■ Or both of the above 10/03/21
1.4.2 Mining Frequent Patterns,
Associations and Correlations
17

◻ Frequent patterns are patterns that occur frequently in data

◻ Types:
🞑 Frequent itemset: a set of items that frequently appear together in a
transactional data set (e.g. charger cable and adapter)
🞑 Frequent subsequence / sequential patterns: a pattern that customers tend to
purchase product A, followed by a purchase of product B
(Ex: Mobile -->Charger -->Earphones)

🞑 Frequent substructure: A substructure can refer to different structural forms (e.g.,

graphs, trees, or lattices) that may be combined with itemsets or
subsequences.

10/03/21
Mining Frequent Patterns,
Associations and Correlations
18

🞑 Association Analysis: find frequent patterns

■ E.g. AllElectronics Store which items are frequently purchased:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that he/she will buy
software. 1% of all of the transactions under analysis showed that computer and
software are purchased together. )
■ This association rule involves a single attribute or predicate (i.e., buys) that
repeats.
■ Association rules that contain a single predicate are referred to
as single- dimensional association rules.
■ Dropping the predicate notation, the rule can be written simply as
computer ⇒ software [1%, 50%]

10/03/21
Mining Frequent Patterns,
Associations and Correlations
19

■ AllElectronics: Purchases
■ EX: age(X , “20..29”) ∧ income(X , “40K..49K”) ⇒ buys(X , “laptop”) [support
= 2%, confidence = 60%].

◻ Rule denotes 2% are 20 to 29 years old with an income of $40,000 to $49,000 and have
purchased a laptop (computer) at AllElectronics.

◻ There is a 60% probability that a customer in this age and income group will purchase a
laptop.
◻ Thisis an association involving more than one attribute or predicate (i.e., age, income, and
buys) - multidimensional association rule.
◻ Typically, association rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold.

10/03/21
1.4.3 Classification and Regression for
predictive Analysis
20

◻ Classification: The process of finding a model that describes and distinguishes the data
classes or concepts.

🞑 The derived model is based on the analysis of a set of training data (data objects whose
class label is known).

🞑 The model can be represented in classification (IF-THEN) rules, decision trees, neural
class(X, “A”)
networks, etc. age(X, “youth”) AND income(X, “high”) If-then
age(X, “youth”) AND income(X, “low”) class(X, “B”)
age(X, “middle_aged”) class(X, “C”)
class(X, “C”)
age(X, “senior”)
Decision Tree
Neural Networks
age?
f3 f6 class A
youth middle_aged, senior
age f1
f4 f7 class B
income? class C
income f2
high low f5 f8 class C

class A class B

10/03/21
Classification and Regression for
Predictive Analysis
21

◻ Regression: predict missing or unavailable numerical data values rather than (discrete)
class labels.

🞑 classification predicts categorical (discrete,unordered) labels, regression

models continuous-valued functions.
◻ Prediction: both numeric prediction and class label prediction
◻ Regression analysis is a statistical methodology that is most often used for numeric
prediction. Regression also encompasses the identification of distribution trends based
on the available data.
◻ Classification and regression may need to be preceded by relevance analysis, which
attempts to identify attributes that are significantly relevant to the classification and
regression process.
🞑 Such attributes will be selected for the classification and regression process. Other
attributes, which are irrelevant, can then be excluded from consideration.

10/03/21
Classification and Regression for
Predictive Analysis
22

◻ Classification Example: AllElectronics - classify a large set of items in the store, based on
three kinds of responses to a sales campaign: good response, mild response and no
response.
◻ Derive a model for each of these three classes based on the descriptive features of the
items, such as price, brand, place made, type, and category.
🞑 The resulting classification should maximally distinguish eachclass from
the others, presenting an organized picture of the data set.
◻ Regression Example: AllElectronics - Predict the amount of revenue that each item will
generate during an upcoming sale, based on the previous sales data.
🞑 This is an example of regression analysis because the regression model constructed
will predict a continuous function (or ordered value.)

10/03/21
1.4.4 Cluster Analysis
23

◻ Unlike classification and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels.
◻ In many cases, class- labeled data may simply not exist at the beginning.
◻ Clustering can be used to generate class labels for a group of data.
◻ The objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity.

10/03/21
1.4.5 Outlier Analysis
24

◻ A data set may contain objects that do not comply with the general behavior or model of
the data. These data objects are outliers.

◻ Many data mining methods discard outliers as noise or exceptions. However, in some
applications (e.g., fraud detection) the rare events can be more interesting than the more
regularly occurring ones. The analysis of outlier data is referred to as outlier analysis
or anomaly mining.
◻ Example: Fraudulent Activity Credit Card Usage

10/03/21
1.4.6 Are All Patterns Interesting?
25

◻ A data mining system has the potential to generate thousands or even millions of
patterns, or rules.

◻ Are all of the patterns interesting? – No

◻ A pattern is interesting if it is

🞑 easily understood by humans

🞑 valid on new or test data with some degree of certainty,

🞑 potentially useful

🞑 novel

🞑 validates some hypothesis that a user seeks to confirm

◻ An interesting measure represents knowledge !

10/03/21
Are All Patterns Interesting?
26

◻ Objective measures
🞑 statistics and structures of patterns, e.g., support, confidence, etc. (Rules that do
not satisfy a threshold are considered uninteresting.)

🞑 accuracy and coverage - percentage of data that are correctly classified by a rule.
Coverage is similar to support, in that it tells us the percentage of data to which a
rule applies

🞑 Although objective measures help identify interesting patterns, they are often
insufficient unless combined with subjective measures that reflect a particular user’s
needs and interests.

🞑 For example, patterns describing the characteristics of customers who shop frequently
at AllElectronics should be interesting to the marketing manager, but may be of little
interest to other analysts studying the same database for patterns on employee
performance.

10/03/21
Are All Patterns Interesting?
27

◻ Subjective measures
🞑 Reflect the needs and interests of a particular user.
■ E.g. A marketing manager is only interested in characteristics of customers who
shop frequently.
🞑 Based on user’s belief in the data.
■ e.g., Patterns are interesting if they are unexpected, or can be used for
strategic planning, etc
◻ Objective and subjective measures need to be combined.
◻ Find all the interesting patterns: Completeness
🞑 Unrealistic and inefficient
🞑 User-provided constraints and interestingness measures should be used
◻ Search for only interesting patterns: An optimization problem
🞑 Highly desirable
🞑 No need to search through the generated patterns to identify truly interesting ones.
🞑 Measures can be used to rank the discovered patterns according their
interestingness
10/03/21
Chapter 1.
28
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ A Multi-Dimensional View of Data Mining

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary
10/03/21
1.5 Which Technologies Are Used?
29

◻ As a highly application-driven domain, datamining has incorporated

many techniques from other domains such as:
🞑 statistics Machine Pattern Statistic
Learnin Recognitio s
🞑machine learning g n

🞑pattern recognition

🞑database and data warehouse Application Data

Visualizatio
s n
Mining
🞑information retrieval

🞑Visualization
Algorith Database High-Performanc
🞑Algorithms m Technolog e Computing
y
🞑 high- performance computing, and many application domains
◻ The interdisciplinary nature of data mining research and development contributes
significantly to the success of data mining and its extensive applications.

10/03/21
1.5.1 Statistics
30

◻ studies the collection, analysis, interpretation or explanation, and presentation of

data.

◻ A statistical model is a set of mathematical functions that describe the behaviour of

the objects in a target class in terms of random variables and their associated
probability distributions.
◻ Statistical models are widely used to model data and data classes.
◻ For example, in data mining tasks like data characterization and classification,
statistical models of target classes can be built. In other words, such statistical
models can be the outcome of a data mining task.
◻ Alternatively, data mining tasks can be built on top of statistical models.
◻ For example, we can use statistics to model noise and missing data values. Then,
when mining patterns in a large data set, the data mining process can use the model
to help identify and handle noisy or missing values in the data.

10/03/21
1.5.2 Machine Learning
31

◻ Machine learning investigates howcomputers can learn (or improve their

performance) based on data.

◻ classic problems in machine learning that are highly related to data mining.

🞑 Supervised learning - Classification

🞑 Unsupervised learning - Clustering

🞑 Semi-supervised learning - both

🞑 Active learning - users in learning process

10/03/21
1.5.3 Database Systems and Data Warehouses
32

◻ Database systems

🞑 focuses on the creation, maintenance, and use of databases for organizations and
end-users.

🞑 Follows - highly recognized principles in data models, query languages, query

processing and optimization methods, data storage, and indexing and accessing
methods.

◻ Data warehouse

🞑 integrates data originating from multiple sources and various timeframes.

🞑 consolidates data in multidimensional space to form partially materialized data

cubes.

🞑 facilitates not only OLAP in multidimensional databases but also promotes

multidimensional data mining

10/03/21
1.5.4 Information Retrieval
33

◻ science of searching for documents or information in documents

◻ Differences:
🞑 (1) the data under search are unstructured; and
🞑 (2) the queries are formed mainly by keywords
◻ Types:
🞑 Language Model - probability density function that generates the bag of words in
the document
■ two documents can be measured by the similarity between
their corresponding language models.
🞑 Topic Model - a topic in a set of text documents can be modeled as a probability
distribution over the vocabulary

10/03/21
Chapter 1.
34
Introduction

◻ Why Data Mining?

◻ What Is Data Mining?

◻ A Multi-Dimensional View of Data Mining

◻ What Kind of Data Can Be Mined?

◻ What Kinds of Patterns Can Be Mined?

◻ Which Technologies Are Used?

◻ Which Kinds of Applications Are Targeted?

◻ Major Issues in Data Mining

◻ Summary
10/03/21
1.6 Which Kinds of Applications Are Targeted?
35

“Where there are data, there are data mining applications”

◻ Many applications
◻ To demonstrate the importance of applications as a major dimension in data
mining research and development, discuss two highly successful and popular
application examples of data mining:
🞑 business intelligence
🞑 search engines

10/03/21
1.6.1 Business Intelligence
36

◻ Business intelligence (BI) technologies provide historical, current, and predictive

views of business operations.
◻ Examples include reporting, online analytical processing, business performance
management, competitive intelligence, benchmarking, and predictive analytics.
◻ “How important is business intelligence?”
◻ Without data mining, many businesses may not be able to perform effective
market analysis, compare customer feedback on similar products, discover the
strengths and weaknesses of their competitors, retain highly valuable customers,
and make smart business decisions.
◻ Clearly, data mining is the core of business intelligence.
🞑OLAP tools in businessintelligence rely on datawarehousing and
multidimensional data mining.
🞑 Classification and prediction techniques are the core of predictive analytics in
analyzing markets, supplies, and sales.

10/03/21
Business Intelligence
37

🞑Moreover, clustering plays a central role in customer relationship management,

which groups customers based on their similarities.
🞑Using characterization mining techniques, we can better
understand features of
each customer group and develop customized customer reward programs.

10/03/21
1.6.2 Web Search Engines
38

◻ A Web search engine is a specialized computer server that searches for

information on the Web.
◻ Various data mining techniques are used in all aspects of search engines:
🞑 crawling - deciding which pages should be crawled and the crawling
frequencies
🞑 indexing - selecting pages to be indexed and deciding to which extent the index
should be constructed
🞑 searching - deciding how pages should be ranked, which advertisements
should be added, and how the search results can be personalized or made
“context aware”
◻ Search engines - grand challenges:
🞑 handle a huge and ever-growing amount of data
🞑 have to deal with online data
🞑 Responding to context-aware query

10/03/21
1.7 Major Issues in Data Mining
39

◻ Major issues in data mining research:

🞑 mining methodology,
🞑 user interaction,
🞑 efficiency and scalability,
🞑 diversity of data types, and
🞑 data mining and society
◻ Many of these issues have been addressed in recent data mining research and
development to a certain extent and are now considered data mining requirements;
others are still at the research stage.

10/03/21
1.7.1 Mining Methodology
40

◻ Developing new data mining methodologies involves in the investigation of

🞑Mining various and new kinds of knowledge: dataanalysis and knowledge
discovery tasks
🞑 Mining knowledge in multidimensional space : Data Cube
🞑Data mining—an interdisciplinary effort: integrating methodsfrom
other disciplines
🞑Boosting the power of discovery in a networked environment: a
linked or interconnected environment
🞑 Handling uncertainty, noise, or incompleteness of data: noise, errors, exceptions,
or uncertainty, or are incomplete
🞑Pattern evaluation and pattern- or constraint-guided mining: interesting
patterns

10/03/21
1.7.2 User Interaction
41

◻ Interesting areas of research include how to interact with a data mining system, how to
incorporate a user’s background knowledge in mining, and how to visualize and
comprehend data mining results.
◻ Interactive mining:
🞑 The data mining process should be highly interactive. Thus, it is important to build
flexible user interfaces and an exploratory mining environment, facilitating the user’s
interaction with the system.
◻ Incorporation of background knowledge:
🞑 Background knowledge, constraints, rules, and other information regarding the
domain under study should be incorporated into the knowledge discovery process
◻ Ad hoc data mining and data mining query languages:
🞑 high-level data mining query languages or other high-level flexible user interfaces will
give users the freedom to define ad hoc data mining tasks.
◻ Presentation and visualization of data mining results:
🞑 adopt expressive knowledge representations, user-friendly interfaces, and
visualization techniques.
10/03/21
1.7.3 Efficiency and Scalability
42

◻ Efficiency and scalability are always considered when comparing data mining
algorithms.
◻ As data amounts continue to multiply, these two factors are especially critical.
◻ Efficiency and scalability:
🞑 running time of a data mining algorithm must be predictable, short, and
acceptable by applications.
🞑 Efficiency, scalability, performance, optimization, and the ability to execute in real
time are key criteria that drive the development of many new data mining
algorithms.
◻ Parallel, distributed, and incremental mining algorithms:
🞑 First partition the data into “pieces.” Each piece is processed, in parallel, by
searching for patterns.
🞑 The parallel processes may interact with one another. The patterns from each
partition are eventually merged.
10/03/21
1.7.4 Diversity of Database Types
43

◻ Handling complex types of data:

🞑 Diverse applications generate a wide spectrum of new data type
🞑 unrealistic to expect one data mining system to mine all kinds of data, given the
diversity of data types and the different goals of data mining
◻ Mining dynamic, networked, and global data repositories:
🞑 Multiple sources of data are connected by the Internet and various kinds of
networks, forming gigantic, distributed, and heterogeneous global information
systems and networks

10/03/21
1.7.5 Data Mining and Society
44

◻ Social impacts of data mining:

🞑 With data mining penetrating our everyday lives, it is important to study the impact of
data mining on society.
🞑 How can we use data mining technology to benefit society? How can we guard against
its misuse?
■ The improper disclosure or use of data and the potential violation of individual
privacy and data protection rights are areas of concern that need to be
addressed.
◻ Privacy-preserving data mining:
🞑 Data mining will help scientific discovery, business management, economy
recovery, and security protection (e.g., the real-time discovery of intruders and
cyberattacks).
🞑 However, it poses the risk of disclosing an individual’s personal information.
🞑 Studies on privacy-preserving data publishing and data mining are ongoing.
🞑 The philosophy is to observe data sensitivity and preserve people’s privacy while
performing successful data mining.

10/03/21
1.7.5 Data Mining and Society
45

◻ Invisible data mining:

🞑 We cannot expect everyone in society to learn and master data mining
techniques.
🞑 More and more systems should have data mining functions built within
🞑 Intelligent search engines and Internet-based stores perform such invisible
data mining by incorporating data mining into their components to improve their
functionality and performance.
🞑 For example, when purchasing items online, users may be unaware that the store
is likely collecting data on the buying patterns of its customers, which may be
used to recommend other items for purchase in the future.

10/03/21
Summary
46

◻ Data mining: Discovering interesting patterns and knowledge from

massive amount of data

◻ A natural evolution of database technology, in great demand,

with wide applications

◻ A KDD process includes datacleaning,dataintegration, dataselection,

transformation, data mining, pattern evaluation, and knowledge presentation

◻ Mining can be performed in a variety of data

◻ Data mining functionalities: characterization, discrimination, association,
classification, clustering, outlier and trend analysis, etc.

◻ Data mining technologies

10/03/21
Dr. R. Elakkiya, AP-SoC, SASTRA Deemed University 10/03/21

DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Chap 1
No ratings yet
Chap 1
32 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Data Mining
No ratings yet
Data Mining
26 pages
DM Module 1
No ratings yet
DM Module 1
13 pages
1 Intro
No ratings yet
1 Intro
50 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
01 Intro
No ratings yet
01 Intro
26 pages
Module 1
No ratings yet
Module 1
41 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
001lecture - 1 Introduction-1
No ratings yet
001lecture - 1 Introduction-1
40 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Module-2-Data Mining
No ratings yet
Module-2-Data Mining
48 pages
DM Unit2 (Part1)
No ratings yet
DM Unit2 (Part1)
19 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
01 Intro
No ratings yet
01 Intro
40 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
Combine 056
No ratings yet
Combine 056
57 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
39 pages
Data Mining
No ratings yet
Data Mining
48 pages
Data Minng
No ratings yet
Data Minng
20 pages
Datamining Unit - 1
No ratings yet
Datamining Unit - 1
20 pages
UNIT-1 Why We Need Data Mining?
No ratings yet
UNIT-1 Why We Need Data Mining?
99 pages
Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Day-2 BE-VIII DMDW (Into. Contd..)
No ratings yet
Day-2 BE-VIII DMDW (Into. Contd..)
23 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
File 1705310604 0009750 Unit-1b
No ratings yet
File 1705310604 0009750 Unit-1b
46 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
U1 - L2 - Data Warehouse - Pptx-Data
No ratings yet
U1 - L2 - Data Warehouse - Pptx-Data
27 pages
SQL
No ratings yet
SQL
1 page
Data Mining - Data Objects and Attributes
No ratings yet
Data Mining - Data Objects and Attributes
50 pages
INT202 LPS Unit II 3
No ratings yet
INT202 LPS Unit II 3
23 pages
Unit 1 Problems-Cn
No ratings yet
Unit 1 Problems-Cn
12 pages
1.2 Data Warehouse
No ratings yet
1.2 Data Warehouse
27 pages
Functions and Modules
No ratings yet
Functions and Modules
43 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
GUI2
No ratings yet
GUI2
3 pages
Java Debugging Questions2
No ratings yet
Java Debugging Questions2
85 pages
Chapter 1
No ratings yet
Chapter 1
188 pages
Delay Problems
No ratings yet
Delay Problems
9 pages
Basics of Drug and Their Actions - Presentation
No ratings yet
Basics of Drug and Their Actions - Presentation
19 pages
cn1 Sastra
No ratings yet
cn1 Sastra
29 pages
CN-CIA-1-2024 Sastra
No ratings yet
CN-CIA-1-2024 Sastra
2 pages
Text Analytics in Conceptual Modelling
No ratings yet
Text Analytics in Conceptual Modelling
87 pages
Improved KDD Algorithm Research
No ratings yet
Improved KDD Algorithm Research
3 pages
Clinical Research Informatics
100% (1)
Clinical Research Informatics
415 pages
DWDM Question Bank MCQ
No ratings yet
DWDM Question Bank MCQ
11 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Data Mining & Warehousing - S. Prabhu
No ratings yet
Data Mining & Warehousing - S. Prabhu
144 pages
(Robert J. Thierauf) Knowledge Management Systems PDF
100% (1)
(Robert J. Thierauf) Knowledge Management Systems PDF
376 pages
Formative Knowledge
No ratings yet
Formative Knowledge
14 pages
Knowledge Extraction - Kore Ai Docs
No ratings yet
Knowledge Extraction - Kore Ai Docs
9 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
4 pages
Analysis Knowledge
No ratings yet
Analysis Knowledge
32 pages
21AI643
No ratings yet
21AI643
2 pages
Data Mining Questions 1st Unit
No ratings yet
Data Mining Questions 1st Unit
6 pages
Data Mining: Ying Liu, Prof., PH.D
No ratings yet
Data Mining: Ying Liu, Prof., PH.D
57 pages
Enhancement of Newton Law of Cooling Method Based
No ratings yet
Enhancement of Newton Law of Cooling Method Based
6 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Process Knowledge Graph Modeling Techniques and AP
No ratings yet
Process Knowledge Graph Modeling Techniques and AP
15 pages
Graph Construction and Applicaiton
No ratings yet
Graph Construction and Applicaiton
7 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
Pattern Recognition Algorithms For Data Mining Scalability Knowledge Discovery and Soft Granular Computing 1st Edition Sankar K. Pal Instant Download
100% (7)
Pattern Recognition Algorithms For Data Mining Scalability Knowledge Discovery and Soft Granular Computing 1st Edition Sankar K. Pal Instant Download
76 pages
Marketing Analytics New
No ratings yet
Marketing Analytics New
66 pages
Andromeda
No ratings yet
Andromeda
6 pages
Maintenance Analytics
No ratings yet
Maintenance Analytics
6 pages
A Review of Data Mining Literature
No ratings yet
A Review of Data Mining Literature
6 pages
Algorithmics Research On Knowledge Discovery and Data Mining
No ratings yet
Algorithmics Research On Knowledge Discovery and Data Mining
32 pages
TextMining PAKDD1999
No ratings yet
TextMining PAKDD1999
7 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
16 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Financial Knowledge Graph Based Financial Report Query System
No ratings yet
Financial Knowledge Graph Based Financial Report Query System
18 pages