0% found this document useful (0 votes)

68 views43 pages

Week-1-Introduction To Data Mining

The document provides an introduction to data mining, defining it as the process of extracting useful information from large datasets and differentiating it from knowledge discovery. It discusses the importance of data mining in various fields, the techniques used such as classification, clustering, and association, and the stages involved in the data mining process. Additionally, it highlights the architecture of data mining systems and addresses major issues and challenges in the field.

Uploaded by

wubetayalew2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views43 pages

Week-1-Introduction To Data Mining

Uploaded by

wubetayalew2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Introduction to Data

Mining

Instructor: Melkamu A.
Date: 15/11/2022
What is Data Mining?
 Extracting and ‘Mining’ knowledge from large amounts of
data.
(Or)
 Non-trivial extraction of implicit, previously unknown and
potentially useful information from data.
(or)
 Exploration & analysis, by automatic or semi-automatic means,
of large quantities of data in order to discover meaningful
patterns.

 “Gold Mining from rock or sand” is same as “Knowledge mining

from data”

 Other terms for Data Mining:

o Knowledge Mining
o Knowledge Extraction
o Pattern Analysis
o Data Archaeology

 Data Mining is not same as KDD (Knowledge Discovery from

Data)
 Data Mining is a step in KDD
Why Mine Data? Commercial
Viewpoint
Why Mine Data? Scientific Viewpoint
What motivated data mining?
Why it is important?
 There is often information “hidden” in the data that is
not readily evident.
 Human analysts may take weeks to discover useful
information.
 Much of the data is never analyzed at all

• Huge Volume of data

• Major Sources of Abundant data: - Business – Web, E-commerce,
Transactions, Stocks - Science – Remote Sensing, Bio
informatics, Scientific Simulation - Society and Everyone – News,
Digital Cameras, You Tube
• Need for turning data into knowledge – Drowning in data, but
starving for knowledge
• Applications that use data mining: - Market Analysis - Fraud
Detection - Customer Retention - Production Control - Scientific
Exploration
• Data rich and information poor situation
What is (not) Data
Mining?
Related Terminologies

Statistics/ Machine Learning/

AI Pattern
Recognition

Data Mining

Database
systems
Warehouse (OLAP)
Online analytical process

Data Warehouse:-Data spread in several databases –

physically located at numerous sites Data warehouse –
repository of multiple DBs in single schema; resides at single
site.

 Mostly reads
 Queries are long and complex
 Gb - Tb of data
 History
 Lots of scans
 Summarized, reconciled data
 Hundreds of users (e.g.,
decision-makers, analysts)
Machine Learning
 Machine learning is a field of artificial intelligence that uses
statistical techniques to give computer systems the ability to
"learn"

 Machine learning explores the study and construction of

algorithms that can learn from and make predictions on data.

 Machine learning is closely related to (and often overlaps

with) computational statistics, which also focuses on
prediction-making through the use of computers.
Machine Learning
Definition
“Machine Learning is the science of
getting computers to learn and act
like humans do, and improve their
learning over time in autonomous
fashion, by feeding them data and
information in the form of
observations and real-world
interactions.”
Relation Statistics
 Statistics – “Learning from Data” or “Turning data into
information”.
 Data – Crude Information – Does not makes sense – What
we capture & store
 e.g. customer data, store data, demographical data,
geographical data
 Information – relates items of data – relevant to the
decision problem
 e.g. X lives in Z; S is Y years old; X and S moved; W has
money
in Z
 Facts – Information becomes facts when data can support
it
 Knowledge – What we know or infer – relates items of
information
 e.g. a quantity Q of product A is used in region Z;
Data Mining – Confluence of
Multiple Disciplines
 Databases
 Data Warehousing
 Statistics
 Machine Learning
 Information Retrieval
 Image and Signal Processing
 Pattern Recognition
 Neural Networks
 Data Visualization
 Spatial / Temporal Data Analysis
Data Mining – On What Kinds of
Data?

 Database-oriented data sets and applications

o Relational database, data warehouse, transactional
database
 Advanced data sets and advanced applications
o Data streams and sensor data
o Time-series data, temporal data, sequence data (incl.
bio-sequences)
o Structure data, graphs, social networks and multi-
linked data
 Object-relational databases
o Heterogeneous databases and legacy databases
o Spatial data and spatiotemporal data
o Multimedia database o Text databases
o The World-Wide Web
Data Mining Tasks

 Prediction Methods
Use some variables to predict unknown
or future values of other variables.

 Description Methods
Find human-interpretable patterns that
describe the data.
Data Mining Tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery
[Descriptive]
Sequential Pattern Discovery
[Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
Data Mining Goals
 Data mining uncovers this in-
depth business intelligence by
using advanced analytical and
modelling techniques.

 With data mining, you can ask

far more sophisticated questions
of your data than you can with
conventional querying methods.
Data Mining Goals

Data mining is simply the acquisition

of information that is already present
in your CRM (Customer Relationship M
anagement System)
that is intended to be utilized for
marketing, customer service, customer
informative services and similar
applications.
Data Mining Goals
Data mining tools ease and automate the process of
discovering this kind of information from large stores of data
Data mining can identify patterns in company data,
for example, in records of supermarket purchases.
If, for example, customers buy product A and product B,
which product C are they most likely to buy as well?
Accurate answers to questions like these are invaluable
aids to marketing strategies.
Data mining can identify the characteristics of a known
group of customers, for example, those who have a proven
record as poor credit risks.
DBMS
 Relational Databases:
 Consists of Database (inter related data) and set of software
programs to
manage and access data.
 Collection of tables
 Each table has a set of attributes (columns / fields) and large set
of tuples
(records or rows) .

 Transactional Databases:
 Consists of a file with records where each record is a transaction.
 Each transaction has a unique transaction ID and list of items that
make
up transactions.

 Object-Relational Databases:
 Temporal Databases, Sequence Databases and Time-
Series Databases
 Spatial Databases and Spatiotemporal Databases:
Stages of Data Mining Process

TRUE or FALSE?
KDD Process

TRUE or FALSE?
Brief explanation of data
mining stages

Data Cleaning – Remove noisy and inconsistent

Data Integration – Multiple data sources combined
Data Selection – Data relevant to analysis retrieved
Data Transformation – Transform into form suitable
Data Mining (Summarized / Aggregated) Data Mining
– Extract data patterns using intelligent methods
Pattern Evaluation – Identify interesting patterns
Knowledge Presentation – Visualization / Knowledge
Representation – Presenting mined knowledge to
the user
Data Mining
Techniques
There are several major data mining
techniques have been developing
and using in data mining projects
recently including
 association,
 classification,
 clustering,
 prediction,
 sequential patterns and
 decision tree.
Data Mining
Techniques(Association)
 Association is one of the best-known data mining
technique. In association, a pattern is discovered based on
a relationship between items in the same transaction.

 That’s is the reason why association technique is also

known as relation technique. The association technique is
used in market basket analysis to identify a set of products
that customers frequently purchase together.

 Retailers are using association technique to research

customer’s buying habits. Based on historical sale data,
retailers might find out that customers always buy crisps
when they buy beers, and, therefore, they can put beers
and crisps next to each other to save time for the customer
and increase sales.
Classification
 Classification is a classic data mining technique
based on machine learning. Basically, classification
is used to classify each item in a set of data into
one of a predefined set of classes or groups.

 Classification method makes use of mathematical

techniques such as decision trees, linear
programming, neural network, and statistics.

 In classification, we develop the software that can

learn how to classify the data items into groups.
For example, we can apply classification in the
application that “given all records of
employees who left the company, predict who
will probably leave the company in a future
period.”
 Clustering is a data mining technique that
makes a meaningful or useful cluster of
Clustering
objects which have similar characteristics
using the automatic technique.

 The clustering technique defines the classes

and puts objects in each class, while in the
classification techniques, objects are
assigned into predefined classes.

 To make the concept clearer, we can take

book management in the library as an
example. In a library, there is a wide range
of books on various topics available.
 The challenge is how to keep those books in
a way that readers can take several books
on a particular topic without hassle.
 By using the clustering technique, we can
keep books that have some kinds of
similarities in one cluster or one shelf and
label it with a meaningful name. If readers
want to grab books in that topic, they would
only have to go to that shelf instead of
looking for the entire library.
Prediction

 The prediction, as its name implied, is one

of a data mining techniques that discovers
the relationship between independent
variables and relationship between
dependent and independent variables.

 For instance, the prediction analysis

technique can be used in the sale to
predict profit for the future if we consider
the sale is an independent variable, profit
could be a dependent variable.

 Then based on the historical sale and

Sequential Patterns

 Sequential patterns analysis is one of data

mining technique that seeks to discover or
identify similar patterns, regular events or
trends in transaction data over a business
period.

 In sales, with historical transaction data,

businesses can identify a set of items that
customers buy together different times in a
year.

 Then businesses can use this information to

recommend customers buy it with better
deals based on their purchasing frequency
Decision trees
The A decision tree is one of the most commonly used data
mining techniques because its model is easy to understand
for users.

In decision tree technique, the root of the decision tree is a

simple question or condition that has multiple answers.

Each answer then leads to a set of questions or conditions

that help us determine the data so that we can make the
final decision based on it.

For example, We use the following decision tree to

determine whether or not to play tennis:
Knowledge Representation

Knowledge representation is the

presentation of knowledge to the user for
visualization in terms of trees, tables, rules
graphs, charts, matrices, etc.
For Example: Histograms
Histograms

•Histogram provides the representation of a distribution of

values of a single attribute.
•It consists of a set of rectangles, that reflects the counts
or frequencies of the classes present in the given data.
Example: Histogram of an electricity bill generated for 4
months, as shown in diagram given below.
Data Visualization

It deals with the representation of

data in a graphical or pictorial
format.

Patterns in the data are marked

easily by using the data visualization
technique.
Pixel- oriented visualization technique
In pixel based visualization techniques,
there are separate sub-windows for the
value of each attribute and it is
represented by one colored pixel.
Pixel- oriented visualization technique

•The color mapping of the

pixel is decided on the basis
of data characteristics and
visualization tasks.
Geometric projection visualization technique

i. Scatter-plot matrices
It consists of scatter plots of all possible pairs of variables in a
dataset.

ii. Hyper slice

It is an extension to scatter-plot matrices. They represent multi-
dimensional
function as a matrix of orthogonal two dimensional slices.

iii. Parallel co-ordinates T he parallel vertical lines which are

separated defines the axes.
A point in the Cartesian coordinates corresponds to a polyline in
parallel coordinates.
3. Icon-based visualization techniques
Icon-based visualization techniques are also known as iconic
display techniques.
Each multidimensional data item is mapped to an icon.
This technique allows visualization of large amount of data.
The most commonly used technique is Chernoff faces.
Chernoff faces

For example: The face width, the length of the mouth and the
length of nose, etc. as shown in the following diagram.
Visualization techniques

Hierarchical visualization techniques

 Hierarchical visualization
techniques are used for
partitioning of all dimensions in to
subset.
 These subsets are visualized in
hierarchical manner.
Some of the visualization techniques are:

i. Dimensional stacking In dimension stacking,

n-dimensional attribute space is partitioned in
2-dimensional subspaces.
Attribute values are partitioned into various
classes.
Each element is two dimensional space in the form
of xy plot.
Helps to mark the important attributes and are
used on the outer level.
ii. Mosaic plotMosaic plot gives the graphical
representation of successive decompositions.
Rectangles are used to represent the count of
categorical data and at every stage, rectangles are
Tree maps visualization

 Techniques are well suited for displaying large

amount of hierarchical structured data.
 The visualization space is divided into the multiple
rectangles that are ordered, according to a
quantitative variable.
 The levels in the hierarchy are seen as rectangles
containing the other rectangle.
 Each set of rectangles on the same level in the
hierarchy represents a category, a column or an
expression in a data set.
 Visualization complex data and relations
 This technique is used to visualize non-numeric
data.

For example: text, pictures, blog entries and product

reviews.
Expert systems

Rely on domain experts for decision making - using their knowledge

intuition o Time consuming, costly, error prone, biased
 So the solution is to use Data Mining tools
– performs data analysis,
- finds data patterns
Architecture of Typical Data
Mining Systems
Architecture of a typical Data Mining
System – Major Components:
Knowledge Base:
 Domain knowledge is used to guide search – used to
evaluate
interestingness of patterns.
 Includes concept hierarchies, user benefits, thresholds,
metadata Database / Data warehouse Server:
 Responsible for fetching relevant data based on data
mining request.
Data Mining Engine:
 Consists of modules for characterization, association, correlation
analysis, classification, cluster analysis, prediction, outlier
analysis and evolution analysis.
Pattern Evaluation Module:
 Interacts with data mining modules. Focuses the search
towards interesting patterns.
 Pattern evaluation module may be integrated with mining
module to confine the search.
User Interface:
 Communicates between users and data mining system
 Specifies data mining query – to focus search
Major Issues in Data Mining:
 Mining Methodology Issues:
o Mining different kinds of knowledge in databases.
o Incorporation of background knowledge
o Handling noisy or incomplete data
o Pattern Evaluation – Interestingness Problem
 User Interaction Issues:
o Interactive mining of knowledge at multiple levels of
abstraction
o Data mining query languages and ad-hoc data mining.
o Presentation and visualization of data mining results.
 Performance Issues:
o Efficiency and Scalability of Data Mining Algorithms.
o Parallel, distributed and incremental mining algorithms.
 Issues related to diversity of data types:
o Handling of relational and complex types of data.
o Mining information from heterogeneous databases and
global I nformation systems.
Review Questions

1. What motivated Data Mining? Why is it

important?
2. What is Data Mining?
3. Explain the steps in the Knowledge Discovery
Process.
4. Detail on the Architecture of Data Mining
Systems with a suitable diagram.
5. Explain about various Data Mining functionalities
6. Discuss about the major issues in data mining.

Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit - I
No ratings yet
Unit - I
22 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Data Mining
No ratings yet
Data Mining
9 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining
No ratings yet
Data Mining
395 pages
1 - DM
No ratings yet
1 - DM
5 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 4
No ratings yet
Unit 4
17 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Mining & BI Course Guide
No ratings yet
Data Mining & BI Course Guide
25 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
Internal
No ratings yet
Internal
267 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Mining Concepts & Techniques Guide
100% (2)
Data Mining Concepts & Techniques Guide
27 pages
Unit III
No ratings yet
Unit III
101 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Introduction
No ratings yet
Introduction
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Chap 1
No ratings yet
Chap 1
32 pages
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
No ratings yet
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
21 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining Concepts and Techniques
67% (3)
Data Mining Concepts and Techniques
136 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Python Data Science
92% (12)
Python Data Science
65 pages
Statistical Data Analysis Explained
93% (27)
Statistical Data Analysis Explained
359 pages
Introduction to Data Visualization
100% (12)
Introduction to Data Visualization
28 pages
Data Mining A Tutorial-Based Primer, Second Edition PDF
100% (1)
Data Mining A Tutorial-Based Primer, Second Edition PDF
530 pages
Big Data Analytics Overview
100% (6)
Big Data Analytics Overview
112 pages
SQL PDF
100% (13)
SQL PDF
221 pages
Python Machine Learning Workbook For Beginners
100% (1)
Python Machine Learning Workbook For Beginners
264 pages
EDA Techniques and Case Studies
100% (4)
EDA Techniques and Case Studies
791 pages
Feature Engineering PDF
100% (1)
Feature Engineering PDF
75 pages
Big Data Analytics Methods and Applications Jovan Pehcevski
100% (6)
Big Data Analytics Methods and Applications Jovan Pehcevski
430 pages
Data Visualization in Python Preview PDF
100% (9)
Data Visualization in Python Preview PDF
58 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Machine Learning?
100% (5)
Machine Learning?
114 pages
Power BI DAX Simplified B099SBN1XP
94% (16)
Power BI DAX Simplified B099SBN1XP
542 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
UltimateGuidetoDataScienceInterviews 2
100% (4)
UltimateGuidetoDataScienceInterviews 2
87 pages
Data Science Theory, Analysis and Applications - Memon - Ahmed
100% (14)
Data Science Theory, Analysis and Applications - Memon - Ahmed
345 pages
PL 300 Master Cheat Sheet
100% (2)
PL 300 Master Cheat Sheet
19 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
Data Mining Using Conceptual Clustering
No ratings yet
Data Mining Using Conceptual Clustering
29 pages
Talk - Data Quality Framework
100% (1)
Talk - Data Quality Framework
30 pages
Data Analyst
100% (2)
Data Analyst
446 pages
Business Intelligence and Analytics Fundamentals - Charles Natuhamya
100% (1)
Business Intelligence and Analytics Fundamentals - Charles Natuhamya
21 pages
Machine Learning From Scratch PDF
89% (9)
Machine Learning From Scratch PDF
124 pages
The Data Visualization Workshop
86% (7)
The Data Visualization Workshop
535 pages
Predictive Analytics
100% (1)
Predictive Analytics
62 pages
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (9)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Data Visualization Charts, Maps, and Interactive Graphics
100% (17)
Data Visualization Charts, Maps, and Interactive Graphics
249 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
20 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Guide Lines For 10 TH Semester: Practical Training Papers
No ratings yet
Guide Lines For 10 TH Semester: Practical Training Papers
11 pages
Data Modeling: This Class: Hand in Exercises On Paper After Class Next Class: No Reading. Exercises Due After Class
No ratings yet
Data Modeling: This Class: Hand in Exercises On Paper After Class Next Class: No Reading. Exercises Due After Class
20 pages
Basic Documents and Transactions Related To Bank Deposits
No ratings yet
Basic Documents and Transactions Related To Bank Deposits
18 pages
Let's Call Quiet Quitting What It Often Is Calibrated Contributing
No ratings yet
Let's Call Quiet Quitting What It Often Is Calibrated Contributing
5 pages
PJBL 4
No ratings yet
PJBL 4
44 pages
Example of A Hotel Organizational Chart
No ratings yet
Example of A Hotel Organizational Chart
14 pages
Page 1
No ratings yet
Page 1
8 pages
Realcare: Adult & Baby Hygiene Products
No ratings yet
Realcare: Adult & Baby Hygiene Products
24 pages
SNT Autopart Oil Seal Catalog For MITSUBISHI FUSO PDF
No ratings yet
SNT Autopart Oil Seal Catalog For MITSUBISHI FUSO PDF
151 pages
Annual Gender and Development (Gad) Plan and Budget Cy 2019
No ratings yet
Annual Gender and Development (Gad) Plan and Budget Cy 2019
2 pages
Elective 3 Mathematics of Finance PDF
No ratings yet
Elective 3 Mathematics of Finance PDF
70 pages
DLP - AdmitCard - 1004098774 - 01-02-2025 22 - 55 - 32
No ratings yet
DLP - AdmitCard - 1004098774 - 01-02-2025 22 - 55 - 32
3 pages
R63 Declaratory Relief and Similar Remedies
100% (3)
R63 Declaratory Relief and Similar Remedies
15 pages
2021 - 2 - 11 - 145 - CL1-Adavanced Audit and Assurance-Feb 2021 - English
No ratings yet
2021 - 2 - 11 - 145 - CL1-Adavanced Audit and Assurance-Feb 2021 - English
13 pages
Usher Job Description at Performing Arts Center
No ratings yet
Usher Job Description at Performing Arts Center
1 page
Late Vs First 13
No ratings yet
Late Vs First 13
24 pages
Engergy Saving Mode For BCCH TRX PDF
100% (1)
Engergy Saving Mode For BCCH TRX PDF
19 pages
Ita Reviewer
No ratings yet
Ita Reviewer
5 pages
Unit - 4 Human Resource Management
No ratings yet
Unit - 4 Human Resource Management
40 pages
Discounted Cash Flow Valuation
100% (1)
Discounted Cash Flow Valuation
26 pages
Life Insurance Plan Details
No ratings yet
Life Insurance Plan Details
9 pages
Fine-Tuning and Chatbot Planning
No ratings yet
Fine-Tuning and Chatbot Planning
2 pages
Indwdhi 20231231
No ratings yet
Indwdhi 20231231
1 page
Internet Bill March
No ratings yet
Internet Bill March
1 page
Attachment-7-Operation and Maintenance Training PDF
No ratings yet
Attachment-7-Operation and Maintenance Training PDF
4 pages
Suggested Reading Topics in Community Medicine
No ratings yet
Suggested Reading Topics in Community Medicine
16 pages
Tesla Free Power Device - VladimirUtkin
100% (1)
Tesla Free Power Device - VladimirUtkin
75 pages
Studio
No ratings yet
Studio
65 pages
Consti 1 - Syllabus
No ratings yet
Consti 1 - Syllabus
26 pages
Acknowledgement
No ratings yet
Acknowledgement
2 pages

Week-1-Introduction To Data Mining

Uploaded by

Week-1-Introduction To Data Mining

Uploaded by

Introduction to Data

 “Gold Mining from rock or sand” is same as “Knowledge mining

 Other terms for Data Mining:

 Data Mining is not same as KDD (Knowledge Discovery from

• Huge Volume of data

Statistics/ Machine Learning/

Data Warehouse:-Data spread in several databases –

 Machine learning explores the study and construction of

 Machine learning is closely related to (and often overlaps

 Database-oriented data sets and applications

 With data mining, you can ask

Data mining is simply the acquisition

Data Cleaning – Remove noisy and inconsistent

 That’s is the reason why association technique is also

 Retailers are using association technique to research

 Classification method makes use of mathematical

 In classification, we develop the software that can

 The clustering technique defines the classes

 To make the concept clearer, we can take

 The prediction, as its name implied, is one

 For instance, the prediction analysis

 Then based on the historical sale and

 Sequential patterns analysis is one of data

 In sales, with historical transaction data,

 Then businesses can use this information to

In decision tree technique, the root of the decision tree is a

Each answer then leads to a set of questions or conditions

For example, We use the following decision tree to

Knowledge representation is the

•Histogram provides the representation of a distribution of

It deals with the representation of

Patterns in the data are marked

•The color mapping of the

ii. Hyper slice

iii. Parallel co-ordinates T he parallel vertical lines which are

Hierarchical visualization techniques

i. Dimensional stacking In dimension stacking,

 Techniques are well suited for displaying large

For example: text, pictures, blog entries and product

Rely on domain experts for decision making - using their knowledge

1. What motivated Data Mining? Why is it

You might also like