0% found this document useful (0 votes)

12 views11 pages

Data Preprocessing Personal

Data warehousing is a centralized storage system for historical data used for analysis and reporting, while data mining involves discovering patterns and insights from large datasets using statistical and AI techniques. Both processes are essential for effective data management, with data mining relying on the structured data provided by data warehouses. The Knowledge Discovery in Databases (KDD) process outlines the steps for extracting valuable information from data, including data selection, preprocessing, transformation, mining, evaluation, and presentation.

Uploaded by

raisahab2199

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views11 pages

Data Preprocessing Personal

Uploaded by

raisahab2199

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Warehouse vs.

Data Mining: Explanation and Differences

Both data warehousing and data mining are essential components of data management
and analysis, but they serve different purposes.

1. Data Warehouse: Explanation

A data warehouse is a large, centralized storage system that collects and organizes data
from multiple sources for analysis and reporting. It is designed for querying, reporting, and
decision-making.
Key Features:
✅ Stores historical data for business intelligence (BI)
✅ Integrates data from multiple sources (databases, files, etc.)
✅ Optimized for fast querying and reporting
✅ Uses ETL (Extract, Transform, Load) processes to prepare data
Example:
A retail company stores sales data from different stores in a data warehouse. Analysts use
this data to generate reports on customer trends, sales performance, and inventory
management.

2. Data Mining: Explanation

Data mining is the process of discovering patterns, trends, and useful information from
large datasets using statistical, AI, and machine learning techniques.
Key Features:
✅ Extracts hidden patterns and relationships in data
✅ Uses algorithms like clustering, classification, and association rules
✅ Helps in predictive analytics and decision-making
✅ Works on structured and unstructured data
Example:
A bank uses data mining to analyze customer transactions and detect fraudulent activities
by identifying unusual spending patterns.

3. Differences Between Data Warehouse and Data Mining

Feature Data Warehouse Data Mining

Centralized repository for storing and Process of analyzing data to find

Definition
managing data hidden patterns

Storage and retrieval of historical data Extracting insights and making

Purpose
for analysis predictions

Process ETL (Extract, Transform, Load) Algorithms, statistical models, and AI

SQL, OLAP (Online Analytical Machine learning, AI, clustering,

Technology
Processing), Data Lakes classification

Users Business analysts, data engineers Data scientists, analysts

Output Reports, dashboards, structured data Patterns, trends, predictions

Summary:
 A data warehouse is a storage system for structured historical data, mainly used for
reporting.
 Data mining is the process of analyzing data (often stored in a data warehouse) to
uncover insights and patterns.
Both work together—data mining needs a data warehouse to have high-quality, structured
data for analysis. 🚀

What Kinds of Data be Mined?

database data, transactional data, data warehouses, and other kinds of data
for data mining. Here’s a detailed explanation of each:

1. Database Data
A database is a structured collection of data stored in tables. Most databases
follow a relational model (RDBMS) where data is organized in rows and
columns with relationships between tables.
Data Mining on Database Data:
 Classification – Identifying customer segments based on demographics.
 Clustering – Grouping products with similar sales patterns.
 Association Rules – Finding relationships between purchased products
(e.g., people who buy bread also buy butter).
📌 Example:
A bank uses data mining on its customer database to identify high-risk loan
applicants based on their credit history.

2. Transactional Data
Transactional data refers to records of business transactions, typically stored
in databases or data warehouses. These transactions capture events like
purchases, deposits, withdrawals, and stock trades.
Data Mining on Transactional Data:
 Market Basket Analysis – Identifying items frequently bought together
in retail (e.g., diapers and baby wipes).
 Fraud Detection – Identifying suspicious financial transactions based on
unusual spending patterns.
 Time-Series Analysis – Predicting sales trends over time.
📌 Example:
An e-commerce platform mines transactional data to recommend products to
users based on their purchase history.
3. Data Warehouse Data
A data warehouse is a large, centralized repository where data from multiple
sources is integrated, processed, and stored for analytical purposes. It
contains historical data and is optimized for reporting and data analysis
rather than transactional operations.
Data Mining on Data Warehouses:
 Trend Analysis – Identifying long-term sales growth patterns.
 Customer Segmentation – Finding high-value customers based on past
purchase history.
 Forecasting – Predicting future demand for products based on past
data.
📌 Example:
A telecom company mines its data warehouse to identify churn patterns and
prevent customers from leaving by offering personalized promotions.

4. Other Kinds of Data

Besides databases, transactions, and data warehouses, data mining can also
be applied to semi-structured and unstructured data, such as:
A. Web Data (Clickstream, Social Media Data)
 Used for web mining (analyzing user behavior on websites).
 Example: Tracking customer journeys to improve website experience.
B. Text Data (Emails, Customer Reviews, Chat Logs)
 Used for sentiment analysis (understanding customer opinions).
 Example: Analyzing social media posts to gauge public opinion on a
brand.

Knowledge Discovery in Databases (KDD) Process in Data Mining

The KDD process (Knowledge Discovery in Databases) is a systematic approach
to extracting useful patterns and knowledge from large datasets. It consists of
multiple steps, starting from raw data and ending with meaningful insights.
📌 KDD is the foundation of data mining, where raw data is processed to
extract valuable information.

🔹 Steps in the KDD Process

1️Data Selection (Choosing Relevant Data)
📌 What Happens?
 Identify and collect the relevant data from databases, data warehouses,
or other sources.
 Remove unnecessary data to focus on specific objectives (e.g., customer
behavior, fraud detection).
📌 Example:
A bank collects customer transaction data to identify fraudulent activities.

2️Data Preprocessing (Cleaning & Transformation)

📌 What Happens?
 Handle missing values, duplicate records, and inconsistent data.
 Normalize and standardize data (e.g., converting different date formats
into a uniform format).
 Reduce noise and fix outliers.
📌 Example:
If some customer records have missing age data, they can be filled with the
average age of similar customers.

3️Data Transformation (Feature Engineering & Reduction)

📌 What Happens?
 Convert raw data into a suitable format for analysis.
 Reduce data complexity (dimensionality reduction).
 Select key features that are most relevant for the mining process.
📌 Example:
In fraud detection, instead of using raw transaction logs, we extract features
like transaction frequency, average amount spent, and time of transaction.

4️Data Mining (Applying Algorithms to Extract Patterns)

📌 What Happens?
 Apply data mining techniques such as classification, clustering,
association rule mining, and anomaly detection.
 Use algorithms to find patterns, trends, and relationships in data.
📌 Example:
A supermarket applies association rule mining to find that "People who buy
bread also buy butter."
🔍 Common Data Mining Techniques:
✔️Classification: Spam vs. Non-Spam emails
✔️Clustering: Customer segmentation
✔️Association Rule Mining: Market basket analysis
✔️Anomaly Detection: Fraudulent transaction detection

5️Pattern Evaluation & Interpretation

📌 What Happens?
 Evaluate and validate the discovered patterns.
 Check whether the patterns are useful and make business sense.
 Interpret the results to derive meaningful conclusions.
📌 Example:
A credit card company finds that fraudulent transactions usually happen at odd
hours and in unusual locations. They use this insight to improve fraud
detection systems.
6️Knowledge Presentation (Visualization & Reporting)
📌 What Happens?
 Present the discovered knowledge in a readable format (graphs, charts,
reports).
 Decision-makers use these insights for strategic planning.
📌 Example:
A retail company visualizes customer purchasing patterns using heatmaps and
bar charts to optimize product placement in stores.

📌 Summary Table: KDD Steps

Step What Happens? Example
Selecting transaction records for
1. Data Selection Choosing relevant data
fraud detection
2. Data Cleaning and preparing Handling missing values,
Preprocessing data removing duplicates
3. Data Converting data into a Creating new features like
Transformation usable format "average transaction amount"
Applying algorithms to Finding association rules
4. Data Mining
find patterns between purchased products
Checking if patterns are Validating fraud detection rules
5. Pattern Evaluation
useful in a banking system
6. Knowledge Presenting results via Showing customer buying trends
Presentation visualization in a dashboard

🔹 Conclusion
✅ The KDD process is a structured approach to discovering knowledge from
data.
✅ Data Mining is a key step within KDD, but the entire process involves data
selection, cleaning, transformation, pattern discovery, and interpretation.
✅ Businesses use KDD to make data-driven decisions in areas like marketing,
fraud detection, and customer segmentation. 🚀
Data Preprocessing
Correctly modify the errors

DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
DWM (Data Warehousing and Mining) : By: Akatsuki
No ratings yet
DWM (Data Warehousing and Mining) : By: Akatsuki
12 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
DM Notes
No ratings yet
DM Notes
193 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Unit 2 Introduction To Data Mining
No ratings yet
Unit 2 Introduction To Data Mining
38 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Data Mining: Concepts and Challenges
100% (1)
Data Mining: Concepts and Challenges
24 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
55 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
17 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
52 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Data Mining
No ratings yet
Data Mining
46 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
DWDM Unit II
No ratings yet
DWDM Unit II
18 pages
Data Minng
No ratings yet
Data Minng
20 pages
FDS Unit 1
No ratings yet
FDS Unit 1
20 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Past PPR
No ratings yet
Past PPR
31 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
NCVRT Datamining
No ratings yet
NCVRT Datamining
43 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
DWDM External
No ratings yet
DWDM External
30 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Unit-1 (Data Mining)
No ratings yet
Unit-1 (Data Mining)
13 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
New Note
No ratings yet
New Note
23 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
30 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Best Chapter 1 DM
No ratings yet
Best Chapter 1 DM
22 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
29 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Mining and Warehouse: By, T.Karthi, S.Karthikeyan
No ratings yet
Data Mining and Warehouse: By, T.Karthi, S.Karthikeyan
15 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
PXF 5 11 2
No ratings yet
PXF 5 11 2
252 pages
Buses and I/O System: Computer Architecture and Assembly Language Fall 2003
No ratings yet
Buses and I/O System: Computer Architecture and Assembly Language Fall 2003
45 pages
Lazareto Dec
No ratings yet
Lazareto Dec
87 pages
SQLMAP
No ratings yet
SQLMAP
23 pages
Research Report On The Factors Behind The Use of Daraz
No ratings yet
Research Report On The Factors Behind The Use of Daraz
18 pages
DAX Note Book
No ratings yet
DAX Note Book
32 pages
Itil Capacity Management Policy
No ratings yet
Itil Capacity Management Policy
15 pages
Lesson Plan in Frequency Table
100% (3)
Lesson Plan in Frequency Table
4 pages
Data Warehousing
No ratings yet
Data Warehousing
14 pages
Attachment Report
No ratings yet
Attachment Report
48 pages
PL/SQL Quiz: Data Types and Code Practices
No ratings yet
PL/SQL Quiz: Data Types and Code Practices
5 pages
Oil India Limited Contractual Geologist Hiring
No ratings yet
Oil India Limited Contractual Geologist Hiring
10 pages
Entering, Editing, Managing and Formatting Data - 2020 - W2
No ratings yet
Entering, Editing, Managing and Formatting Data - 2020 - W2
21 pages
Midterm Quiz: Accounting Systems
No ratings yet
Midterm Quiz: Accounting Systems
3 pages
Comparison of Different Source Digital Elevation Models With Carto-Dem
No ratings yet
Comparison of Different Source Digital Elevation Models With Carto-Dem
7 pages
Cognizant Syllabus and Exam Pattern For 2025 Batch
No ratings yet
Cognizant Syllabus and Exam Pattern For 2025 Batch
7 pages
UNIX Helpful Commands: Brush Up Basic Commands
No ratings yet
UNIX Helpful Commands: Brush Up Basic Commands
12 pages
Tableau - Diabetes Dataset Assessment
No ratings yet
Tableau - Diabetes Dataset Assessment
2 pages
Lecture Note
No ratings yet
Lecture Note
5 pages
Grade 2 Data Analysis Lesson
No ratings yet
Grade 2 Data Analysis Lesson
9 pages
Paper 1c
No ratings yet
Paper 1c
10 pages
SQL Database Design & Queries Guide
No ratings yet
SQL Database Design & Queries Guide
4 pages
Amazon Sale Analysis
No ratings yet
Amazon Sale Analysis
13 pages
2024-2025 Dbms External Paper
No ratings yet
2024-2025 Dbms External Paper
9 pages
Spring Persistence Tutorial - Baeldung
No ratings yet
Spring Persistence Tutorial - Baeldung
7 pages
Wireshark Analysis of Trace File-Utar
No ratings yet
Wireshark Analysis of Trace File-Utar
22 pages
Research Methodology Guide
50% (2)
Research Methodology Guide
32 pages
Ms-Access Note
No ratings yet
Ms-Access Note
10 pages
Project Work Sem - III & IV - Word
No ratings yet
Project Work Sem - III & IV - Word
8 pages
Faculty Name: Dr. M. Massarrat Ali Khan Course Name: Introduction To Statistics Email: Mokhan@iba - Edu.pk
No ratings yet
Faculty Name: Dr. M. Massarrat Ali Khan Course Name: Introduction To Statistics Email: Mokhan@iba - Edu.pk
15 pages

Data Preprocessing Personal

Uploaded by

Data Preprocessing Personal

Uploaded by

Data Warehouse vs.

Data Mining: Explanation and Differences

1. Data Warehouse: Explanation

2. Data Mining: Explanation

3. Differences Between Data Warehouse and Data Mining

Centralized repository for storing and Process of analyzing data to find

Storage and retrieval of historical data Extracting insights and making

Process ETL (Extract, Transform, Load) Algorithms, statistical models, and AI

SQL, OLAP (Online Analytical Machine learning, AI, clustering,

Users Business analysts, data engineers Data scientists, analysts

Output Reports, dashboards, structured data Patterns, trends, predictions

What Kinds of Data be Mined?

4. Other Kinds of Data

Knowledge Discovery in Databases (KDD) Process in Data Mining

🔹 Steps in the KDD Process

2️Data Preprocessing (Cleaning & Transformation)

3️Data Transformation (Feature Engineering & Reduction)

4️Data Mining (Applying Algorithms to Extract Patterns)

5️Pattern Evaluation & Interpretation

📌 Summary Table: KDD Steps

You might also like