0% found this document useful (0 votes)

24 views7 pages

Data Mining and Data Warehouse Study Material - Edited

The document provides an overview of data mining and data warehousing, explaining the processes involved in analyzing large datasets to identify patterns and support business decisions. It details various architectures of data warehouses, data cleaning and integration techniques, and the functionalities of data mining, including descriptive and predictive mining. Additionally, it covers concepts such as multidimensional models, decision trees, data transformation, and hierarchy generation in the context of data management.

Uploaded by

a.m.i.t.k.u.m.a.r.6.2.7.5.9.7.7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

Data Mining and Data Warehouse Study Material - Edited

Uploaded by

a.m.i.t.k.u.m.a.r.6.2.7.5.9.7.7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

DATA MINING AND DATA WAREHOUSE

DATA MINING
Data mining is the process of searching and analyzing a large batch of raw
data in order to identify patterns and extract useful information.

Companies use data mining software to learn more about their customers.
It can help them to develop more effective marketing strategies, increase
sales, and decrease costs. Data mining relies on effective data
collection, warehousing, and computer processing.

OPERATIONAL DATABASE
An operational database management system is software that is
designed to allow users to easily define, modify, retrieve, and manage
data in real-time. While conventional databases rely on batch
processing, operational database systems are oriented toward real-time,
transactional operations.

DATAWAREHOUSE ARCHITECTURE

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 1
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

The single-tier Data Warehouse architecture is composed of a single

hardware layer. This hardware layer is composed of a single hardware
layer. There are three approaches to creating a data warehouse layer:
Single-tier, two-tier, and three-tier.

Single-tier architecture: A single-layer structure aimed at keeping data

space minimal. This structure is rarely used in real life.

Two-tier architecture: Data warehouse is the aggregation of data in a

format that is easy to transform and load into a database. Data
warehouses can be implemented in a number of different ways, and it is
important to pick the right one for your business needs. The most
important thing to consider is scalability. If you want to store large
amounts of data in a small amount of space, then you should consider
using a data warehouse.

Three-Tier Data Warehouse Architecture: The Top, Middle, and

Bottom Tiers of this Architecture of Data Warehouse are collectively
referred to as the Top Tier.

1. The bottom tier of the Datawarehouse is a relational database

system. This database system typically contains a relational
database system. Back-end tools clean, transform, and load data
into this layer.
2. A middle tier OLAP server is either ROLAP or MOLAP-based. It
abstracts OLAP from the end user by serving as a middle tier
OLAP server. Data warehouses that facilitate end-user interaction
with the database and middle tier OLAP servers that abstract
OLAP from the end user are known as middle tier OLAP servers.
3. The front-end client layer of the top-tier is important because it is
the first point of interaction with the data. It is where data is
presented to the end user, and decisions are made with the data.
The front-end client layer of top-tier must work with real-time data
and must be able to process data quickly. It is also important to
work with data that is in a format that top-tier can understand and
use. Typically, top-tier data is in a relational database format, but it
could be a file or a stream. Top-tier data must be well-structured,

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 2
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

must be validated, and must be structured in a way that allows for

easier data profiling and analytics.

CLEANING AND INTEGRATION

Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset.
When combining multiple data sources, there are many opportunities for
data to be duplicated or mislabeled. If data is incorrect, outcomes and
algorithms are unreliable, even though they may look correct. There is
no one absolute way to prescribe the exact steps in the data cleaning
process because the processes will vary from dataset to dataset. But it
is crucial to establish a template for your data cleaning process so you
know you are doing it the right way every time.

Data integration is the process of combining data from multiple

sources into a cohesive and consistent view. This process involves
identifying and accessing the different data sources, mapping the data
to a common format, and reconciling any inconsistencies or
discrepancies between the sources. The goal of data integration is to
make it easier to access and analyze data that is spread across
multiple systems or platforms, in order to gain a more complete and
accurate understanding of the data.
FUNCTIONALITIES OF DATAMINING
There is a lot of confusion between data mining and data analysis. Data
mining functions are used to define the trends or correlations contained
in data mining activities. While data analysis is used to test statistical
models that fit the dataset, for example, analysis of a marketing
campaign, data mining uses Machine Learning and mathematical and
statistical models to discover patterns hidden in the data. In comparison,
data mining activities can be divided into two categories:

o Descriptive Data Mining: It includes certain knowledge to

understand what is happening within the data without a previous idea.
The common data features are highlighted in the data set. For
example, count, average etc.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 3
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

o Predictive Data Mining: It helps developers to provide unlabeled

definitions of attributes. With previously available or historical data,
data mining can be used to make predictions about critical business
metrics based on data's linearity. For example, predicting the volume
of business next quarter based on performance in the previous
quarters over several years or judging from the findings of a patient's
medical examinations that is he suffering from any particular disease.

MULTIDIMENSIONAL MODEL

A multidimensional model views data in the form of a data-cube. A data

cube enables data to be modeled and viewed in multiple dimensions. It
is defined by dimensions and facts.

The dimensions are the perspectives or entities concerning which an

organization keeps records. For example, a shop may create a sales
data warehouse to keep records of the store's sales for the dimension
time, item, and location. These dimensions allow the save to keep track
of things, for example, monthly sales of items and the locations at which
the items were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For example,
a dimensional table for an item may contain the attributes item_name,
brand, and type.

A multidimensional data model is organized around a central theme, for

example, sales. This theme is represented by a fact table. Facts are
numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.

DATAWARE
A data warehouse is a repository of data from an organization's
operational systems and other sources that supports analytics
applications to help drive business decision-making. Data warehousing
is a key part of an overall data management strategy: The data stored in
data warehouses is processed and organized for analysis by business
analysts, executives, data scientists and other users.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 4
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

Typically, a data warehouse is a relational database or columnar

database housed on a computer system in an on-premises data center
or, increasingly, the cloud. Data from online transaction processing
(OLTP) applications and additional internal or external sources is
extracted and consolidated in the data warehouse for business
intelligence (BI) uses that include ad hoc querying, decision support and
enterprise reporting. Users access the data through BI software and
other types of analytics tools.

GRAPHIC USER INTERFACE

A graphical user interface (GUI) is a digital interface in which a user
interacts with graphical components such as icons, buttons, and menus.
In a GUI, the visuals displayed in the user interface convey information
relevant to the user, as well as actions that they can take.

QUERY LANGUAGE
The Data Mining Query Language (DMQL) was proposed by Han, Fu,
Wang, et al. for the DBMiner data mining system. The Data Mining
Query Language is actually based on the Structured Query Language
(SQL). Data Mining Query Languages can be designed to support ad
hoc and interactive data mining. This DMQL provides commands for
specifying primitives. The DMQL can work with databases and data
warehouses as well. DMQL can be used to define data mining tasks.

DECISION TREE
A decision tree is a non-parametric supervised learning algorithm,
which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches,
internal nodes and leaf nodes.

TRANSFORMATION AND REDUCTION

Data transformation in data mining refers to the process of converting
raw data into a format that is suitable for analysis and modeling. The

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 5
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

goal of data transformation is to prepare the data for data mining so

that it can be used to extract useful insights and knowledge.

Data reduction is a capacity optimization technique in which data is

reduced to its simplest possible form to free up capacity on a storage
device. There are many ways to reduce data, but the idea is very
simple—squeeze as much data into physical storage as possible to
maximize capacity.

CONCEPT OF HIERARCHY GENERATION

A concept hierarchy represents a series of mappings from a set of low-

level concepts to larger-level, more general concepts. Concept hierarchy
organizes information or concepts in a hierarchical structure or a specific
partial order, which are used for defining knowledge in brief, high-level
methods, and creating possible mining knowledge at several levels of
abstraction.

A conceptual hierarchy includes a set of nodes organized in a tree,

where the nodes define values of an attribute known as concepts. A
specific node, “ANY”, is constrained for the root of the tree. A number is
created to the level of each node in a conceptual hierarchy. The level of
the root node is one. The level of a non-root node is one more the level
of its parent level number.

Because values are defined by nodes, the levels of nodes can also be
used to describe the levels of values. Concept hierarchy enables raw
information to be managed at a higher and more generalized level of
abstraction. There are several types of concept hierarchies which are as
follows −

Schema Hierarchy − Schema hierarchy represents the total or partial

order between attributes in the database. It can define existing semantic
relationships between attributes. In a database, more than one schema
hierarchy can be generated by using multiple sequences and grouping
of attributes.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 6
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

Set-Grouping Hierarchy − A set-grouping hierarchy constructs values

for a given attribute or dimension into groups or constant range values. It
is also known as instance hierarchy because the partial series of the
hierarchy is represented on the set of instances or values of an attribute.
These hierarchies have more functional sense and are so approved than
other hierarchies.

Operation-Derived Hierarchy − Operation-derived hierarchy is

represented by a set of operations on the data. These operations are
defined by users, professionals, or the data mining system. These
hierarchies are usually represented for mathematical attributes. Such
operations can be as easy as range value comparison, as difficult as a
data clustering and data distribution analysis algorithm.

Rule-based Hierarchy − In a rule-based hierarchy either a whole

concept hierarchy or an allocation of it is represented by a set of rules
and is computed dynamically based on the current information and rule
definition. A lattice-like architecture is used for graphically defining this
type of hierarchy, in which each child-parent route is connected with a
generalization rule.

The static and dynamic generation of concept hierarchy is based on data

sets. In this context, the generation of a concept hierarchy depends on a
static or dynamic data set is known as the static or dynamic generation
of concept hierarchy.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 7

Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
No ratings yet
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
24 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Data Mining & Techniques Guide
No ratings yet
Data Mining & Techniques Guide
108 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Data Mining for Business Analysts
No ratings yet
Data Mining for Business Analysts
77 pages
ALL YOU NEED Data - Mining - and - Warehousing
No ratings yet
ALL YOU NEED Data - Mining - and - Warehousing
42 pages
Data Mining Chapter 1 Introduction
No ratings yet
Data Mining Chapter 1 Introduction
39 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
30 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
14 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
6 pages
Datamining
100% (1)
Datamining
11 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
Rdbmsiii 190703162808
No ratings yet
Rdbmsiii 190703162808
20 pages
By Bi Jay Mishra
100% (1)
By Bi Jay Mishra
685 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
7 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
Data Mining and Data Warehousing
100% (1)
Data Mining and Data Warehousing
12 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
63 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
Data Mining Ch1
No ratings yet
Data Mining Ch1
38 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Data Mining Capital Iq
No ratings yet
Data Mining Capital Iq
78 pages
DM - Unit4
No ratings yet
DM - Unit4
15 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
3 pages
Unit 2 Data Warehouse
No ratings yet
Unit 2 Data Warehouse
22 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Data Mininng
No ratings yet
Data Mininng
11 pages
Module 1
No ratings yet
Module 1
32 pages
Current Trends
No ratings yet
Current Trends
35 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
DWDM
No ratings yet
DWDM
48 pages
Data Warehousing & Mining Basics
No ratings yet
Data Warehousing & Mining Basics
20 pages
1,2 Units Notes
No ratings yet
1,2 Units Notes
53 pages
DATA MINING Unit 1
No ratings yet
DATA MINING Unit 1
22 pages
001.data Mining and Data Warewhouse
No ratings yet
001.data Mining and Data Warewhouse
7 pages
Hu DM 2024
No ratings yet
Hu DM 2024
205 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Data Mining 5 Units Notes
No ratings yet
Data Mining 5 Units Notes
85 pages
Session 35 - Data Mining and Data Warehousing
No ratings yet
Session 35 - Data Mining and Data Warehousing
14 pages
Unit 1
No ratings yet
Unit 1
11 pages
Reports Interview Q&A..
0% (1)
Reports Interview Q&A..
15 pages
My Resume
No ratings yet
My Resume
3 pages
SQL Questions For Practical File
No ratings yet
SQL Questions For Practical File
1 page
What Purpose Does The Model Database Serve?: SQL Server
No ratings yet
What Purpose Does The Model Database Serve?: SQL Server
10 pages
Student Portfolio
No ratings yet
Student Portfolio
11 pages
ISAM B+trees
No ratings yet
ISAM B+trees
12 pages
Chapter 1: Object-Based Database Concepts
No ratings yet
Chapter 1: Object-Based Database Concepts
27 pages
DBMS - Unit 3 - Notes (Relational Calculus)
No ratings yet
DBMS - Unit 3 - Notes (Relational Calculus)
22 pages
Event Management System Project Report
No ratings yet
Event Management System Project Report
29 pages
Cycle 1
No ratings yet
Cycle 1
5 pages
ETL Testing Guide: Concepts & Types
No ratings yet
ETL Testing Guide: Concepts & Types
14 pages
NGT Unit 1
No ratings yet
NGT Unit 1
16 pages
Systematic Review Search Guide
No ratings yet
Systematic Review Search Guide
9 pages
Criteria Api: Criteriabuilder
No ratings yet
Criteria Api: Criteriabuilder
7 pages
CS621 Assignment 01
No ratings yet
CS621 Assignment 01
2 pages
NoSQL Scaling and Consistency
No ratings yet
NoSQL Scaling and Consistency
76 pages
FastGeo Efficient Geometric Range Queries
No ratings yet
FastGeo Efficient Geometric Range Queries
5 pages
Unit-1 RDBMS
No ratings yet
Unit-1 RDBMS
24 pages
Exadata Migration
100% (1)
Exadata Migration
13 pages
Smartplant Foundation: How To Configure The Consolidated Data Warehouse
100% (1)
Smartplant Foundation: How To Configure The Consolidated Data Warehouse
41 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
SQL Cheatsheet - 2
No ratings yet
SQL Cheatsheet - 2
21 pages
1Z0-497 - Dumps: 1. Which Statement About CDB Architecture Is True?
No ratings yet
1Z0-497 - Dumps: 1. Which Statement About CDB Architecture Is True?
36 pages
MySQL Installation (Linux or Ubuntu)
No ratings yet
MySQL Installation (Linux or Ubuntu)
17 pages
Database Interview Question
No ratings yet
Database Interview Question
27 pages
Query Languages - DPP 02 Discussion Notes
No ratings yet
Query Languages - DPP 02 Discussion Notes
12 pages
Unit 5 Final Exam Introduction To Relational Database Management Systems
No ratings yet
Unit 5 Final Exam Introduction To Relational Database Management Systems
235 pages
Step by Step Guide To RefWorks Hallam 3.1 Share Online
No ratings yet
Step by Step Guide To RefWorks Hallam 3.1 Share Online
6 pages
Parallel Database Systems An Overview
No ratings yet
Parallel Database Systems An Overview
10 pages
DWM Unit 4 Introduction To Data Mining
100% (2)
DWM Unit 4 Introduction To Data Mining
17 pages

Data Mining and Data Warehouse Study Material - Edited

Uploaded by

Data Mining and Data Warehouse Study Material - Edited

Uploaded by

DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

DATA MINING AND DATA WAREHOUSE

The single-tier Data Warehouse architecture is composed of a single

Single-tier architecture: A single-layer structure aimed at keeping data

Two-tier architecture: Data warehouse is the aggregation of data in a

Three-Tier Data Warehouse Architecture: The Top, Middle, and

1. The bottom tier of the Datawarehouse is a relational database

must be validated, and must be structured in a way that allows for

CLEANING AND INTEGRATION

Data integration is the process of combining data from multiple

o Descriptive Data Mining: It includes certain knowledge to

o Predictive Data Mining: It helps developers to provide unlabeled

A multidimensional model views data in the form of a data-cube. A data

The dimensions are the perspectives or entities concerning which an

A multidimensional data model is organized around a central theme, for

Typically, a data warehouse is a relational database or columnar

GRAPHIC USER INTERFACE

TRANSFORMATION AND REDUCTION

goal of data transformation is to prepare the data for data mining so

Data reduction is a capacity optimization technique in which data is

CONCEPT OF HIERARCHY GENERATION

A concept hierarchy represents a series of mappings from a set of low-

A conceptual hierarchy includes a set of nodes organized in a tree,

Schema Hierarchy − Schema hierarchy represents the total or partial

Set-Grouping Hierarchy − A set-grouping hierarchy constructs values

Operation-Derived Hierarchy − Operation-derived hierarchy is

Rule-based Hierarchy − In a rule-based hierarchy either a whole

The static and dynamic generation of concept hierarchy is based on data

You might also like