KEMBAR78
Data Mining and Data Warehouse Study Material - Edited | PDF | Data Warehouse | Data Mining
0% found this document useful (0 votes)
24 views7 pages

Data Mining and Data Warehouse Study Material - Edited

The document provides an overview of data mining and data warehousing, explaining the processes involved in analyzing large datasets to identify patterns and support business decisions. It details various architectures of data warehouses, data cleaning and integration techniques, and the functionalities of data mining, including descriptive and predictive mining. Additionally, it covers concepts such as multidimensional models, decision trees, data transformation, and hierarchy generation in the context of data management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Data Mining and Data Warehouse Study Material - Edited

The document provides an overview of data mining and data warehousing, explaining the processes involved in analyzing large datasets to identify patterns and support business decisions. It details various architectures of data warehouses, data cleaning and integration techniques, and the functionalities of data mining, including descriptive and predictive mining. Additionally, it covers concepts such as multidimensional models, decision trees, data transformation, and hierarchy generation in the context of data management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

DATA MINING AND DATA WAREHOUSE


DATA MINING
Data mining is the process of searching and analyzing a large batch of raw
data in order to identify patterns and extract useful information.

Companies use data mining software to learn more about their customers.
It can help them to develop more effective marketing strategies, increase
sales, and decrease costs. Data mining relies on effective data
collection, warehousing, and computer processing.

OPERATIONAL DATABASE
An operational database management system is software that is
designed to allow users to easily define, modify, retrieve, and manage
data in real-time. While conventional databases rely on batch
processing, operational database systems are oriented toward real-time,
transactional operations.

DATAWAREHOUSE ARCHITECTURE

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 1
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

The single-tier Data Warehouse architecture is composed of a single


hardware layer. This hardware layer is composed of a single hardware
layer. There are three approaches to creating a data warehouse layer:
Single-tier, two-tier, and three-tier.

Single-tier architecture: A single-layer structure aimed at keeping data


space minimal. This structure is rarely used in real life.

Two-tier architecture: Data warehouse is the aggregation of data in a


format that is easy to transform and load into a database. Data
warehouses can be implemented in a number of different ways, and it is
important to pick the right one for your business needs. The most
important thing to consider is scalability. If you want to store large
amounts of data in a small amount of space, then you should consider
using a data warehouse.

Three-Tier Data Warehouse Architecture: The Top, Middle, and


Bottom Tiers of this Architecture of Data Warehouse are collectively
referred to as the Top Tier.

1. The bottom tier of the Datawarehouse is a relational database


system. This database system typically contains a relational
database system. Back-end tools clean, transform, and load data
into this layer.
2. A middle tier OLAP server is either ROLAP or MOLAP-based. It
abstracts OLAP from the end user by serving as a middle tier
OLAP server. Data warehouses that facilitate end-user interaction
with the database and middle tier OLAP servers that abstract
OLAP from the end user are known as middle tier OLAP servers.
3. The front-end client layer of the top-tier is important because it is
the first point of interaction with the data. It is where data is
presented to the end user, and decisions are made with the data.
The front-end client layer of top-tier must work with real-time data
and must be able to process data quickly. It is also important to
work with data that is in a format that top-tier can understand and
use. Typically, top-tier data is in a relational database format, but it
could be a file or a stream. Top-tier data must be well-structured,

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 2
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

must be validated, and must be structured in a way that allows for


easier data profiling and analytics.

CLEANING AND INTEGRATION


Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset.
When combining multiple data sources, there are many opportunities for
data to be duplicated or mislabeled. If data is incorrect, outcomes and
algorithms are unreliable, even though they may look correct. There is
no one absolute way to prescribe the exact steps in the data cleaning
process because the processes will vary from dataset to dataset. But it
is crucial to establish a template for your data cleaning process so you
know you are doing it the right way every time.

Data integration is the process of combining data from multiple


sources into a cohesive and consistent view. This process involves
identifying and accessing the different data sources, mapping the data
to a common format, and reconciling any inconsistencies or
discrepancies between the sources. The goal of data integration is to
make it easier to access and analyze data that is spread across
multiple systems or platforms, in order to gain a more complete and
accurate understanding of the data.
FUNCTIONALITIES OF DATAMINING
There is a lot of confusion between data mining and data analysis. Data
mining functions are used to define the trends or correlations contained
in data mining activities. While data analysis is used to test statistical
models that fit the dataset, for example, analysis of a marketing
campaign, data mining uses Machine Learning and mathematical and
statistical models to discover patterns hidden in the data. In comparison,
data mining activities can be divided into two categories:

o Descriptive Data Mining: It includes certain knowledge to


understand what is happening within the data without a previous idea.
The common data features are highlighted in the data set. For
example, count, average etc.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 3
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

o Predictive Data Mining: It helps developers to provide unlabeled


definitions of attributes. With previously available or historical data,
data mining can be used to make predictions about critical business
metrics based on data's linearity. For example, predicting the volume
of business next quarter based on performance in the previous
quarters over several years or judging from the findings of a patient's
medical examinations that is he suffering from any particular disease.

MULTIDIMENSIONAL MODEL

A multidimensional model views data in the form of a data-cube. A data


cube enables data to be modeled and viewed in multiple dimensions. It
is defined by dimensions and facts.

The dimensions are the perspectives or entities concerning which an


organization keeps records. For example, a shop may create a sales
data warehouse to keep records of the store's sales for the dimension
time, item, and location. These dimensions allow the save to keep track
of things, for example, monthly sales of items and the locations at which
the items were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For example,
a dimensional table for an item may contain the attributes item_name,
brand, and type.

A multidimensional data model is organized around a central theme, for


example, sales. This theme is represented by a fact table. Facts are
numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.

DATAWARE
A data warehouse is a repository of data from an organization's
operational systems and other sources that supports analytics
applications to help drive business decision-making. Data warehousing
is a key part of an overall data management strategy: The data stored in
data warehouses is processed and organized for analysis by business
analysts, executives, data scientists and other users.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 4
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

Typically, a data warehouse is a relational database or columnar


database housed on a computer system in an on-premises data center
or, increasingly, the cloud. Data from online transaction processing
(OLTP) applications and additional internal or external sources is
extracted and consolidated in the data warehouse for business
intelligence (BI) uses that include ad hoc querying, decision support and
enterprise reporting. Users access the data through BI software and
other types of analytics tools.

GRAPHIC USER INTERFACE


A graphical user interface (GUI) is a digital interface in which a user
interacts with graphical components such as icons, buttons, and menus.
In a GUI, the visuals displayed in the user interface convey information
relevant to the user, as well as actions that they can take.

QUERY LANGUAGE
The Data Mining Query Language (DMQL) was proposed by Han, Fu,
Wang, et al. for the DBMiner data mining system. The Data Mining
Query Language is actually based on the Structured Query Language
(SQL). Data Mining Query Languages can be designed to support ad
hoc and interactive data mining. This DMQL provides commands for
specifying primitives. The DMQL can work with databases and data
warehouses as well. DMQL can be used to define data mining tasks.

DECISION TREE
A decision tree is a non-parametric supervised learning algorithm,
which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches,
internal nodes and leaf nodes.

TRANSFORMATION AND REDUCTION


Data transformation in data mining refers to the process of converting
raw data into a format that is suitable for analysis and modeling. The

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 5
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

goal of data transformation is to prepare the data for data mining so


that it can be used to extract useful insights and knowledge.

Data reduction is a capacity optimization technique in which data is


reduced to its simplest possible form to free up capacity on a storage
device. There are many ways to reduce data, but the idea is very
simple—squeeze as much data into physical storage as possible to
maximize capacity.

CONCEPT OF HIERARCHY GENERATION

A concept hierarchy represents a series of mappings from a set of low-


level concepts to larger-level, more general concepts. Concept hierarchy
organizes information or concepts in a hierarchical structure or a specific
partial order, which are used for defining knowledge in brief, high-level
methods, and creating possible mining knowledge at several levels of
abstraction.

A conceptual hierarchy includes a set of nodes organized in a tree,


where the nodes define values of an attribute known as concepts. A
specific node, “ANY”, is constrained for the root of the tree. A number is
created to the level of each node in a conceptual hierarchy. The level of
the root node is one. The level of a non-root node is one more the level
of its parent level number.

Because values are defined by nodes, the levels of nodes can also be
used to describe the levels of values. Concept hierarchy enables raw
information to be managed at a higher and more generalized level of
abstraction. There are several types of concept hierarchies which are as
follows −

Schema Hierarchy − Schema hierarchy represents the total or partial


order between attributes in the database. It can define existing semantic
relationships between attributes. In a database, more than one schema
hierarchy can be generated by using multiple sequences and grouping
of attributes.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 6
DATA MINING AND DATA WAREHOUSE STUDY MATERIAL

Set-Grouping Hierarchy − A set-grouping hierarchy constructs values


for a given attribute or dimension into groups or constant range values. It
is also known as instance hierarchy because the partial series of the
hierarchy is represented on the set of instances or values of an attribute.
These hierarchies have more functional sense and are so approved than
other hierarchies.

Operation-Derived Hierarchy − Operation-derived hierarchy is


represented by a set of operations on the data. These operations are
defined by users, professionals, or the data mining system. These
hierarchies are usually represented for mathematical attributes. Such
operations can be as easy as range value comparison, as difficult as a
data clustering and data distribution analysis algorithm.

Rule-based Hierarchy − In a rule-based hierarchy either a whole


concept hierarchy or an allocation of it is represented by a set of rules
and is computed dynamically based on the current information and rule
definition. A lattice-like architecture is used for graphically defining this
type of hierarchy, in which each child-parent route is connected with a
generalization rule.

The static and dynamic generation of concept hierarchy is based on data


sets. In this context, the generation of a concept hierarchy depends on a
static or dynamic data set is known as the static or dynamic generation
of concept hierarchy.

SAMEEKSHA PANDEY
B.TECH+M.TECH(CSE,PSIT KANPUR)
ASSISTANT PROFESSOR SANSKRITI UNIVERSITY,SOEIT(CSE) Page 7

You might also like