0% found this document useful (0 votes)

33 views89 pages

Data Warehousing Notes

The document discusses the importance of security, backup, and recovery in data warehousing, emphasizing the need for proper data and user classification, auditing, and network security. It outlines various backup methods, including hardware, software, and OLAP operations, while also addressing the challenges of tuning and testing data warehouses. Additionally, it highlights the significance of performance assessment and the necessity of thorough testing across different levels to ensure system efficiency and reliability.

Uploaded by

abdn89571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views89 pages

Data Warehousing Notes

Uploaded by

abdn89571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Data Mining Interface, Security,

Backup and Recovery

Data Warehousing - Security

• The objective of a data warehouse is to make large amounts

of data easily accessible to the users, hence allowing the users
to extract information about the business as a whole.
• But we know that there could be some security restrictions
applied on the data that can be an obstacle for accessing the
information.
• If the analyst has a restricted view of data, then it is
impossible to capture a complete picture of the trends within
the business.
Security Requirements
• We should consider the following possibilities during the
design phase.
– Whether the new data sources will require new security and/or audit
restrictions to be implemented?
– Whether the new users added who have restricted access to data that
is already generally available?
• The following activities get affected by security measures −
– User access
– Data load
– Data movement
– Query generation
User Access
• We need to first classify the data and then classify the users
on the basis of the data they can access.
• Data Classification
• The following two approaches can be used to classify the data
– Data can be classified according to its sensitivity.
– Data can also be classified according to the job function.
• User classification
– The following approaches can be used to classify the users
– Users can be classified as per the hierarchy of users in an organization,
i.e., users can be classified by departments, sections, groups, and so
on.
– Users can also be classified according to their role, with people
grouped across departments based on their role.
Audit Requirements
• Auditing is a subset of security, a costly activity. Auditing can
cause heavy overheads on the system.
• To complete an audit in time, we require more hardware and
therefore, it is recommended that wherever possible, auditing
should be switched off.
• Audit requirements can be categorized as follows −
– Connections
– Disconnections
– Data access
– Data change
Network Requirements
• Network security is as important as other securities. We
cannot ignore the network security requirement.
• We need to consider the following issues −
– Is it necessary to encrypt data before transferring it to the data
warehouse?
– Are there restrictions on which network routes the data can take?
Data Movement
• Suppose we need to transfer some restricted data as a flat file
to be loaded.
• When the data is loaded into the data warehouse, the
following questions are raised −
– Where is the flat file stored?
– Who has access to that disk space?
– Do you backup encrypted or decrypted versions?
– Do these backups need to be made to special tapes that are stored
separately?
– Who has access to these tapes?
– Where is that temporary table to be held?
– How do you make such table visible?
Documentation

• The audit and security requirements need to be properly

documented.
• This will be treated as a part of justification. This document
can contain all the information gathered from −
– Data classification
– User classification
– Network requirements
– Data movement and storage requirements
– All auditable actions
Impact of Security on Design

• Security affects the application code and the development

timescales.
• Security affects the following area −
– Application development
– Database design
– Testing
Hardware Backup
• It is important to decide which hardware to use for the
backup.
• The speed of processing the backup and restore depends on
the hardware being used, how the hardware is connected,
bandwidth of the network, backup software, and the speed of
server's I/O system.
• Tape Technology
– Disk Backups
– Tape Technology
• The tape choice can be categorized as follows −
– Tape media
– Standalone tape drives
– Tape stackers
– Tape silos
Disk Backups
• Methods of disk backups are −
– Disk-to-disk backups
– Mirror breaking
Disk-to-Disk Backups
• Here backup is taken on the disk rather on the tape. Disk-to-
disk backups are done for the following reasons −
– Speed of initial backups
– Speed of restore
Mirror Breaking
• The idea is to have disks mirrored for resilience during the
working day. When backup is required, one of the mirror sets
can be broken out. This technique is a variant of disk-to-disk
backups.
Software Backups
• There are software tools available that help in the backup process.
These software tools come as a package.
• These tools not only take backup, they can effectively manage and
control the backup strategies.

• The criteria for choosing the best software package are listed below
– How scalable is the product as tape drives are added?
– Does the package have client-server option, or must it run on the
database server itself?
– Will it work in cluster and MPP environments?
– What degree of parallelism is required?
– What platforms are supported by the package?
– Does the package support easy access to information about tape
contents?
– Is the package database aware?
– What tape drive and tape media are supported by the package?
Data Warehousing - OLAP
Introduction
Online Analytical Processing Server (OLAP) is based on
the multidimensional data model.
It allows managers, and analysts to get an insight of the
information through fast, consistent, and interactive
access to information.
Types of OLAP Servers
We have four types of OLAP servers −
1. Relational OLAP (ROLAP)
2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)
4. Specialized SQL Servers
Relational OLAP

• ROLAP servers are placed between relational back-

end server and client front-end tools.
• To store and manage warehouse data, ROLAP uses
relational or extended-relational DBMS.
• ROLAP includes the following −
– Implementation of aggregation navigation logic.
– Optimization for each DBMS back end.
– Additional tools and services.
Multidimensional OLAP

• MOLAP uses array-based multidimensional storage engines

for multidimensional views of data.

• With multidimensional data stores, the storage utilization may

be low if the data set is sparse.

• Therefore, many MOLAP server use two levels of data storage

representation to handle dense and sparse data sets.
Hybrid OLAP

• Hybrid OLAP is a combination of both ROLAP and MOLAP.

• It offers higher scalability of ROLAP and faster computation of

MOLAP.
• HOLAP servers allows to store the large data volumes of
detailed information.

• The aggregations are stored separately in MOLAP store.

Specialized SQL Servers

• Specialized SQL servers provide advanced query language and

query processing support for SQL queries over star and
snowflake schemas in a read-only environment.
OLAP Operations

• OLAP servers are based on multidimensional view of data.

• Here is the list of OLAP operations −
1. Roll-up
2. Drill-down
3. Slice and dice
4. Pivot (rotate)
Roll-up
• Roll-up performs aggregation on a data cube in any of the
following ways −
– By climbing up a concept hierarchy for a dimension
– By dimension reduction
• Roll-up is performed by climbing up a concept hierarchy for the
dimension location.
• Initially the concept hierarchy was "street < city < province <
country".
• On rolling up, the data is aggregated by ascending the location
hierarchy from the level of city to the level of country.
• The data is grouped into cities rather than countries.
• When roll-up is performed, one or more dimensions from the
data cube are removed.
Drill-down
• Drill-down is the reverse operation of roll-up.
• It is performed by either of the following ways
– By stepping down a concept hierarchy for a dimension
– By introducing a new dimension.
– Drill-down is performed by stepping down a concept
hierarchy for the dimension time.
– Initially the concept hierarchy was "day < month < quarter
< year."
– On drilling down, the time dimension is descended from
the level of quarter to the level of month.
– When drill-down is performed, one or more dimensions
from the data cube are added.
– It navigates the data from less detailed data to highly
detailed data.
Slice
• The slice operation selects one particular dimension from a
given cube and provides a new sub-cube.
– Here Slice is performed for the dimension "time" using the criterion
time = "Q1".
– It will form a new sub-cube by selecting one or more dimensions.
Dice
• Dice selects two or more dimensions from a given cube and
provides a new sub-cube.
Pivot
• The pivot operation is also known as rotation.
• It rotates the data axes in view in order to provide an
alternative presentation of data.
Data Warehousing – Tuning and
Testing
Data Warehousing - Tuning
• A data warehouse keeps evolving and it is unpredictable what
query the user is going to post in the future. Therefore it
becomes more difficult to tune a data warehouse system.
• Tuning a data warehouse is a difficult procedure due to
following reasons −
– Data warehouse is dynamic; it never remains constant.
– It is very difficult to predict what query the user is going to post in the
future.
– Business requirements change with time.
– Users and their profiles keep changing.
– The user can switch from one group to another.
– The data load on the warehouse also changes with time.
Performance Assessment
• Here is a list of objective measures of performance −
– Average query response time
– Scan rates
– Time used per day query
– Memory usage per process
– I/O throughput rates
• Following are the points to remember.
– It is necessary to specify the measures in service level agreement (SLA).
– It is of no use trying to tune response time, if they are already better than
those required.
– It is essential to have realistic expectations while making performance
assessment.
– It is also essential that the users have feasible expectations.
– To hide the complexity of the system from the user, aggregations and
views should be used.
– It is also possible that the user can write a query you had not tuned for.
Data Load Tuning
There are various approaches of tuning data load that are
discussed below
– The very common approach is to insert data using the SQL
Layer
– The second approach is to bypass all these checks and
constraints and place the data directly into the
preformatted blocks.
– The third approach is that while loading the data into the
table that already contains the table, we can maintain
indexes.
– The fourth approach says that to load the data in tables
that already contain data, drop the indexes & recreate
them when the data load is complete.
Integrity Checks
• Integrity checking highly affects the performance of the load.
Following are the points to remember −
• Integrity checks need to be limited because they require
heavy processing power.
• Integrity checks should be applied on the source system to
avoid performance degrade of data load.
Tuning Queries
• We have two kinds of queries in data warehouse −
– Fixed queries
– Ad hoc queries

• Fixed queries are well defined. Following are the

examples of fixed queries −
– regular reports
– Canned queries
– Common aggregations
• Tuning the fixed queries in a data warehouse is same as
in a relational database system.
Ad hoc Queries
• To understand ad hoc queries, it is important to know the ad hoc
users of the data warehouse.
• For each user or group of users, you need to know the following −
– The number of users in the group
– Whether they use ad hoc queries at regular intervals of time
– Whether they use ad hoc queries frequently
– Whether they use ad hoc queries occasionally at unknown intervals.
– The maximum size of query they tend to run
– The average size of query they tend to run
– Whether they require drill-down access to the base data
– The elapsed login time per day
– The peak time of daily usage
– The number of queries they run per peak hour
Points to Note
– It is important to track the user's profiles and identify
the queries that are run on a regular basis.
– It is also important that the tuning performed does
not affect the performance.
– Identify similar and ad hoc queries that are frequently
run.
– If these queries are identified, then the database will
change and new indexes can be added for those
queries.
– If these queries are identified, then new aggregations
can be created specifically for those queries that
would result in their efficient execution.
Data Warehousing - Testing
• Testing is very important for data warehouse
systems to make them work correctly and
efficiently.
• There are three basic levels of testing
performed on a data warehouse −
– Unit testing
– Integration testing
– System testing
Unit Testing
• In unit testing, each component is separately
tested.
• Each module, i.e., procedure, program, SQL
Script, Unix shell is tested.
• This test is performed by the developer.
Integration Testing
• In integration testing, the various modules of
the application are brought together and then
tested against the number of inputs.
• It is performed to test whether the various
components do well after integration.
System Testing
– In system testing, the whole data warehouse
application is tested together.
– The purpose of system testing is to check whether
the entire system works correctly together or not.
– System testing is performed by the testing team.
– Since the size of the whole data warehouse is very
large, it is usually possible to perform minimal
system testing before the test plan can be
enacted.
System Testing
– In system testing, the whole data warehouse
application is tested together.
– The purpose of system testing is to check whether
the entire system works correctly together or not.
– System testing is performed by the testing team.
– Since the size of the whole data warehouse is very
large, it is usually possible to perform minimal
system testing before the test plan can be
enacted.
Testing Backup Recovery
• Testing the backup recovery strategy is extremely
important. Here is the list of scenarios for which
this testing is needed −
– Media failure
– Loss or damage of table space or data file
– Loss or damage of redo log file
– Loss or damage of control file
– Instance failure
– Loss or damage of archive file
– Loss or damage of table
– Failure during data failure
Testing Operational Environment
• Security − A separate security document is required for security testing.
This document contains a list of disallowed operations and devising tests
for each.
• Scheduler − Scheduling software is required to control the daily
operations of a data warehouse. It needs to be tested during system
testing. The scheduling software requires an interface with the data
warehouse, which will need the scheduler to control overnight processing
and the management of aggregations.
• Disk Configuration. − Disk configuration also needs to be tested to identify
I/O bottlenecks. The test should be performed with multiple times with
different settings.
• Management Tools. − It is required to test all the management tools
during system testing. Here is the list of tools that need to be tested.
– Event manager
– System manager
– Database manager
– Configuration manager
– Backup recovery manager
Testing the Database
• Testing the database manager and monitoring tools − To test
the database manager and the monitoring tools, they should
be used in the creation, running, and management of test
database.
• Testing database features − Here is the list of features that we
have to test −
– Querying in parallel
– Create index in parallel
– Data load in parallel
• Testing database performance − Query execution plays a very
important role in data warehouse performance measures.
There are sets of fixed queries that need to be run regularly
and they should be tested.
Testing the Application

• All the managers should be integrated correctly and work in

order to ensure that the end-to-end load, index, aggregate
and queries work as per the expectations.

– Each function of each manager should work correctly

– It is also necessary to test the application over a period of time.
– Week end and month-end tasks should also be tested.
– Overnight processing
– Query performance
Logistic of the Test
• The aim of system test is to test all of the following areas −

– Scheduling software
– Day-to-day operational procedures
– Backup recovery strategy
– Management and scheduling tools
– Overnight processing
– Query performance
Warehousing applications and
Recent Trends
Trends in Data Mining
• Data mining concepts are still evolving and here are the latest
trends that we get to see in this field −
– Application Exploration.
– Scalable and interactive data mining methods.
– Integration of data mining with database systems, data warehouse
systems and web database systems.
– Standardization of data mining query language.
– Visual data mining.
– New methods for mining complex types of data.
– Biological data mining.
– Data mining and software engineering.
– Web mining.
– Distributed data mining.
– Real time data mining.
– Multi database data mining.
– Privacy protection and information security in data mining.
Web Mining
• Web is a collection of inter-related files on one or more Web
servers.
• Web mining is the application of data mining techniques to
extract knowledge from Web data.
• Web data is :
– Web content – text, image, records, etc.
– Web structure – hyperlinks, tags, etc.
– Web usage – http logs, app server logs, etc.
Web Mining Taxonomy
Pre-processing Web Data
• ‰
Web Content Web Content:
– Extract “snippets” from a Web document that
represents the Web Document ‰
• Web Structure Web Structure
– Identifying interesting graph patterns or preprocessing
the whole web graph to come up with metrics such as
PageRank ‰
• Web Usage Web Usage
– User identification, session creation, robot detection
and filtering, and extracting usage path patterns
Pre-processing Web Data
• ‰
Web Content Web Content:
– Extract “snippets” from a Web document that
represents the Web Document ‰
• Web Structure Web Structure
– Identifying interesting graph patterns or preprocessing
the whole web graph to come up with metrics such as
PageRank ‰
• Web Usage Web Usage
– User identification, session creation, robot detection
and filtering, and extracting usage path patterns
Web Content Mining
• Web Content Mining is the process of extracting useful
information from the contents of Web documents.
– Content data corresponds to the collection of facts a Web page was
designed to convey to the users.
– It may consist of text, images, audio, video, or structured records such
as lists and tables.
• Research activities in this field also involve using techniques
from other disciplines such as Information Retrieval (IR) and
natural language processing (NLP).
Pre-processing Content
• Preparation:
– Extract text from HTML.
– Perform Stemming.
– Remove Stop Words.
– Calculate Collection Wide Word Frequencies (DF).
– Calculate per Document Term Frequencies (TF).
• Vector Creation:
– Common Information Retrieval Technique.
– Each document (HTML page) is represented by a sparse vector
of term weights.
– TFIDF weighting is most common.
– Typically, additional weight is given to terms appearing as
keywords or in titles
Common Mining Techniques
• The more basic and popular data mining
techniques include: ™
– Classification ™
– Clustering ™
– Associations
• The other significant ideas: ™
– Topic Identification, tracking and drift analysis
– ™Concept hierarchy creation
– ™Relevance of content.
Web Structure Mining
• The structure of a typical Web graph consists of Web pages as
nodes, and hyperlinks as edges connecting between two
related pages

• Web Structure Mining can be is the process of discovering

structure information from the Web
– This type of mining can be performed either at the (intra-page)
document level or at the (inter-page) hyperlink level
– The research at the hyperlink level is also called Hyperlink Analysis
Motivation to study Hyperlink
Structure
• ™Hyperlinks serve two main purposes.
– Pure Navigation.
– Point to pages with authority on the same topic of
the page containing the link. ™
• This can be used to retrieve useful information
from the web.
Web Structure Terminology
• Directed Path:
– A sequence of links, starting from p that can be followed to reach q. ‰
• Shortest Path:
– Of all the paths between nodes p and q, which has the shortest length,
i.e. number of links on it. ‰
• Diameter:
– The maximum of all the shortest paths between a pair of nodes p and
q, for all pairs of nodes p and q in the Web-graph.
Web Usage Mining
Discovery of meaningful patterns from data generated by
client-server transactions on one or more Web localities

• Typical Sources of Data

– automatically generated data stored in server access logs,
referrer logs, agent logs, and client-side cookies
– user profiles
– meta data: page attributes, content attributes, usage data
Issues in Usage Data
• S™ession Identification ™
• Common Gateway Interface Data ™
• Caching ™
• Dynamic Pages ™
• Robot Detection and Filtering ™
• Transaction Identification
• Identify Unique Users
• Identify Unique User transaction
Introduction to Spatial Data
Mining
What is spatial Data
• Spatial Data is also known as geospatial data or geographic
information.
• It is the data or information that identifies the geographic
location of features and boundaries on Earth, such as natural
or constructed features, oceans, and more.
• Spatial data is usually stored as coordinates and topology, and
is data that can be mapped.
• A common example of spatial data can be seen in a road
map.
• A road map is a two-dimensional object that contains points,
lines, and polygons that can represent cities, roads, and
political boundaries such as states or provinces.
Non Spatial Data
• The non spatial data are numbers, characters or
logical type.
• Here data are typically non spatial in nature as it
directly or indirectly, does not refer to any location.
• In another example consider a table containing
population information for specific locations say city
,districts or provinces.
What is a Spatial Pattern ?
What is not a pattern?
• Random, haphazard, chance, stray, accidental, unexpected
• Without definite direction, trend, rule, method, design,
aim, purpose
• Accidental - without design, outside regular course of
things
• Casual - absence of pre-arrangement, relatively
unimportant
• Fortuitous - What occurs without known cause
What is a Spatial Pattern ?
What is a Pattern?
• A frequent arrangement, configuration, composition,
regularity

• A rule, law, method, design, description

• A major direction, trend, prediction

• A significant surface irregularity or unevenness

What is Spatial Data Mining?
• Search for spatial patterns
• Non-trivial search - as “automated” as
possible—reduce human effort
• Interesting, useful and unexpected spatial
pattern
What is Spatial Data Mining?
Non-trivial Search
– Large (e.g. exponential) search space of plausible hypothesis
– Ex. Asiatic cholera : causes: water, food, air, insects, …; water
delivery mechanisms - numerous pumps, rivers, ponds, wells,
pipes, ...
Interesting
– Useful in certain application domain
– Ex. Shutting off identified Water pump => saved human life
Unexpected
– Pattern is not common knowledge
– May provide a new understanding of world
– Ex. Water pump - Cholera connection lead to the “germ” theory
What is NOT Spatial Data Mining?
Simple Querying of Spatial Data
– Find neighbors of Canada given names and
boundaries of all countries
– Find shortest path from Boston to Houston in a
freeway map Search space is not large (not
exponential)
Testing a hypothesis via a primary data analysis
– Ex. Female chimpanzee territories are smaller than
male territories Search space is not large !
– SDM: secondary data analysis to generate multiple
plausible hypotheses
What is NOT Spatial Data Mining?
Uninteresting or obvious patterns in spatial data
– Heavy rainfall in Minneapolis is correlated with heavy
rainfall in St. Paul, Given that the two cities are 10
miles apart.
– Common knowledge: Nearby places have similar
rainfall
Mining of non-spatial data
Diaper sales and beer sales are correlated in evenings
GPS product buyers are of 3 kinds:
• outdoors enthusiasts, farmers, technology enthusiasts
Why Learn about Spatial Data Mining?
Two basic reasons for new work
– Consideration of use in certain application domains
– Provide fundamental new understanding
Application domains
• Scale up secondary spatial (statistical) analysis to very large
datasets
– Describe/explain locations of human settlements in last 5000
years
– Find cancer clusters to locate hazardous environments
– Prepare land-use maps from satellite imagery •
– Predict habitat suitable for endangered species
Find new spatial patterns
– Find groups of co-located geographic features
Why Learn about Spatial Data Mining?
New understanding of geographic processes for
Critical questions
– Ex. How is the health of planet Earth?
– Ex. Characterize effects of human activity on
environment and ecology
– Ex. Predict effect of El Nino on weather, and economy
Traditional approach: manually generate and
test hypothesis
• But, spatial data is growing too fast to analyze
manually
– Satellite imagery, GPS tracks, sensors on highways, …
Why Learn about Spatial Data Mining?
Number of possible geographic hypothesis too
large to explore manually
– Large number of geographic features and locations
– Number of interacting subsets of features grow
exponentially
– Ex. Find tele connections between weather events
across ocean and land areas
SDM may reduce the set of plausible hypothesis
– Identify hypothesis supported by the data
– For further exploration using traditional statistical
methods
Spatial Data Mining
Spatial Patterns
– Hot spots, Clustering, trends, …
– Spatial outliers
– Location prediction
– Associations, co-locations
Primary Tasks
– Spatial Data Clustering Analysis
– Spatial Outlier Analysis
– Mining Spatial Association Rules
– Spatial Classification and Prediction
Example:
• Unusual warming of Pacific ocean (El Nino) affects
weather in USA…
Spatial Data Mining
• Spatial data mining follows along the same functions in
data mining, with the end objective to find patterns in
geography, meteorology, etc.
• The main difference: spatial autocorrelation
– the neighbors of a spatial object may have an influence on
it and therefore have to be considered as well
• Spatial attributes
– Topological
• adjacency or inclusion information
– Geometric
• position (longitude/latitude), area, perimeter, boundary polygon
27 Example What Kind of Houses Are Highly
Temporal Data Mining
INTRODUCTION
Temporal Data Mining is a rapidly evolving area of
research that is at the intersection of several
disciplines including:
– Statistics (e.g., time series analysis)
– Temporal pattern recognition
– Temporal databases
– Optimisation
– Visualisation
– High-performance computing
– Parallel computing
DEFINITION OF TEMPORAL DATA
MINING
• Temporal Data Mining is a single step in the process
of Knowledge Discovery in Temporal Databases.
• Temporal Database tuples associated with time
attributes.
• It enumerates structures (temporal patterns or
models) over the temporal data.
• Any algorithm that enumerates temporal patterns
from, or fits models to, temporal data is a Temporal
Data Mining Algorithm.
Temporal Data Mining Tasks
Temporal data mining tasks include:
• Temporal data characterization and discrimination
• Temporal clustering analysis
• Temporal classification
• Temporal association rules
• Temporal pattern analysis
• Temporal prediction and trend analysis
Temporal Data Mining Tasks
A new temporal data model (supporting time
granularity and time-hierarchies) may need to
be developed based on:
– Temporal data structures
– Temporal semantics.
Temporal Data Mining Tasks
A new temporal data mining concept may need to be
developed based on the following issues:

– The task of temporal data mining can be seen as a problem

of extracting an interesting part of the logical theory of a
model
– The theory of a model may be formulated in a logical
formalism able to express q.
TEMPORAL DATA MINING TECHNIQUES

1. Classification in Temporal Data Mining

– The basic goal of temporal classification is to predict
temporally related fields in a temporal database based
on other fields.
– The problem in general is cast as determining the
most likely value of the temporal variable being
predicted given the other fields
– The training data in which the target variable is given
for each observation, and a set of assumptions
representing one’s prior knowledge of the problem.
– Temporal classification techniques are also related to
the difficult problem of density estimation.
TEMPORAL DATA MINING TECHNIQUES

2. Temporal Cluster Analysis:

– Temporal clustering according to similarity is a concept
which appears in many disciplines, so there are two basic
approaches to analyze it.
1. Measure of temporal similarity approach and
2. Temporal optimal partition approach
If the number of clusters is given, then clustering
techniques can be divided into three classes:
1. Metric-distance based technique
2. Model-based technique
3. Partition-based technique.

DWDM-Unit-5 Notes Mr. Rohit Pratap Singh
No ratings yet
DWDM-Unit-5 Notes Mr. Rohit Pratap Singh
51 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
DWH Unit 1
No ratings yet
DWH Unit 1
12 pages
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
No ratings yet
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
16 pages
Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Unit 3 DW
No ratings yet
Unit 3 DW
91 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
2-Data Warehouse Architecture - Three-Tier Data Warehouse Architecture-16!12!2024
No ratings yet
2-Data Warehouse Architecture - Three-Tier Data Warehouse Architecture-16!12!2024
30 pages
MCS221 Data Warehousing
No ratings yet
MCS221 Data Warehousing
102 pages
HAJJATII
No ratings yet
HAJJATII
11 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
DBMS, Data Warehousing and Data Mining
No ratings yet
DBMS, Data Warehousing and Data Mining
31 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
DMDW Mid 1 Solution
No ratings yet
DMDW Mid 1 Solution
29 pages
Unit - 4 Final
No ratings yet
Unit - 4 Final
71 pages
ERP: OLTP, OLAP, and Data Warehousing
No ratings yet
ERP: OLTP, OLAP, and Data Warehousing
5 pages
Unit 2
No ratings yet
Unit 2
34 pages
DW Olap
No ratings yet
DW Olap
57 pages
Current Trends
No ratings yet
Current Trends
35 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
Understanding OLAP: Types and Operations
No ratings yet
Understanding OLAP: Types and Operations
63 pages
Unit - II Data Warehouseing&OLAP
No ratings yet
Unit - II Data Warehouseing&OLAP
17 pages
Bi Unit 4
No ratings yet
Bi Unit 4
40 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
What Are The Principal Tools and Technologies For Accessing Information From Databases To Improve Business Performance and Decision Making
100% (1)
What Are The Principal Tools and Technologies For Accessing Information From Databases To Improve Business Performance and Decision Making
4 pages
Unit 2 - Data Science BCA
No ratings yet
Unit 2 - Data Science BCA
20 pages
AFM - Module 1-2
No ratings yet
AFM - Module 1-2
45 pages
Data Warehouse Modeling Guide
No ratings yet
Data Warehouse Modeling Guide
17 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
R18CSE4102-UNIT 1 Data Mining Notes
No ratings yet
R18CSE4102-UNIT 1 Data Mining Notes
26 pages
Business Analytics For Decision Making 3-6
No ratings yet
Business Analytics For Decision Making 3-6
31 pages
DSS Course in English
No ratings yet
DSS Course in English
17 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
56 pages
05 Data Warehousing Process Overview
No ratings yet
05 Data Warehousing Process Overview
31 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
Data Mining
No ratings yet
Data Mining
26 pages
DM Unit 2
No ratings yet
DM Unit 2
19 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
Unit 3
No ratings yet
Unit 3
53 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
12 pages
Data Base Management Sysytem
No ratings yet
Data Base Management Sysytem
26 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Harness Business Data
No ratings yet
Harness Business Data
46 pages
By Bi Jay Mishra
100% (1)
By Bi Jay Mishra
685 pages
Designing The Data Warehouse - Part 1
100% (2)
Designing The Data Warehouse - Part 1
45 pages
Module 1
No ratings yet
Module 1
71 pages
Adbms: Data Warehousing OLAP Technology
No ratings yet
Adbms: Data Warehousing OLAP Technology
57 pages
Data Warehousing and Decision Support
No ratings yet
Data Warehousing and Decision Support
8 pages
LM2575
No ratings yet
LM2575
25 pages
New Ford Figo Features & Specs
No ratings yet
New Ford Figo Features & Specs
22 pages
A Company Provides Vce Files Free and Valid Dumps PDF
No ratings yet
A Company Provides Vce Files Free and Valid Dumps PDF
10 pages
RDX QuikStation 4 Quick Start Guide
No ratings yet
RDX QuikStation 4 Quick Start Guide
2 pages
Icet Inst English
No ratings yet
Icet Inst English
8 pages
Started On State Completed On Time Taken Marks Grade Feedback
100% (1)
Started On State Completed On Time Taken Marks Grade Feedback
4 pages
E&H SIL Poster
No ratings yet
E&H SIL Poster
1 page
Definitive Guide To Enterprise Container Platforms: OCTOBER 2020
No ratings yet
Definitive Guide To Enterprise Container Platforms: OCTOBER 2020
16 pages
KIRAN's Resume
No ratings yet
KIRAN's Resume
1 page
DIY Smart Home with Arduino
100% (1)
DIY Smart Home with Arduino
17 pages
Am-Stick-Wb: Part.-No. 349081
No ratings yet
Am-Stick-Wb: Part.-No. 349081
19 pages
Cisco Live Introduction To SRv6 uSID Technology-2
No ratings yet
Cisco Live Introduction To SRv6 uSID Technology-2
129 pages
Aep-84 Voli Eda v1 e
No ratings yet
Aep-84 Voli Eda v1 e
546 pages
GSM Network Aspects: Handover & Security
100% (1)
GSM Network Aspects: Handover & Security
5 pages
Fastener Standards
No ratings yet
Fastener Standards
8 pages
HOT 40 Pro User Manual
No ratings yet
HOT 40 Pro User Manual
1 page
2019 - Artigo - Static Structural Analysis of Pratt, Flink and Howe Steel Truss Using Ansys Software
No ratings yet
2019 - Artigo - Static Structural Analysis of Pratt, Flink and Howe Steel Truss Using Ansys Software
8 pages
Applied Machine Learning Course Guide
No ratings yet
Applied Machine Learning Course Guide
5 pages
Panvalet Users Guide: Number: 11.04.01 Effective: 08/01/01
No ratings yet
Panvalet Users Guide: Number: 11.04.01 Effective: 08/01/01
71 pages
Teltonika FMC125 Brochure
No ratings yet
Teltonika FMC125 Brochure
3 pages
Userguide Code Composer Slau132a
No ratings yet
Userguide Code Composer Slau132a
301 pages
5-7-6 FICHA TECNICA FUSIBLES TIPO K 15KV-signed
No ratings yet
5-7-6 FICHA TECNICA FUSIBLES TIPO K 15KV-signed
1 page
E-Book - 1.9
No ratings yet
E-Book - 1.9
776 pages
Using The 1783-NATR To Bridge Networks - The Automation Blog
No ratings yet
Using The 1783-NATR To Bridge Networks - The Automation Blog
5 pages
SWOT Analysis of Samsung Corporation LTD
100% (1)
SWOT Analysis of Samsung Corporation LTD
5 pages
Internship Report
No ratings yet
Internship Report
20 pages
Rana ..Muhammad Awais (19-ARID-1147) BSIT4B Evening Database Systems ASG07 Term Project
No ratings yet
Rana ..Muhammad Awais (19-ARID-1147) BSIT4B Evening Database Systems ASG07 Term Project
12 pages
Internship Report
No ratings yet
Internship Report
25 pages
Lorenzo PDF
No ratings yet
Lorenzo PDF
37 pages

Data Warehousing Notes

Uploaded by

Data Warehousing Notes

Uploaded by

Data Mining Interface, Security,

Backup and Recovery

• The objective of a data warehouse is to make large amounts

• The audit and security requirements need to be properly

• Security affects the application code and the development

• ROLAP servers are placed between relational back-

• MOLAP uses array-based multidimensional storage engines

• With multidimensional data stores, the storage utilization may

• Therefore, many MOLAP server use two levels of data storage

• Hybrid OLAP is a combination of both ROLAP and MOLAP.

• It offers higher scalability of ROLAP and faster computation of

• The aggregations are stored separately in MOLAP store.

• Specialized SQL servers provide advanced query language and

• OLAP servers are based on multidimensional view of data.

• Fixed queries are well defined. Following are the

• All the managers should be integrated correctly and work in

– Each function of each manager should work correctly

• Web Structure Mining can be is the process of discovering

• Typical Sources of Data

• A rule, law, method, design, description

• A major direction, trend, prediction

• A significant surface irregularity or unevenness

– The task of temporal data mining can be seen as a problem

1. Classification in Temporal Data Mining

2. Temporal Cluster Analysis:

You might also like