KEMBAR78
CSE, 7thsem, cs703b, DM&W UNIT-2 | PDF | Information Technology | Computer Engineering
0% found this document useful (0 votes)
14 views22 pages

CSE, 7thsem, cs703b, DM&W UNIT-2

Uploaded by

Raj Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

CSE, 7thsem, cs703b, DM&W UNIT-2

Uploaded by

Raj Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CS-703B DM&W AIST SAGAR

UNIT- 2 OLAP SYSTEM

(LECTURE:1-3)

OLAP System:

Basic Concepts-
OLAP (Online Analytical Processing) is the technology support the multidimensional view of data for
many Business Intelligence (BI) applications. OLAP provides fast, steady and proficient access,
powerful technology for data discovery, including capabilities to handle complex queries, analytical
calculations, and predictive “what if” scenario planning.
OLAP is a category of software technology that enables analysts, managers and executives to gaininsight
into data through fast, consistent, interactive access in a wide variety of possible views of information
that has been transformed from raw data to reflect the real dimensionality of the enterprise as
understood by the user. OLAP enables end-users to perform ad hoc analysis of data in multiple
dimensions, thereby providing the insight and understanding they need for better decision making.

CHARACTERISTICS OF OLAP SYSTEM


The need for more intensive decision support prompted the introduction of a new generation of tools.
Generally used to analyze the information where huge amount of historical data is stored. Those new
tools, called online analytical processing (OLAP), create an advanced data analysis environment that
supports decision making, business modeling, and operations research.
Its four main characteristics are:
1. Multidimensional data analysis techniques
2. Advanced database support
3. Easy to use end user interfaces
4. Support for client/server architecture.
1.Multidimensional Data Analysis Techniques:
Multidimensional analysis is inherently representative of an actual business model. The most distinctive
characteristic of modern OLAP tools is their capacity for multidimensional analysis (for example actual
vs budget). In multidimensional analysis, data are processed and viewed as part of a multidimensional
structure. This type of data analysis is particularly attractive to business decision makers because they
tend to view business data as data that are related to other business data.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 1
CS-703B DM&W AIST SAGAR

2.Advanced Database Support:


 For efficient decision support, OLAP tools must have advanced data access features.
Access too many different kinds of DBMSs, flat files, and internal and external data
sources.
 Access to aggregated data warehouse data as well as to the detail data found in
operationaldatabases.
 Advanced data navigation features such as drill-down and roll-up.
 Rapid and consistent query response times.

3.Easy-to-Use End-User Interface:


Advanced OLAP features become more useful when access to them is kept simple. OLAP tools have
equipped their sophisticated data extraction and analysis tools with easy-to-use graphical interfaces.
Many of the interface features are “borrowed” from previous generations of data analysis tools that are
already familiar to end users. This familiarity makes OLAP easily acceptedand readily used.
4.Client/Server Architecture:
Conform the system to the principals of Client/server architecture to provide a framework within which
new systems can be designed, developed, and implemented. The client/server environment enables an
OLAP system to be divided into several components that define its architecture. Those components can
then be placed on the same computer, or they can be distributed among several computers. Thus, OLAP
is designed to meet ease-of-use requirements while keeping the system flexible.

MOTIVATION FOR USING OLAP

 Understanding and improving sales: For an enterprise that has many products and uses a number
of channels for selling the products, OLAP can assist in finding the most popular products and
the most popular channels. In some cases it may be possible to find the most profitable
customers.
 Understanding and reducing costs of doing business: Improving sales is one aspect of
improving a business, the other aspect is to analyze costs and to control them as much as
possiblewithout affecting sales. OLAP can assist in analyzing the costs associated with sales.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 2
CS-703B DM&W AIST SAGAR

GUIDELINES FOR OLAP IMPLEMENTATION

Following are a number of guidelines for successful implementation of OLAP. The guidelines are,
somewhat similar to those presented for data warehouse implementation.
 Vision: The OLAP team must, in consultation with the users, develop a clear vision for the
OLAP system. This vision including the business objectives should be clearly defined,
understood,and shared by the stakeholders.
 Senior management support: The OLAP project should be fully supported by the senior
managers and multidimensional view of data. Since a data warehouse may have been developed
already, this should not be difficult.
 Selecting an OLAP tool: The OLAP team should familiarize themselves with the ROLAP and
MOLAP tools available in the market. Since tools are quite different, careful planning may
be required in selecting a tool that is appropriate for the enterprise. In some situations, a
combinationof ROLAP and MOLAP may be most effective.
 Corporate strategy: The OLAP strategy should fit in with the enterprise strategy and business
objectives. A good fit will result in the OLAP tools being used more widely.
 Focus on the users: The OLAP project should be focused on the users. Users should, in
consultation with the technical professional, decide what tasks will be done first and what will
be done later. Attempts should be made to provide each user with a tool suitable for that person’s
skill level and information needs. A good GUI user interface should be provided to non-technical
users. The project can only be successful with the full support of the users.
 Joint management: The OLAP project must be managed by both the IT and business
professionals. Many other people should be involved in supplying ideas. An appropriate
committeestructure may be necessary to channel these ideas.
 Review and adapt: As noted in last chapter, organizations evolve and so must the OLAP systems.
Regular reviews of the project may be required to ensure that the project is meeting the current
needs of the enterprise.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 3
CS-703B DM&W AIST SAGAR

OLTP VS. OLAP

Table 1 : Difference between OLAP and OLTP

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 4
CS-703B DM&W AIST SAGAR

Figure 1: OLAP vs OLTP

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 5
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:3)

OLAP QUERIES
OLAP queries are complex queries that:
Touch large amounts of data
Discover patterns and trends in the data
Typically expensive queries that take long time

Example-I:

Query Syntax: SELECT …GROUP BY ROLLUP ( grouping_Column_reference_list);


Example-II:

SELECT Time, Location, product, sum(revenue) AS Profit FROM sales GROUP BY ROLLUP (Time,
Location, product);

The Query calculates the standard aggregate values specified in the GROUP BY clause. Then, it creates
progressively higher-level subtotals, moving from right to left through the list of grouping columns.
Finally, it creates a grand total.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 6
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:4)

OLAP SERVERS
Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows
managers, and analysts to get an insight of the information through fast, consistent, and interactive
access to information.

Types of OLAP Servers


We have four types of OLAP servers −
 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers

Relational OLAP

ROLAP servers are placed between relational back-end server and client front-end tools. To store and
manage warehouse data, ROLAP uses relational or extended-relational DBMS.

ROLAP includes the following −


o Implementation of aggregation navigation logic
o Optimization for each DBMS back end
o Additional tools and services
o Can handle large amounts of data
o Performance can be slow
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data.
o Multidimensional data stores
o The storage utilization may be low if the data set is sparse.
o MOLAP server uses two levels of data storage representation to handle dense and
sparsedata sets.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 7
CS-703B DM&W AIST SAGAR

Hybrid OLAP
Hybrid OLAP technologies attempt to combine the advantages of MOLAP and ROLAP. It offers higher
scalability of ROLAP and faster computation of MOLAP. HOLAP servers allow storing the large data
volumes of detailed information. The aggregations are stored separately in MOLAPstore.

Specialized SQL Servers


Specialized SQL servers provide advanced query language and query processing support for SQL
queries over star and snowflake schemas in a read-only environment.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 8
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:5-6)
OLAP OPERATIONS:

Four types of analytical operations in OLAP are:


1. Roll-up
2. Drill-down
3. Slice and dice
4. Pivot (rotate)
1.Roll-up:
Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can be performed in 2
ways:
1. Reducing dimensions
2. Climbing up concept hierarchy. Concept hierarchy is a system of grouping things based
ontheir order or level.

Figure2: Roll-up operation

 In this example, cities New jersey and Lost Angles and rolled up into country USA
 The sales figure of New Jersey and Los Angeles are 440 and 1560 respectively.
 Theybecome 2000 after roll-up
 In this aggregation process, data is location hierarchy moves up from city to the country.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 9
CS-703B DM&W AIST SAGAR

 In the roll-up process at least one or more dimensions need to be removed.


 In this example, Quarter dimension is removed.

2.Drill-down:
In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be
done via
 Moving down the concept hierarchy
 Increasing a dimension

Figure 3: Drill-down operation

Consider the diagram above:


 Quarter Q1 is drilled down to months January, February, and March. Corresponding
salesare also registers.
 In this example, dimension months are added.
3.Slice:
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 10
CS-703B DM&W AIST SAGAR

Figure 4 : Slice operation

4.Dice:
This operation is similar to a slice. The difference in dice is to select 2 or more dimensions that result
in the creation of a sub-cube.

Figure 5: Dice operation

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 11
CS-703B DM&W AIST SAGAR

5.Pivot:
In Pivot, rotate the data axes to provide a substitute presentation of data.
In the following example, the pivot is based on item types.

Figure 6: Pivot operation

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 12
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:7-8)

HARDWARE AND OPERATIONAL DESIGN SERVER-


Two main hardware architectures
 Symmetric Multi-Processing (SMP)
 Massively Parallel Processing (MPP)
-SMP machine is a set of tightly coupled CPUs that share memory and disk.
-MPP machine is a set of loosely coupled CPUs, each of which has its own memory and disk.

Symmetric Multi-Processing (SMP)


An SMP machine is a set of CPU s that share memory and disk. This is sometimes called a shared-
everything environment the CPUs in an SMP machine are all equal a process can run on any CPU in the
machine, run on different CPUs at different times.

Figure7 : Symmetric Multi-Processing (SMP)

Scalability of SMP machines


Length of the communication bus connecting the CPUs is a natural limit. As the bus gets longer the
inter-process or communication speeds become slower, each extra CPU imposes an extra, band with
load on the bus, increases memory contention, and so on.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 13
CS-703B DM&W AIST SAGAR

Massively Parallel Processing (MPP)


Made up of many loosely coupled nodes linked together by a high-speed connection. Each node has its
own memory, and the disks are not shared. Most MPP systems allow a disk to be dual connected
between two nodes.

Figure 8 : Massively Parallel Processing (MPP)

Shared Nothing Environment


a) Nothing is shared directly. Many different models of MPP
 Nodes with two CPUs
-One CPU is dedicated to handling I/O
-Nodes that are genuine SMP machines
 Single CPU nodes
-Some will be dedicated to handling I/O and others to processing

VIRTUAL SHARED DISK (VSD)


 An extra layer of software to be added to allow disks to be shared across nodes
 System will suffer overheads if an I/O is issued and data has to be shipped from node
to node.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 14
CS-703B DM&W AIST SAGAR

DISTRIBUTED LOCK MANAGER


 MPP machines require this to maintain the integrity of the distributed resources
 Track which node's memory holds the current copy of each piece of information
 Getting the data rapidly from node to node

Impact to design
 User access via WAN – impacts the design of Query Manager
 Source system data transfer
 Data extractions
Example: Problem of getting the data from the source systems
It may not get the data to the warehouse system early enough to allow it to be loaded transformed,
processed and backed up within the overnight time window.
Guideline
 Ensure that the network architecture and bandwidth are capable of supporting the
datatransfer and any data extractions in an acceptable time.
 The transfer of data to be loaded must be complete quickly enough to allow the rest of
theovernight processing to complete.
CLIENT HARDWARE
 Those responsible for client machine management will need to know the requirements
forthat machine to access the data warehouse system.
 Details such as the network protocols supported on the server, and the server's
Internet address, will need to be supplied.
 If multiple access paths to the server system exist this information needs to be relayed
tothose responsible for the client systems.
CLIENT TOOLS
 The tool should not be allowed to affect the basic design of the warehouse itself.
 Multiple tools will be used against the data warehouse.
 Should be thoroughly tested and trialed to ensure that they are suitable for the users.
 Testing of the tools should ideally be performed in parallel with the data warehouse design:
 Usability issues to be exposed,
 Drive out any requirements that the tool will place on the data warehouse

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 15
CS-703B DM&W AIST SAGAR

Disk Technology:
RAID Technology
Redundant Array of Inexpensive Disks
 The purpose of RAID technology is to provide resilience against disk failure, so that
the loss of an individual disk does not mean loss of data.
 Striping is a technique in which data is spread across multiple disks.
 RAID levels 0, 1 and 5 are commercially viable and thus widely available

Table 2 : RAID Level with Descriptions

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 16
CS-703B DM&W AIST SAGAR

Figure 9: RAID Technology

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 17
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:9-10)

WHAT IS DATA WAREHOUSE SECURITY?


Data warehouses pull data from many different sources, and warehouses have many moving parts.
Security issues arise every time data moves from one place to another. Data warehouse security entails
taking the necessary steps to protect information so that only authorized personnel can access it.
Data warehouse security should involve the following:
 Strict user access control so that users of the warehouse can only access data they need
to do their jobs.
 Taking steps to secure networks on which data is held.
 Carefully moving data and considering the security implications involved in all data
movement.
Physical Security Practices
1. Restricting and controlling physical access to data warehouses has been made easy thanks to
tech innovations like biometric readers, anti-tailgating systems and other physical access
control mechanisms. These might look excessive and expenditure overhead, but they play a
crucial role inensuring the integrity and safety of the precious enterprise data.
2. Imparting information about security protocols and ensuring all the personnel in the
proximity of the data warehouse religiously obey and adhere to these rules is one of the keys to
success. It’s understandable that an employee can be used by intruders to gain access, but if the
employee in question is ardently following the specified guidelines it makes a world of
difference.
Software-Based Security Measures Data
Encryption
Data encryption is one of the primary safeguards against data theft. All data should be encrypted using
algorithms like AES (advanced encryption standard) or FIPS 140-2 certified software for data
encryption, whether it’s in the transactional database or the data warehouse. Some proponents would
argue that data encryption adversely affects the performance and data access speed of data centers, but
considering the alternative, it is preferred.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 18
CS-703B DM&W AIST SAGAR

Data Segmenting and Partitioning


Data encryption although an added security measure can be quite cumbersome if applied without
segmenting and partitioning. Segmenting and partitioning entail classifying or splitting data into
sensitive and non-sensitive information.
After going through partitioning the data should be accordingly encrypted and put into separate tables
ready for consumption.
Securing On-The-Move Data
Securing data in a single place and transit are two different ball games. Here data in transit means the
one which is being relayed from transactional databases in real time to the data warehouse. These
transactional databases can be anywhere geographically, therefore using protective protocols, such as
SSL or TSL is highly recommended. Cloud-based data warehouses nowadays provide a secure and
impenetrable tunnel between the database and the cloud storage which shouldbe leveraged.
Trusted Witness Server
As mentioned earlier, hackers and intruders nowadays have become as skilled and sophisticated as the
security measures they are up against. Implementing a trusted witness server is akin to hiring a watchdog
that avidly and quite tenaciously keeps vigil on your data access points. It can detect an unwarranted and
suspicious attempt at accessing data and generate an alert immediately. This allows the people
responsible for the data warehouse security to stop the intruders dead in their tracks.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 19
CS-703B DM&W AIST SAGAR

UNIT 2 – (LECTURE:11-12)

BACKUP AND RECOVERY OF DATA WAREHOUSE


Backup and recovery are among the most important tasks for an administrator, and data warehouses are
no different. However, because of the sheer size of the database, data warehouses introduce new
challenges for an administrator in the backup and recovery area.
Data warehouses are unique in that the data can come from a myriad of resources and it is transformed
before finally being inserted into the database; but mostly because it can be very large. Managing the
recovery of a large data warehouse can be a daunting task and traditional OLTP backup and recovery
strategies may not meet the needs of a data warehouse.
Strategies and Best Practices for Backup and Recovery
Devising a backup and recovery strategy can be a daunting task. And when you have hundreds of
gigabytes of data that must be protected and recovered in the case of a failure, the strategy can be very
complex.
The following best practices can help you implement your warehouse's backup and recovery strategy:
 Best Practice A: Use ARCHIVELOG Mode
 Best Practice B: Use RMAN
 Best Practice C: Use Read-Only Table spaces
 Best Practice D: Plan for NOLOGGING Operations
 Best Practice E: Not All Table spaces are Equally Important

Best Practice-A: Use ARCHIVELOG Mode


Archived redo logs are crucial for recovery when no data can be lost, because they constitute arecord of
changes to the database. Oracle can be run in either of two modes:
 ARCHIVELOG -- Oracle archives the filled online redo log files before reusing them
inthe cycle.
 NOARCHIVELOG -- Oracle does not archive the filled online redo log files before
reusingthem in the cycle.
Running the database in “ARCHIVELOG” mode has the following benefits:
 The database can be completely recovered from both instance and media failure.
 The user can perform backups while the database is open and available for use.

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 20
CS-703B DM&W AIST SAGAR

 Oracle supports multiplexed archive logs to avoid any possible single point of failure
onthe archive logs
 The user has more recovery options, such as the ability to perform tablespace-point-in-
timerecovery (TSPITR).
 Archived redo logs can be transmitted and applied to the physical standby database,
whichis an exact replica of the primary database.
 The database can be completely recovered from both instance and media
Running the database in “NOARCHIVELOG” mode has the following consequences:
1.The user can only back up the database while it is completely closed after a clean shutdown.
2.Typically, the only media recovery option is to restore the whole database.
3.which causesthe loss of all transactions since the last backup.
Best Practice-B: Use RMAN
There are many reasons to adopt RMAN. Some of the reasons to integrate RMAN into your backup and
recovery strategy are that it offers:
 Extensive reporting
 Incremental backups
 Downtime free backups
 Backup and restore validation
 Backup and restore optimization
 Easily integrates with media managers
 Block media recovery
 Archive log validation and management
 Corrupt block

VIDEO REFERANCE:
https://youtu.be/50gD4GTdv7U?si=YjxFK4OtZuoBJ2gD
https://youtu.be/V5qPIgd5gDs?si=jEhCoIF2gVTWKhPa
https://youtu.be/kJ2TsdT56L4?si=ChFyD643HMlW27SH
https://youtu.be/XX6T4Eb3wpQ?si=z1tSiCp3FxtJA96w

------------------------------------------------***-------------------------------------------------------------

Ms.Sadhvi Biltharey, Asst. Professor, Department of Computer Science & Engineering Page 21

You might also like