KEMBAR78
Distributed Databases | PDF | Databases | Database Transaction
0% found this document useful (0 votes)
16 views25 pages

Distributed Databases

The document discusses distributed database systems (DDBS), emphasizing their architecture, advantages, and complexities. It outlines the promises of DDBS technology, including transparent management, reliability, improved performance, and easier system expansion. Additionally, it addresses the challenges posed by distributed environments, such as data replication, synchronization, and autonomy among local systems.

Uploaded by

Marzia Borno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views25 pages

Distributed Databases

The document discusses distributed database systems (DDBS), emphasizing their architecture, advantages, and complexities. It outlines the promises of DDBS technology, including transparent management, reliability, improved performance, and easier system expansion. Additionally, it addresses the challenges posed by distributed environments, such as data replication, synchronization, and autonomy among local systems.

Uploaded by

Marzia Borno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Here the data base is centrally managed by one

DISTRIBUTED DATABASES computer system(site 2) and all requests are routed


to that site.
INTRODUCTION
In this regard we mainly deal with transmission
Distributed database system (DDBS) technology is
delays
the union of two opposed approaches to data
processing database system and computer network So the existence of CN or a collection of files is not
technologies. sufficient to form DDBS. So we move to….. (cont.)

Distributed data processing So we move environment where data are distributed


among a number of sites.
(distributed processing or distributed computing)
Promises of DDBSs
•Distributed computing is a field of computer
science that studies distributed systems. Many advantages of DDBSs are distilled to four
fundamentals which are called as promises of DDBS
•A distributed system is a system whose
technology
components are located on different networked
computers, which communicate and coordinate 1. Transparent Management of Distributed and
their actions by passing messages to one another. Replicated Data (data independence, Network
Transparency, Replication Transparency,
•The components interact with one another in order
Fragmentation Transparency)
to achieve a common goal.}
2. Reliability (reliable access to data) Through
Distributed database system(DDBS) Distributed Transactions

What is distributed database?? 3. Improved Performance

distributed database as a collection of multiple, 4. Easier System Expansion


logically interrelated databases distributed over a
computer network. A database that is not limited to 1.Transparent Management of Distributed and
one system, it is spread over different sites(ex: on Replicated Data
multiple computers or over a network of computers)
Transparency refers to separation of the higher-level
Then distributed DBMS is the software system that semantics of a system from lower-level
manages a distributed database such that the implementation issues.
distribution aspects are transparent to the users.
( DDBMS hides all the added complexities of
DDBS is used to refer jointly to the distributed distribution, allowing users to think that they are
database and the distributed DBMS. working with a single centralized system.)

DDBS is not only specify about the data but also Here goes an example…..
about the structure of data
Consider an engineering firm that has offices in
In a DDBS , despite the existence of a network, the Boston Waterloo Paris and SanFrancisco.
database resides at only one node of the network.
They run projects and maintain database of their
So we face problems not less than maintaining employees (ex: projects, employee data)
dbms.
Let us assume that the database is relational and LDI stores information about how data is managed
stored in following two relations inside.

EMP(ENO,ENAME,TITLE)and PROJ(PNO,PNAME, Physical data independence, on the other hand,


deals with hiding. The details of the storage
BUDGET){cont…} structure from user applications.

The other relation to store salary information ii. Network Transparency /Distribution
SAL(TITLE, AMT)
transperancy
The fourth relation to know the assign projects with
Other than data the user should be protected from
durationand responsibility indicates as ASG
the Operational details of the network.
ASG(ENO,PNO,RESP,DUR).
Allowing a user to access a resource ( application
Based on the query it is going to search in different
program or data) without the user needing to know
databases of Boston,Parisetc…
whether the resource is located on the local machine
In order to quick processing of query we are going to
iii. Replication Transparency
partition each of the relations and store each
partition at a different site. This is known as Replication transparency ensures that replication of
fragmentation. databases are hidden from the users.
Sometimes we also duplicate some of this data at It enables users to query up on a table as if only as in
other sites for performance and reliability reasons copy Of the table exists.
It means distributed database which is fragmented
ii. Fragmentation Transparency
and replicated.
dividing each database relation into smaller
So Fully transparent access means that the users can
fragments and treat each fragment as a separate
still pose the query as specified above, without
database object .This is for reasons of
paying any attention to the fragmentation, location,
performance ,availability, and reliability.
or replication of data
Fragmentation Transparency hides the fact that the
I.Data Independence table the user is querying on is actually a fragment or
union of some fragments.
Data independence is a fundamental form of
transparency. It is capacity of changing the Database So to provide easy and efficient access of the DBMS
schema (structure/description) at one level of a we need to have full transparency.
database system without effecting the schema at the
next higher level. 2)Reliability (reliable access to data) Through
Distributed Transactions
Database system follows multilayer architecture to
store meta data.2 types of Data independence: Distributed DBMSs are designed to improve
Logical data independence reliability by having replicated components results in
eliminating failure.
Followed by logical schema
So the failure creates problem to the entire system.
: Physical data independence In distributed proper care is taken such that instead
of failure part user may be permitted to access other
Followed by physical schema
parts of the distributed database. This is useful to It has inter-query and intra-query parallelism.
support for distributed transactions.
Inter-query parallelism is the ability to execute
A transaction is a basic unit of consistent and reliable multiple queries at the same time
computing, consisting of a sequence of database
operations executed as an atomic action. intra-query parallelism is achieved by breaking up a
single query into a number of sub queries each of
Let us take an example of transaction based on the which is executed at a different site, accessing a
engineering firm. Assuming that there is an different part of the distributed database.
application that updates the salaries of all the
employees by 10%. 2) Easier System Expansion

In the middle of this transaction if a system In a distributed environment,it is much easier to


failure .we would like the DBMS to be able to accommodate Increasing database sizes.
determine, upon recovery, where it left off and
In general expansion can be handled by adding
continue with its operation (or start all over again)
processing and storage power to the network.
Alternatively, if some other user runs a query
This also depends on the overhead of distribution.
calculating the average salaries of the employees in
this firm while the original update action is going on, One aspect of easier system expansion, is
the calculated result will be in error. economics.

Therefore we would like the system to be able to It normally costs much less to put together a system
synchronize the concurrent execution of these two of “smaller” computers with the equivalent power of
programs. a single big machine.

Distributed transactions execute at a number of sites Problem areas


at which they access the local database.
As distributed environment is more complex than
Here we are providing a facility that there is no the database systems, even the underlying principles
interruption in any transaction. are same.This complexity give rise to new problems
i.e., influenced by 3 factors
2) Improved Performance
1. First, as data may be replicated in distributed
Performance of distributed DBMSs is improved by:-
environment….. The distributed DB is designed in
A distributed DBMS fragments the conceptual such a way that a portion of DB or entire DB reside
database. This is also called as data localization at different sites of CN(not essential that every site
on the network contain DB)………….. So in this regard
It has the following advantages the DDB is responsible for-

--Since each site handles only a portion of the -If requested choosing one of the stored copies of
database, contention for CPU and I/O services is not the requested data for access making sure that the
severe. effect of an update is reflected on each & every copy
of that data item.
---Local initialization reduces remote access delays.
1. Second, if some site fail either by S/W or H/W
Implementation of inherent parallelism of
malfunctioning or if some communication link fail
distributed systems.
while execution of updating, the system must make
sure that the effects will be reflected on the data The location of data on different storage devices and
residing at the failing or unreachable sites as soon as the access mechanisms used to reach and
the system can recover from the failure. manipulate data are the issues dealt with at this
level. The external view, which is concerned with
2. Third, since each site cannot have instantaneous how users view the database.
information on the actions currently being carried
out at the other sites, the synchronization of In between the set ends is the conceptual schema,
transactions on multiple sites has to be take cared. which is an abstract definition of the database used
to represent the data and the relationships among
Distributed DBMS Architecture data. The advantage is it mainly supports data
independence. The separation of schemas leads to
The architecture of a system defines its structure the physical and logical data independence.

Here the components of the system are identified Architectural Models for Distributed DBMSs

The function of each component is specified There are various possible ways in which a
distributed DBMS may be architected.
The inter relationships and interactions among these
components are defined. We use a classification that organizes the systems as
characterized with respect to
1) ANSI/SPARC Architecture
(1)the autonomy of local systems,
In late 1972, the Computer and Information
Processing Committee(X3)of the American National (2)their distribution
Standards Institute (ANSI) established a Study Group
on DBMS under the Standards Planning and Autonomy
Requirements Committee (SPARC).
Autonomy refers to the distribution of control, which
The mission of the study group was to study the indicates the degree to which individual DBMSs can
feasibility of setting up standards. The study group operate independently. Autonomy is a function of a
issued its report in1975 and its final report in 1977. number of factors such as

The architectural framework proposed in these 1) whether the component systems (individual
reports known as the “ANSI/SPARC architecture,” DBMSs) exchange information.
its full title being “ANSI/X3/SPARC DBMS
2)whether they can independently execute
Framework.”
transactions.
There are three views of data:
3) whether one is allowed to modify them.
The external view, which is that of the end user, who
Requirements of an autonomous system have been
might be a programmer.
specified as follows
The internal view, that of the system or machine.
1.The local operations of the individual DBMSs
The conceptual view, that of the enterprise. should not be affected by their participation in the
distributed system.
Lowest level of the architecture is the internal view,
which deals with the physical definition and 2. The manner in which the individual DBMSs
organization of data. process queries should not be affected by the
execution of global queries that access multiple Distribution is the physical distribution of data over
databases multiple sites. There are many ways DBMSs have
been distributed. We abstract these alternatives into
3. System consistency or operation should not be two classes:
compromised when individual DBMSs join or leave
the distributed system. 1.client/server distribution : The sites on a network
are distinguished as “clients” and “servers”.
There are dimensions of autonomy which can be Communication duties are shared between the client
specified as follows machines and servers.

1.Design autonomy: Individual DBMSs are free to 2.peer-to-peer distribution : There is no distinction
use the data models and transaction management of client machines versus servers. Each machine has
techniques that they prefer. full DBMS functionality and can communicate with
other machines to execute queries and transactions.
2.Communication autonomy: Each of the individual
DBMSs is free to make its own decision as to what Heterogeneity
type of information it wants to provide to the other
DBMSs. It refers to the uniformity or dissimilarity of the data
models system components and databases.
3.Execution autonomy: Each DBMS can execute the
transactions that are submitted to it in any way that It occurs in various forms in Distributed Systems
it wants to. Other hand autonomous systems there which may be in the field of hardware heterogeneity
are other alternatives. and differences in networking protocols to variations
in data managers.
They are tight integration ,semiautonomous, total
isolation. It may be in data models query languages and
transaction management protocols.
1.Tight integration : a single-image of the entire
database is available to any user who wants to share Representing data with different modeling tools
the information, which may reside in multiple creates heterogeneity
databases. In these tightly- integrated systems, the
data managers are implemented. so that one of Heterogeneity in query languages involves the use
them is in control of the processing of each user of completely different data access paradigms
request even if that request is serviced by more than
Even if the SQL is the standard relational query
one data manager
language there are many different implementations
2.Semiautonomous : They are not fully autonomous also.
systems because they need to be modified to enable
Client/Server Systems
them to exchange information with one another.
Client/server DBMSs entered the computing at the
3.Total isolation :the individual systems are stand-
beginning of 1990’s
alone DBMSs that know neither of the existence of
other DBMSs nor how to communicate with them. In It mainly focus on two functions1)
such systems, the processing of user transactions
that access multiple databases is especially difficult. Server functions2) client functions.

Distribution: This is a two-level architecture easier to manage the


complexity of modern DBMSs and the complexity of
distribution.
The functionality allocation between clients and 2.In other approach it concentrates on the data
serves different types of distributed DBMSs. management functionality at the servers. Thus, the
transparency of data access is provided at the server
In relational systems, the server does most of the interface, leading to “light clients.”
data management work (all of query processing and
optimization, transaction management and storage Peer-to-Peer Systems Modern peer-to-peer
management is done at the server)
From their earlier relatives.
The client, in addition to application and the user
interface, has a DBMS client module that is The first is the massive distribution in current
responsible for managing the data that is cached to systems.
the client and (sometimes) managing the transaction
While in the early days we focused on a few sites,
locks.
current
There is operating system and communication
Systems consider thousands of sites.
software that runs on both the client and the server.
These cond is the inherent heterogeneity of every
Here the communication between the clients and
aspect of the sites and their autonomy.
the server(s) is at the level of SQL statements.
Now initially we focus on the meaning of peer-to-
The client passes SQL queries to the server. The
peer (functionality of each site),since the principles
server does most of the work and returns the result
and fundamental techniques of these systems are
relation to the client.
very similar to those of client/server systems.
There are a various types of client/server
The description of the architecture in data
architectures.
organizational view.
One of the case is there is only one server which is
First is that the physical data organization on each
accessed by multiple clients.
machine is different.
This is called as multiple client/single server.
This means that there needs to be an individual
Here the database is stored on only one machine(the internal schema definition at each site which called
server)that also hosts the software to manage it. as local internal schema (LIS).

The more sophisticated client/server architecture is The enterprise view of the data is described by the
one where there are multiple servers in the system global conceptual schema (GCS).
called as multiple client/multiple server approach.
which is global because it describes the logical
In multiple client/multiple server approach two structure of the Data at all the sites.
alternative management strategies are possible :
To handle data fragmentation and replication the
1.either each client manages its own connection to logical Organization of data teach site needs to be
the appropriate server or each client knows only its described.Therefore,thereneedstobeathirdlayerinthe
“home server” which then communicates with other architecture,theLocal conceptual schema (LCS).
servers as required. This approach simplifies server
User applications and user access to the database is
code, but loads the client machines with additional
supported by external schemas (ESs).
responsibilities which is called “heavy client”
systems.
Data independence is supported since the model is 1.The local query optimizer, which acts as the access
an extension of ANSI/SPARC Location and replication path selector, is responsible for choosing the best
transparencies are supported by the local and global access path to access any data item
conceptual schemas.
2.The local recovery manager is responsible for
Network transparency, on the other hand, is making sure that the local database remains
supported by the Global conceptual schema. consistent even when failures occur.

The detail components of a distributed DBMS are 3.Ther un-times up port processor is the interface to
shown in before picture the operating system and contains the database
buffer (or cache) manager, which is responsible for
One component handles the interaction with users, maintaining the main memory buffers and managing
and another deals with the storage the data accesses.

User processor consists of 4 elements: In peer-to-peer systems, both the user processor
modules and the data processor modules on each
1.user interface handler
machine.
2.semantic data controller
Distributed database design
3.Global query optimizer and decomposer
Alternative design strategies
4.distributed execution monitor
Two major strategies that have been identified for
1.The user interface handler is responsible for designing
interpreting user commands as they come in, and
formatting the result data as it is sent to the user. Distributed databases. They are

2.The semantic data controller uses the integrity 1.Top-down approach


constraints and authorizations checking if the user
2.Bottom-upapproach
query can be processed & also responsible for
authorization Top-down approach is more suitable for tightly
integrated homogeneous distributed DBMSs.
3.The global query optimizer and decomposer
determines an execution strategy to minimize a cost Bottom-up design is more suitable to multi
function. The global query optimizer is responsible databases.
for generating the best strategy to execute
distributed operations The activity begins with a requirements analysis that
defines the environment of the system (data and
4.The distributed execution monitor coordinates the processing needs of all database users)
distributed execution of the user request. The
execution monitor is also called the distributed The requirements study also specifies where the
transaction manager. final system is expected to stand with respect to the
objectives of a distributed DBMS
In executing queries in a distributed fashion, the
execution monitors at various sites may, and usually These objectives are performance, reliability and
do, communicate with one another. availability economics, and expandability (flexibility).

The second component of a distributed DBMS is the


data processor consists of 3 elements:
The requirements document is input to two parallel 1.Whyfragmentat all?
activities: view design and conceptual design.
2.How should we fragment?
The view design activity deals with defining the
interfaces for end users. 3.How much should we fragment?

The conceptual design, is the process by which the 4.is there any way to test the correctness of
enterprise is examined to determine entity types and decomposition?
relationships among these entities.
5.How should we allocate?
From the conceptual design step, comes the
6.What is the necessary information for
definition of global conceptual schema.
fragmentation and allocation?
The global conceptual schema (GCS) and access
1)Reasons for Fragmentation
pattern information collected as a result of view
design are inputs to the distribution design step. The decomposition of a relation into fragments,
each being treated as a unit, permits a number of
The objective at this stage, is to design the local
transactions to execute concurrently.
conceptual schemas (LCSs) by distributing the
entities over the sites of the distributed system. Fragmentation also results in the parallel execution
of a single query by dividing it into a set of sub
The last step in the design process is the physical
queries that operate on fragments.
design, which maps the local conceptual schemas to
the physical storage devices available at the Thus fragmentation typically increases the level of
corresponding sites. concurrency.

As designing and development activity is an ongoing This form of concurrency is refer to a query
process which requires constant monitoring and concurrency
periodic adjustment and tuning.
2)Fragmentation Alternatives
For that reason observation and monitoring came
into existence. Relation instances are essentially tables

The result is some form of feedback, which may Finding alternative ways of dividing a table into
result in backing up to one of the earlier steps in the smaller ones.
design.
There are two alternatives: dividing it horizontally or
Distribution design issues dividing it vertically.

In relational model of entities and relations. The 1)Degree of Fragmentation


relations in a database schema are decomposed in to
The extent to which the database should be
smaller fragments
fragmented is an important decision that affects the
Thus ,the distribution design activity consists of two performance of query execution.
steps: fragmentation and allocation
The degree of fragmentation goes from one
As there is no justification or details for this process. extreme, that is not to fragment at all, to the other
So a set of interrelated questions covers the entire extreme, to fragment to the level of individual tuples
issue (in the case of horizontal fragmentation) or to the
level of individual attributes (in the case of vertical In most cases a simple horizontal or vertical
fragmentation). fragmentation of a database schema will not be
sufficient to satisfy the requirements of user
2)Correctness Rules of Fragmentation applications.

They are three rules that we follow during In this case a vertical fragmentation may be followed
fragmentation, which ,together ,ensure that the by a horizontal one, or vice versa, producing a tree
database does not undergo semantic change during structured partitioning
fragmentation.
Since two types of partitioning strategies are applied
a)Completeness b)Reconstruction C)Disjointness one after the other which is called as hybrid
fragmentation (mixed or nested fragmentation.)
1Allocation Alternatives
Allocation
After the database is fragmented, one has to decide
on the allocation of the fragments to various sites on The allocation of resources across the nodes or
the network. placing individual files of a computer network is a big
task.
When data are allocated ,it may either be replicated
or maintained as a single copy. Allocation Problem

The reasons for replication are reliability and It is defined with respect to two measures
efficiency of read-only queries.
1.Minimal cost.
The replication of data that depends on the ratio of
the read-only queries to the update queries. 2.Performance.

2)Information Requirements 1.Performance: The allocation strategy is designed


to maintain a performance metric. Two well-known
The information needed for distribution design can ones are to minimize the response time and to
be divided Into four categories: maximize the system throughput at each site.

Database information ,application information,


DDB unit-2
communication network information, and computer
system information. QUERYPROCESSING&DECOMPOSITION

Fragmentation Query Processing Objectives

There are two fundamental fragmentation The objective of query processing in a distributed
strategies: horizontal and vertical. context is to transform a high-level query on a
distributed database into an efficient execution
Furthermore, there is a possibility of nesting
strategy expressed in a low-level language on local
fragments in a hybrid fashion.
databases
1.Horizontal Fragmentation
We assume that the high-level language is relational
2.Vertical Fragmentation calculus, while the low-level language is an extension
of relational algebra with communication operators.
3.Hybrid Fragmentation
Important aspect of query processing is query
Hybrid Fragmentation optimization
Transformations of the high-level query means the 1.Languages
one that Optimizes (minimizes)
2.Types of Optimization
A good measure of resource consumption is the total
cost that will be incurred in processing the query. 3OptimizationTiming

Total cost is the sum of all times incurred in process 4.Statistics


in the operators of the query at various sites and in
5.Decision Sites
inter site communication.
6.Exploitation of the Network Topology
Another measure is the response time of the query,
which is the time elapsed for executing the query 7.Exploitation of Replicated Fragments

In a distributed database system, the total cost to be 8.Use of Semi joins


minimized includes
1)Languages
1.CPU,
Query processing must perform efficient mapping
2.I/O and from the input language to the output language.

3.communicationcosts. 1)Types of Optimization

1.The CPU cost is incurred when performing Query optimization aims at choosing the “best”
operators on data in main memory point in the solution space of all possible execution
strategies.
2.TheI/O cost is the time necessary for disk accesses.
This cost can be minimized by reducing the number An immediate method for query optimization is to
of disk accesses through fast access methods to the search the solution space ,predict the cost of each
data and efficient use of main memory (buffer strategy, and select the strategy with minimum cost.
management).
The problem is that the solution space can be large;
3.The communication cost is the time needed for that is, there may be many equivalent strategies,
exchanging data between sites participating in the even with a small number of relations.
execution of the query.
Therefore, an “exhaustive” search approach is often
Characterization of Query Processors used whereby (almost) all possible execution
strategies are considered
It is quite difficult to evaluate and compare query
processors in both centralized systems and 1)Optimization Timing
distributed systems because they may differ in many
aspects. A query maybe optimized at different times.

Here are some important characteristics of query Optimization can be done statically or dynamically
processors which is used as a basis for comparison.
Statically means before executing the query
The first four characteristics hold for both
Dynamically means as the query is executed.
centralized and distributed query processors
The main advantage over static query optimization is
Next four characteristics are particular to distributed
that the actual sizes of intermediate relations are
query processors distributed DBMSs.
available to the query processor, thereby minimizing Relations are mapped into queries on physical
the probability of a bad choice fragments of relations by translating relations into
fragments.
2)Statistics
We call this process localization because its main
The effectiveness of query optimization relies on function is to localize the data involved in the query.
statistics on the database.
1)Use of Semijoins
Dynamic query optimization requires statistics in
order to choose which operators should be done The semijoin operator has the important property of
first. Static query optimization is even more reducing the size of the operand relation.
demanding since the size of intermediate relations
must also be estimated based on statistical A semijoin is particularly useful for improving the
information processing of distributed join operators as it reduces
the size of data exchanged between sites
1)Decision Sites
The early distributed DBMSs, which has slow wide
When optimization is used, either a single site or area networks, make extensive use of semijoins.
several sites may participate in the selection of the
strategy to be applied for answering the query. Some later systems, which has faster networks and
do not employ semijoins.
In centralized decision approach a single site
generates the strategy even the decision process The input is a query on global data expressed in
could be distributed among various sites relational calculus.
participating in the elaboration of the best strategy
This query is posed on
Even it is simple but requires knowledge of the global(distributed)relations,meaning that data
entire distributed database Where as in distributed distribution is hidden.
approach it requires only local information.
Four main layers are involved in distributed query
Hybrid approaches where ones it makes the major processing. The first three layers map the input
decisions and other sites can make local decisions query in to an optimized distributed query execution
are also frequent plan. They perform the functions of query
decomposition,data localization, and global query
1)Exploitation of the Network Topology optimization.

The network topology is generally exploited by the The fourth layer performs distributed query
distributed query processorWith wide area execution by executing the plan and returns the
networks,the cost function to be minimized can be answer to the query.
restricted to the data communication cost
It is done by the local sites and the control site.
With local area networks,communication costs are
comparable to I/O costs 1)Query Decomposition

2)Exploitation of Replicated Fragments The first layer decomposes the calculus query in to
an algebraic query on global relations.
A distributed relation is usually divided into relation
fragments Distributed queries expressed on global The information needed for this transformation is
found in the global conceptual schema describing
the global relations
Query decomposition can be done as four successive It is a weighted combination of I/O, CPU, and
steps communication costs.

First, the calculus query is rewritten in a normalized The output of the query optimization layer is a
form that is suitable for subsequent manipulation optimized algebraic query

Second, the normalized query is analyzed It is represented and saved (for future executions)as
semantically so that incorrect queries are detected a distributed query execution plan .
and rejected as early as possible.
1)Distributed Query Execution
Third,the correct query(still expressed in relational
calculus) is simplified (eliminate redundant The last layer is performed by all the sites having
predicates) fragments involved in the query.

Fourth,the calculus query is restructured as an Each sub query executing a tone site,called a local
algebraic query. The algebraic query generated by query, is then optimized using the local schema of
this layer avoid worse executions. the site and executed.

1)Data Localization Query Decomposition

The input to the second layer is an algebraic query It is the first phase of query processing that
on global relations. transforms a relational calculus query into a
relational algebra query
The main role of the second layer is to localize the
query’s data using data distribution information in The successive steps of query decomposition are (1)
the fragment schema.
normalization,(2)analysis,(3)elimination of
This layer determines which fragments are involved redundancy, and (4) rewriting.
in the query and transforms the distributed query in
Normalization
to a query on fragments
It is the goal of normalization to transform the query
A global relation can be reconstructed by applying
to a normalized form to facilitate further processing.
the fragmentation rules, and then deriving a
program, called a localization program, of relational Two types of normal from conjunctive(˄),
algebra operators, which then act on fragments. disjunctive(˅) normal form.
1)Global Query Optimization Analysis
The input to the third layer is an algebraic query on Query analysis enables rejection of normalized
fragments. queries for which further processing is either
impossible or unnecessary.
The goal of query optimization is to find an
execution strategy for the query. The main reasons for rejection are that the query is
type incorrect or semantically incorrect.
Query optimization find bestway that minimize a
cost function. When one of these cases is detected, the query is
simply returned to the user with an explanation.
The cost function, often defined in terms of time
Otherwise, query processing is continued.
units.
Rewriting
The last step of query decomposition rewrites the execution cost is expressed as a weighted
query in relational algebra. combination of I/O, CPU, and communication costs.

For the sake of clarity it is customary to represent Query Optimization


the relational algebra query graphically by an
operator tree Here the leaf node is a relation stored Query optimization refers to the process of
in the database, and a non-leaf node is an producing a query execution plan (QEP) which
intermediate relation produced by a relational represents an execution strategy for the query.
algebra operator.
QEP minimizes an objective cost function.
The sequence of operations is directed from the
A query optimizer ,the software module that
leaves to the root, which represents the answer to
performs query optimization ,consists of three
the query.
components: a search space, a cost model,and a
Localization of Distributed Data search strategy.

Determine which fragments are involved The search space provides execution plans that
represent the input query.
The localization layer translates an algebraic query
on global relations into an algebraic query expressed All plans provides same result for input but differ in
on physical fragments. the execution order of operations and the way the
operations are implemented, and therefore in their
To simplify this section, we do not consider the fact performance.
that data fragments maybe replicated
The cost model predicts the cost of a given execution
This can be viewed as replacing the leaves of the plan. To be accurate, the cost model must have good
operator tree of the distributed query with sub trees knowledge about the distributed execution
corresponding to the localization programs. environment.

We call the query obtained this way the localized The search strategy explores the search space and
query. selects the best plan, using the cost model.

DISTRIBUTEDQUERYOPTIMIZATION The details of the environment (centralized


vs.distributed) are captured by the search space and
Query Optimization the cost model.

Objective of the optimizer is to find a strategy close Query execution plans are abstracted by means of
to optimal & avoid bad strategies. operator trees by defining the order in which the
operations are executed
The strategy produced by the optimizer as the
optimal strategy (or optimal ordering). They are enriched with additional information, such
as the best algorithm chosen for each operation.
The output of the optimizer is an optimized query
execution plan consisting of the algebraic query For a given query,the search space can thus be
specified on fragments and the communication defined as the set of equivalent operator trees.
operations to support the execution of the query
over the fragment sites. Each of the join trees can be assigned a cost based
on the estimated cost of each operator.
The selection of the optimal strategy requires the
prediction of execution costs(total cost). The
Join tree(c) which starts with a Cartesian product Dynamic Query Optimization : The QEP is
may have a much higher cost than the other join dynamically constructed by the query optimizer
trees. which makes calls to the DBMS execution engine for
executing the query’s operations. Thus, there is no
If a query is large we use some restrictions need for a cost model.

The most common heuristic is to perform selection Static Query Optimization : With static query
and projection when accessing base relations optimization, there is a clear separation between the
generation of the QEP at compile-time and its
Another common heuristic is to avoid Cartesian
execution by the DBMS execution engine. Thus, an
products that are not required by the query.
accurate cost model is key to predict the costs of
Like operator tree(c)would not be part of the search candidate QEPs.
space considered by the optimizer.
The minimization of communication costs makes
To characterize query optimizers, we concentrate on distributed query optimization more complex.
join trees, which are operator trees whose operators
the optimization timing, which can be dynamic,
are join or Cartesian product.
static or hybrid, is a good basis for classifying query
Because permutations of the join order have the optimization techniques.
most important effect on performance of relational
queries. DDB UNIT-3

Cost functions Transaction Management

The cost of a distributed execution strategy can be DEFINITION


expressed with respect to either the total time or
the response time. The total time is the sum of all The concept of a transaction is used in database
time(also referred to as cost)components. systems as a basic unit of consistent and reliable
computing.
The response time is the elapsed time from the
initiation to the completion of the query. The terms consistent and reliable plays a major role.
A database is in a consistent state if it obeys all of
Centralized Query Optimization the consistency (integrity) constraints defined over it

It is a pre requisite to understanding distributed Reliability refers to both the resiliency of a system to
query optimization for three reasons. various types of failures and its capability to recover
from them. A transaction is a unit of consistent and
First, a distributed query is translated into local reliable computation. A transaction takes a
queries, each of which is processed in a centralized database, performs an action on it, and generates a
way. new version of the database, causing a state
transition.
Second, distributed query optimization techniques
are often extensions of the techniques for transaction is considered to be made up of a
centralized systems. sequence of read and write operations on the
database, together with computation steps.
Finally, centralized query optimization is a simpler
problem. A transaction maybe though to as a program with
embedded database access queries.
Properties of Transactions In other words, a transaction is a correct program
that maps one consistent database state to another.
The consistency and reliability aspects of
transactions are due to four properties: There is a classification of consistency that groups
databases into four levels.
(1)atomicity,
Then based on the concept of dirty data, the four
(2)consistency, levels are defined as follows:

(3)isolation,and “Degree3: Transaction T sees degree3 consistency if:

(4)durability. 1.T does not overwrite dirty data of other


transactions.
Together, these are commonly referred to as the
ACID properties of transactions. 2.T does not commit any writes until it completes all
its writes[i.e., until the end of transaction (EOT)].
Atomicity
3.T doesnot read dirty data from other transactions.
Atomicity refers to either all the transaction’s actions
are completed, or none of them are. This is also 4.Other transactions do not dirty any data read by T
known as the “all-or-nothing property.” beforeT completes.

Atomicity requires that if the execution of a Degree2: Transaction T sees degree2 consistency if:
transaction is interrupted by any sort of failure, the
DBMS will be responsible for determining what to do 1.T does not overwrite dirty data of other
with the transaction upon recovery from the failure. transactions.

There are two possible courses of action: it can 2.T does not commit any writes before EOT.
either be terminated by completing the remaining
actions, or it can be terminated by undoing all the 3.T does not read dirty data from other transactions.
actions that have already been executed. Degree 1:Transaction Tsees degree 1 consistency if:

There are two types of failures: A transaction itself 1.T doesnot overwrite dirty data of other
may fail due to input data errors, deadlocks, or other transactions.
factors.
2.T doesnot commit any writes before EOT.
Maintaining transaction atomicity in the presence of
Degree0: Transaction T sees degree0 consistency if:
this type of failure is commonly called the
transaction recovery. 1.T doesnot overwrite dirty data of other
transactions.”
The second type of failure is caused by system
crashes, such as media failures, process or failures, defining multiple levels of consistency is to provide
communication link breakages, power outages etc. application programmers the flexibility to define
transactions that operate at different levels.
Ensuring transaction atomicity in the presence of
system crashes is called crash recovery. Consequently,whilesometransactionsoperateatDegr
ee3 consistency level, others may operate at lower
Consistency
levels Isolation
The consistency of a transaction is its correctness
This property ensures that multiple transactions can Read(x) Read(y) x←x+1 y←y+1 Write(x) Write(y)
occur concurrently without leading to the commit
inconsistency of database state.
Read(x) x ←x+1 Write(x) Read(y) y ←y+1 Write(y)
Durability commit

Durability refers to that property of transactions Transactions can also be classified according to their
which ensures that once a transaction commits, its structure
results are permanent and cannot be erased from
the database. Distinguishing four broad categories in increasing
complexity
The DBMS ensures that the results of a transaction
will survive subsequent system failures. 1.flat transactions

Types of transactions 2.nested transactions(closed/open)

Transactions have been classified according to a 3.work flow models(are combinations of various
number of criteria. nested forms.) flat transactions

First goes duration of transactions transactions Flat transactions have a single start point(Begin
maybe classified as online or batch These two classes transaction)and a single termination point (End
are also called short-life and long-life transactions. transaction).

Online transactions are characterized by very short Nested transactions


execution/ response times(typically, on the order of
An alternative transaction model is to permit a
a couple of seconds)and by access to a relatively
transaction to include other transactions with their
small portion of the database.
own begin and commit points
Take longer to execute(response time being
These transactions that are embedded in another
measured in minutes,hours,or even days) and access
one are usually called sub transactions.
a larger portion of the database.
We differentiate between closed and open nesting
Example statistical applications, report generation,
because of their termination characteristics.
complex queries,and image processing.
Closed nested transactions commit in a bottom-up
Inter mixing of read and write actions without any
fashion through the root.
specific ordering is called general transactions.
Thus ,a nested sub transaction begins after its parent
If the transactions are restricted so that all the read
and finishes before it, and the commitment of the
actions are performed before any write action,the
sub transactions is conditional upon the
transaction is called a two- step transaction
commitment of the parent.
If the transaction is restricted so that a data item has
An open nested transaction allows its partial results
to be read before it can be updated(written),the
to be observed outs the transaction.
corresponding class is called restricted (or read-
before-write) transaction Advantages of nested transactions

If a transaction is both two step and restricted, it is 1.providing a higher-level of concurrency among
called a restricted two-step transaction. transactions. Since a transaction consists of a
number of other transactions, more concurrency is First, read operations do not conflict with each
possible within a single transaction other. We can, therefore ,talk about two types of
conflicts: read-write(or write- read), and write-write.
2.It is possible to recover in dependently from
failures of each sub transaction. Second, the two operations can belong to the same
transaction or to two different transactions.
Workflows
The existence of a conflict between two operations
To model business activities flat and nested indicates that their order of execution is important
transactions are less sufficient.
The ordering of two read operations is insignificant.
So workflows came into existence a workflow is“ a (very small)
collection of tasks organized to accomplish some
business process.” Concurrency control mechanisms & algorithms

Three types of work flows are identified There are a number of ways that the concurrency
control approaches can be classified.
Distributed concurrency control
Grouping the concurrency control mechanisms into
The distributed concurrency control mechanism of a two broad classes:
distributed DBMS ensures that the consistency of
the database maintained in a multiuser distributed 1.pessimistic concurrency control methods and
environment.
2.optimistic concurrency control methods.
In this chapter we make two major assumptions: the
distributed system is fully reliable and does not Pessimistic algorithms synchronize the concurrent
experience any failures (of hardware or software), execution of transactions early in their execution
and the database is not replicated. lifecycle

Serialized ability Optimistic algorithms delay the synchronization of


transactions until their termination
When multiple transactions are running concurrently
then there is a possibility that the database may be The pessimistic group consists of locking based
left in an inconsistent state. Serialized ability is a algorithms, ordering(or transaction ordering)based
concept that helps us to check which schedules are algorithms, and hybrid algorithms.
serialize able, that always leaves the database in
The optimistic group can be classified as locking-
consistent state.
based or time stamp ordering-based.
A history R (also called a schedule) is defined over a
In the locking based approach, the synchronization
set of transactions T={T1,T2,……Tn} and specifies an
of transactions is achieved by employing physical or
interleaved order of execution of these transactions’
logical locks on some portion or granule of the
operations.
database.
Consider Two operations Oi j(x) and Okl(x) (i and k
The timestamp ordering (TO) class involves
representing transactions and are not same)
organizing the execution order of transactions so
accessing the same database entity x are said to be
that they maintain transaction consistency
in conflict if at least one of them is a write operation.

There are two things in this definition


This ordering is maintained by assigning times tamps Optimistic algorithms, delay the validation phase
to both the transactions and the data items that are until just before the write phase
stored in the database.
Thus an operation submitted to an optimistic
in some locking-based algorithms, timestamps are scheduler is never delayed.
also used, to improve efficiency and the level of
concurrency called as hybrid algorithms locking- The read, compute,and write operations of each
based approach transaction are processed freely without updating
the actual database.
The main idea of locking-based concurrency control
is to ensure that a data item that is shared by Each transaction initially makes its updates on local
conflicting operations is accessed by one operation copies of data items.
at a time by associating a “lock” with each lock unit.
The validation phase consists of checking if these
The problem with history H is the locking algorithm updates would maintain the Consistency of the
releases the locks that are held by a transaction (say, database.
Ti) as soon as the associated database
If the answer is affirmative,the changes are made
command(read or write) is executed, and that lock
global(written into the actual database).
unit (say x) no longer needs to be accessed
Otherwise, the transaction is aborted and has to
Hence we go for two-phase locking(2PL).
restart.
It states that no transaction should request a lock
Deadlock management
after it releases one of its locks.
Any locking based concurrency control algorithm
Time-stamped & optimistic concurrency control
may result in deadlocks.
algorithms
Since there is mutual exclusion(object that prevents
To establish times tamp ordering, the transaction
simultaneous access to a shared resource (data) )
manager assigns each transaction Tia unique time
and transactions may wait on locks TO-based
stamp, ts(Ti), at its initiation.
algorithms that require the waiting of transactions
A time stamp is a identifier that’serves to identify may also cause deadlocks.
each transaction uniquely an disused for ordering.
A dead lock can occur because transactions wait for
Formally the times tamp ordering (TO)rule can be one another
specified as follows:
Deadlock situation is a set of requests that can never
Optimistic Concurrency Control be granted by the concurrency control mechanism.

Pessimistic concurrency control algorithms assume Using the WFG, it is easier to indicate the condition
that the conflicts between transactions are quite for the occurrence of a deadlock.
frequent and do not permit a transaction to access a
A deadlock occurs when the WFG contains a cycle.
data item if there is a conflicting transaction that
Deadlock Prevention
accesses that data item.
Deadlock prevention methods guarantee that
Thus operation of a transaction follows the sequence
deadlocks can not occur in the first place
of phases validation (V), read (R), computation (C),
write (W)
The transaction manager checks a transaction when •If the transaction manager generates a non-
it is first initiated and does not permit it to proceed if serialize able schedule,we say that it has failed.
it may cause a deadlock
(i)Reliability and Availability
To perform this check, it is required that all of the
data items that will be accessed by a transaction be •Reliability refers to the probability that the system
pre declared under consideration does not experience any failures
in a given time interval
The transaction manager then permits a transaction
to proceed fall the data items that it will access a re •Formally,thereliabilityofasystem,R(t),isdefinedasthe
available. followingconditionalprobability:

Otherwise, the transaction is not permitted to •If we assume that failures follow a Poisson
proceed. distribution(which is usually the case for hardware),
this formula reduces to
DISTRIBUTED DATABASES UNIT-4
•Under the same assumptions, it is possible to
Distributed DBMS Reliability derive that

A reliable distributed database management system •Availability, A(t), refers to the probability that the
is one that can continue to process user requests system is operational according to its specification at
even when the underlying system is unreliable. a given point in time t.

Even when components of the distributed •Reliability and availability of a system are
computing environment fail, a reliable distributed considered to be contradictory objectives.
DBMS should be able to continue executing user
requests without violating database consistency. (i)Mean Time between Failures(MTBF)/ Mean Time
to Repair(MTTR)
So we discuss the reliability features of a distributed
DBMS. •MTB Finish the expected time between sub sequent
failures in a system with repair.
Reliability concepts & measures
•MTBF canbe calculated either from empirical data
(i)System,State,andFailure: or from the reliability function as:

•Reliability refers to a system that consists of a set of •MTTD: mean time to detect
components.
Failures in distributed DBMS
•The system has a state, which changes as the
system operates. •Designing a reliable system that can recover from
failures requires identifying the types of failures with
•Any deviation of a system from the behavior which the system has to deal.
described in the specification is considered a failure.
•In distributed database system,we need to deal
•For example, in a distributed transaction manager with four types of failures:
the specification may state that only serialize able
schedules for the execution of concurrent i.transaction failures(aborts)
transactions should be generated.
ii.site(system) failures

iii.media(disk)failures
iv. communication line failures. •These failures may be due to operating system
errors, as well as to hardware faults
(i)transaction failures:
•It means that all or part of the database that is on
•Transactions can fail for a number of reasons. the secondary storage is considered to be destroyed
and in accessible
•Failure can be due to an error in the transaction
caused by incorrect input data or due to deadlocks (ii)Communication Failures:

•some concurrency control algorithms do not permit •The three types of failures described above are
a transaction to proceed or even to wait if the data common to both centralized and distributed DBMSs.
that they attempt to access are currently being
accessed by another transaction. •Communication failures are unique to the
distributed & there are a number of types of
•This might also be considered a failure. communication failures.
•The approach to take incases of transaction failure •Error In the messages, improperly or de
is to abort the transaction,thus re setting the redmessages,lost(or undeliverable)messages,and
Database to its state prior to the start of this communication line failures.
transaction.
Local & distributed reliability protocols
(i)site (system) failures:
Local reliability protocols
•The reasons for system failure can be traced back
to a hardware or to a software failure •we discuss the functions performed by the local
recovery manager(LRM)that exists at each site.
•A system failure is always assumed to result in the
loss of main memory contents. •These functions maintain the atomicity and
durability properties of local transactions.
•Therefore, any part of the database that was in
main memory buffers is lost as a result of a system •Which relate to the execution of the commands
failure. that are passed to the LRM, which are
begin_transaction, read, write, commit, and abort.
•system failures are referred to as site failures,which
makes site un reachable •When the LRM wants to read a page of data on
behalf of a transaction it issues a fetch command,
•We differentiate between partial and total failures indicating the page that it wants to read.
in a distributed system.
•The buffer manager checks to see if that page is
•Total failure refers to the simultaneous failure of all already in the buffer and if so,makes it available for
sites in the distributed system partial failure that transaction; if not, it reads the page from the
indicates the failure of only some sites while the stable database into an empty database buffer
others remain operational
•Other than above a sixth interface command to the
(i)media failures: LRM: recover.

•Media failure refers to the failures of the secondary •The recover command is the interface that the
storage devices that store the database. operating system has to the LRM
•It is used during recovery from system failures •The first kind of access is representative of On-Line
when the operating system asks the DBMS to Transaction Processing(OLTP) applications
recover the database to the state that existed when
the failure occurred. •while the second is representative of On-Line
Analytical Processing (OLAP) applications
Distributed reliability protocols
•Supporting very large databases efficiently for
•the distributed version also aim to maintain the either OLTP or OLAP can be addressed by combining
atomicity and durability of distributed transactions parallel computing and distributed database
that execute over a number of databases management.

•The protocols address the distributed execution of Parallel Database System Architectures
the begin_transaction,read,write,abort,commit, and
recover commands. Objectives

•all the commands are executed in the same manner •Parallel database systems combined at a base
of centralized system management and parallel processing to increase
performance and availability
•We assume that at the originating site of a
transaction there is a coordinator process and a •A parallel database system can be loosely defined
teach site where the transaction executes there are as a DBMS implemented on a parallel\computer.
participant processes.
•The objectives of parallel database systems are
•Thus,the distributed reliability protocols are covered by those of distributed DBMS(performance,
implemented between the coordinator and the availability,extensibility).
participants.
•Ideally,a parallel database system should provide
•Assuming that during the execution of a distributed the following advantages High-performance:
transaction,one of the sites involved in the execution obtained by parallel datamanagement, query
fails; we would like the other sites to terminate the optimization,and load balancing(Load balancing is
transaction the ability of the system to divide a given workload
equally among all processors.)etc
•Recovery protocols deal with the procedure that
the process(coordinator or participant)at the failed High availability: A parallel database system
site has to go through to recover its state once the consists of many redundant components, it can well
site is restarted. increase data availability and fault-tolerance

Parallel database system architectures Extensibility: In a parallel system,accommodating


increasing database sizes or increasing performance
•Many data-intensive applications like e- demands should be easier.
commerce,data warehousing,and datamining,
require support for very large databases. Functional Architecture

•Very large databases are accessed through high Session Manager: provide support for client
numbers of concurrent transactions (e.g., interactions with the server and also performs the
performing on-line orders on an electronic store)or connections and disconnections between the client
complex queries(e.g., decision-support queries). processes and the two other subsystems. Therefore,
it initiates and closes user sessions
Transaction Manager: It receives client transactions •There are three basic strategies for data
related to query compilation and execution. partitioning: round-robin,hash,and range
Depending on the transaction,it activates the various partitioning.
compilation phases, triggers query execution, and
returns the results as well as error codes to the Round-robin partitioning is the simplest strategy,it
client application ensures uniform data distribution.This strategy
Enables the sequential access to a relation to be
3.Data Manager : It provides all the low-level done in parallel.
functions needed to run compiled queries in parallel
database operator execution,parallel transaction Hash partitioning applies a hash function to some
support, cache management, etc. attribute that yields the partition number. This
strategy allows exact-match queries on the selection
Parallel DBMS Architectures attribute to be processed by exactly one node and all
other queries to be processed by all the nodes in
•There are three basic parallel computer parallel.
architectures depending on how main memory or
disk is shared: shared-memory,shared-disk and 3.Range partitioning distributes tuples based on the
shared-nothing value intervals(ranges)of some attribute

1.shared-memory : In the shared-memory approach Parallel query processing


any processor has access to any memory module or
disk unit through a fast interconnect .All the •The objective of parallel query processing is to
processors are under the control of a single transform queries into execution plans that can be
operating system. efficiently executed in parallel.

1.shared-disk: In this any processor has access to •It focuses on both into a operator parallelism (a
any disk unit through the interconnect but exclusive single operator is distributed among multiple
(non-shared) access to its main memory . Each processors.)and inter-operator parallelism(each
processor-memory node is under the control of its query runs on multiple processors which
own copy of the operating system. corresponds to different operators of a query
running in different processors.)
2.shared-nothing:In this approach each processor
has exclusive access to its main memory and disk A parallel query optimizer can be seen as three
unit(s).Similar to shared-disk,each processor components:a search space,a cost model,and a
memory-disk node is under the control of its own search strategy.
copy of the operating system. Then, each node can
Load balancing
be viewed as a local site
•Good load balancing is crucial for the performance
Parallel data placement
of a parallel system
•Data placement in a parallel database system
•the response time of a set of parallel operators is
exhibits similarities with data fragmentation in
that of the longest one.
distributed databases
•Thus,minimizing the time of the longest one is
•we use the terms partitioning and partition instead
important for minimizing response time.
of horizontal fragmentation and horizontal fragment
•Balancing the load of different transactions and •We can fragment the state,the method
queries among different nodes is also essential to definitions,and the method implementation.
maximize throughput
•Furthermore,the objects in a class extent can also
•Solutions to these problems canbe obtained at the be fragmented and placed at different sites.
intra-and inter-operator levels
1.Horizontal Class Partitioning :class C for
Database clusters partitioning,we create classes C1,...,Cn, each of
which takes the instances of C that satisfy the
•a cluster can have a shared-disk or shared-nothing particular partitioning predicate.
architecture
2.Vertical Class Partitioning : Vertical fragmentation
•Shared disk requires a special interconnect that nis considerably more
provides a shared disk space to all nodes with complicated.GivenaclassC,fragmentingitverticallyinto
provision for cache consistency C1,...,Cm produces a number of classes,each of
which contains some of the attributes and some of
•Shared nothing can better support database
the methods. Thus, each of the fragments is less
autonomy without the additional cost of a special
defined than the original class.
interconnect and can scale upto very large
configurations 3.PathPartitioning:Path partitioning is a concept
describing the clustering of all the objects forming a
UNIT-5 DISTRIBUTEDDATABASES
composite object into a partition. A path partition
Fundamental object concepts and models consists of grouping the objects of all the domain
classes that correspond to all the instance variables
•An object DBMS is a system that uses an“ object” as in the subtree rooted at the composite object
the fundamental modeling,in which information is
represented in the form of objects Object Architectural issues

•all object DBMSs are built around the fundamental •The preferred architectural model for object
concept of an object DBMSs has been client/server

•An object represents a real entity in the system that •The unit of communication between the clients and
is being modeled. the server is an issue in object dbms

•It is represented as a(OID,state,interface) •Since data are shared by many clients, the
management of client cache buffers for data
•OID is the object identifier consistency becomes a serious concern

•state is some representation of the current state of •Since objects may be composite or complex,there
the object may be possibilities for prefetching component
objects when an object is requested.
•interface defines the behavior of the object
Alternative Client/Server Architectures
Object distributed design
•Two main types of client/ server architectures have
•The two important aspects of distribution design
been proposed:
are fragmentation and allocation
1.object servers
•An object is defined by its state and its methods.
2.page servers
•The distinction is partly based on the granularity of •Garbage collection is a problem that arises in object
data that are shipped between the clients and the databases due to reference-based sharing.
servers, and partly on the functionality provided to
the clients and servers. •Indeed, in many object DBMSs,the only way to
delete an object is to delete all references to it.
Object management
Object query processing
•Object management includes tasks such as object
identifier management,pointer swizzling,object •Almost all object query processors and optimizers
migration,deletion of objects tasks at the server that have been proposed to date use techniques
developed for relational systems.
•object identifier management: object
identifiers(OIDs)are system-generated and used to •Consequently,it is possible to claim that distributed
uniquely identify every object. object query processing and optimization techniques
require the extension of centralized object query
•The implementation object identifier has two processing and optimization with the distribution
common solutions,based on either physical or logical approaches(discussed earlier)
identifiers
•Objects can(and usually do)have complex
•The physical identifier(POID)approach equates the structures where by the state of an object references
OID with the physical address of the corresponding another object. Accessing such complex objects
object. The address can be a disk page address involves path expressions

•The logical identifier(LOID) approach consists of Object Oriented Data Model


allocating a system-wide unique OID per object.
•Object oriented datamodel is based upon
•PointerSwizzling: In object systems,one can objects,with different attributes.
navigate from one object to another using path
expressions that involve attributes with object-based •All these object have multiple relationships
values.These are called pointers. between them.

•Usually on disk,object identifiers(OID)are used to Objects


represent these pointers.
•The real world entities and situations are
•In memory,it is desirable to use in-memory pointers represented as objects in the Object oriented data
for navigating from one object toanother. base model.

•The process of converting a disk version of the Attributes and Method


pointer to an in-memory version of a pointer is
•Every object has certain characteristics. These are
known as“pointer-swizzling”.
represented using Attributes.The behavior of the
Distributed object storage objects is represented using Methods.

•Among the many issues related to object storage, Class


two are particularly relevant in a distributed
•Similar attributes and methods are grouped
system:object clustering and distributed garbage
together using a class.An object can be called as an
collection.
instance of the class.
•for clustering data on disk such that the I/O cost of
Inheritance
retrieving them isreduced.
•A new class can be derived from the original
class.The derived class contains attributes and
methods of the original class as well as its own.

COMPARISON OODBMS AND ORDBMS

oodbms Ordbms

•In the object oriented database,the data •Inrelational database,data is stored in the form of
tables, which contains rows and column.
Is stored in the form of objects.
•In ordbms, connections between two relations are
•In oodbms, relationships are represented by represented by foreign key attributes
references via the object identifier (OID).
•Handles comparatively simpler data.
•Handles larger and complex data than RDBMS.
•n relational database systems there are datam
•In oodbms,the data management language is anipulation languages such as SQL,
typically incorporated into a programming
languagesuch as #C++. •Stores data in entries is described as tables.

•Stores data entries are described as object.

You might also like