Distributed Database Design
(Chapter 5)
Top-Down Approach: The database system is
being designed from scratch.
Issues: fragmentation & allocation
Bottom-up Approach: Integration of existing
databases (Chapter 15)
Issues: Design of the export and global
schemas.
Design Consideration (1)
The organization of distributed systems can be investigated
along three dimensions:
Level of sharing
1.
No sharing: Each application and its data execute at one
site; no communication with other program or access to
any data file at other sites
2. Data sharing: Programs are replicated at all sites, but
data files are not
Data is moved to where the query is originated
3. Data + Program Sharing: Both data and programs may be
shared.
A program at one site can request for a service from
another program at another site
Design Consideration (2)
Assess Pattern
1.
Static: Access patterns do not change.
2. Dynamic: Access patterns change over time.
How dynamic the access pattern is?
Level of Knowledge
1.
No information: Designers do not have the
knowledge of the access pattern at all
2. Partial information: Access patterns may deviate
from the predictions.
3. Complete information: Access patterns can
reasonably be predicted and are not much
different from the predictions
TOP-DOWN DESIGN PROCESS
Requirements Analysis
Entity analysis
+ functional
analysis
System Requirements
(Objectives)
Conceptual
design
Global Conceptual
schema
View integration View Design
Access Pattern
information
Distribution Design
Maps the local
conceptual schemas
to the physical
storage devices.
Feedback
Defining the
interfaces for
end users
Local Conceptual Schemas
External Schemas
Involve
fragmentation &
allocation
Physical Design
Physical Schema
Observation and monitoring
Requirement Analysis
Environment of the system
Performance, reliability, availability,
expandability, and cost.
View Design
Deal with defining the end user interfaces.
Conceptual Design
Consist of
Entity Analysis
Determine entities and relationships among them.
Functional Analysis
Conceptual design can be seen as an integration
of the user views.
View integration is used to ensure that the
conceptual model support both existing and
future applications.
Functional Analysis
Process of understanding and documenting basic
business activities with which the organization is
concerned.
Function is triggered by events and can be defined as
tasks that must be carried out as a direct result of an
event.
Output contains
Frequency of use of the function
How each attributed is used in the function.
Requirement on response time, availability, how up-to-date
data should be.
Design Issues
Why fragment at all?
How should and how much should we
fragment?
A way to test correctness of the
fragmentation?
How to allocate fragments?
Necessary information for fragmentation
and allocation?
Why fragment at all?
Reasons:
Interquery concurrency
Several queries can be executed in parallel.
Intraquery concurrency
Allowing parallel execution of a single query.
Disadvantages:
Vertical fragmentation may incur overhead.
Attributes participating in a dependency may be
allocated to different sites.
Integrity checking is more costly.
If there exists non disjoint fragments more overhead
(M)
Fragmentation Alternatives
J
JNO
J1
J2
J3
J4
JNAME
Instrumental
Database Dev.
CAD/CAM
Maintenance
BUDGET
150,000
135,000
250,000
350,000
Vertical Partitioning
LOC
Montreal
New York
New York
Orlando
JNO
J1
J2
J3
J4
BUDGET
150,000
135,000
250,000
310,000
Horizontal Partitioning
JP1
JP2
JNO
J1
J2
JNAME
Instrumental
Database Dev.
BUDGET
150,000
135,000
LOC
Montreal
New York
JNO
J3
J4
JNAME
CAD/CAM
Maintenance.
BUDGET
250,000
310,000
LOC
New Yorkl
Orlando
JNO
J1
J2
J3
J4
JNAME
Instrumentation
Database Devl
CAD/CAM
Maintenance
LOC
Montreal
New York
New York
Orlando
Degree of Fragmentation
Application views are usually subsets of
relations. Hence, it is only natural to
consider subsets of relations as
distribution units.
No fragment every tuple is a fragment
No fragment every attribute is a fragment
The appropriate degree of fragmentation
is dependent on the applications.
Correctness Rules
Vertical Partitioning
Lossless decomposition
Dependency
preservation
Disjointness on the
nonprimary key
attributes
Horizontal Partitioning
Disjoint fragments
Allocation
Alternatives
Partitioning: No
replication
Partial Replication: Some
fragments are replicated.
Full Replication: Database
exists in its entirety at
each site.
Information Requirements for
Horizontal Fragmentation
Database Information
Global conceptual schema
The relationship between relations
Application Information
Database Information
Notations
S
Title SAL
L1
ENO ENAME TITLE
JNO JNAME BUDGET LOC
L2
L3
ENO JNO RESP DUR
:1-to-many relationship
Owner(L1) = S = Source relation
Member(L1) = E = Target relation
Direct Link means equi-join
Application Information
Qualitative Information
The fundamental qualificative information consists of the
predicates used in user queries.
Analyze user queries based on 80/20 rule: 20% of user
queries account for 80% of the total data access.
) One should investigate the more important queries.
Quantitative Information
Minterm Selectivity sel(mi): number of tuples that would be
accessed by a query specified according to a given minterm
predicate.
Access Frequency acc(mi): the access frequency of a given
minterm predicate in a given period.
Qualitative information guides the fragmentation activity.
Quantitative information guides the allocation activity.
Simple Predicates
Given a relation R(A1, A2, , An) where Ai has domain Di, a
simple predicate pj defined on R has the form
pj: Ai Value
where {=, <, , , >, } and Value
Di
Example:
JNO
J1
J2
J3
J4
JNAME
Instrumental
Database Dev.
CAD/CAM
Maintenance
Simple predicates:
BUDGET
150,000
135,000
250,000
350,000
LOC
Montreal
New York
New York
Orlando
p1: JNAME= Maintenance
P2: BUDGET<= 200,000
MINTERM PREDICATE
Given a set of simple predicates for relation R.
P={p1, p2, , pm}
TITLE
The set of minterm predicates
Elect. Eng.
M={m1, m2, , mn}
Syst. Analy.
is defined as
Mech. Eng.
*
M={mi | mi = PP P }
Programmer
where Pj* = p j or Pj* = p j
j
SAL
40,000
54,000
32,000
42,000
Minterm predicate is a conjunction of simple predicates
Possible simple predicates:
P1: TITLE=Elect. Eng.
P2: TITLE=Syst. Analy
P3: TITLE=Mech. Eng.
P4: TITLE=Programmer
P5: SAL<=35,000
P6: SAL > 35,000
Some corresponding
minterm predicates:
m1 : TITLE =" Elect.Eng." SAL 35,000
m 2 : TITLE " Elect.Eng" SAL > 35,000
Horizontal Fragmentation
Primary Horizontal Fragmentation
Derived Horizontal Fragmentation
Primary Horizontal Fragmentation
A primary horizontal fragmentation of a relation is defined
by a selection operation on the relation
ENO ENAME TITLE
JNO JNAME BUDGET LOC
L2
ENO JNO RESP DUR
L3
Owner(L3) = J
Member(L3)=G
A possible fragmentation of J is defined as follows:
JP1 = BUDGET 200 , 000 ( J )
JP 2 = BUDGET > 200 , 000 ( J )
Horizontal Fragments
Thus, a horizontal fragment Ri of relation R consists
of all the tuples of R that satisfy a minterm
predicate mi.
There are as many horizontal fragments (also called
minterm fragments) as there are minterm
predicates.
Which minterm predicates should we use?
We have to decide on the set of simple
predicates that are the basis for the minterm
predicates.
10
Desirable properties of the set
of simple predicates
The set should be complete and
minimal.
Informally, the set should include only
predicates with attributes and conditions
that are used in the applications
Completeness (1)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any tuple belonging to any minterm fragment that is
defined according to Pr.
JP1 = LOC = " MONTREAL" ( J )
JP 2 = LOC = " NewYork " ( J )
Case 1: The only application that accesses
J wants to access the tuples according to
the location (any location).
JP3 = LOC = " Orlando" ( J )
The set of simple predicates
LOC=Montreal
JP1
LOC=New York
JP2
LOC=Orlando
JP3
Pr= LOC=Montreal,
LOC=New York,
LOC=Orlando
is complete because each tuple of each
fragment has the same probability of
being accessed.
11
Example:
JP1
JP2
JP3
Completeness (2)
JNO
J1
JNAME
Instrumental
BUDGET
150,000
LOC
Montreal
JNO
J2
J3
JNAME
GUI
CAD/CAM
BUDGET LOC
135,000 New York
250,000 New York
JNO
J4
JNAME
Database Dev.
BUDGET LOC
310,000 Orlando
Note: Completeness is a
desirable property because a
complete set defines
fragments that are not only
logically uniform in that they
all satisfy the minterm
predicate, but statistically
homogeneous.
Case 2: There is a second application which accesses only those
project tuples where the budget is less than $200,000.
The previous Pr is not complete since some tuple in JPi has higher
access probability
For example, tuple J2 has higher access probability than
tuple J3 in JP2 (2 applications access J2, but one application
access J3)
To make the set complete, we need to add
(BUDGET<= 200,000, BUDGET>200,000) to Pr.
Example
The only application that accesses J wants to
access projects located in New York.
The set of simple predicates
Pr= {LOC=Montreal, LOC=New York, Loc!=New
York, LOC=Orlando}
Is this set Pr complete?
12
The only application that accesses J wants to access
project located in New York.
JP1
JP2
JP3
JNO
J1
JNAME
Instrumental
BUDGET
150,000
LOC
Montreal
JNO
J2
J3
JNAME
GUI
CAD/CAM
BUDGET LOC
135,000 New York
250,000 New York
JNO
J4
JNAME
Database Dev.
BUDGET LOC
310,000 Orlando
Is Loc=Orlando relevant? No
Is Loc=New York relevant? Yes
Is Loc=Montreal relevant? No
Minimality
Minimal
If all the predicates of a set Pr are relevant, Pr is
minimal.
That is, there should be at least one
application that accesses fragments fi and
fj differently.
i.e., The simple predicate pi should be
relevant in determining a fragmentation.
13
Relevant
Let mi and mj be two minterm predicates that are
identical in their definition, except that mi contains
the simple predicate pi in its natural-form while mj
contains p i .
Also, let fi and fj be two fragments defined according to mi and mj,
respectively. Then pi is relevant if and only if
acc(mi ) acc(mj )
card ( fi ) card ( fj )
Access frequency
Card(fi): Number of tuples in fi
A Complete and Minimal Example
Two applications:
1.
One application accesses the tuples according to the
location.
2. Another application accesses only those project tuples
where the budget is at most $200,000.
Case 1: Pr={Loc=Montreal, Loc=New York, Loc=Orlando,
BUDGET<=200,000,BUDGET>200,000}
is complete and minimal?
Case 2: If, however, we were to add the predicate JNAME=
Instrumentation to Pr, the resulting set would not be
minimal since the new predicate is not relevant with
respect to the applications.
14
(A)
Case 1: Pr={Loc=Montreal, Loc=New York,
Loc=Orlando, BUDGET<=200,000,BUDGET>200,000}
J
JNO
J1
J2
J3
J4
JNAME
Instrumental
Database Dev.
CAD/CAM
Maintenance
BUDGET
150,000
135,000
250,000
350,000
LOC
Montreal
New York
New York
Orlando
Pr is complete from the previous example. Is Pr
minimal?
Loc=Orlando
J11
Budget <=200,000
J1
Loc!=Orlando
J12
J
Budget >200,000
Case 2
Loc=Orlando is relevant.
BUDGET<=200,000
LOC=Montreal
J1
J2
LOC=New York
J2
LOC=Orlando
J3
JNAME = Instrument
J11
J121
J12
J122
BUDGET>200,000
JNAME Instrument
BUDGET<=200,000
J21
J22
BUDGET>200,000
[ JNAME = Instrument ]
is not relevant.
BUDGET<=200,000
J31
J32
BUDGET>200,000
Relevant
irrelevant
15
Determine the set of meaningful minterm predicates
Application: Take the salary and determine a raise accordingly.
Fragmentation: The employee records are managed in two places, one handling
the records of those with salary less than or equal to $30,000 and the other
handling the records of those who earn more than $30,000.
Pr={SAL<=30,000,SAL>30,000} is complete and minimal.
The minterm predicates:
m1 : ( SAL 30,000) ( SAL > 30,000)
m 2 : ( SAL 30,000) ( SAL > 30,000)
m3 : ( SAL 30,000) ( SAL > 30,000)
m 4 : ( SAL 30,000) ( SAL > 30,000)
i1 m1 is contradictory
i 2 m 4 is contradictory
Therefore, we are left with
M = {m2, m3}
Implications:
i1 : ( SAL 30,000) ( SAL > 30,000)
i 2 : ( SAL 30,000) (SAL > 30,000)
i 3 : (SAL > 30,000) ( SAL 30,000)
i 4 : ( SAL > 30,000) (SAL 30,000)
Invalid Implications
J
JNO
J1
J2
J3
J4
Simple predicates
p1: LOC=Montreal
p2: LOC=New York
p3: LOC=Orlando
p4: BUDGET<=200,000
p5: BUDGET>200,000
JNAME
Instrumental
Database Dev.
CAD/CAM
Maintenance
BUDGET
150,000
135,000
250,000
350,000
VALID Implications
i 1 : p1 p 2 p 3
i 2 : p 2 p1 p 3
i 3 : p 3 p1 p 2
i 4 : p 4 p 5
i 5 : p 5 p 4
i 6 : p 4 p 5
i 7 : p 5 p 4
LOC
Montreal
New York
New York
Orlando
INVALID Implications
i 8 : LOC =" Montreal" ( BUDGET > 200,000)
i 9 : LOC =" Orlando" ( BUDGET 200,000)
Implications should be
defined according to the
semantics of the database,
not according to the
current values.
16
Algorithms
Rule 1: fundamental rule of completeness and minimality, which states that a
relation or fragment is partitioned into at least two parts which are accessed
differently by at least one application.
fi of Pr: fragment fi defined according to a minterm predicate defined over
the simple predicates of Pr.
Derived Horizontal Fragmentation
Derived fragmentation is used to facilitate the join between
fragments.
In some cases, the horizontal fragmentation of a relation
cannot be based on a property of its own attributes, but is
derived from the horizontal fragmentation of another
relation.
17
Benefits of Derived Fragmentation
(M)
Primary Fragmentation:
PAY (TITLE, SAL)
PAY 1 = SAL 30 , 000( PAY )
EMP (ENO, ENAME, TITLE)
PAY 2 = ( SAL > 30 , 000)( PAY )
Using Derived Fragmentation:
EMP1 = EMP SJ PAY1
EMP2 = EMP SJ PAY2
Simple Join Graph between EMP and PAY
EMP1
PAY1
EMP2
PAY2
EMPi and PAYi can be allocated
to the same site.
Not using derived fragmentation: one can divide EMP into EMP1
and EMP2 based on TITLE and divide PAY into PAY1, PAY2, PAY3
based on SAL. To join EMP and PAY, we have the following
scenarios.
PAY1
PAY2
EMP1
EMP2
More communication overhead !
PAY3
(M)
Derived Fragmentation
EMP (ENO, ENAME, TITLE)
PROJ (PNO, PNAME, BUDGET)
EMP_PROJ (ENO, PNO, RESP, DUR)
How do we fragment EMP_PROJ ?
Semi-Join with EMP, or
Semi-Join with PROJ
Criterion: Support the more frequent join
operation.
18
(M)
Star Relationships
S (SNUM, )
P (PNUM, )
SPJ (SNUM, PNUM, JNUM, )
Design the primary
horizontal fragmentation
for SPJ.
Derive the derived
fragmentation designs for
S, P, and J accordingly.
J (JNUM, )
Si = S SJSNUM SPJi
Pi = P SJPNUM SPJi
Ji = J SJSNUM SPJi
Exception Case: Primary
fragmentation on member
relation.
How does the join graph looks like when joining all the relations?
Si
Pi
Ji SPJi
(M)
Chain Relationships
R1 (R1PK, )
R2 (R2PK, R2FK, )
R3 (R3PK, R3FK, )
...
Design the primary fragmentation
for R1 into M fragments.
Derive the derived fragmentation
for Rk as follows:
Rki = Rk SJRkFK=R(k-1)PK R(k-1)i
for 2 k N in that order
for all 1 i M.
RN(RNPK, RNFK,)
19