DDBS
Lecture 9 D-DBMS
Horizontal Fragmentation
Partitions a table along its tuples
is performed based on some Predicate/ Condition
(On the basis of some attribute)
Primary Horizontal Fragmentation (PHF)
Derived Horizontal Fragmentation (DHF)
Primary Horizontal Fragmentation (PHF)
Information Requirements types
Database Information: We may need to consult the
conceptual DB design
RDM being Semantically not that rich
Apart from tables, we need relationships, cardinality and
the owner and member tables
PAY title, sal owner = PAY
member = EMP
EMP eNo, Name, title jNo, jName, budget, loc PROJ
ASIGN eNo, jNo, resp, dur
Application Requirement
Information about the database on which the database
program will run and this information was received from
differnet user groups.
Major predicates used in the user quries.
Given a relation R(A1, A2, …., An)
Where Ai is attribute with domain Dj, then a simple predicate Pj has
the form
pj : Ai 𝜃 Value,
Where 𝜃 ∈ {=, <, >,} and
Where 𝜃 ∈ Di
e.g. ProjName=“Housing”, ProjAmount>200,000
Minterm predicates
given a set of simple predicates
Pr = {pr1, pr2, …, pm}for a relation R, the set
of minterm predicates M is defined as
M = {m1, m2, ……, mz}
Z is denoting number of minterm predicates in the
set.
M = {mj | mj = p*k}, 1 ≤ k ≤ m, 1 ≤ j ≤ z
mj denotes an individual minterm predicates
Conjunction or AND operator
p ∈ Pr
k
where p*k = pk or p*k = (pk)
PAY(title, sal)
p1: title = “El Eng”
p2: title = “Sys Ana”
p3: title = “Mech Eng”
p4: title = “Prog”
p5: SAL ≤ 20,000
p6: SAL > 20000
Define From user groups
Minterm Predicates
m1: title = “El Eng” SAL ≤ 20,000
m2: title = “Sys Ana” SAL > 20,000
m3: title = “Sys Ana” SAL !< 20,000
m4: title = “Sys Ana” SAL !> 20,000
PHF-Requirements
Minterm selectivities sel(mi): number of tuples of the
relation that would be accessed by a user query
involving mi.
Like, sel(m1) = 0
A tentative figure.
How much rows will be selected…… not need an
accurate figure.
By the user groups. Not by the designer.
Access frequencies acc(qi): frequency with which user
applications access data. If Q(q1, q2, …, qn} is set
of user queries, then acc(qi) determines the access
frequency of qi within a given period/time, like
acc(q2) = 24
A PHF is defined by a selection operation on the
relations of the database schema, given a relation
R, its PHF are given by Ri = Fi (R), 1 ≤ i
≤w
Where Fj is a selection formula, which is a minterm
predicate.
Horizontal Fragment
A horizontal fragment, Ri of relation R consists of all the
tuples of R which satisfy minterm predicate m
Given a minterm of predicates M, there are as many
horizontal fragments of relation R as there are minterm
predicates.
So the primary horizontal fragments are also called minterm
fragments.
tbl_PAY
Title Sal
Elect. Eng 40000
Sys Analyst 34000
Mech. Eng 27000
Programmer 24000
PAY1 = σ sal ≤ 30000 (PAY)
PAY2 = σ sal > 30000 (PAY)
Providing the basis of the fragmentation
PAY1 and PAY2 are different user groups demand.
Fragmenting the same table using whos attribute is
using in the condition. PAY table fragmentation.
PAY2
Title Sal
Elect. Eng 40000
Sys Analyst 34000
PAY1
Title Sal
Mech. Eng 27000
Programmer 24000
Things to note
◼ Inprevious example table, salary was integer
so fit in fragments easily
◼ Records in fragments should be balanced
according to the user requirements
Completeness
A set of simple predicates Pr is said to be complete if
the accesses to the tuples of the minterm fragments
defined on Pr , requires that two tuples of the same
minterm fragment have the same probability of
being accessed by the application.
PROJ
projNo projName projBudget projLoc
ProjA Instrumentation 3.5Million Lahore
ProjB Database Dev. 2.3Million Rawalpindi
ProjC CAD/CAM 1.9Million Rawalpindi
ProjD Maintenance 1.6Million Peshawar
Two applications says;
Find the budgets of projects at each
location. (1)
◼Condition on location
Find projects with budgets less than
2M. (2)
◼Condition on budget
According to (1),
Pr = {projLoc=“Lahore”, projLoc =“Rawalpindi”, projLoc
=“Peshawar”}
Which is not complete with respect to (2).
(No Lahore Records)
Now three fragments on basis of location. With first
application its fine but with second application is not
complete.
Modify
Pr = {projLoc =“Lahore”, projLoc =“Rawalpindi”,
projLoc =“Peshawar”,
projBudget<=200,000, projBudget >200,000}
Which is complete.
PHF- Minimality of Pr
Need to know about Relevant Predicate
Relevant Predicate
A relevant predicate is the one if it influences how
fragmentation is performed (fragments f into fi and
fj)
then there should be at least one application that
accesses fi and fj differently.
Now if all predicates in a set Pr are relevant then the
set is minimal
PHF-COM-MIN Algorithm
Given: a relation R and a set of simple
predicates Pr. (Related to R)
Output of Algorithm: a complete and minimal
set of simple predicates P'r for Pr.
P'r= Pr Prime (Excluding Irrelevant Predicates)
Rule For com-min algorithm
Rule 1: a relation or fragment is partitioned
into at least two parts which are accessed
differently by at least one application.
Steps of com-min algo
1-Initialization:
Find a pi ∈ Pr such that pi
partitions R according to Rule 1
Pr' ← pi
Pr'= Pr Prime (Excluding Irrelevant Predicates)
Pr ← Pr – pi
2- Iteratively add predicates to Pr' until it is complete, find a pj ∈
Pr such that pj partitions R according to Rule 1
set Pr' = Pr' U pi ;
Pr = Pr – pi ;
A predicate add to Pr' or not but its compulsory to come out of a
simple predicate from Pr. It will go to the Pr' on the condition
that if it is relevant. And if its not relevant then it will come out
of Pr and not included in Pr' . In this way irrelevant predicate
automatically discarded.
3- if pk in Pr' is non-relevant then
Pr' = Pr' – pk
Again a check for irrelevant predicate and picked out from Pr' .
Primary Horizontal Partitioning Algorithm
Makes use of COM_MIN to perform fragmentation
Input: a relation R and a set of simple predicates Pr
Output: a set of minterm predicates M according to
which relation R is to be fragmented
Pr' ← COM_MIN (R, Pr)
(by calling com-min algorithm and passing com-min,
R and Pr)
determine the set M of minterm predicates
Implications
determine the set I of implications among pi Pr
Implications:If you have one simple predicate then it
imply (coming out) certain things which is mathematically
proven not heuristic. So with these implications some
minterm predicates will be conflicting/contradicting now;
eliminate the contradictory minterms from M
The remaning minterms predicate will be the basis of
Primary Horizontal Fragmentation.
1 Find the name and budget of projects given
their no. issued at three sites
projA : projLoc = "Lahore”
projB : projLoc = “Rawalpindi"
projC : projLoc = "Peshawar“
2 Access project information according to
budget one site accesses ≤ 200,000 other
accesses >200,000
projD : projBudget ≤ 200000
projE : projBudget > 200000
Pr = Pr' = {p1 ,p2 ,p3 ,p4 ,p5 }
Passing through com-min we find all included and secondly all are
relevant. That each simple predicate is that on its basis any of the
application is accessing the record.
Implications
p1 p2 p3 ( implies)
p4 p5
If the budget of some project is less than 2million then it will not
greater then 2million therefore in a minterm predicate p4 and p5
can not be together.
From predicates p1 to p6 in Pr', there may be
so many minterm predicates, like,
p1 ^ p2 ^ p3 ^ p4 ^ p5
Excluding the contradicting minterm predicates
We got;
m1 : (projLoc = "Lahore") ^ (projBudget ≤ 2M)
m2 : (projLoc = "Lahore") ^ (projBudget > 2M)
m3 : (projLoc = " Rawalpindi ") ^ (projBudget ≤ 2M)
m4 : (projLoc = "Rawalpindi") ^ (projBudget > 2M)
m5 : (projLoc = "Peshawar") ^ (projBudget ≤ 2M)
m6 : (projLoc = "Peshawar") ^ (projBudget > 2M)
Implications must be based on the database
semantics not on a particular extension of the
database
Fragments must not be discarded on the basis that it
does not have a record.
PROJ
projNo projName projBudget projLoc
ProjA Instrumentation 3.5Million Lahore
ProjB Database Dev. 2.3Million Rawalpindi
ProjC CAD/CAM 1.9Million Rawalpindi
ProjD Maintenance 1.6Million Peshawar
Fragments are;
projNo projName projBudget projLoc
ProjA Instrumentation 3.5Million Lahore
ProjB Database Dev. 2.3Million Rawalpindi
ProjC CAD/CAM 1.9Million Rawalpindi
ProjD Maintenance 1.6Million Peshawar
Derived Horizontal Fragmentation (DHF)
Defined on a member relation of a link according to a
selection operation specified on its owner
Extended Relational Diagram
(Relational Data model and Conceptual Database
Design)
Owner Member Concept (no many to many cardinality)
Fragment is on Member but Predicate will be defined
on Owner.
Two Main Requirements
Each link is an equi-join.
◼ The two tables involved have the established link through
primary – foreign key
◼ And the owner table’s primary key is using in member table as
foreign key and the tuples of both tables linked where the
value of that common attribute is same.
◼ Equijoin can be implemented by means of semi-joins
semi-joins: the value of common attribute is same but it
takes projection on one of the tables means natural join
and projection on one table.
So we are interested in defining the partitions of
member based on fragmentation of its owner, but
want to see attributes only from member
Theoretically defined fragmentation of Owner then
take the natural join of fragments with members but
taking only the attributes of the members (Semi join
with members)
R is member and S is owner
Take a semi join of R with Si
Ri = R ⋉ Si, 1≤ i ≤ w
where w is the maximum number of fragments that will
be defined on R and
Si = Fi (S), where Fi is formula for PHF on S
PAY title, sal
EMP eNo, Name, title jNo, jName, budget, loc PROJ
ASIGN eNo, jNo, resp, dur
Considering the link L1 above:
owner (L1) = PAY member (L1) = EMP
We want to group employees on the basis of
their salaries one with salary less than or equal
to 30,000/- and other more than 30,000/-
EMP
eNo eName title
E1 Tasswar Elec Eng
E2 Waqas Sys Ana
E3 Rehan Mech Eng
E4 Kashif Programme
E5 Faisal Sys Ana
E6 Aslam Elec Eng
E7 Saleem Mech Eng
E8 Munawwar Sys Ana
PAY
Title Sal
Elect. Eng 40000
Sys Analyst 34000
Mech. Eng 27000
Programmer 24000
PAY1 = sal<=30 000 (PAY)
PAY2 = sal<30 000 (PAY)
EMP1 = EMP ⋉ PAY1
EMP2 = EMP ⋉ PAY2
Two Primary horizontal fragmentation of the pay table based
on the condition on the salary attribute
Now we take the semi join of EMP with PAY1 and PAY2 then on
the basis of title (common attribute) fragment will be created
PAY1
eNo eName title
E1 Tassawar Elec Eng
E2 Waqas Sys Ana
E5 Faisal Sys Ana
E6 Aslam Elec Eng
E8 Munawwar Sys Ana
E3 Rehan Mech Eng
E4 Kashif Programme
E7 Saleem Mech Eng
PAY2
E3 Rehan Mech Eng
E4 Kashif Programme
E7 Saleem Mech Eng