CENG 352 Database Management Systems
MIDTERM EXAM PRACTICE QUESTIONS
Q1. Consider the relation R(A, B, C, D, E, F, G) with functional dependencies
F={AB → C, B → F, E → G, A → DE }.
(a) Find all keys of this relation. Do not report a superkey that is not a (minimal) key.
AB+ = ABCDEFG
AB is the only key.
(b) Is R in 3NF? Justify your answer.
No it is not in 3NF.
B->F, E->G, A->DE all violate 3NF. Because B, E, A are not (super)keys and F,G,DE are not parts of a key.
Decompose R into a collection of BCNF relations and state whether the decomposition is lossless-join and
dependency preserving.
1st decomposition: R==> R1(B,F) with {B->F} and R2(A,B,C,D,E,G) with { AB → C, E → G, A → DE}
2nd decomposition R2 ==> R21(E,G) with {E->G} and R22(A,B,C,D,E) with { AB → C, A → DE}
3rd decomposition R22==> R221(A,D,E) with {A->DE} and R222(A,B,C) with { AB → C}
The resulting decomposition R1(B,F), R21(E,G), R221(A,D,E) and R222(A,B,C) is a lossless and
dependency preserving decomposition.
Q2. Consider the following relational table:
Enroll(student_id, course_no, status)
a) You want to ensure that the number of courses that a student is enrolled should not be more than 7. Define
an assertion to achieve this.
CREATE ASSERTION Q2A
CHECK (NOT EXISTS
(SELECT student_id
FROM Enroll
GROUP BY student_id
HAVING COUNT(*) > 7 ))
1
b) Define a trigger to achieve the same constraint in part a).
CREATE TRIGGER Q2B
AFTER INSERT ON Enroll
REFERENCING NEW ROW AS new
FOR EACH ROW
WHEN ( SELECT COUNT(*)
FROM Enroll
WHERE student_id = new.student_id) > 7 )
ROLLBACK;
c) Suppose there is a functional dependency such that student_id → status. Write an assertion to enforce this
functional dependency.
CREATE ASSERTION Q2C
CHECK (NOT EXISTS
(SELECT student_id
FROM Enroll
GROUP BY student_id
HAVING COUNT(DISTINCT status) > 1 ))
2
Q3. Consider the following database schema and the query:
Schema: Viewers(Id,age)
Movie(title, director)
HasSeen(viewerId, title)
Query:
SELECT M.title
FROM Movie M, HasSeen H, Viewers V
WHERE M.title= H.title AND H.viewerId=V.Id AND
V.age > 25 AND V.age < 30 AND M.director = ‘JohnDoe’;
Assume the following statistics and indices:
Viewers: 30,000 tuples, 10 tuples/page.
The viewers’ age is in range 20 to 60 years old.
Movie: 1,000 tuples, 5 tuples/page.
About 100 directors.
HasSeen: 500,000 tuples, 50 tuples/page.
Compute the selectivity (i.e. reduction factor) for each individual term in the WHERE clause.
M.title= H.title :
1/1000 (Assuming title is a foreign key in HasSeen, each HasSeen will join with exactly one Movie record;
thus T(Movie)xT(HasSeen)/1000 records will be in the result)
H.viewerId=V.Id :
1/30000 (same reasoning)
V.age > 25 AND V.age < 30 :
(30-25-1)/(60-20+1) = 4/41=0.097
M.director = ‘JohnDoe’:
1/100 (there are 100 distinct directors)
3
Q4. For each of the following questions, write your answer in the given space.
(1) Consider relations R(a, b) and S(a, c), for the following query. Would a B+ tree index on S.a help in query
processing? Briefly explain.
SELECT a FROM R, S WHERE b<10 and R.a = S.a and c>20
Answer: A B+ tree index on S.a may help the optimizer consider INL-join. If b<10 eliminates many R tuples and
V(S,a) is small then the index will be very useful.
(2) For the above SQL query, is there a unique way to write it in relational algebra? If so, give
such an expression. If not, explain why.
Answer: There is no unique way to write it in relational algebra. There are several RA expressions equivalent to
the given query. For example selection conditions can be done first and join can be done next. Projection
operation can be done early as well.
(3) For the following table, show a decomposition (into two tables) that is lossless.
A B C D
1 3 2 2
2 3 2 4
3 1 3 6
3 1 1 6
A ->B holds in this instance. We can decompose it using this functional dependency:
R1(A, B) and R2(A,C,D)
R1 R2 = A is a key for R1. So the decomposition is lossless.
4
Q5. Consider two relations R(a, b) and S(b, c) with the following statistics:
T(R) = 10,000, B(R) = 1,000 (each block contains 10 tuples),
V(R, b) = 200 (number of distinct values of attribute b in R),
T(S) = 6,000, B(S) = 1,500 (each block contains 4 tuples),
V(S, b) = 100 (number of distinct values of attribute b in S),
V(S, c) = 20 (number of distinct values of attribute c in S) and for all c values, c > 100.
Also, we assume the number of available memory blocks is M = 102.
Answer the following questions:
(i) Estimate the number of tuples in c =150(S)
6000/20 = 300 tuples
(ii) Estimate the number of tuples in R (c > 25(S)).
6000 x 10000 x 1/200
(iii) Suppose we have a clustered B+tree index available for attribute b of S. The tree has 3 levels
including the leaf level, and each leaf node contains 4 records. Assume each node of the index
occupies one block. Estimate the cost of executing the following query: (the cost is measured by disk
I/Os for accessing the index and the records).
SELECT *
FROM S
WHERE b = 100 OR b = 1000
Selectivity of each condition is 1500/100 = 15 pages
3 level Clustered index means the actual records are in the leaf level, therefore 2 internal index
levels. Traversal starting from the root of the tree requires the retrieval of 2 index pages and then 15
pages of data records. There are two selection conditions of the same cost. We add the total cost:
2 + 15 + 2 + 15 = 34 I/O s
5
Q6. Consider a relation R with attributes {A, B, C, D, E, F, G, H} and the set of functional dependencies
F = {G FD, E D, GD CE, BD A }
a) Find the minimal cover of the given set of functional dependencies F.
b) Decompose the relation into a collection of 3NF relations using the minimal cover found in part a).
c) Is your decomposition also BCNF, lossless-join and dependency preserving as well? Explain.
d) In designing a relational database schema why might we choose a non BCNF but 3NF design?
a) Minimal cover:
G->F, E->D,G->C,G->E,BD->A
b) Decomposition:
R1(G,C,E,F)
R2(E,D)
R3(A,B,D)
R0(B,G,H) (BGH is key for R)
c) 3NF decomposition results in a lossless and dependency preserving decomposition. The resulting
relations are also in BCNF.
d) It is possible that with a BCNF schema we ma not preserve some functional dependencies, in which case
we need to write triggers to check those FDs. Those triggers will be costly since they will require joining
two or more tables.
Q7. Consider the relation R(A,B,C,D,E) with multi-valued dependencies MD = { A B, AB C} and
functional dependencies FD = {A D, AB E}.
a) Find key(s) for the relation?
b) Decompose the relation into a collection of relation schemas in 4NF.
a) ABC is the key for the relation (we consider only FDs in finding key)
b) R1(A,D), R2(A,B),R3(A,C,E)