0% found this document useful (0 votes)

17 views6 pages

Solution 02

The document outlines Exercise 2 for a Database Systems course, focusing on index tuning and implementation. It includes specific queries to optimize database performance through index creation, cost analysis of index usage, and the implementation of a dense index for string queries. The exercise emphasizes understanding various indexing strategies and their impact on query efficiency.

Uploaded by

deyik21439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Solution 02

Uploaded by

deyik21439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Database Systems WS 2024/25

Prof. Dr.-Ing. Sebastian Michel

M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

For the following questions, we will consider the uni db schema imported in the previous sheet

Question 1: Index tuning (1 P.)

All relevant primary indices and foreign keys are already created (e.g.: assistants.boss is a foreign key
referencing professors.PID).
As the system seems to be very unresponsive lately, the president of the university tasked you with
improving the performance of the system. An analysis of the query load showed that the following values
are queried regularly:

• Q1: A list of all participants (matrnr) of a specific exam, filtered by semester (e.g.: WHERE semester
< 4).
• Q2: A histogram, showing how many assistants are working in which field, given a specific boss.
• Q3: The same histogram as before for the whole university.

• Q4: All exams of a given professor with a specific grade.

• Q5: An overview over all grades of a given student.
• Q6: The names of all lectures without prerequisites.

Which indices will you create to improve the performance of the given queries? For each index discuss
why you created it. Hint: Maybe some indices can be used for multiple queries. Which of your created
indices could benefit from being a hash index, instead of a B+ tree?

1
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

Solution

• Q1: We create an B+ index on the students.semester, as range queries are possible. The join
between exam and student does not require a further index.
• Q2: A B+ index on (assistants.boss, assistants.field). This enables very fast creation of
the histogram.

• Q3: The previous index can’t be used anymore. We build a new B+ index on assistants.field.
• Q4: We create an index on (exam.PID, exam.grade). This index can be a B+- or hash-index, as
both columns have one specific value.
• Q5: We do not have to create an index, as the primary key of exam can already be used. Alterna-
tively, creating an additional B+- or hash-index on exam.matrnr would also be possible.
• Q6: We have to create either a B+- or hash index on prerequisites.lecture, as the primary key
is ordered with the required lecture first.

Question 2: Index Costs (1 P.)

Given the following values for the lectures table: Lets assume, there are N = 350 000 lectures stored in
this relation. Each is numbered sequentially between 1 and N . Each page on the disk can store exactly
10 lecture tuples. A B+ tree was created as primary index. It has a height of h and the leaves contain 20
entries, together with a pointer to a data page.

a) How many pages have to be accessed to answer the query SELECT * FROM lecture WHERE LID
BETWEEN 30 000 AND 40 000? (Hint: Keep the classification of indices from the lecture in mind)

Solution
A clustered sparse primary index has one entry per page. The rows which are physically stored on
the disk follow the same order as the index.
First, we search the key 30 000 within the B+ tree. This requires h page accesses. From there on we
need to consider the next 10 001 tuples.
These tuples are distributed over d10 001/10e = 1 001 pages.
Since the index is clustered, the total page accesses will be h + 1 001.

2
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

b) After analyzing the query load, the database administrator notices that many queries select lectures
held by the same professor. He decides to cluster the table lectures by the heldby column. An
additional index is created on the primary key LID. How would the lecture classify this index? How
many page accesses would the query from question a) require now?

Solution
The index is now a dense secondary index, as the file is clustered by another attribute. We still have
to find the key 30 000, which once again requires h page accesses. From there we, again, iterate over
the leaves to receive the next 10 001 tuples. But now, each tuple has its own index entry. Thereby,
we have to read d10 001/20e = 501 index pages which point to 10 001 data pages. In total we access
h + 501 + 10 001 pages.
Note: Not necessarily will each lecture tuple be stored in a different page. In reality, we may access
less than 10 001 data pages. But we can not guarantee that multiple result-tuples are stored in the
same page, or that this page is still contained in the DB buffer when we require it again. For the
purpose of this lecture we calculate the worst-case costs.

Question 3: Composite Index (1 P.)

The lectures table contains 1 000 lecture tuples, of which exactly 4 tuples fit into one page. Each lecture
has a (uniformly distributed) SWS value between 1 and 20. There are 50 professors, each holding the
same amount of lectures. The lectures table has two secondary composite-key indices on (sws,heldby)
and (heldby,sws). Both indices are B+ trees with a height of h and each leaf can store 5 references to
pages.

• Which index requires less page accesses for the query SELECT * FROM lectures WHERE heldby =
5 and sws <= 10? Please calculate the number of expected page accesses for both indices.

Solution
The selectivities of both predicates are f (σheldby=5 ) = 1/50, f (σsws<=10 ) = 10/20 = 1/2. The index
accesses for both composite indices are sketched in the following figure.

3
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

(sws, heldby)

h
Index

(20,50)
(1,50)

(5,20)

(10,5)

(10,6)
... ... ... ... ...
(1,1)

(1,5)

(2,5)

... ... 100 visited leaves

Data Entries

Data Pages ... Max. 10 visited data pages

(heldby, sws)

h
Index
(50,20)
(5,10)

(5,11)

(10,5)

(18,6)

... ... ...

(1,1)

(5,1)

(5,5)
(5,6)

... ... 2 visited leaves

...

Data Entries

Data Pages ... Max. 10 visited data pages

Index (sws,heldby) requires h page accesses to navigate to the leaf node containing the first
matching entry. From there on we have to iterate all remaining entry pages until the last element, as
the result tuples are distributed over a large number of leaves. We have to iterate (1 000·10/20·1/5 ≈
100) − 1 = 99 leaves to reach the last matching tuple. As the index allows us to only fetch pages
containing relevant tuples, we require at most 1 000 · 10/20 · 1/50 ≈ 10 data page accesses, if no
result tuple shares a page. In total, this index requires at most h + 99 + 10 page accesses.

Index (heldby,sws) also requires h page accesses to navigate to the leaf node containing the first
entry smaller or equal than (5, 10). But this time the index clusters the result tuples right next to
each other. Hence, we only have to read (1 000 · 1/50 · 1/5 · 1/2 ≈ 2) − 1 = 1 leaf node. As before, the
result tuples can be distributed over up to 10 pages. In total, this index requires at most h + 1 + 10
page accesses.

Aternative solution for (sws,heldby):

If the system estimates that only a few pages have to be accessed and the overhead of random
reads is still less than sequentially reading a range of index leaves, we could find all data entries by
repeatedly traversing the index from the root to the data page as shown below. This would require
h ∗ 10 + 10 random accesses, which depending on the height of the tree and the access times could
be slower than the previous solution.

4
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

(sws, heldby)

...
10 * h
Index

(20,50)
(1,50)

(5,20)

(10,5)

(10,6)
... ... ... ... ...
(1,1)

(1,5)

(2,5)

... ... 10 visited leaves

Data Entries

Data Pages ... Max. 10 visited data pages

5
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 2: Handout 28.10.2024, Due 04.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #008

Question 4: Index Implementation (1 P.)

This question requires you to implement code in any programming language you want. The code has to
compile and return the correct result. In OLAT, we provide a Java template with most of the boilerplate
code already in place. This template checks your result for correctness and times the execution. Do not
change anything else than the specified parts of the code.
Submit the code as a separate file (or archive, if you have more than one source file). If you use a different
language than Java, provide instructions on how to compile and run your code. Also, your code should
then include the same checks as the template Java main method.
N
Given a dataset of N tuples, consisting of a unique ID and a string value with |S| = 10 distinct strings.
Initially the dataset is ordered by the ID and each tuple is assigned a random string from S like the
following:

(1, ABC), (2, BCD), (3, ABC), (4, XYZ), (5, XYX), (6, BCD), (7, XYZ)

The system should be able to answer the following queries:

• Return all tuples for which the string is equal to a given query string q.
• Return all tuples for which the string is lexicographically equal or greater to a given query string q.

Your task is to:

a) Implement a default access method, which iterates all tuples and returns the correct results. (Al-
ready given, if one uses the Java template code)
b) Implement a dense index as introduced in the lecture.

If it is not possible for your index implementation to perform one of these operations, then use the default
execution method and state why it is not possible.
Execute your code a few times with different query strings (For the java template, just execute it mul-
tiple times). Describe how the index creation time and query times for both operations differ in your
implementations. You may change the values of N and S to check the effects on your implementation.

Solution
One possible solution for the dense index is to use a hash index.
Pseudo code dense index creation:

index = {}
for t in tuples:
index[t.string] += [t.id]

Pseudo code dense index get equal string:

return index[query_string]

With the hash index, no “greater than” operation can be efficiently implemented. One could iterate all
keys, check if they are greater than the given value and add the corresponding tuple(s) to the list. But
this can be slower than iterating all tuples.

Sheet 02
No ratings yet
Sheet 02
3 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Databases II Midterm Solution
No ratings yet
Databases II Midterm Solution
15 pages
Dbms Lab Manual 10CSL58
100% (2)
Dbms Lab Manual 10CSL58
34 pages
DBMS Assignment 8
No ratings yet
DBMS Assignment 8
6 pages
Database Application Lab Manual V-Sem Cse: Thirthe Gowda MT
No ratings yet
Database Application Lab Manual V-Sem Cse: Thirthe Gowda MT
64 pages
MidtermPracticeQuestions Solutions
No ratings yet
MidtermPracticeQuestions Solutions
6 pages
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
No ratings yet
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
29 pages
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
No ratings yet
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
5 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Lecture 12 Database - Systems
No ratings yet
Lecture 12 Database - Systems
52 pages
B.Tech Database Systems Exam
No ratings yet
B.Tech Database Systems Exam
3 pages
Assignment 6 DBMS January 2024
No ratings yet
Assignment 6 DBMS January 2024
10 pages
Database Management System Lab
No ratings yet
Database Management System Lab
12 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Database Exam for CS Students
100% (1)
Database Exam for CS Students
19 pages
DBMS - Lab
No ratings yet
DBMS - Lab
38 pages
SE3060 - Database Systems
No ratings yet
SE3060 - Database Systems
6 pages
Lecture 5 - Indexes 2 - Template
No ratings yet
Lecture 5 - Indexes 2 - Template
10 pages
DBM S Manual Final
No ratings yet
DBM S Manual Final
51 pages
Lec 13
No ratings yet
Lec 13
26 pages
Take Assessment: Exercise 6: Index Choice and Query Optimization
No ratings yet
Take Assessment: Exercise 6: Index Choice and Query Optimization
7 pages
Course Title:: Course Code: 10ISL57 Credits (L:T:P) : 0:1:1 Core/ Elective: Core Type of Course: Tutorials, Practicals Total Contact Hours: 42 Hrs
No ratings yet
Course Title:: Course Code: 10ISL57 Credits (L:T:P) : 0:1:1 Core/ Elective: Core Type of Course: Tutorials, Practicals Total Contact Hours: 42 Hrs
5 pages
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
No ratings yet
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
3 pages
Module 5 6 7 8
No ratings yet
Module 5 6 7 8
116 pages
QEII
No ratings yet
QEII
44 pages
Quiz 02-B - Solution
No ratings yet
Quiz 02-B - Solution
2 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Access Path Selection in A Relation Database Management System
No ratings yet
Access Path Selection in A Relation Database Management System
13 pages
13 QP1
No ratings yet
13 QP1
33 pages
Database Design Exam Questions
100% (1)
Database Design Exam Questions
2 pages
CS186 Database Systems Exam
No ratings yet
CS186 Database Systems Exam
14 pages
Database Lab Exercises
No ratings yet
Database Lab Exercises
35 pages
Data Structure Manuals
No ratings yet
Data Structure Manuals
14 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
Chap12 Practice Key
No ratings yet
Chap12 Practice Key
3 pages
Database Systems Exam Guide
No ratings yet
Database Systems Exam Guide
4 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Dbms Lab Manuals
No ratings yet
Dbms Lab Manuals
13 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Final 15
No ratings yet
Final 15
7 pages
1045uf Cs Abcd Dbms
No ratings yet
1045uf Cs Abcd Dbms
7 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
Fundamentals of Database Systems: Assignment: 4 Due Date: 28th August, 2017
No ratings yet
Fundamentals of Database Systems: Assignment: 4 Due Date: 28th August, 2017
5 pages
Assignment2 PDF
No ratings yet
Assignment2 PDF
8 pages
Dbms Lab Manual 2015-16
No ratings yet
Dbms Lab Manual 2015-16
50 pages
CMSPCOR02T Final Question Paper 2022
No ratings yet
CMSPCOR02T Final Question Paper 2022
2 pages
Cycle 1
No ratings yet
Cycle 1
5 pages
Data Management - AA 2015/16 - Exam of 08/1/2016
No ratings yet
Data Management - AA 2015/16 - Exam of 08/1/2016
3 pages
DBMS Spring 2021 - Final
No ratings yet
DBMS Spring 2021 - Final
5 pages
Dbms All 8 Assignments
No ratings yet
Dbms All 8 Assignments
33 pages
Database Index PDF
No ratings yet
Database Index PDF
6 pages
Assignment On Database Indexing
No ratings yet
Assignment On Database Indexing
3 pages
SQL Query Practice for Students
No ratings yet
SQL Query Practice for Students
11 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
END213E Lecturenotes Week7
No ratings yet
END213E Lecturenotes Week7
84 pages
Ii Cse CS3492 QB Int4
No ratings yet
Ii Cse CS3492 QB Int4
4 pages
Computer Science Resume
100% (1)
Computer Science Resume
6 pages
Real-time Face Recognition with Python
No ratings yet
Real-time Face Recognition with Python
6 pages
P4M8P-M7 V2x BIOS 0129
No ratings yet
P4M8P-M7 V2x BIOS 0129
41 pages
HDFC Bank Statement 09-08-2022
No ratings yet
HDFC Bank Statement 09-08-2022
5 pages
Supra and Hackathon - Design Report
No ratings yet
Supra and Hackathon - Design Report
29 pages
SAP Sales & Distribution Guide
100% (2)
SAP Sales & Distribution Guide
2 pages
Brochure E&A Solutions
No ratings yet
Brochure E&A Solutions
9 pages
607551-Namulonge B Revised Solar Yongeza
No ratings yet
607551-Namulonge B Revised Solar Yongeza
17 pages
Getting Started With NDI
No ratings yet
Getting Started With NDI
36 pages
Fire Safety Standards for Marine Vessels
No ratings yet
Fire Safety Standards for Marine Vessels
4 pages
Type II Propeller Synchrophaser
No ratings yet
Type II Propeller Synchrophaser
4 pages
Chinon Ce 4 Text Images
No ratings yet
Chinon Ce 4 Text Images
48 pages
Official Transcript: Student - Records@boston - Co.za
No ratings yet
Official Transcript: Student - Records@boston - Co.za
1 page
Chairs
No ratings yet
Chairs
1 page
Intel PC Emulator Setup Guide
No ratings yet
Intel PC Emulator Setup Guide
5 pages
Computer Systems Servicing: Self-Learning Module 3
No ratings yet
Computer Systems Servicing: Self-Learning Module 3
8 pages
Razer Gold Gift Card - Google Search
No ratings yet
Razer Gold Gift Card - Google Search
1 page
From Zero To Hero - How To Start Your Python Programming Journey
No ratings yet
From Zero To Hero - How To Start Your Python Programming Journey
3 pages
CV - Rakib Ahmed Shawon
No ratings yet
CV - Rakib Ahmed Shawon
2 pages
Conference Schedule - Cyber Security in Telecoms 27022024161610
No ratings yet
Conference Schedule - Cyber Security in Telecoms 27022024161610
1 page
Annual Report - 2023
No ratings yet
Annual Report - 2023
123 pages
Sysmax XP100 Cell Counter
No ratings yet
Sysmax XP100 Cell Counter
2 pages
EE Abbreviations
No ratings yet
EE Abbreviations
49 pages
Assessment 1 Magil
No ratings yet
Assessment 1 Magil
12 pages
Maaggear Cpu
No ratings yet
Maaggear Cpu
8 pages
UNIT 1 Mobile and Pervasive Computing Notes
No ratings yet
UNIT 1 Mobile and Pervasive Computing Notes
23 pages
Dell Vostro 15 3510 p112f p112f003 Dell Regulatory and Environmental Datasheet En-Us
No ratings yet
Dell Vostro 15 3510 p112f p112f003 Dell Regulatory and Environmental Datasheet En-Us
12 pages
PRIME AMP Guide
No ratings yet
PRIME AMP Guide
6 pages
Migrating An Oracle Database To AWS
No ratings yet
Migrating An Oracle Database To AWS
9 pages
Seagate HDD Data Sheet
No ratings yet
Seagate HDD Data Sheet
2 pages

Solution 02

Uploaded by

Solution 02

Uploaded by

Database Systems WS 2024/25

Prof. Dr.-Ing. Sebastian Michel

Question 1: Index tuning (1 P.)

• Q4: All exams of a given professor with a specific grade.

Question 2: Index Costs (1 P.)

Question 3: Composite Index (1 P.)

... ... 100 visited leaves

Data Pages ... Max. 10 visited data pages

... ... ...

... ... 2 visited leaves

Data Pages ... Max. 10 visited data pages

Aternative solution for (sws,heldby):

... ... 10 visited leaves

Data Pages ... Max. 10 visited data pages

Question 4: Index Implementation (1 P.)

The system should be able to answer the following queries:

Your task is to:

Pseudo code dense index get equal string:

You might also like