Margarita Makashi Group SE1
EXERCISES – L1
Please complete the following exercises based on the first Lecture and
Chapters 1 and 2 of the Database System Concepts textbook. Ensure that
you answer all questions in your own words and provide clear
justifications for your responses. This assignment encourages you to
apply critical thinking and analysis.
Exercise 1: Compare a file-processing system and a DBMS. What are the
key differences in how they handle data consistency, security, and
redundancy? How would the choice between using a file-processing
system versus a DBMS affect the scalability of an enterprise system as
the business grows? Provide real-world examples.
Here are two distinct methods for data storage and management:
1. **Data Consistency**: In a file-processing system (FPS), data is stored in
individual files, which complicates the maintenance of consistency when
multiple applications access the same data. This can lead to multiple
versions of the same information, as there is no mechanism to enforce
consistency. In contrast, a database management system (DBMS)
guarantees consistency through ACID properties, ensuring that changes in
one location are automatically reflected throughout the system. It also
employs constraints to maintain accuracy. For example, in a shipping
company utilizing an FPS, customer information may be kept in one file and
order details in another. If a customer’s address changes in the customer
file, it may not be updated in the order file.
2. **Data Security**: FPS typically offers limited security, depending solely
on basic operating system permissions without advanced control measures.
Conversely, DBMS provides enhanced security features such as
authentication, authorization, encryption, audit trails, and role-based access
control. For instance, sensitive data handled by a manager in an FPS might
be accessible to unauthorized employees, while a DBMS allows for more
precise access management.
3. **Redundancy**: Redundancy is prevalent in FPS, where the same data is
often duplicated across multiple files, leading to inconsistencies. DBMS
reduces redundancy through normalization, ensuring that each data element
is stored only once and referenced appropriately. For example, in an FPS, a
customer’s address could appear in both the customer and order files,
causing duplication. In a DBMS, the address would be stored only once and
linked to both files.
4. **Scalability**: FPS faces challenges with scalability; as data volume
increases, management and maintenance become more difficult. The
growing number of files complicates the tracking of relationships. In contrast,
DBMS is designed to handle large data volumes and can easily scale as the
organization expands.
Exercise 2: Identify three common challenges in designing a database
schema and propose solutions for each. How might improper schema
design affect the performance and maintainability of a large-scale
application, such as an e-commerce platform? Illustrate with examples.
1. Normalization is essential for reducing redundancy and organizing data
into smaller, logical tables, but it can result in complicated queries that
involve multiple JOIN operations. Conversely, denormalization combines
tables to enhance read performance. Striking a balance between the two can
be challenging, often accomplished through a hybrid strategy that integrates
both approaches. Additionally, implementing indexes and caching can
accelerate query performance without heavily depending on
denormalization.
2. As needs change, database design must be flexible. This flexibility can be
achieved through a versatile schema, such as the entity-attribute-value
model, which can adapt to new requirements.
3. Managing data relationships necessitates careful integration of foreign
keys to ensure effective coordination between tables. However, it is crucial
to use foreign keys judiciously to uphold data integrity while avoiding
unnecessary complexity in the design.
Exercise 3: What are two disadvantages of using database systems in an
organization? Discuss whether these disadvantages could outweigh the
benefits in certain scenarios, such as in small businesses or highly
specialized applications.
**Cost and Complexity**: Implementing and maintaining a database
management system (DBMS) can be expensive and typically requires
specialized personnel for setup. For smaller businesses with minimal data
needs, adopting a full DBMS might be unnecessary; simpler solutions like
spreadsheets could suffice.
**Overhead**: A DBMS can create considerable overhead by utilizing server
resources, taking up disk space, and introducing latency. For organizations
with limited data, the performance benefits of a DBMS may not be apparent,
and the additional overhead could potentially hinder system performance
rather than enhance it.
Exercise 4: Explain why physical data independence is important for
database systems. Can you think of a scenario where not having physical
data independence could create challenges for developers or database
administrators? How would you mitigate these challenges?
Physical data independence is essential because it allows for changes
in how or where data is physically stored without necessitating changes to
the application code. This flexibility ensures that the system design remains
unaffected by the physical storage details. As a business expands or new
technologies are introduced, the system can be upgraded seamlessly
without altering the code.
For instance, in an e-commerce platform, the database schema might be
designed to reside on a hard drive using a particular storage format. If an
upgrade is required, such as transitioning to a new storage system,
considerable time could be wasted rewriting parts of the database and
recreating queries. By utilizing a DBMS that offers robust physical data
independence, upgrades can be carried out without interrupting the
application.
Exercise 5: Compare the advantages and disadvantages of using a two-
tier versus a three-tier architecture in web applications. Which
architecture would you recommend for a cloud-based application that
requires scalability and high security? Justify your answer.
**Two-tier architecture**: This approach is characterized by its simplicity,
quicker performance, and reduced latency, as there are fewer layers
between the client and server. However, it does have drawbacks, including
limited scalability, tightly integrated components, and potential security
vulnerabilities.
**Three-tier architecture**: This model offers enhanced scalability,
improved security, better reusability, and a distinct separation of the
presentation, logic, and data layers. On the flip side, it can be more complex,
slower in terms of performance, and more expensive to implement.
For a cloud-based application, I recommend using the three-tier architecture.
It provides superior scalability and security, and the clear separation of
concerns is particularly beneficial in cloud environments, facilitating
flexibility and easier maintenance as the system expands and requirements
evolve.
Exercise 6: Given the following two relations:
• Instructor (ID, Name, Dept_Name, Salary) ;
• Teaches (ID, Course_ID, Section, Semester, Year)
1
Write the relational algebra expression for the query.
Find the names of all instructors who taught a course in the "Computer
Science" department during the "Fall 2020" semester.
Can you transform this query into a more efficient version by applying
selection before the join operation? Explain why the transformation is
more efficient.
We begin by selecting only the records from the Teaches relation for the Fall
semester of 2020:
σSemester='Fall' AND Year=2020(Teaches).
Next, we filter for instructors:
σDept_Name='ComputerScience'(Instructor).
σDept_Name='ComputerScience'(Instructor) ⋈ σSemester='Fall' AND
We then combine these filtered results:
Year=2020(Teaches).
πName(σDept_Name='ComputerScience'(Instructor) ⋈ σSemester='Fall' AND
Finally, we extract the names of the instructors:
Year=2020(Teaches)).
By performing the selection prior to the join, we minimize the number of
records processed, which helps reduce both memory consumption and query
execution time.
Exercise 7: Consider the schema:
• Instructor (ID, Name, Dept_Name, Salary)
• Department (Dept_Name, Building, Budget)
Suppose there is a foreign key constraint on
Instructor.Dept_Name referencing Department.Dept_Name.
What would happen if you attempted to:
• Insert an instructor with a Dept_Name that doesn't exist in the
Department table? • Delete a department that still has instructors
associated with it?
Propose strategies for preventing such violations in a database system.
If we try to add an instructor to the Instructor table with a department
name that does not exist in the Department table, it will violate the foreign
key constraint, leading the database to reject the insertion. To prevent this
issue, we should verify that the department is present in the Department
table before adding the instructor by implementing a foreign key validation
check in the application layer.
Similarly, if we attempt to delete a department from the Department table
that still has linked instructors in the Instructor table, the foreign key
constraint will be violated, and the database will block the deletion to avoid
creating orphaned instructor records (where the department name no longer
points to a valid department). To address this, we can restrict deletions
when foreign key dependencies are present, ensuring that a department
cannot be deleted until any associated instructors are properly managed or
reassigned.
Exercise 8: Given the following relations:
• Employee (Emp_ID, Name, Dept_ID)
• Project (Proj_ID, Emp_ID, Project_Name)
Write a relational algebra expression that lists all employees along with
the projects they are working on. Ensure that employees who are not
working on any project are still listed in the result. Explain a real-world
scenario where this type of join would be useful.
The expression for this query would be:
Employee LEFT OUTER JOIN Project ON Employee.Emp_ID =
Project.Emp_ID
In this case, all employees are included in the result, even if they don't have
a corresponding entry in the Project table.
A real-world scenario for this type of join could be a company HR system
tracking employee involvement in projects. The system would display all
employees along with the projects they are working on, while also showing
employees who are not currently assigned to any projects. This ensures that
no employee is left out of the report, regardless of their current project
status.
Exercise 9: Given the relations:
• Takes (Student_ID, Course_ID)
• Courses (Course_ID, Dept_Name)
Write a relational algebra expression using the division operator (÷) to find
the IDs of all students who have taken all "Mathematics" courses. Can you
rewrite this query without using the division operator? Explain your process
and why the division operator is useful in this context.
We begin by selecting the Course IDs for all Mathematics courses:
πCourse_ID(σDept_Name=′Mathematics′(Courses))
Using the division operator, we can express the query to find all students
who have taken every course in this set:
πStudent_ID(Takes)÷πCourse_ID(σDept_Name=′Mathematics′(Courses))
To rewrite the query without using the division operator:
1. Find all Mathematics course IDs: M=πCourse_ID(σDept_Name=
′Mathematics′(Courses))
2. Find the students who took any Mathematics course:
T=σCourse_ID∈M(Takes)
3. Count how many Mathematics courses each student took:
G=γStudent_ID,COUNT(Course_ID)(T)
4. Find the total number of Mathematics courses: N=∣πCourse_ID
(σDept_Name=′Mathematics′(Courses))∣
5. Finally, select the students who took all Mathematics courses:
σCOUNT(Course_ID)=N(G)
The division operator simplifies queries where we need to find entities (like
students) that are related to all instances of another entity (like courses in a
department). In this case, it allows us to express the requirement that a
student has taken every "Mathematics" course directly, without needing to
compare sets or perform aggregation manually.
Exercise 10: Consider the following relational algebra expression:
σyear=2021 (Instructor⋈Instructor.ID=Teaches.IDTeaches) This query
retrieves information about instructors
who taught courses in 2021. Rewrite the query by applying the selection
operation before the join. Which version of the query would be more
efficient and why?
Think of other ways to optimize complex queries involving multiple
operations.
Instructor⋈Instructor.ID=Teaches.IDσyear=2021(Teaches)
This approach joins the Instructor and Teaches tables after filtering the
Teaches table for the year 2021. By doing so, we significantly reduce the
number of rows involved in the join, which enhances query execution
efficiency.
Other strategies to further improve query performance may include:
Indexing: Creating indexes on frequently queried columns can speed
up data retrieval.
Query Rewriting: Modifying the query to optimize its structure can
lead to better execution plans.
Avoiding Redundant Joins: Eliminating unnecessary joins can
simplify the query and reduce processing time.
Cost-Based Query Optimization: Utilizing the database's query
optimizer to choose the most efficient execution plan based on
estimated costs can also enhance performance.