1.
Purpose of a Database System
The purpose of a database system is to:
      Store, retrieve, and manage data efficiently.
      Maintain data integrity and security.
      Support concurrent access by multiple users.
      Minimize data redundancy and inconsistency.
      Provide mechanisms for backup and recovery.
      Enable data sharing among multiple users or systems.
2. Characteristics of the Database Approach
      Self-Describing Nature: The database system contains not only the data but also
       metadata (data about data) to define its structure.
      Data Abstraction: Hides the complexity of data representation and storage through
       various abstraction levels.
      Support for Multiple Views: Users can access customized views of the data according
       to their requirements.
      Data Sharing and Multi-user Transaction Processing: Supports concurrency, ensuring
       consistency and isolation for multiple transactions.
      Minimized Data Redundancy: Centralized management reduces duplication and
       inconsistencies.
3. Data Models
A data model defines the logical structure of a database, including the relationships
between different data elements. Common types include:
      Hierarchical Model: Data is organized in a tree-like structure.
      Network Model: Data is represented as records connected by links.
      Relational Model: Data is organized into tables (relations) with rows and columns.
      Entity-Relationship Model (ER): Focuses on entities, attributes, and relationships.
      Object-Oriented Model: Data is represented as objects, like in object-oriented
       programming.
4. Database Management System (DBMS)
A DBMS is software that:
      Provides an interface for users and applications to interact with a database.
      Handles tasks like data storage, retrieval, security, and concurrency.
      Examples: MySQL, Oracle, Microsoft SQL Server, PostgreSQL.
5. Database System Architecture
      Centralized Architecture: A single central database for all users.
      Client-Server Architecture: The database is hosted on a server, and clients access it
       over a network.
      Distributed Architecture: Data is stored across multiple physical locations.
6. Three-Schema Architecture
      External Schema: Represents user views tailored to specific needs.
      Conceptual Schema: Represents the logical structure of the entire database, hiding
       physical details.
      Internal Schema: Represents the physical storage structure and access paths.
The goal of this architecture is to achieve data abstraction and data independence.
7. Components of a DBMS
   1. Storage Manager: Manages disk space and data structures.
   2. Query Processor: Translates and executes database queries.
   3. Transaction Manager: Ensures data consistency and handles concurrency.
   4. Metadata Manager: Maintains information about the data.
   5. Database Engine: Core service for data processing and management.
   6. User Interfaces: Includes APIs and tools for interaction.
8. Data Independence
      Logical Data Independence: Ability to change the logical schema without affecting
       applications.
      Physical Data Independence: Ability to change the physical storage without altering
       the logical schema.
9. File System Approach vs. Database System Approach
Feature               File System Approach               Database System Approach
                      High redundancy due to lack of Minimized redundancy with centralized
Data Redundancy
                      integration.                   management.
Data Integrity        Difficult to maintain.             Enforced through constraints and rules.
                                                         Strong concurrency control
Concurrency           Limited or absent.
                                                         mechanisms.
                                                         Advanced security features like user
Data Security         Limited.
                                                         roles.
Data Sharing          Limited sharing.                   Easy and efficient sharing across users.
Backup and
                      Manual and cumbersome.             Automated and efficient.
Recovery
Scalability           Difficult to scale.                Scalable with distributed databases.
Entity-Relationship (ER) Modeling
ER Modeling is a conceptual approach to data modeling that defines the structure and
relationships of data within a domain. It provides a high-level view of the system's data
requirements and relationships in the form of an Entity-Relationship Diagram (ERD).
1. Conceptual Data Modeling: Motivation
      Purpose:
              o   To create a blueprint for the database system that captures the essential
                  structure of the data and its relationships.
              o   To facilitate communication between stakeholders (business users,
                  developers, and database designers).
              o   To ensure the data model reflects business rules and minimizes redundancy.
      Benefits:
              o   Provides a clear, visual representation of data.
              o   Helps identify data integrity constraints.
              o   Serves as a basis for database design.
2. Key Concepts in ER Modeling
a. Entities
       Definition: Entities represent "things" or "objects" in the real world that have distinct
        identities.
       Examples: Employee, Student, Car, Product.
b. Entity Types
       Definition: A collection of entities with the same attributes.
       Examples:
              o   Student is an entity type that includes individual entities like John or Maria.
              o   Represented as a rectangle in ER diagrams.
c. Attributes
       Definition: Properties or characteristics of an entity or relationship.
       Types of Attributes:
              o   Simple: Cannot be divided further (e.g., Name, Age).
              o   Composite: Can be broken down into smaller parts (e.g., Full Name → First
                  Name + Last Name).
              o   Derived: Computed from other attributes (e.g., Age from Date of Birth).
              o   Multivalued: Can have multiple values (e.g., Phone Numbers).
       Represented as ovals in ER diagrams.
d. Relationships
       Definition: Associations between entities.
       Examples:
              o   Student enrolls in Course.
              o   Represented as a diamond in ER diagrams.
e. Relationship Types
       Definition: A set of similar relationships.
       Examples:
             o   "Enrolled-In" is a relationship type between Students and Courses.
3. Constraints on Relationships
a. Cardinality
Specifies the number of instances of one entity related to the other:
         One-to-One (1:1): One entity instance is related to one instance of another entity.
             o   Example: Each Employee has one Desk.
         One-to-Many (1:N): One entity instance is related to many instances of another
          entity.
             o   Example: A Customer places many Orders.
         Many-to-Many (M:N): Many instances of one entity are related to many instances of
          another entity.
             o   Example: Students enroll in many Courses, and each Course has many
                 Students.
b. Participation
Specifies whether all entities participate in the relationship:
         Total Participation: Every instance of an entity must participate in at least one
          relationship instance.
             o   Represented by a double line.
         Partial Participation: Some instances of an entity may not participate.
             o   Represented by a single line.
c. Keys
         Primary Key: A unique attribute (or combination) that identifies an entity instance.
             o   Example: Student ID for a Student.
4. Entity-Relationship Diagram (ERD) Notation
ER diagrams use specific symbols to represent entities, relationships, and attributes:
Component            Symbol                      Description
Entity               Rectangle                   Represents an entity type.
Weak Entity          Double Rectangle            Depends on a strong entity for identification.
Component            Symbol                    Description
                                               Represents a property of an entity or
Attribute            Oval
                                               relationship.
Key Attribute        Oval (underlined)         An attribute uniquely identifying an entity.
Derived Attribute Dashed Oval                  Computed from other attributes.
Relationship         Diamond                   Represents a relationship type between entities.
                     1, N, M written near
Cardinality                                    Specifies the number of related instances.
                     lines
Participation        Single or double lines    Indicates partial or total participation.
Example ERD
Scenario: A Student enrolls in Courses taught by Professors.
      Entities: Student, Course, Professor.
      Attributes:
            o   Student: Student_ID (PK), Name, Age.
            o   Course: Course_ID (PK), Title, Credits.
            o   Professor: Professor_ID (PK), Name, Department.
      Relationships:
            o   Enrolls: Between Student and Course (Many-to-Many).
            o   Teaches: Between Professor and Course (One-to-Many).
ERD Notation:
      Student → Rectangle with attributes (Student_ID, Name, Age).
      Course → Rectangle with attributes (Course_ID, Title, Credits).
      Professor → Rectangle with attributes (Professor_ID, Name, Department).
      Enrolls → Diamond connecting Student and Course with M:N cardinality.
      Teaches → Diamond connecting Professor and Course with 1:N cardinality.
Relational Data Model
The Relational Data Model organizes data into relations (tables) consisting of rows (tuples)
and columns (attributes). It provides a formal foundation for structuring, querying, and
manipulating data.
1. Update Anomalies
Anomalies arise when updating data in poorly designed tables, often due to redundancy:
      Insertion Anomaly:
          o   Occurs when certain data cannot be added to the database without adding
              unrelated data.
          o   Example: You can't add a new course to a student-course table unless a
              student is enrolled in that course.
      Deletion Anomaly:
          o   Occurs when deleting certain data unintentionally removes other related
              data.
          o   Example: Deleting the last student enrolled in a course removes the course
              details.
      Update Anomaly:
          o   Occurs when changes in data require multiple updates, leading to
              inconsistencies.
          o   Example: If a professor's department changes, it must be updated in all rows
              where the professor appears.
These anomalies are resolved by normalizing the database.
2. Concept of Relations
      Relation:
          o   A table with rows and columns where each row is a tuple, and each column is
              an attribute.
          o   Relations must satisfy certain properties:
                      No duplicate tuples.
                      Attributes have unique names.
                      Order of rows and columns does not matter.
      Relation Schema:
          o   The structure of a relation, including its name, attributes, and their domains.
          o   Example: Student(Student_ID, Name, Age).
      Relation Instance:
             o   A specific set of tuples for a given schema at a particular moment.
3. Schema-Instance Distinction
         Schema:
             o   The overall structure and design of the database (metadata).
             o   Example: Table definitions, attribute names, and data types.
             o   Static in nature.
         Instance:
             o   The actual data stored in the database at a specific time.
             o   Dynamic and changes frequently as the data is updated.
4. Keys
Keys uniquely identify tuples within a relation.
         Superkey:
             o   A set of attributes that uniquely identify tuples in a relation.
             o   Example: {Student_ID}, {Student_ID, Name}.
         Candidate Key:
             o   A minimal superkey, meaning no subset can also uniquely identify tuples.
             o   Example: {Student_ID}.
         Primary Key:
             o   A chosen candidate key used to uniquely identify tuples.
             o   Example: Student_ID.
         Alternate Key:
             o   A candidate key not chosen as the primary key.
             o   Example: {Email} if {Email} is unique.
         Composite Key:
             o   A primary key made of multiple attributes.
             o   Example: {Student_ID, Course_ID} in an enrollment table.
5. Relational Integrity Constraints
Constraints ensure data accuracy and consistency.
      Domain Constraints:
           o   Attribute values must belong to a specific domain.
           o   Example: Age must be an integer between 0 and 120.
      Entity Integrity:
           o   Primary key attributes cannot be null.
           o   Example: Every Student must have a unique Student_ID.
      Referential Integrity:
           o   Ensures foreign keys reference valid primary key values in another table.
           o   Example: Course_ID in the Enrollment table must exist in the Courses table.
6. Referential Integrity and Foreign Keys
      Foreign Key:
           o   An attribute in one relation that references the primary key of another
               relation.
           o   Maintains referential integrity by ensuring the referenced data exists.
Example:
      Courses(Course_ID, Title) → Primary Key: Course_ID.
      Enrollments(Student_ID, Course_ID) → Foreign Key: Course_ID references
       Courses(Course_ID).
7. Relational Algebra Operators and Queries
Relational algebra is a procedural query language that provides operations to manipulate
relations.
a. Unary Operators
      Selection (σ):
           o   Filters rows based on a condition.
           o   Example: σ(Age > 18)(Student) → Selects students older than 18.
      Projection (π):
              o   Selects specific columns.
              o   Example: π(Name, Age)(Student) → Displays only Name and Age.
      Rename (ρ):
              o   Renames a relation or its attributes.
              o   Example: ρ(NewStudent, Student) → Renames Student to NewStudent.
b. Binary Operators
      Union (∪):
              o   Combines tuples from two relations, removing duplicates.
              o   Example: R1 ∪ R2.
      Set Difference (-):
              o   Finds tuples in one relation but not in another.
              o   Example: R1 - R2.
      Intersection (∩):
              o   Finds tuples common to both relations.
              o   Example: R1 ∩ R2.
      Cartesian Product (×):
              o   Combines every tuple of one relation with every tuple of another.
              o   Example: R1 × R2.
c. Relational Join
      Theta Join (⨝θ):
              o   Combines tuples based on a condition θ.
              o   Example: Student ⨝Student.Student_ID = Enrollment.Student_ID Enrollment.
      Natural Join (⨝):
              o   Automatically joins relations on attributes with the same name.
d. Division
      Finds tuples in one relation that match all tuples in another relation.
      Example: R1 ÷ R2.
8. Example Relational Algebra Query
Retrieve the names of students enrolled in the "Database" course:
   1. Select the Course_ID for "Database" from the Courses relation:
   2. σ(Title = "Database")(Courses)
   3. Join the result with the Enrollments table to find relevant students:
   4. Courses ⨝ Courses.Course_ID = Enrollments.Course_ID Enrollments
   5. Join with the Student table to retrieve names:
   6. (Courses ⨝ Enrollments) ⨝ Student
   7. Project the Name attribute:
   8. π(Name)((Courses ⨝ Enrollments) ⨝ Student)
Data Storage and Indexes
Efficient storage and retrieval of data are critical in database systems. Indexing and file
organization are key components that improve performance.
1. Need for File Indexes
File indexes act like a "table of contents" for a database, speeding up data retrieval.
      Why Indexes Are Needed:
           o   Fast Search: Quickly locate records without scanning the entire table.
           o   Efficient Updates: Reduce the cost of finding and modifying data.
           o   Sorting: Facilitate ordered retrieval of data.
           o   Optimization: Improve performance of queries with WHERE, JOIN, or ORDER
               BY.
      Example: Searching for a specific student in a table of 1 million rows without an
       index would require a full table scan. An index reduces this to a logarithmic search
       time.
2. File Organizations
The arrangement of data on storage determines access speed and efficiency.
Types of File Organizations:
      Heap (Unordered Files):
           o   Records are stored as they arrive.
           o   Efficient for bulk inserts but slow for search.
      Sequential:
           o   Records are stored in a sorted order.
           o   Efficient for range queries but costly for inserts and updates.
      Clustered:
           o   Related rows are stored physically close together.
           o   Optimized for queries involving joins or related data.
      Hashed:
           o   Records are stored based on a hash function.
           o   Provides fast equality search but poor range query performance.
3. Index Structures
Indexes are auxiliary data structures that speed up searches.
Types of Indexes:
      Primary Index:
           o   Built on a table's primary key.
           o   Ensures that each key corresponds to exactly one record.
      Secondary Index:
           o   Built on non-primary attributes.
           o   Useful for optimizing queries on frequently searched columns.
      Unique Index:
           o   Enforces uniqueness on indexed columns.
Index Data Structures:
      Single-Level Indexing:
           o   A single layer of indexes pointing to records.
           o   Example: Dense Index (one index entry per record) and Sparse Index (index
               entry for each block).
      Multi-Level Indexing:
           o   Indexes on indexes for scalability with large datasets.
           o   Example: Two-level or three-level indexing.
4. Concurrent Execution of Transactions
Multiple transactions run simultaneously in a database system to improve performance and
resource utilization.
Challenges:
       Data Consistency:
             o   Simultaneous transactions can lead to conflicts (e.g., dirty reads, lost
                 updates).
       Deadlocks:
             o   Two or more transactions waiting for each other to release locks.
Solutions:
       Locking Protocols:
             o   Shared and exclusive locks ensure data integrity.
       Timestamp Ordering:
             o   Ensures transaction execution follows a logical order based on timestamps.
5. ACID Properties
ACID ensures reliability and integrity of transactions in databases.
A: Atomicity:
       A transaction is all-or-nothing; either all its operations succeed, or none are applied.
       Example: Transferring money between accounts must debit and credit both accounts
        or none.
C: Consistency:
       A transaction leaves the database in a valid state, adhering to all constraints.
       Example: Ensuring no two rows have the same primary key.
I: Isolation:
       Concurrent transactions do not interfere with each other.
       Example: One transaction updating a record cannot see changes made by another
        transaction until committed.
D: Durability:
       Once a transaction is committed, its changes are permanent, even in case of system
        failure.
      Achieved through logs and backup mechanisms.
Example Scenario
Use Case: Retrieving student records sorted by age.
   1. File Organization:
           o   Use sequential file organization to store records sorted by age.
   2. Indexing:
           o   Create a secondary index on the Age column for faster queries.
   3. Concurrent Transactions:
           o   If two users query and update the Age column simultaneously, locking
               ensures isolation.
   4. ACID Compliance:
           o   Atomicity guarantees both read and update queries complete or fail together.
           o   Durability ensures updates persist even after a system crash.
Advantages of Indexing
      Faster data retrieval.
      Improved query performance for large datasets.
      Reduced I/O overhead for accessing data.