DBMS
Data vs Information
• Data is a collection of facts, while information puts those facts into context.
• While data is raw and unorganized, information is organized.
What is Database?
Database is an electronic place/system where data is stored in a way that it can be easily accessed,
managed, and updated.
What is DBMS?
• A database-management system (DBMS) is a collection of interrelated data and a set of programs to
access those data. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
• A DBMS is the database itself, along with all the software and functionality. It is used to perform
different operations, like addition, access, updating, and deletion of the data.
File-processing systems has major disadvantages.
• Data Redundancy and inconsistency
• Difficulty in accessing data
• Data isolation Integrity problems
• Atomicity problems
• Concurrent-access anomalies
• Security problems
DBMS Architecture
Three Schema Architecture
The major purpose of DBMS is to provide users with an abstract view of the data. That is, the system hides
certain details of how the data is stored and maintained.
We have 3 types of Schemas: Physical, Logical, and several view schemas called subschemas.
• Physical level / Internal level
The lowest level of abstraction describes how the data are stored.
• Logical level / Conceptual level:
The conceptual schema describes the design of a database at the conceptual level, describes what
data are stored in DB, and what relationships exist among those data
• View level / External level:
Highest level of abstraction aims to simplify users’ interaction with the system by providing different
view to different end-user.
Each view schema describes the database part that a particular user group is interested in and hides
the remaining database from that user group.
At views also provide a security mechanism to prevent users from accessing certain parts of DB.
Instances and Schemas
• The collection of information stored in the DB at a particular moment is called an instance of DB.
• The overall design of the DB is called the DB schema.
Data Models
• Provides a way to describe the design of a DB at logical level.
• A collection of conceptual tools for describing data, data relationships, data semantics & consistency
constraints.
Database Languages
• Data definition language (DDL) to specify the database schema.
• Data manipulation language (DML) to express database queries and updates.
DBMS Application Architectures
T1 Architecture
• The client, server & DB all present on the same machine.
T2 Architecture
• App is partitioned into 2-components.
• Client machine, which invokes DB system functionality at server end through query language
statements.
• API standards like ODBC & JDBC are used to interact between client and server.
Disadvantages- scalability, security
Advantages-maintenance
T3 Architecture
• App is partitioned into 3 logical components.
• Client machine is just a frontend and doesn’t contain any direct DB calls.
• Client machine communicates with App server, and App server communicated with DB system to
access data.
Disadvantages- maintenance
Advantages-
Scalability- due to distributed application servers
Security- The client can’t directly access DB, hence it is more secure.
Data integrity- App server acts as a middle layer between client and DB, which minimize the
chances of data corruption.
Entity-Relationship Model
Data Model: Collection of conceptual tools for describing data, data relationships, data semantics,
and consistency constraints.
ER Model: It is a high-level data model based on a perception of the real world that consists of a
collection of basic objects, called entities, and of relationships among these objects.
Entity: An Entity is a “thing” or “object” in the real world that is distinguishable from all other
objects.
Attributes: An entity is represented by a set of attributes
E.g., Student Entity has the following attributes
Student_ID, Name, Standard, Course, Batch, Contact number, Address
Types of Attributes
1. Simple
o Attributes that can’t be divided further.
o E.g., Customer’s account number in a bank, Student’s Roll number, etc.
2. Composite
• Can be divided into subparts (that is, other attributes).
• E.g., Name of a person, can be divided into first-name, middle-name, last-name. Address can
also be divided, street, city, state, PIN code.
3. Single-valued
• Only one value attribute. e.g., Student ID, loan number for a loan.
4. multi-valued
• Attribute having more than one value. e.g., phone number, nominee name on some
insurance, dependent name, etc.
5. Derived
• The value of this type of attribute can be derived from the value of other related attributes
• e.g., Age, loan age, membership period, etc
6. NULL Value
• An attribute takes a null value when an entity does not have a value for it. salary attribute
value of an employee is null, which means it is not known yet
Relationships
• Association among two or more entities.
• e.g., Person has a vehicle, Parent has a Child, Customer borrows a loan, etc
Degree of Relationship
• Unary, only one entity participates. e.g., Employee manages employee.
• Binary, two entities participate. e.g., Student takes Course.
• Ternary relationship, three entities participate. e.g., Employee works-on branch, employee works-on
job
Mapping Cardinality / Cardinality Ratio
One to one
• Entity in A associated with at most one entity in B, where A & B are entity sets. And an entity of B is
associated with at most one entity of A.
• E.g., Citizen has Aadhar Card.
One to many
• Entity in A associated with N entity in B. While entity in B is associated with at most one entity in A.
• e.g., Citizen has Vehicle.
Many to one
• Entity in A associated with at most one entity in B. While entity in B can be associated with N entity
in A.
• e.g., Course taken by Professor.
Many to many
• Entity in A associated with N entity in B. While entity in B is also associated with N entity in A.
• E.g., Customer buys product, Student attends course.
Relational Model
The relational Model (RM) organizes the data in the form of relations (tables).
Relation Schema: defines the design and structure of the relation, contains the name of the relation
and all the columns/attributes.
Degree of table: number of attributes/columns in a given table/relation
Important Properties of a Table in Relational Model
• The name of the relation is distinct from all other relation.
• The values have to be atomic. Can’t be broken down further.
• The name of each attribute/column must be unique. Each tuple must be unique in a table.
• The sequence of rows and columns has no significance.
• Tables must follow integrity constraints - it helps to maintain data consistency across the tables
Relational Model Keys
Super Key (SK): Any P&C of attributes present in a table that can uniquely identify each tuple
Candidate Key (CK): a minimum subset of super keys, which can uniquely identify each tuple. It
contains no redundant attribute. CK value shouldn’t be NULL.
Primary Key (PK): Selected out of CK set, has the least no. of attributes.
Alternate Key (AK): All CK except PK
Foreign Key (FK): It creates a relation between two tables. A relation, say r1, may include among its
attributes the PK of another relation, say r2. This attribute is called FK from r1 referencing r2
Composite Key: PK formed using at least 2 attributes
Compound Key: PK which is formed using 2 FK
Surrogate Key: Synthetic PK. Generated automatically by DB, usually an integer value
Normalisation
Why Normalisation?
To avoid redundancy in the DB, not to store redundant data.
What is Normalisation?
1. Normalisation is used to minimize the redundancy of a relation. It is also used to eliminate
undesirable characteristics like Insertion, Update, and Deletion Anomalies.
2. Normalisation divides the composite attributes into individual attributes OR larger table into smaller
and links them using relationships
What happen if we have redundant data? Insertion, deletion and updation anomalies arises.
Anomalies
Anomalies means abnormalities, there are three types of anomalies introduced by data redundancy.
1. Insertion anomaly
When certain data (attribute) cannot be inserted into the DB without the presence of other data.
2. Deletion anomaly
The delete anomaly refers to the situation where the deletion of data results in the unintended loss of
some other important data.
4. Updation anomaly (or modification anomaly)
1. The update anomaly is when an update of a single data value requires multiple rows of data to be
updated.
2. Due to updation to many places, may be Data inconsistency arises, if one forgets to update the data
at all the intended places.
Due to these anomalies, DB size increases and DB performance become very slow. To rectify these
anomalies and the effect of these of DB, we use Database optimisation technique called
NORMALISATION
Types of Normal forms
1NF, 2NF, 3NF, BCNF (Boyce-Codd normal form)
Advantages of Normalisation
1. Normalisation helps to minimise data redundancy.
2. Greater overall database organisation.
3. Data consistency is maintained in DB
Transaction
A unit of work done against the DB in a logical sequence
ACID Properties
To ensure integrity of the data, we require that the DB system maintain the following properties of the
transaction.
1. Atomicity - Either all operations of transaction are reflected properly in the DB, or none.
2. Consistency - Integrity constraints must be maintained before and after transaction.
3. Isolation - Even though multiple transactions may execute concurrently, the system guarantees that,
for every pair of transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti
started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions
executing concurrently in the system. Multiple transactions can happen in the system in isolation,
without interfering each other.
4. Durability - After transaction completes successfully, the changes it has made to the database persist,
even if there are system failures.
How to implement Atomicity and Durability in Transactions
Shadow-copy scheme
Based on making copies of DB (aka, shadow copies). A pointer called db-pointer is maintained on the
disk, which at any instant points to current copy of DB. All further updates are done on new DB copy
leaving the original copy (shadow copy) untouched. If at any point the T has to be aborted the system
deletes the new copy. And the old copy is not affected.
Log-based recovery methods
The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if
any failure occurs, then it can be recovered from there. If any operation is performed on the database,
then it will be recorded in the log. But the process of storing the logs should be done before the actual
transaction is applied in the database.
Indexing is used to optimise the performance of a database by minimising the number of disk accesses
required when a query is processed.