Database Systems
CSC104
Nazeef Ul Haq
Spring 2025
Problem Based Learning
• Lets start with a problem…
Example of a Traditional Database
Application
Suppose we are building a system to store the information about:
• students
• courses
• professors
• who takes what, who teaches what
4
Can we do it with the usage of files?
Sure we can! Start by storing the data in files:
students.txt courses.txt professors.txt
Now write C or Java programs to implement specific tasks
5
Doing it with a file system
• Enroll “Ali” in “CS262”:
Write a C program to do the following:
Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
Find&update
Find&updatethetherecord
record“CS262”
“CS262”
Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”
6
Problems with file system...
• System crashes:
Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
CRASH !
Find&update the record “CS262”
Find&update the record “CS262”
What is the problem ? Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”
• Large data sets (say 50GB)
• What is the problem ?
• Simultaneous access by many users?
• What happens when you mistakenly add Ali two times? How will you
manage duplicate insertions?
• Attach a course to a student which does not exist. What will happen then?
• If location of the item is unknown, what to do?
• Read, write, open and close operations etc.
• Security?
7
Problems
• Data inconsistency/anomalies
• Data redundancy
• Durability
• Concurrency
• File size etc.
A program that
makes it easy for
you to
manipulate large
amounts of data.
Opening Door of DBMS
DBMS
• Create databases
• Query
• Storage
• Durability
• Concurrency
Examples? Oracle, mySQL, postgreSQL, MS SQL Server, and Microsoft
Access etc.
DBMS
• Single Box – Components
• Double Boxes – In-memory data structures
• Solid line – Control and Data flow
• Dashed line – Data flow
• Users: Ask for data or modify data (DML and DDL)
• DB Administrator or DBA: Responsible for the structure or schema of
the database
DBMS continued..
• Query Processing
• Query
• Query Compiler (Parse and optimize query)
• Execution Engine (Executes query)
• Resource Manager (Manages locations of files and uses indexes to execute the query or find elements)
• Buffer Manager (brings data in RAM)
• Storage Manager (Gets data from disk)
• Transaction Manager (ACID)
• Atomic
• Isolation
• 1 query = 1 transaction
• 1 unit = 1 transaction = 1 or more queries
• Two parts: Concurrency Control Manager (maintains A and I of ACID) and Logging and Recovery
Manager (maintains D of ACID)
Storage and Buffer Manager
• Storage Manager
• Controls placement of data on disk and its movement between disk and main
memory
• Tracks location of files
• Obtains blocks of file on request
• Buffer Manager
• Partitions available space of main memory in buffers where disk blocks can
reside
• DBMS components will interact with buffers for information (data, metadata,
logs, stats and indexes)
Transaction Processing
• Logging
• Recovery manage will check logs and restore in case of crash.
• Communicates with buffer manager and saves logs on disk
• Concurrency
• Concurrency Control manager must assure that the individual actions of
multiple transactions are executed in such an order that the net effect is the
same as if the transactions had in fact executed in their entirety, one-at a
time.
• Locks – stored in main memory – lock table
• Deadlock Resolution
• Can stuck in locks while fighting for resources
Query Processor
• Effects performance of DBMS
• Query Compiler
• Generates query plan
• Parser (builds tree)
• Preprocessor (Relation checking and generating query plan)
• Optimizer (best solution)
• Execution Engine
• Executes the query plan as it is!!!
• Interacts with buffers
• Interacts with schedulers to avoid accessing locked data
• Interacts with log manager to log data
Expected from databases
• Speed
• Scale
• Stability
• Evolution
• Reliability
• Cost Effectiveness
ACID properties
• Atomicity (Work as a unit. Yes or No – Nothing else)
• Consistency
• Isolation (No interaction among users)
• Durability
Comparison with File System?
• Can file system provide all the tasks required by DBMS?
Collection of
Files
60’s
Hierarchical (tree-based) Network (graph)
70's
Choice for most new
80's Relational applications
90’s
Object Bases
now Knowledge Bases
Relational Model Attributes
Name Street City gpa
Student-id
• Example of tabular data in the relational model
192-83-7465 Johnson
Alma Palo Alto 3.6
019-28-3746 Smith
North Rye 2.7
192-83-7465 Johnson
Alma Palo Alto 3.2
321-12-3123 Jones
Main Harrison 4.0
019-28-3746 Smith
North Rye 3.45
Object Oriented Model
• Class Person{
public:
Person();
~Person();
float GetSalary();
float PutSalary(float&);
string Name;
int SSN;
date BirthDate;
private:
float salary;
}
Types of Data
• Structured
• Relational
• Semi-structured
• CSV
• JSON
XML etc.
• Unstructured
• Emails
• Documents
• PDFs etc.
• Binary Data
• Images
• Videos etc.
Flash Names
• Database/DB
• Database Management System (DBMS)
• Data scientist
• Data Anomalies
• Data Inconsistency
• Data Redundancy
Till now, we have
used files to store
Business Application Architecture data
Application Layer
(Web, Business Logic Data Access Layer Data
Desktop,Console)
Lets dive into some
examples
UnPack YouTube Database
Result list
Rea
Lorem
Video & description,
ipsum
d # Views, Likes
Example
Unpack Lorem
Youtu
ipsum
Youtube DB be DB
congue
Lorem
Modify
Lorem
Learn
ipsum ipsum
Related videos Upload
Relevant ads Like, Review
Every minute
on the
Internet
as per 2021
Example
Self Driving
Front panel metrics
Rea
Lorem Speed, distance
ipsum
d ETA
Example
Unpack Cars Lorem
Cars
ipsum
DB DB
congue
Lorem
Modify
Lorem
Learn
ipsum ipsum
Collisions Road models
Traffic signals Drive models
Example
Unpack
ATM DB:
Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Transfer $3k from a10 to a20:
1 Debit $3k from a10
2 Credit $3k to a20
Transactions
Example
Scenarios
1. Crash before 1?
2. After 1 but before 2? [Bad!! a10: 17,000, a20:
15,000]
3. After 2?
DBs are often optimized for key use cases
L
L L
o
o o
r
r r
e
e e
m
m m
Goals of L L L L
L L
o o o o
o o i
i i r r
r r r r p
p p e e
e e e e s
s s m DB m
m DB m m DB m u
u u
Special i
p
m
i
p
i
p
m i
p
i
p
m
i
p
Databases s s s s
s s
u u u u
u u
m m m m
m m
Store current data Optimize historical Run batch
(e.g., lot of reads) data (e.g., logs) Workloads
(e.g. training)
> 100 viable data engines on market
(MySql, Postgres, Oracle, IBM/SAP to
data clouds on AWS/Azure, GCP, to Spark, Cockroach/Spanner, Mongo, ….)
For
Whom?
For
What?
How?
How?
Example
Mobile Game
Report &
Share
Business/Produc
Real-Time
Game App User Events
DB
DBMS t Analysis
DB v0 Q1: 1000 users/sec? Q7: How to model/evolve game Q4: Which user
Q2: Offline? data? cohorts?
Q3: Support v1, v1’ Q8: How to scale to millions of Q5: Next features to
versions? users? build? Experiments to
Q9: When machines die, restore run?
game state gracefully? Q6: Predict ads
demand?
App designer Systems Product/Biz
designer designer
How?
Example Mobile Game Data
Exploration
Cloud Datalab
Report &
Share
Game App DB Business/Produc
Real-Time 4 t Analysis
User Events Data Analytics
1 Processing Engine
E-T-L, Dataflow BigQuery
2 3
Data system
“v1” on Cloud
1 Log user actions 3 Run queries in a
peta scale analytics
2 Store in DB, after system
Extract-Transform-
Load 4 Visualize query
How?
Example
Game App Mobile Game Data
Exploration Report &
Cloud Datalab Share
DB Business/Produc
4 t Analysis
Data Sync Data
Analytics
Processing
Engine
1 MySQL,
BigQuery
Dataflow
2 3
Data system 0 Real-Time
User Events
“v2” Cloud +
Local Local DB
0 Log user actions 3 Run queries in a
In local DB petabyte scale analytics
system
1 Data sync to cloud
4 Visualize query results
2 Store in DB, after
ETL
DBs - General + Optimized
Summary Data System - Connect DBs
to solve a problem
Data bases
Data systems
Data lakes Data Lake - Set of Data
Systems for different data
(e.g., Netflix has HD movies
(1GB?) and user logs)
Data Lake and Warehouse
Data Lake:
Repository of structured, unstructured, semi-structured or binary
data.
Data warehouse:
Repository of processed or structured data.
Data Lake - Set of Data
Systems for different data
(e.g., Netflix has HD movies
(1GB?) and user logs)