Drawbacks of using file system to store data Data redundancy and inconsistency-
multiple file format,duplicate info in diff files Difficulty in accessing data-new
program for each new task Data isolation-Multiple files and formats Integrity
problems-become “buried” in program code rather than being stated explicitly,Hard
to add new constraints or change existing ones Atomicity of updates-Failures leave
database in inconsistent state with partial updates carried out
Centralized-single computer system, no interact with other computer
systems.General-purpose computer system: one to a few CPUs and a number of
device controllers that are connected through a common bus that provides access to
shared memory.Single-user system:desk-top unit, single user has one CPU and one or
two hard disks; OS support only one user.Multi-user system: Serve large number of
users connected to the system via terminals. called server systems.
Client server-satisfy requests generated at m client systems, whose general structure
is shown in the dig.Database functionality can be divided into:Back-end: manages
access structures, query evaluation and optimization, concurrency control and
recovery.Front-end: consists of tools such as forms, report-writers, and graphical
user interface facilities
Parallel-multiple processors and multiple disks connected by a fast interconnection
network.A coarse-grain parallel machine-small number of powerful processors.A fine
grain parallel machinethousands of smaller processors.throughput --- no of tasks
completed in a given time interval.response time --- time to complete a single task
from the time it is submitted
Distributed-Data spread over multiple machines (also referred to as sites or
nodes).Network interconnects the machines.Data shared by users on multiple
machines.1-Homogeneous distributed databases.1-Same software/schema on all
sites, data may be partitioned among sites.2-Goal: provide a view of a single
database, hiding details of distribution.2-Heterogeneous distributed databases-1-
Different software/schema on different sites.2-Goal: integrate existing databases to
provide useful functionality Diff betn local global transactions-access data in single
site at which transaction initiated….access data at diff site where transaction not
initiated.
The logical schema-defines structure of data and the relationships between the
various attributes, tables, and entries. focuses on how the data is perceived and
organized conceptually.
the physical schema-defines how data is stored and managed on the physical hard
disk of devices the database is running on. related to the actual data structure used
to implement the logical schema of the database, such as B-Tree or Hashing.
Physical Data Independence:allows to modify to the physical schema without
affecting logical schema.Applications depend on logical schema, changes to physical
schema should not impact the way applications interact with the data.Well-defined
interfaces between different levels and components ensure that changes in one part
do not significantly affect others.
2. Logical Data Independence:allows changes to the conceptual schema without
changing external views, external API. changes at the conceptual level should be
absorbed by mapping between external and conceptual levels.Achieving logical data
independence is more challenging compared to physical data independence.
The advantages of using data models in database management systems are: 1.
Accurate Representation: ensure that data is represented accurately. 2. Well-Defined
Relationships: easier to understand and manage. 3. Minimized Data Redundancy: to
minimize data redundancy and easier to identify missing data. 4. Enhanced Security:
maintaining the security of the data, ensure its not compromised.
Database Users:Types: Naive Users, Application Programmers, Sophisticated Users,
Database Administrators.Each type of user has different levels of interaction and
access to the database system
Query Processing:Query Processing Steps: Parsing and Translation, Optimization,
Evaluation. process of translating user queries into efficient execution plans and
evaluating those plans to retrieve the required data. Storage Management:Storage
manager:to for provide interface betn low-level data stored in database and
application programs and queries submitted to system.Tasks include interaction with
the file manager, efficient storing, retrieving, and updating of data. Transaction
Management:collection of operations that perform a single logical function in a
database application.Components: Transaction Manager, Concurrency Control
Manager.transaction management ensures database remains in consistent state
despite system failures, while the concurrency-control manager controls interaction
among concurrent transactions to ensure database consistency.
ER diagram
A weak entity set-does not have primary key of its own. Its existence depends on the existence
of an identifying entity set. The relationship between the weak entity and the identifying entity
set is shown using a double diamond. The discriminator of a is underlined.The primary key is
formed by that of the strong entity set on which the weak is dependent, plus the weak entity
set’s discriminator. primary key of the weak includes pk of the strong and the discriminator of the
weak
Types of attributes:
Simple: Cannot be divided further
Complex:Composed of multiple simple attributes
Composite:Composed of sub attributes(hierarchical)
Derived:Attribute whose value can be calculated from other attribute
Key:Uniquely identifies an entity
Foreign Key:Reference to primary key in another table
Null valued:Attribute may not have a value for every entity instance
Multivalued:Can have multiple values for single entity instance.
Keys
Super key:Set of one or more columns that uniquely identify an entity
Candidate key:Similar to primary key.Each column value is unique.
Primary Key:Unique identifier of entity/table
Extended ER Features:
1. Specialization:top-down design process, subgroupings within an entity are designated, creating
lower-level entity sets with unique attributes or relationships.shown by a triangle
component.Attribute inheritance occurs, where a lower-level entity set inherits all the attributes
and relationship participation of the higher-level entity set it is linked to.
2. Generalization:bottom-up design process, combines multiple entity sets sharing the same
features into a higher-level entity set.inversion of specialization,represented in an E-R diagram in
the same way.specialization and generalization are used interchangeably.
3. Aggregation:abstraction of a relationship into a new entity, treating the relationship as an
abstract entity. eliminates redundancy by allowing relationships and represents overlapping info
in a more efficient manner.
● AND SELECT column1, column2, ... FROM table_name WHERE condition1 AND
condition2 AND condition3 ...; ●OR SELECT column1, column2, ... FROM table_name
WHERE condition1 OR condition2 OR condition3 ...; ●NOT SELECT column1, column2,
... FROM table_name WHERE NOT condition;
●NULL SELECT column_names FROM table_name WHERE column_name IS NULL; ●
SELECT column_names FROM table_name WHERE column_name IS NOT NULL;
● UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE condition
● DELETE FROM table_name WHERE condition; ● DELETE FROM table_name;
FOR MIN MAX AVG SUM ALL OF THEM
● SELECT MIN(column_name) FROM table_name WHERE condition;
●BETWEEN SELECT column_name(s) FROM table_name WHERE column_name
BETWEEN value1 AND value2;
●inner join SELECT column_name(s) FROM table1 INNER JOIN table2 ON
table1.column_name = table2.column_name;
●LEFT/RIGHT OUTER JOIN SELECT column_name(s) FROM table1 LEFT JOIN table2
ON table1.column_name = table2.column_name;
●UNION SELECT column_name(s) FROM table1 UNION SELECT column_name(s)
FROM table2;
● HAVING SELECT column_name(s) FROM table_name WHERE condition GROUP BY
column_name(s) HAVING condition ORDER BY column_name(s);
●GROUP BY SELECT column_name(s) FROM table_name WHERE condition GROUP
BY column_name(s) ORDER BY column_name(s);
Query processing involves translating high-level queries into low-level expressions
for use at the file system level, optimizing the query, and executing it to obtain
results. This process consists of parsing and translation, query optimization, code
generation, and execution within the runtime database processor.
Query processing involves translating high-level queries into low-level expressions
for file system use, optimizing query execution, and executing the query to obtain
results. The process comprises four steps: scanning and parsing to ensure syntax
correctness, query optimization to find efficient execution strategies, code
generation for execution, and finally, execution in the runtime database processor.
The optimizer considers various query plans, aiming to minimize elapsed time by
estimating costs such as disk accesses and CPU time. Once the plan is determined,
the code generator produces access routines, which are either interpreted or
compiled for execution. The runtime database processor then executes these
routines, returning results to the application.
Scanning and Parsing
• When a query is first submitted (via an applications program), it must be scanned
and parsed to
determine if the query consists of appropriate syntax.
• Scanning is the process of converting the query text into a tokenized
representation.
• The tokenized representation is more compact and is suitable for processing by the
parser. This
representation may be in a tree form.
• The Parser checks the tokenized representation for correct syntax. In this stage,
checks are made to
determine if columns and tables identified in the query exist in the database and if
the query has been
formed correctly with the appropriate keywords and structure.
• If the query passes the parsing checks, then it is passed on to the Query Optimizer.