Module - 1
Module - 1
L:T:P(Hours/Week)
Credits:4 Total Contact Hours: 75 Max Marks: 100
3:0:2
Course Objective:
The course is intended to impart knowledge on database fundamentals, develop skills in
designing databases and apply SQL for database manipulation.
Module I 22 Hours
Foundations of DBMS: File System versus Database approach – Database applications
– View of Data – Database Languages (DDL, DCL, DML, TCL) – Database Design – Data
storage and querying – Architecture –Database Users and Administrators.
Relational Model: Terminology – Structure of Relational Database – Keys – Integrity
Constraints – Schema Diagrams – Relational operations.ER Modeling: Design Process –
Entity Types – Relationship Types – Attributes – Structural Constraints – Reduction to
Relational Schemas – Design Issues.
SQL Data Manipulation: Overview of Query Language – Data Types – Data Definition –
SQL Queries – Aggregate functions – Nested Queries – Joins – Views – Integrity
Constraints – Authorization.
Advanced SQL: SQL Programming Language – Functions and procedures – Cursors –
Triggers – Accessing SQL from a Programming Language – SQL vs NoSQL.
Module II 23 Hours
Normalization: Purpose – Data Redundancy and Update Anomalies – Functional
Dependencies – Normalization Process – 1NF, 2NF, 3NF, BCNF.
Data Storage: Storage Media – RAID – Database Buffer – Indexing and Hashing.
Query Processing: Query Decomposition – Cost Estimation – Query Optimization.
Transaction and Concurrency Control: Transaction properties – Locking methods –
Deadlock – Timestamp Methods – Validation Protocols – Consistency – Granularity.
Recovery System: Failure Classification – Recovery facilities – Recovery Techniques.
Introduction to Advanced Database concepts: Document database – Graph QL –
Database Optimization.
7
Course Outcomes Cognitive
At the end of this course, students will be able to: Level
CO1: Design ER models using various constructs to simulate the real world Apply
databases.
CO2: Formulate structured and optimized queries to manipulate databases. Apply
CO3: Investigate the dependencies in a database and normalize to
Analyze
appropriate level.
CO4: Compare and contrast the various locking facilities to perform
Evaluate
concurrent transactions on databases.
CO5: Build fault tolerant optimized databases using SQL by analyzing the
various database functionalities as an individual or team for real Create
world applications.
Text Book(s):
T1. A Silberschatz, H Korth, S Sudarshan, “Database System Concepts”, 7th Edition,
McGraw- Hill, 2019.
T2. Thomas Connolly, Carolyn Begg, “Database Systems: A Practical Approach to
Design, Implementation and Management”, 6th Edition, Pearson Education, 2015.
Reference Book(s):
R1. Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database Systems”,
7th Edition, Pearson Education, 2017.
R2. C.J. Date, A. Kannan and S. Swamynathan, “An Introduction to “Database Systems”,
8th Edition, Pearson Education, 2006.
Web References:
1. Introduction to Database Systems:
http://www.inf.unibz.it/~nutt/IDBs1011/idbs-slides.html
2. NPTEL lecture videos and notes:
https://onlinecourses.nptel.ac.in/noc23_cs79/
3. SQL practice exercises with solutions:
https://www.w3resource.com/sql-exercises/
CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 - - 3 - - - - - - - - - - -
CO2 3 - - - - - 2 - - - - - - -
CO3 - - - 3 - - - - - - - - - -
CO4 - 3 - - - - - - - - - - - -
CO5 - - - - 3 1 - - 3 3 - - 3 -
High-3; Medium-2; Low-1
8
Dr.Mahalingam College of Engineering and Technology,Pollachi
Department of Computer Science and Engineering
23CSI301-Database Systems
Module 1
Foundations of DBMS
File System versus Database approach:
File Management System Database Management System
File System is a general, easy-to-use system Database management system is used when
to store general files which require less security constraints are high.
security and constraints.
Data Redundancy is more in file Data Redundancy is less in database
management system. management system
Data Inconsistency is more in file system Data Inconsistency is less in database
management system
Centralisation is hard to get when it comes Centralisation is achieved in Database
to File Management System Management System
User locates the physical address of the files In Database Management System, user is
to access data in File Management System. unaware of physical address where data is
stored.
Security is low in File Management System. Security is high in Database Management System
Database applications:
The earliest database systems arose in the 1960s in response to the computerized
management of commercial data.
Database systems are used to manage collections of data that:
• are highly valuable,
• are relatively large, and
• are accessed by multiple users and applications, often at the same time.
The first database applications had only simple, precisely formatted, structureddata.
Today, database applications may include data with complex relationships and a more variable
structure. As an example of an application with structured data, consider a university’s records
regarding courses, students, and course registration. The university keeps the same type of
information about each course: course-identifier, title, department, course number, etc., and
similarly for students: student-identifier, name, address, phone, etc. Course registration is a
collection of pairs: one course identifier and one student identifier. Information of this sort has a
standard, repeating structure and is representative of the type of database applications that go
back to the 1960s.
Managing complexity is challenging, not only in the management of data but in any domain. Key
to the management of complexity is the concept of abstraction. Abstraction allows a person to use
a complex device or system without having to know thedetails of how that device or system is
constructed.
Here are some representative applications:
Enterprise Information
Sales: For customer, product, and purchase information
Accounting: For payments, receipts, account balances, assets, and other
accountinginformation.
Human resources: For information about employees, salaries, payroll taxes, andbenefits,
and for generation of paychecks.
Manufacturing: For management of the supply chain and for tracking productionof items
in factories, inventories of items in warehouses and stores, and orders foritems.s
Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation ofmonthly
statements.
Finance: For storing information about holdings, sales, and purchases of
financialinstruments such as stocks and bonds; also for storing real-time marketdata to
enable online trading by customers and automated trading by the firm.
Universities:
For student information, course registrations, and grades (in additionto standard enterprise
information such as human resources and accounting).
Airlines:
For reservations and schedule information. Airlines were among the firstto use databases in a
geographically distributed manner.
Telecommunication:
For keeping records of calls, texts, and data usage, generatingmonthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
View of Data
• A database system is a collection of interrelated data and
• A set of programs that allow users to access and modify these data.
• A major purpose of a database system is to provide users with an abstract view of the data.
• That is, the system hides certain details of how the data are stored and maintained.
Three views of Data
• Data Abstraction
• Instances and Schemas
• Data Models
1.Data Abstraction:
The main purpose of data abstraction is to hide irrelevant data and provide an abstract
view of the data. With the help of data abstraction, developers hide irrelevant data from the user
and provide them the relevant data.
Levels of Abstraction:
• Physical Level
• Logical Level
• View Level
Physical Level:
The lowest level of abstraction describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
Logical Level:The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data.
View Level:
• The highest level of abstraction describes only part of the entire database.
• Even though the logical level uses simpler structures, complexity remains because of the
variety of information stored in a large database.
• Many users of the database system do not need all this information; instead, they need to
access only a part of the database.
• The view level of abstraction exists to simplify their interaction with the system.
Data-Definition Language:
The data values stored in the database must satisfy certain consistency constraints.For example,
suppose the university requires that the account balance of a departmentmust never be negative.
The DDL provides facilities to specify such constraints.
Domain Constraints:A domain of possible values must be associated with every
attribute (for example, integer types, character types, date/time types).Declaring an attribute to
be of a particular domain acts as a constraint on the values that itcan take. Domain constraints
are themost elementary form of integrity constraint. They are tested easily by the system
whenever a new data item is entered into thedatabase.
Referential Integrity:There are cases where we wish to ensure that a value that appears
in one relation for a given set of attributes also appears in a certain setof attributes in another
relation (referential integrity).
Authorization:We may want to differentiate among the users as far as the type ofaccess
they are permitted on various data values in the database. These differentiations are expressed in
terms of authorization, the most common being: read authorization, which allows reading, but
not modification, of data; insert authorization,which allows insertion of new data, but not
modification of existing data; update authorization, which allows modification, but not deletion,
of data; and delete authorization,which allows deletion of data
Data-Manipulation Language:
Data-manipulation language (DML) is a language that enables users to access or
manipulatedata as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database.
• Insertion of new information into the database.
• Deletion of information from the database.
• Modification of information stored in the database.
Database Design:
• Database design mainly involves the design of the database schema
• A high-level data model provides the database designer with a conceptual framework in
which to specify the data requirements of the database users and how the database will be
structured to fulfill these requirements
• The initial phase of database design, then, is to characterize fully the data needs of the
prospective database users.
• The database designer needs to interact extensively with domain experts and users to
carry out this task. The outcome of this phase is a specification of user requirements.
• Next, the designer chooses a data model, and by applying the concepts of the chosen data
model, translates these requirements into a conceptual schema of the database.
• The schema developed at this conceptual-design phase provides a detailed overview of the
enterprise.
• The designer reviews the schema to confirm that all data requirements are indeed
satisfied and are not in conflict with one another. The designer can also examine the
design to remove any redundant features.
Two types of designer
logical database designers.
physical database designers.
logical database designers:
Identifying the data (that is, the entities and attributes), the relationships between the
data, and the constraints on the data that is to be stored in the database.(What data)
Student(Stud_Rno,Name,DoB)
Staff(Staff_Rno,Stud_Rno,Course_Taught)
physical database designers:
• Decides how the logical database design is to be physically realized.(how data)
• Selecting specific storage structures and access methods for the data to achieve good
performance; Eg:Rno-10 digits
• Designing any security measures required on the data.
Transaction Management:
several operations on the database form a single logical unit of work.
An example is a funds transfer in which one account A is debited and another account B is
credited. Clearly, it is essential that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must happen in its entirety or not at all. This all-or-none requirement
is called atomicity. In addition, it is essential that the execution of the funds transfer preserves the
consistency of the database. That is, the value of the sum of the balances of A and B must be
preserved. This correctness requirement is called consistency. Finally, after the successful
execution of a funds transfer, the new values of the balances of accounts A and B must persist,
despite the possibility of system failure. This persistence requirement is called durability.
A transaction is a collection of operations that performs a single logical function in a database
application. Each transaction is a unit of both atomicity and consistency.
Ensuring the atomicity and durability properties is the responsibility of the database system
itself—specifically, of the recovery manager. In the absence of failures, all transactions complete
successfully, and atomicity is achieved easily.
when several transactions update the database concurrently, the consistency of data may no
longer be preserved, even though each individual transaction is correct. It is the responsibility of
the concurrency-control manager to control the interaction among the concurrent
transactions, to ensure the consistency of the database.
Architecture:
Computer system on which the database system runs. Database systems can be centralized, or
client-server, where one server machine executes work on behalf of multiple client
machines.
Database systems can also be designed to exploit parallel _x0002_ computer architectures.
Distributed databases span multiple geographically separated machines.
Earlier-generation database applications used a two-tier architecture, wherethe application
resides at the client machine, and invokes database system functionality at the server machine
through query language statements. In contrast, modern database applications use a three-tier
architecture, where the client machine acts asmerely a front end and does not contain any direct
database calls; web browsers and mobile applications are the most commonly used application
clients today. The front end communicates with an application server. The application server, in
turn, communicates with a database system to access data. The business logic of the application,
which says what actions to carry out under what conditions, is embedded in the application
server, instead of being distributed across multiple clients.
Na¨ıve users are unsophisticated users who interact with the system by using predefined
user interfaces, such as web ormobile applications
Application programmers are computer professionals who write application
programs.Application programmers can choose from many tools to develop user interface
Sophisticated users interact with the system without writing programs. Instead, they
form their requests either using a database query language or by using toolssuch as data
analysis software. Analysts who submit queries to explore data in thedatabase fall in this
category.
Database Administrator:
One of the main reasons for using DBMSs is to have central control of both the data
and the programs that access those data. A person who has such central control over
the system is called a database administrator (DBA).
The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing aset of
data definition statements in the DDL.
• Storage structure and access-method definition. The DBA may specify some
parameters pertaining to the physical organization of the data and the indices to be
created.
• Schema and physical-organization modification. The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization,or to alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database
varioususers can access. The authorization information is kept in a special system
structurethat the database system consults whenever a user tries to access the
data inthe system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
• Periodically backing up the database onto remote servers, to prevent loss ofdata in
case of disasters such as flooding.
• Ensuring that enough free disk space is available for normal operations,
andupgrading disk space as required.
• Monitoring jobs running on the database and ensuring that performance is
notdegraded by very expensive tasks submitted by some users.
Relational Model
Terminology:
• Relational Model represents how data is stored in Relational Databases.
• It is the second generation of DBMS and is based on the relational data model.
• In the relational model, all data is logically structured withinrelations(tables)
• Each relation has a name and is made up of named attributes (columns) of data.
• Each tuple (row) contains one value per attribute.
• Popular examples of standard relational db include Microsoft SQL Server, Oracle Database,
MySQL and IBM DB2.
Keys:
• A superkey is a set of one or more attributes that, taken collectively, allow us to identify
uniquely a tuple in the relation.For example, the ID attribute of the relation instructor is
sufficient to distinguish one instructor tuple from another. Thus, ID is a superkey. The
name attribute of instructor, on the other hand, is not a superkey, because several
instructors might have the same name.
• A superkey may contain extraneous attributes. For example, the combination of ID and
name is a superkey for the relation instructor. If K is a superkey, then so is anysuperset of
K. We are often interested in superkeys for which no proper subset is a superkey. Such
minimal superkeys are called candidate keys.
• The term primary key to denote a candidate key that is chosen by the database designer
as the principal means of identifying tuples within a relation. A key (whether primary,
candidate, or super) is a property of the entire relation, rather than of the individual
tuples. Any two individual tuples in the relation are prohibited from having the same value
on the key attributes at the same time. The designation of a key represents a constraint in
the real-world enterprise being modeled. Thus, primary keys are also referred to as
primary key constraints.
• A foreign-key constraint from attribute(s) A of relation r1 to the primary-key B of
relation r2 states that on any database instance, the value of A for each tuple in r1 must
also be the value of B for some tuple in r2. Attribute set A is called a foreign key from r1
referencing r2. The relation r1 is also called the referencing relation of the foreign-key
constraint, and r2 is called the referenced relation.
Integrity Constraints:
Integrity constraints ensure that changes made to the database by authorized users do
not result in a loss of data consistency. Thus, integrity constraints guard against accidental
damage to the database. This is in contrast to security constraints, which guard against access to
the database by unauthorized users.
Unique Constraint:
SQL also supports an integrity constraint:
unique (Aj1, Aj2,…, Ajm)
The unique specification says that attributes Aj1, Aj2,…,Ajm form a superkey; that is, no
two tuples in the relation can be equal on all the listed attributes. However, attributes declared as
unique are permitted to be null unless they have explicitly been declared to be not null. Recall
that a null value does not equal any other value.
Here, we use the check clause to simulate an enumerated type by specifying that semester
must be one of 'Fall', 'Winter', 'Spring', or 'Summer'. Thus, the check clause permits attribute
domains to be restricted in powerful ways that most programminglanguage type systems do not
permit. Null values present an interesting special case in the evaluation of a check clause. A check
clause is satisfied if it is not false, so clauses that evaluate to unknown are not violations. If null
values are not desired, a separate not null constraint must be specified.
Referential Integrity:
Ensure that a value that appears in one relation (the referencing relation) for a given set of
attributes also appears for a certain set of attributes in another relation (the referenced
relation)such conditionsare called referential integrity constraints, and foreign keys are a form
of a referential integrity constraint where the referenced attributes form a primary key of the
referenced relation. Foreign keys can be specified as part of the SQL create table statement by
using the foreign key clause,
“foreign key (dept name) references department”.
Schema Diagrams:
A database schema, along with primary key and foreign-key constraints, can be depicted
by schema diagrams.The schema diagram for our universityorganization. Each relation appears
as a box, with the relation name at the top in blueand the attributes listed inside the box.Primary-
key attributes are shown underlined. Foreign-key constraints appear asarrows from the foreign-
key attributes of the referencing relation to the primary key ofthe referenced relation. We use a
two-headed arrow, instead of a single-headed arrow,to indicate a referential integrity constraint
that is not a foreign-key constraint. The line with a two-headed arrow from time slot id in the
section relation totime slot id in the time slot relation represents the referential integrity
constraint fromsection.time slot id to time slot.Many database systems provide design tools with
a graphical user interface forcreating schema diagrams.2 We shall discuss a different
diagrammatic representationof schemas, called the entity-relationship diagram, at
length.althoughthere are some similarities in appearance, these two notations are quite different,
and should not be confused for one another.
Relational operations:
The relational algebra consists of a set of operations that take one or two relations asinput and
produce a new relation as their result. Some of these operations, such as the select, project, and
rename operations, are called unary operations because they operate on one relation. The other
operations,
such as union, Cartesian product, and set difference, operate on pairs of relations and are,
therefore, called binary operations. Although the relational algebra operations form the basis for
the widely used SQL query language, database systems do not allow users to write queries in
relational algebra.
In general, we allow comparisons using =, ≠, <, ≤, >, and ≥ in the selection predicate. Furthermore,
we can combine several predicates into a larger predicate by using the connectives and (∧), or
(∨), and not (¬). Thus, to find the instructors in Physics with a salary greater than $90,000, we
write:
Instead, the predicate in the where clause is used to restrict the combinations created by the
Cartesian product to those that are meaningful for the desired answer.
We would likely want a query involving instructor and teaches to combine a particular tuplet in
instructor with only those tuples in teaches that refer to the same instructor to which t refers.
That is, we wish only to match teaches tuples with instructor tuples that have the same ID value.
The following SQL query ensures this condition and outputs the instructor name and course
identifiers from such matching tuples.
select name, course id
from instructor, teaches
where instructor.ID= teaches.ID;
The Rename Operation:
Consider again the query that we used earlier:
select name, course id
from instructor, teaches
where instructor.ID= teaches.ID;
The result of this query is a relation with the following attributes:
name, course id
The names of the attributes in the result are derived from the names of the attributes in the
relations in the from clause.
Set Operations:
The SQL operations union, intersect, and except operate on relations and correspond to the
mathematical set operations ∪, ∩, and −.We shall now construct queries involving the union,
intersect, and except operations over two sets.
• The set of all courses taught in the Fall 2017 semester:
select course id
from section
where semester = 'Fall' and year= 2017;
• The set of all courses taught in the Spring 2018 semester:
select course id
from section
where semester = 'Spring' and year= 2018;
(select course idfrom section where semester = 'Fall' and year= 2017)union
(select course id from section where semester = 'Spring' and year= 2018);
The union operation automatically eliminates duplicates, unlike the select clause.
The Intersect Operation:
To find the set of all courses taught in both the Fall 2017 and Spring 2018, we write:
(select course id from section where semester = 'Fall' and year= 2017)intersect
(select course id from section where semester = 'Spring' and year= 2018);
Set difference:
The Set difference operation defines a relation consisting of the tuples that are in relation R, but
not in S. R and S must be union-compatible
Aggregate Functions:
Aggregate functions are functions that take a collection (a set or multiset) of values as
input and return a single value. SQL offers five standard built-in aggregate functions:9
Basic Aggregation:
Consider the query “Find the average salary of instructors in the Computer Science
department.” We write this query as follows:
selectavg (salary)
from instructor
wheredept name = 'Comp. Sci.';
The result of this query is a relation with a single attribute containing a single tuple with a
numerical value corresponding to the average salary of instructors in the Computer Science
department
TheHavingClause:
At times, it is useful to state a condition that applies to groups rather than to tuples. For
example, we might be interested in only those departments where the average salary of the
instructors is more than $42,000. This condition does not apply to a single tuple; rather, it applies
to each group constructed by the group by clause. To express such a query, we use the having
clause of SQL. SQL applies predicates in the having clause after groups have been formed, so
aggregate functions may be used in the having clause.
We express this query in SQL as follows:
selectdept name, avg (salary) as avg salary
from instructor
group by dept name
havingavg(salary)>42000;
Join Expressions:
• The Join operation, which combines two relations to form a new relation, is one of the
essential operations in the relational algebra.
• There are various forms of the Join operation
Theta join
Equijoin (a particular type of Theta join)
Natural join
Outer join
Semijoin
Outer join:
• Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
• The (left) Outer join is a join in which tuples from R that do not have matching values in
the common attributes of S are also included in the result relation.
• Missing values in the second relation are set to null.
Semi join:
The Semijoin operation defines a relation that contains the tuples of R .R that participate in the
join of R with S satisfying the predicate F.
Division Operation:
• The Division operation defines a relation over the attributes C that consists of the set of
tuples from R that match the combination of every tuple in S.
ER Modeling:
ER modeling is a top-down approach to database design that begins by identifying the important
data called entities and relationships between the data that must be represented in the model.
We then add more details, such as the information we want to hold about the entities and
relationships called attributes and any constraints on the entities, relationships, and attributes.
Entity:
Any thing that has an independent existence and about which we collect data. It is also known
as entity type. In ER modeling, notation for entity is given below.
Entity instance:
Entity instance is a particular member of the entity type.
Example for entity instance : A particular employee Regular Entity
Weak entity:
An entity which depends on other entity for its existence and doesn't have any key
attribute of its own is a weakentity.
Example for a weak entity : In a parent/child relationship, a parent is considered as a
strong entity and the child is a weak entity.
In ER modeling, notation for weak entity is given below.
Attributes:
Properties/characteristics which describe entities are called attributes. In ER modeling, notation
for attribute is given below.
Domain of Attributes:
The set of possible values that an attribute can take is called the domain of the attribute. For
example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence
this set can be termed as the domain of the attribute day.
Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is
called key attribute.
E.g the employee_id of an employee, pan_card_number of a person etc.If the key attribute
consists of two or more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.
Simple attribute:
If an attribute cannot be divided into simpler components, it is a simple attribute.
Example for simple attribute :employee_id of an employee.
Composite attribute:
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into
First_name, Middle_name, and Last_name.
Multi-valued Attributes:
If an attribute can take more than one value for each entity instance, it is a multi-valued
attribute.
Multi-valuedexample for multi valued attribute : telephone number of an employee, a
particular employee may have multiple telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.
Derived Attribute:
An attribute which can be calculated or derived based on other attributes is a derived
attribute.Example for derived attribute : age of employee which can be calculated from date of
birth and current date. In ER modeling, notation for derived attribute is given below.
Relationships:
Associations between entities are called relationships
Example : An employee works for an organization. Here "works for" is a relation between
the entities employee and organization.
In ER modeling, notation for relationship is given below.
However in ER Modeling, To connect a weak Entity with others, you should use a weak
relationship notation as given below.
Degree of a Relationship:
Degree of a relationship is the number of entity types involved. The n-ary relationship
is the general form for degree n. Special cases are unary, binary, and ternary ,where the
degree is 1, 2, and 3, respectively.
Example for unary relationship : An employee ia a manager of another employee
Example for binary relationship : An employee works-for department.
Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship:
Relationship cardinalities specify how many of each entity type is allowed.
Relationships can have four possible connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship
The minimum and maximum values of this connectivity is called the cardinality of the
relationship
One employee is assigned with only one parking space and one parking space is assigned to
only one employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1).In ER
modeling, this can be mentioned using notations as given below
One organization can have many employees , but one employee works in only one organization.
Hence it is a 1:N relationship and cardinality is One-To-Many (1:N).In ER modeling, this can be
mentioned using notations as given below
Example for Cardinality – Many-to-One (M :1)
It is the reverse of the One to Many relationship. employee works in organization.
One employee works in only one organization But one organization can have many employees.
Hence it is a M:1 relationship and cardinality is Many-to-One (M :1)
In ER modeling, this can be mentioned using notations as given below.
Relationship Participation:
1. Total
In total participation, every entity instance will be connected through the relationship to
another instance of the other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of
the department. In other words, only few instances of employee entity participate in the above
relationship. So employee entity's participation is partial in the said relationship. However each
department will be headed by some employee. So department entity's participation is total in the
said relationship.
Extended ER Features:
Specialization
Top-down design process; we designate sub-groupings within an entity set that are
distinctive from other entities in the set.
These sub-groupings become lower-level entity sets that have attributes or participate in
relationships that do not apply to the higher-level entity set.
Depicted by a trianglecomponent labeled ISA (e.g., instructor“is a” person).
Attribute inheritance–a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.
Generalization:
A bottom-up design process–combine a number of entity sets that share the same
features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are
represented in an E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.
Design Process:
Initial phase -- characterize fully the data needs of the prospective database users.
Second phase -- choosing a data model
Applying the concepts of the chosen data model
Translating these requirements into a conceptual schema of the database.
A fully developed conceptual schema indicates the functional requirements of
the enterprise.
Describe the kinds of operations (or transactions) that will be performed on the data.
Final Phase -- Moving from an abstract data model to the implementation of the database
• Logical Design – Decidingon the database schema.
Database design requires that we find a “good” collection of relation
schemas.
Business decision – What attributes should we record in the database?
Computer Science decision – What relation schemas should we have and
how should the attributes be distributed among the various relation
schemas?
• Physical Design – Deciding on the physical layout of the database
In designing a database schema, we must ensure that we avoid two major pitfalls:
• Redundancy: a bad design may result in repeat information.
Redundant representation of information may lead to data inconsistency
among the various copies of information
• Incompleteness: a bad design may make certain aspects of the enterprise difficult
or impossible to model.
Avoiding bad designs is not enough. There may be a large number of good designs from
which we must choose.
Structural Constraints:
The constraints should reflect the restrictions on the relationships as perceived in the “real
world.” Examples of such constraints include the requirements that a property for rent must have
an owner and each branch must have staff. The main type of constraint on relationships is called
multiplicity.
Multiplicity- The number (or range) of possible occurrences of an entity type that may relate to
a single occurrence of an associated entity type through a particular relationship.
the most common degree for relationships is binary. Binary relationships are generally referred
to as being one-to-one (1:1), one-tomany (1:*), or many-to-many (*:*).
We examine these three types of relationships using the following integrity constraints:
• a member of staff manages a branch (1:1);
• a member of staff oversees properties for rent (1:*);
• newspapers advertise properties for rent (*:*).
Multiplicity(complexrelationship):
The number (or range) of possible occurrences of an entity type in an n-ary relationship when
the other (n–1) values are fixed.
Cardinality: Describes the maximum number of possible relationship occurrences for an entity
participating in a given relationship type.
Composite attributes are flattened out by creating a separate attribute for each
component attribute
Example: given entity set instructorwith composite attribute namewith component
attributes first_nameand last_namethe schema corresponding to the entity set has two
attributes name_first_nameand name_last_name
Prefix omitted if there is no ambiguity (name_first_namecould be first_name
Ignoring multivalued attributes, extended instructor schema is instructor(ID, first_name,
middle_initial, last_name,street_number, street_name, apt_number, city, state, zip_code,
date_of_birth)
Example:
Representation of Entity Sets with Multivalued Attributes:
A multivalued attribute Mof an entity Eis represented by a separate schema EM
Schema EMhas attributes corresponding to the primary key of Eand an attribute
corresponding to multivalued attribute M
Example: Multivalued attribute phone_numberof instructoris represented by a
schema:inst_phone= (ID, phone_number)
Each value of the multivalued attribute maps to a separate tuple of the relation on schema
EM
For example, an instructorentity with primary key 22222 and phone numbers 456-7890
and 123-4567 maps to two tuples: (22222, 456-7890) and (22222, 123-4567)
Redundancy of Schemas
Many-to-one and one-to-many relationship sets that are total on the many-side can be
represented by adding an extra attribute to the “many” side, containing the primary key of
the “one” side
Example: Instead of creating a schema for relationship set inst_dept, add an attribute
dept_nameto the schema arising from entity set instructor
For one-to-one relationship sets, either side can be chosen to act as the “many” side
That is, an extra attribute can be added to either of the tables corresponding to the two
entity sets
If participation is partial on the “many” side, replacing a schema by an extra attribute in
the schema corresponding to the “many” side could result in null values
The schema corresponding to a relationship set linking a weak entity set to its identifying
strong entity set is redundant.
Example: The section schema already contains the attributes that would appear in
sec_course schema
Design Issues:
Common Mistakes in E-R Diagram:
Entities vs. Attributes:
Use of entity sets vs. attributes
Use of phone as an entity allows extra information about phone numbers (plus multiple
phone numbers)
Entities Vs Relationships:
Use of entity sets vs. relationship sets:Possible guideline is to designate a relationship set
to describe an action that occurs between entities
Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later
standards and special proprietary features. Not all examples here may work on your particular
system.
SQL Parts:
DML --provides the ability to query information from the database and to insert tuples
into, delete tuples from, and modify tuples in the database.
integrity –the DDL includes commands for specifying integrity constraints.
View definition --The DDL includes commands for defining views.
Transaction control –includes commands for specifying the beginning and ending of
transactions.
Embedded SQL and dynamic SQL --define how SQL statements can be embedded within
general-purpose programming languages.
Authorization –includes commands for specifying access rights to relations and views.
Ai represents an attribute Rirepresents a relation Pis a predicate. The result of an SQL query is a
relation.
The select Clause:
o The selectclause lists the attributes desired in the result of a query
corresponds to the projection operation of the relational algebra
Example: find the names of all instructors:
select namefrom instructor
o NOTE: SQL names are case insensitive (i.e., you may use upper-or lower-case letters.) E.g.,
Name≡ NAME≡ name
Some people use upper case wherever we use bold font.
o SQL allows duplicates in relations as well as in query results.
o To force the elimination of duplicates, insert the keyword distinctafter select.
o Find the department names of all instructors, and remove duplicates
select distinct dept_namefrom instructor
o The keyword all specifies that duplicates should not be removed.
select alldept_namefrom instructor
o An asterisk in the select clause denotes “all attributes”
select *from instructor
o An attribute can be a literal with no from clause
select '437'
o Results is a table with one column and a single row with value “437”
o Can give the column a name using:
select '437' as FOO
o An attribute can be a literal with from clause
select 'A'from instructor
o Result is a table with one column and Nrows (number of tuples in the instructorstable),
each row with value “A”
Examples:
String Operations:
SQL includes a string-matching operator for comparisons on character strings. The
operator likeuses patterns that are described using two special characters:
o percent ( % ). The % character matches any substring.
o underscore ( _ ). The _ character matches any character.
Find the names of all instructors whose name includes the substring “dar”.
o select namefrom instructorwherename like '%dar%'
Match the string “100%”
o like '100 \%' escape '\'
in that above we use backslash (\) as the escape character.
o Patterns are case sensitive.
o Pattern matching examples:
'Intro%' matches any string beginning with “Intro”.
'%Comp%' matches any string containing “Comp” as a substring.
'_ _ _' matches any string of exactly three characters.
'_ _ _ %' matches any string of at least three characters.
Set Operations:
Find courses that ran in Fall 2017 or in Spring 2018
(selectcourse_idfrom section where sem= 'Fall' and year
=2017)union(selectcourse_idfrom section where sem= 'Spring' and year = 2018)
Find courses that ran in Fall 2017 but not in Spring 2018
(selectcourse_idfrom section where sem= 'Fall' and year =
2017)except(selectcourse_idfrom section where sem= 'Spring' and year = 2018)
Null Values:
It is possible for tuples to have a null value, denoted by null, for some of their attributes
nullsignifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving nullis null
o Example: 5 + nullreturns null
The predicate is nullcan be used to check for null values.
Example: Find all instructors whose salary is null.
selectnamefrominstructorwhere salary is null
The predicate is not null succeeds if the value on which it is applied is not null.
SQL treats as unknownthe result of any comparison involving a null value (other than
predicates is null and is not null).Example: 5 < nullornull<> nullornull= null
The predicate in a whereclause can involve Boolean operations (and, or, not); thus the
definitions of the Boolean operations need to be extended to deal with the value unknown.
o and :(trueand unknown) = unknown, (falseand unknown) = false,(unknown
andunknown) = unknown
o or: (unknownortrue) = true,(unknownorfalse) = unknown(unknown orunknown) =
unknown
Result of where clause predicate is treated as false if it evaluates to unknown
Aggregate Functions:
These functions operate on the multiset of values of a column of a relation, and return a value
o avg: average value
o min: minimum value
o max: maximum value
o sum: sum of values
o count: number of values
Find the average salary of instructors in the Computer Science department select
avg(salary)from instructorwhere dept_name= 'Comp. Sci.';
Find the total number of instructors who teach a course in the Spring 2018 semesterselect count
(distinct ID)from teacheswhere semester = 'Spring' and year = 2018;
Find the number of tuples in the course relationselect count (*)from course;
Group By:
Find the average salary of instructors in each departmentselect dept_name, avg(salary)
asavg_salaryfrom instructorgroup by dept_name;
Nested Subqueries:
o SQL provides a mechanism for the nesting of subqueries. A subqueryis a select-from-
whereexpression that is nested within another query.
o The nesting can be done in the following SQL query
select A1, A2, ..., An f
rom r1, r2, ..., rm
where P as follows:
From clause: rican be replaced by any valid subquery
Where clause: Pcan be replaced with an expression of the form:
B<operation> (subquery)
B is an attribute and <operation> to be defined later.
Select clause:
Ai can be replaced be a subquery that generates a single value.
Joined Relations:
o Join operationstake two relations and return as a result another relation.
o A join operation is a Cartesian product which requires that tuples in the two relations
match (under some condition). It also specifies the attributes that are present in the result
of the join
o The join operations are typically used as subquery expressions in the from clause
o Three types of joins:
Natural join
Inner join
Outer join
Unique Constraints :
unique( A1, A2, …, Am)The unique specification states that the attributes A1, A2, …, Am
form a candidate key.
Candidate keys are permitted to be null (in contrast to primary keys).
Referential Integrity:
Ensures that a value that appears in one relation for a given set of attributes also appears
for a certain set of attributes in another relation.
Example: If “Biology” is a department name appearing in one of the tuples in the
instructorrelation, then there exists a tuple in the departmentrelation for “Biology”.
Let A be a set of attributes. Let R and S be two relations that contain attributes A and
where A is the primary key of S. A is said to be a foreign keyof R if for any values of A
appearing in R these values also appear in S.
Foreign keys can be specified as part of the SQL createtable statement
foreign key (dept_name) references department
By default, a foreign key references the primary-key attributes of the referenced table.
SQL allows a list of attributes of the referenced relation to be specified explicitly.
foreign key (dept_name) references department (dept_name)
Authorization:
Privileges in SQL:
select: allows read access to relation, or the ability to query using the view
Example: grant users U1, U2, and U3selectauthorization on theinstructorrelation:
grant select on instructor to U1, U2, U3
insert: the ability to insert tuples
update: the ability to update using the SQL update statement
delete: the ability to delete tuples.
all privileges: used as a short form for all the allowable privileges
Advanced SQL
Accessing SQL from a Programming Language:
Not all queries can be expressed in SQL, since SQL does not provide the full expressive
power of a general-purpose language.
Non-declarative actions --such as printing a report, interacting with a user, or sending the
results of a query to a graphical user interface --cannot be done from within SQL.
A database programmer must have access to a general-purpose programming language for
at least two reasons
general-purpose program --can connect to and communicate with a database server using
a collection of functions
Embedded SQL --provides a means by which a program can interact with a database
server.
The SQL statements are translated at compile time into function calls.
At runtime, these function calls connect to the database using an API that provides
dynamic SQL facilities.
There are two approaches to accessing SQL from a general-purpose programming
language
Table Functions:
The SQL standard supports functions that can return tables as results; such functions are
called table functions
Example: Return all instructors in a given department
createfunctioninstructor_of(dept_namechar(20))
returnstable (
ID varchar(5),namevarchar(20),dept_namevarchar(20),salarynumeric(8,2))
returntable(selectID, name, dept_name,
salaryfrominstructorwhereinstructor.dept_name= instructor_of.dept_name)
Usage
select *from table (instructor_of('Music'))
For loop
Permits iteration over all results of a query
Example: Find the budget of all departmentsdeclare n integer default 0;for r asselect
budget from department where dept_name= 'Music' doset n = n + r.budget end for
Triggers:
A triggeris a statement that is executed automatically by the system as a side effect of a
modification to the database.
To design a trigger mechanism, we must:Specify the conditions under which the trigger is
to be executed.
Specify the actions to be taken when the trigger executes.
Triggers introduced to SQL standard in SQL:1999, but supported even earlier using non-
standard syntax by most databases.Syntax illustrated here may not work exactly on your
database system; check the system manuals
SQL vs NoSQL:
SQL NOSQL
Databases are categorized as Relational Database NoSQL databases are categorized as Non-
Management System (RDBMS). relational or distributed database system.
SQL databases have fixed or static or predefined NoSQL databases have dynamic schema.
schema.
SQL databases display data in form of tables so it NoSQL databases display data as collection of
is known as table-based database. key-value pair, documents, graph databases
or wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful language In NoSQL databases, collection of documents
"Structured Query Language" to define and are used to query the data. It is also called
manipulate the data. unstructured query language. It varies from
database to database.
SQL databases are best suited for complex queries. NoSQL databases are not so good for complex
queries because these are not as powerful as
SQL queries.
SQL databases are not best suited for hierarchical NoSQL databases are best suited for
data storage. hierarchical data storage.
MySQL, Oracle, Sqlite, PostgreSQL and MS-SQL etc. MongoDB, BigTable, Redis, RavenDB,
are the example of SQL database. Cassandra, Hbase, Neo4j, CouchDB etc. are the
example of nosql database