KEMBAR78
Module - 1 | PDF | Databases | Relational Model
0% found this document useful (0 votes)
31 views54 pages

Module - 1

The Database Systems course (23CSI301) aims to teach students database fundamentals, design, and SQL for manipulation. It covers topics such as DBMS foundations, relational models, normalization, query processing, and advanced database concepts, along with practical experiments. By the end of the course, students will be able to design ER models, formulate optimized queries, and build fault-tolerant databases.

Uploaded by

hariharanstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views54 pages

Module - 1

The Database Systems course (23CSI301) aims to teach students database fundamentals, design, and SQL for manipulation. It covers topics such as DBMS foundations, relational models, normalization, query processing, and advanced database concepts, along with practical experiments. By the end of the course, students will be able to design ER models, formulate optimized queries, and build fault-tolerant databases.

Uploaded by

hariharanstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Course Title: Database Systems

Course Code: 23CSI301


(Common to CS & AD)

Course Category: Professional Core Course Level: Practice

L:T:P(Hours/Week)
Credits:4 Total Contact Hours: 75 Max Marks: 100
3:0:2
Course Objective:
The course is intended to impart knowledge on database fundamentals, develop skills in
designing databases and apply SQL for database manipulation.

Module I 22 Hours
Foundations of DBMS: File System versus Database approach – Database applications
– View of Data – Database Languages (DDL, DCL, DML, TCL) – Database Design – Data
storage and querying – Architecture –Database Users and Administrators.
Relational Model: Terminology – Structure of Relational Database – Keys – Integrity
Constraints – Schema Diagrams – Relational operations.ER Modeling: Design Process –
Entity Types – Relationship Types – Attributes – Structural Constraints – Reduction to
Relational Schemas – Design Issues.
SQL Data Manipulation: Overview of Query Language – Data Types – Data Definition –
SQL Queries – Aggregate functions – Nested Queries – Joins – Views – Integrity
Constraints – Authorization.
Advanced SQL: SQL Programming Language – Functions and procedures – Cursors –
Triggers – Accessing SQL from a Programming Language – SQL vs NoSQL.

Module II 23 Hours
Normalization: Purpose – Data Redundancy and Update Anomalies – Functional
Dependencies – Normalization Process – 1NF, 2NF, 3NF, BCNF.
Data Storage: Storage Media – RAID – Database Buffer – Indexing and Hashing.
Query Processing: Query Decomposition – Cost Estimation – Query Optimization.
Transaction and Concurrency Control: Transaction properties – Locking methods –
Deadlock – Timestamp Methods – Validation Protocols – Consistency – Granularity.
Recovery System: Failure Classification – Recovery facilities – Recovery Techniques.
Introduction to Advanced Database concepts: Document database – Graph QL –
Database Optimization.

List of Experiments: 30 Hours

1. Design databases using ER modeling.


2. Create and modify database tables using DDL commands and
manipulate table data using DML commands.
3. Implement Joins and nesting concept for complex queries.
4. Implement Functions and procedures using advanced SQL.
5. Create Cursors and Triggers using SQL programming.
6. Access database through JDBC connectivity

7
Course Outcomes Cognitive
At the end of this course, students will be able to: Level
CO1: Design ER models using various constructs to simulate the real world Apply
databases.
CO2: Formulate structured and optimized queries to manipulate databases. Apply
CO3: Investigate the dependencies in a database and normalize to
Analyze
appropriate level.
CO4: Compare and contrast the various locking facilities to perform
Evaluate
concurrent transactions on databases.
CO5: Build fault tolerant optimized databases using SQL by analyzing the
various database functionalities as an individual or team for real Create
world applications.

Text Book(s):
T1. A Silberschatz, H Korth, S Sudarshan, “Database System Concepts”, 7th Edition,
McGraw- Hill, 2019.
T2. Thomas Connolly, Carolyn Begg, “Database Systems: A Practical Approach to
Design, Implementation and Management”, 6th Edition, Pearson Education, 2015.
Reference Book(s):
R1. Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database Systems”,
7th Edition, Pearson Education, 2017.
R2. C.J. Date, A. Kannan and S. Swamynathan, “An Introduction to “Database Systems”,
8th Edition, Pearson Education, 2006.
Web References:
1. Introduction to Database Systems:
http://www.inf.unibz.it/~nutt/IDBs1011/idbs-slides.html
2. NPTEL lecture videos and notes:
https://onlinecourses.nptel.ac.in/noc23_cs79/
3. SQL practice exercises with solutions:
https://www.w3resource.com/sql-exercises/

Course Articulation Matrix

CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2

CO1 - - 3 - - - - - - - - - - -

CO2 3 - - - - - 2 - - - - - - -

CO3 - - - 3 - - - - - - - - - -

CO4 - 3 - - - - - - - - - - - -

CO5 - - - - 3 1 - - 3 3 - - 3 -
High-3; Medium-2; Low-1

8
Dr.Mahalingam College of Engineering and Technology,Pollachi
Department of Computer Science and Engineering
23CSI301-Database Systems

Module 1
Foundations of DBMS
File System versus Database approach:
File Management System Database Management System
File System is a general, easy-to-use system Database management system is used when
to store general files which require less security constraints are high.
security and constraints.
Data Redundancy is more in file Data Redundancy is less in database
management system. management system
Data Inconsistency is more in file system Data Inconsistency is less in database
management system
Centralisation is hard to get when it comes Centralisation is achieved in Database
to File Management System Management System
User locates the physical address of the files In Database Management System, user is
to access data in File Management System. unaware of physical address where data is
stored.

Security is low in File Management System. Security is high in Database Management System

File Management System stores Database Management System stores structured


unstructured data as isolated data data which have well defined constraints and
files/entities. interrelation.

Database applications:
The earliest database systems arose in the 1960s in response to the computerized
management of commercial data.
Database systems are used to manage collections of data that:
• are highly valuable,
• are relatively large, and
• are accessed by multiple users and applications, often at the same time.

The first database applications had only simple, precisely formatted, structureddata.
Today, database applications may include data with complex relationships and a more variable
structure. As an example of an application with structured data, consider a university’s records
regarding courses, students, and course registration. The university keeps the same type of
information about each course: course-identifier, title, department, course number, etc., and
similarly for students: student-identifier, name, address, phone, etc. Course registration is a
collection of pairs: one course identifier and one student identifier. Information of this sort has a
standard, repeating structure and is representative of the type of database applications that go
back to the 1960s.
Managing complexity is challenging, not only in the management of data but in any domain. Key
to the management of complexity is the concept of abstraction. Abstraction allows a person to use
a complex device or system without having to know thedetails of how that device or system is
constructed.
Here are some representative applications:
Enterprise Information
 Sales: For customer, product, and purchase information
 Accounting: For payments, receipts, account balances, assets, and other
accountinginformation.
 Human resources: For information about employees, salaries, payroll taxes, andbenefits,
and for generation of paychecks.
 Manufacturing: For management of the supply chain and for tracking productionof items
in factories, inventories of items in warehouses and stores, and orders foritems.s
Banking and Finance
 Banking: For customer information, accounts, loans, and banking transactions.
 Credit card transactions: For purchases on credit cards and generation ofmonthly
statements.
 Finance: For storing information about holdings, sales, and purchases of
financialinstruments such as stocks and bonds; also for storing real-time marketdata to
enable online trading by customers and automated trading by the firm.
Universities:
For student information, course registrations, and grades (in additionto standard enterprise
information such as human resources and accounting).
Airlines:
For reservations and schedule information. Airlines were among the firstto use databases in a
geographically distributed manner.
Telecommunication:
For keeping records of calls, texts, and data usage, generatingmonthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.

View of Data
• A database system is a collection of interrelated data and
• A set of programs that allow users to access and modify these data.
• A major purpose of a database system is to provide users with an abstract view of the data.
• That is, the system hides certain details of how the data are stored and maintained.
Three views of Data
• Data Abstraction
• Instances and Schemas
• Data Models
1.Data Abstraction:
The main purpose of data abstraction is to hide irrelevant data and provide an abstract
view of the data. With the help of data abstraction, developers hide irrelevant data from the user
and provide them the relevant data.
Levels of Abstraction:
• Physical Level
• Logical Level
• View Level
Physical Level:
 The lowest level of abstraction describes how the data are actually stored.
 The physical level describes complex low-level data structures in detail.
Logical Level:The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data.
View Level:
• The highest level of abstraction describes only part of the entire database.
• Even though the logical level uses simpler structures, complexity remains because of the
variety of information stored in a large database.
• Many users of the database system do not need all this information; instead, they need to
access only a part of the database.
• The view level of abstraction exists to simplify their interaction with the system.

Instances and Schemas:


In DBMS, the data is stored for a particular amount of time and is called an instance of the
database. The database schema defines the attributes of the database in the particular DBMS. The
value of the particular attribute at a particular moment in time is known as an instance of the
DBMS.
A database schema is a structure that represents the logical storage of the data in a
database. It represents the organization of data and provides information about the relationships
between the tables in a given database.
Data Models:
An integrated collection of concepts for describing and manipulating data, relationships
between data, and constraints on the data in an organization.
Data model comprising three components:
• A structural part -consisting of a set of rules according to which databases can be
constructed.
• A manipulative part -defining the types of operation that are allowed on the data .
• A set of integrity constraints- verifies the data accuracy.

Categories of Data Models:


• Object-oriented data model-Both the data and the relationship are contained into a
single structure that’s known as an object in this model
• Physical data model-database-specific model that represents relational data objects (for
example, tables, columns, primary and foreign keys) and their relationships.
• Relational data model-Relational model can represent as a table with columns and rows.
Each row is known as a tuple. Each table of the column has a name or attribute.
• Network data model-The main difference between this model and the hierarchical model
is that any record can have several parents in the network model. It uses a graph instead
of a hierarchical tree
• Hierarchical data model- The hierarchy begins at the root, which contains root data, and
then grows into a tree as child nodes are added to the parent node

Database Languages (DDL, DCL, DML, TCL):

A database system provides a data-definition language (DDL) to specify the database


schema and a data-manipulation language (DML) to express database queries and updates.

Data-Definition Language:
The data values stored in the database must satisfy certain consistency constraints.For example,
suppose the university requires that the account balance of a departmentmust never be negative.
The DDL provides facilities to specify such constraints.
Domain Constraints:A domain of possible values must be associated with every
attribute (for example, integer types, character types, date/time types).Declaring an attribute to
be of a particular domain acts as a constraint on the values that itcan take. Domain constraints
are themost elementary form of integrity constraint. They are tested easily by the system
whenever a new data item is entered into thedatabase.
Referential Integrity:There are cases where we wish to ensure that a value that appears
in one relation for a given set of attributes also appears in a certain setof attributes in another
relation (referential integrity).
Authorization:We may want to differentiate among the users as far as the type ofaccess
they are permitted on various data values in the database. These differentiations are expressed in
terms of authorization, the most common being: read authorization, which allows reading, but
not modification, of data; insert authorization,which allows insertion of new data, but not
modification of existing data; update authorization, which allows modification, but not deletion,
of data; and delete authorization,which allows deletion of data

Data-Manipulation Language:
Data-manipulation language (DML) is a language that enables users to access or
manipulatedata as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database.
• Insertion of new information into the database.
• Deletion of information from the database.
• Modification of information stored in the database.

There are basically two types of data-manipulation language:


• Procedural DMLs require a user to specify what data are needed and how to get
those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify
what data are needed without specifying how to get those data.

Transaction Control Language:


TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.
Here are some commands that come under TCL:
• COMMIT- Commit command is used to save all the transactions to the database.
• ROLLBACK- Rollback command is used to undo transactions that have not already been
saved to the database.
• SAVEPOINT- It is used to roll the transaction back to a certain point without rolling back
the entire transaction

Data Control Language:


DCL commands are used to grant and take back authority from any database user.

• Grant- It is used to give user access privileges to a database.


• Revoke- It is used to take back permissions from the user.

Database Design:
• Database design mainly involves the design of the database schema
• A high-level data model provides the database designer with a conceptual framework in
which to specify the data requirements of the database users and how the database will be
structured to fulfill these requirements
• The initial phase of database design, then, is to characterize fully the data needs of the
prospective database users.
• The database designer needs to interact extensively with domain experts and users to
carry out this task. The outcome of this phase is a specification of user requirements.
• Next, the designer chooses a data model, and by applying the concepts of the chosen data
model, translates these requirements into a conceptual schema of the database.
• The schema developed at this conceptual-design phase provides a detailed overview of the
enterprise.
• The designer reviews the schema to confirm that all data requirements are indeed
satisfied and are not in conflict with one another. The designer can also examine the
design to remove any redundant features.
Two types of designer
 logical database designers.
 physical database designers.
logical database designers:
 Identifying the data (that is, the entities and attributes), the relationships between the
data, and the constraints on the data that is to be stored in the database.(What data)
 Student(Stud_Rno,Name,DoB)
 Staff(Staff_Rno,Stud_Rno,Course_Taught)
physical database designers:
• Decides how the logical database design is to be physically realized.(how data)
• Selecting specific storage structures and access methods for the data to achieve good
performance; Eg:Rno-10 digits
• Designing any security measures required on the data.

Data storage and querying:


Storage Manager:
The storage manager is the component of a database system that provides the interfacebetween
the low-level data stored in the database and the application programs andqueries submitted to
the system. The storage manager is responsible for the interactionwith the file manager. The raw
data are stored on the disk using the file system providedby the operating system. The storage
manager translates the various DML statementinto low-level file-system commands.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicts.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value. For example, we could
use an index to find the instructor record with a particular ID, or all instructor records with a
particular name.

The Query Processor:


The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query-evaluation engine understands. A query
can usually be translated into any of a number of alternative evaluation plans that all give the
same result. The DML compiler also performs query optimization; that is, it picks the lowest cost
evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.

Transaction Management:
several operations on the database form a single logical unit of work.
An example is a funds transfer in which one account A is debited and another account B is
credited. Clearly, it is essential that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must happen in its entirety or not at all. This all-or-none requirement
is called atomicity. In addition, it is essential that the execution of the funds transfer preserves the
consistency of the database. That is, the value of the sum of the balances of A and B must be
preserved. This correctness requirement is called consistency. Finally, after the successful
execution of a funds transfer, the new values of the balances of accounts A and B must persist,
despite the possibility of system failure. This persistence requirement is called durability.
A transaction is a collection of operations that performs a single logical function in a database
application. Each transaction is a unit of both atomicity and consistency.
Ensuring the atomicity and durability properties is the responsibility of the database system
itself—specifically, of the recovery manager. In the absence of failures, all transactions complete
successfully, and atomicity is achieved easily.
when several transactions update the database concurrently, the consistency of data may no
longer be preserved, even though each individual transaction is correct. It is the responsibility of
the concurrency-control manager to control the interaction among the concurrent
transactions, to ensure the consistency of the database.

Architecture:
Computer system on which the database system runs. Database systems can be centralized, or
client-server, where one server machine executes work on behalf of multiple client
machines.
Database systems can also be designed to exploit parallel _x0002_ computer architectures.
Distributed databases span multiple geographically separated machines.
Earlier-generation database applications used a two-tier architecture, wherethe application
resides at the client machine, and invokes database system functionality at the server machine
through query language statements. In contrast, modern database applications use a three-tier
architecture, where the client machine acts asmerely a front end and does not contain any direct
database calls; web browsers and mobile applications are the most commonly used application
clients today. The front end communicates with an application server. The application server, in
turn, communicates with a database system to access data. The business logic of the application,
which says what actions to carry out under what conditions, is embedded in the application
server, instead of being distributed across multiple clients.

Database Users and Administrators:

Database Users and User Interfaces:

 Na¨ıve users are unsophisticated users who interact with the system by using predefined
user interfaces, such as web ormobile applications
 Application programmers are computer professionals who write application
programs.Application programmers can choose from many tools to develop user interface
 Sophisticated users interact with the system without writing programs. Instead, they
form their requests either using a database query language or by using toolssuch as data
analysis software. Analysts who submit queries to explore data in thedatabase fall in this
category.
Database Administrator:
One of the main reasons for using DBMSs is to have central control of both the data
and the programs that access those data. A person who has such central control over
the system is called a database administrator (DBA).
The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing aset of
data definition statements in the DDL.
• Storage structure and access-method definition. The DBA may specify some
parameters pertaining to the physical organization of the data and the indices to be
created.
• Schema and physical-organization modification. The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization,or to alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database
varioususers can access. The authorization information is kept in a special system
structurethat the database system consults whenever a user tries to access the
data inthe system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
• Periodically backing up the database onto remote servers, to prevent loss ofdata in
case of disasters such as flooding.
• Ensuring that enough free disk space is available for normal operations,
andupgrading disk space as required.
• Monitoring jobs running on the database and ensuring that performance is
notdegraded by very expensive tasks submitted by some users.

Relational Model
Terminology:
• Relational Model represents how data is stored in Relational Databases.
• It is the second generation of DBMS and is based on the relational data model.
• In the relational model, all data is logically structured withinrelations(tables)
• Each relation has a name and is made up of named attributes (columns) of data.
• Each tuple (row) contains one value per attribute.
• Popular examples of standard relational db include Microsoft SQL Server, Oracle Database,
MySQL and IBM DB2.

Relation: A relation is a table with columns and rows.


Attribute :An attribute is a named column of a relation.
Domain :A domain is the set of allowable values for one or more attributes.It can be considered
as data type
Tuple :A tuple is a row of a relation
Degree :The degree of a relation is the number of attributes it contains.
Cardinality: The cardinality of a relation is the number of tuples it contains.
Relational database: A collection of normalized relations with distinct relation names.

Structure of Relational Database:


• A relational database consists of a collection of tables, each of which is assigned a unique
name.
• A row in a table represents a relationship among a set of values
• In the relational model, all data is logically structured within relations (tables)
• Each relation has a name and is made up of named attributes (columns) of data
• Each tuple (row) contains one value per attribute.
• The term relation instance to refer to a specific instance of a relation, that
is, containing a specific set of rows.
• For each attribute of a relation, there is a set of permitted values, called the domain
of that attribute.
• A domain is atomic if elements of the domain are considered to be indivisible units.
• The null value is a special value that signifies that the value is unknown or does notexist.

Keys:
• A superkey is a set of one or more attributes that, taken collectively, allow us to identify
uniquely a tuple in the relation.For example, the ID attribute of the relation instructor is
sufficient to distinguish one instructor tuple from another. Thus, ID is a superkey. The
name attribute of instructor, on the other hand, is not a superkey, because several
instructors might have the same name.
• A superkey may contain extraneous attributes. For example, the combination of ID and
name is a superkey for the relation instructor. If K is a superkey, then so is anysuperset of
K. We are often interested in superkeys for which no proper subset is a superkey. Such
minimal superkeys are called candidate keys.
• The term primary key to denote a candidate key that is chosen by the database designer
as the principal means of identifying tuples within a relation. A key (whether primary,
candidate, or super) is a property of the entire relation, rather than of the individual
tuples. Any two individual tuples in the relation are prohibited from having the same value
on the key attributes at the same time. The designation of a key represents a constraint in
the real-world enterprise being modeled. Thus, primary keys are also referred to as
primary key constraints.
• A foreign-key constraint from attribute(s) A of relation r1 to the primary-key B of
relation r2 states that on any database instance, the value of A for each tuple in r1 must
also be the value of B for some tuple in r2. Attribute set A is called a foreign key from r1
referencing r2. The relation r1 is also called the referencing relation of the foreign-key
constraint, and r2 is called the referenced relation.

Integrity Constraints:

Integrity constraints ensure that changes made to the database by authorized users do
not result in a loss of data consistency. Thus, integrity constraints guard against accidental
damage to the database. This is in contrast to security constraints, which guard against access to
the database by unauthorized users.

Examples of integrity constraints are:

• An instructor name cannot be null.


• No two instructors can have the same instructor ID.
• Every department name in the course relation must have a matching department name in the
department relation.
• The budget of a department must be greater than $0.00

Constraints on a Single Relation:


The create table command may also include integrity-constraint statements. In addition
tothe primary-key constraint, there are a number of other ones that can be included in the create
table command. The allowed integrity constraints include
• not null
• unique
• check(<predicate>)

Not Null Constraint:


The null value is a member of all domains, and as a result it is a legal value for every
attribute in SQL by default. For certain attributes, however, null values may be inappropriate.
Consider a tuple in the student relation where name is null. Such a tuple gives student
information for an unknown student; thus, it does notcontain useful information. Similarly, we
would not want the department budget to be null. In cases such as this, we wish to forbid null
values, and we can do so by restricting the domain of the attributes name and budget to exclude
null values, by declaring it as follows:
name varchar(20) not null
budget numeric(12,2) not null
The not null constraint prohibits the insertion of a null value for the attribute, and is an example
of a domain constraint. Any database modification that would cause a null to be inserted in an
attribute declared to be not null generates an error diagnostic.

Unique Constraint:
SQL also supports an integrity constraint:
unique (Aj1, Aj2,…, Ajm)
The unique specification says that attributes Aj1, Aj2,…,Ajm form a superkey; that is, no
two tuples in the relation can be equal on all the listed attributes. However, attributes declared as
unique are permitted to be null unless they have explicitly been declared to be not null. Recall
that a null value does not equal any other value.

The Check Clause:


When applied to a relation declaration, the clause check(P) specifies a predicate P that
must be satisfied by every tuple in a relation. A common use of the check clause is to ensure that
attribute values satisfy specified conditions, in effect creating a powerful type system. For
instance, a clause check (budget > 0) in the create table command for relation department would
ensure that thevalue of budget is nonnegative.
As another example, consider the following:
create table section
(course id varchar (8),
sec id varchar (8),
semester varchar (6),
year numeric (4,0),
building varchar (15),
room number varchar (7),
time slot id varchar (4),
primary key (course id, sec id, semester, year),
check (semester in ('Fall', 'Winter', 'Spring', 'Summer')));

Here, we use the check clause to simulate an enumerated type by specifying that semester
must be one of 'Fall', 'Winter', 'Spring', or 'Summer'. Thus, the check clause permits attribute
domains to be restricted in powerful ways that most programminglanguage type systems do not
permit. Null values present an interesting special case in the evaluation of a check clause. A check
clause is satisfied if it is not false, so clauses that evaluate to unknown are not violations. If null
values are not desired, a separate not null constraint must be specified.

Referential Integrity:
Ensure that a value that appears in one relation (the referencing relation) for a given set of
attributes also appears for a certain set of attributes in another relation (the referenced
relation)such conditionsare called referential integrity constraints, and foreign keys are a form
of a referential integrity constraint where the referenced attributes form a primary key of the
referenced relation. Foreign keys can be specified as part of the SQL create table statement by
using the foreign key clause,
“foreign key (dept name) references department”.

Schema Diagrams:
A database schema, along with primary key and foreign-key constraints, can be depicted
by schema diagrams.The schema diagram for our universityorganization. Each relation appears
as a box, with the relation name at the top in blueand the attributes listed inside the box.Primary-
key attributes are shown underlined. Foreign-key constraints appear asarrows from the foreign-
key attributes of the referencing relation to the primary key ofthe referenced relation. We use a
two-headed arrow, instead of a single-headed arrow,to indicate a referential integrity constraint
that is not a foreign-key constraint. The line with a two-headed arrow from time slot id in the
section relation totime slot id in the time slot relation represents the referential integrity
constraint fromsection.time slot id to time slot.Many database systems provide design tools with
a graphical user interface forcreating schema diagrams.2 We shall discuss a different
diagrammatic representationof schemas, called the entity-relationship diagram, at
length.althoughthere are some similarities in appearance, these two notations are quite different,
and should not be confused for one another.
Relational operations:
The relational algebra consists of a set of operations that take one or two relations asinput and
produce a new relation as their result. Some of these operations, such as the select, project, and
rename operations, are called unary operations because they operate on one relation. The other
operations,
such as union, Cartesian product, and set difference, operate on pairs of relations and are,
therefore, called binary operations. Although the relational algebra operations form the basis for
the widely used SQL query language, database systems do not allow users to write queries in
relational algebra.

The Select Operation:


The select operation selects tuples that satisfy a given predicate. We use the lowercase Greek
letter sigma (σ) to denote selection. The predicate appears as a subscript to σ. The argument
relation is in parentheses after the σ. Thus, to select those tuples of the instructor relation where
the instructor is in the “Physics” department, we write:

In general, we allow comparisons using =, ≠, <, ≤, >, and ≥ in the selection predicate. Furthermore,
we can combine several predicates into a larger predicate by using the connectives and (∧), or
(∨), and not (¬). Thus, to find the instructors in Physics with a salary greater than $90,000, we
write:

The Project Operation:


Suppose we want to list all instructors’ ID, name, and salary, but we do not care about the
dept name. The project operation allows us to produce this relation. The project operation is a
unary operation that returns its argument relation, with certain attributes left out. Since a
relation is a set, any duplicate rows are eliminated. Projection is denoted by the uppercase Greek
letter pi(Π). We list those attributes that we wish to appear in the result as a subscript to Π. The
argument relation follows in parentheses. We write the query to produce such a list as:

Composition of Relational Operations:


The fact that the result of a relational operation is itself a relation is important. Consider
the more complicated query “Find the names of all instructors in the Physics department.” We
write:

The Cartesian-Product Operation:


The Cartesian-product operation, denoted by a cross (×), allows us to combine
informationfrom any two relations. We write the Cartesian product of relations r1 and r2 as r1 ×
r2.

Instead, the predicate in the where clause is used to restrict the combinations created by the
Cartesian product to those that are meaningful for the desired answer.
We would likely want a query involving instructor and teaches to combine a particular tuplet in
instructor with only those tuples in teaches that refer to the same instructor to which t refers.
That is, we wish only to match teaches tuples with instructor tuples that have the same ID value.
The following SQL query ensures this condition and outputs the instructor name and course
identifiers from such matching tuples.
select name, course id
from instructor, teaches
where instructor.ID= teaches.ID;
The Rename Operation:
Consider again the query that we used earlier:
select name, course id
from instructor, teaches
where instructor.ID= teaches.ID;
The result of this query is a relation with the following attributes:
name, course id
The names of the attributes in the result are derived from the names of the attributes in the
relations in the from clause.

Set Operations:
The SQL operations union, intersect, and except operate on relations and correspond to the
mathematical set operations ∪, ∩, and −.We shall now construct queries involving the union,
intersect, and except operations over two sets.
• The set of all courses taught in the Fall 2017 semester:
select course id
from section
where semester = 'Fall' and year= 2017;
• The set of all courses taught in the Spring 2018 semester:
select course id
from section
where semester = 'Spring' and year= 2018;

The Union Operation:


To find the set of all courses taught either in Fall 2017 or in Spring 2018, or both, wewrite the
following query. Note that the parentheses we include around each selectfrom-where statement
below are optional but useful for ease of reading; some databasesdo not allow the use of the
parentheses, in which case they may be dropped.

(select course idfrom section where semester = 'Fall' and year= 2017)union
(select course id from section where semester = 'Spring' and year= 2018);

The union operation automatically eliminates duplicates, unlike the select clause.
The Intersect Operation:
To find the set of all courses taught in both the Fall 2017 and Spring 2018, we write:

(select course id from section where semester = 'Fall' and year= 2017)intersect
(select course id from section where semester = 'Spring' and year= 2018);

Set difference:
The Set difference operation defines a relation consisting of the tuples that are in relation R, but
not in S. R and S must be union-compatible
Aggregate Functions:
Aggregate functions are functions that take a collection (a set or multiset) of values as
input and return a single value. SQL offers five standard built-in aggregate functions:9

• Average: avg-returns the average of the values in the associated attribute.


• Minimum: min-returns the smallest value in the associated attribute.
• Maximum: max-returns the largest value in the associated attribute.
• Total: sum-returns the sum of the values in the associated attribute.
• Count: count-returns the number of values in the associated attribute.

Basic Aggregation:
Consider the query “Find the average salary of instructors in the Computer Science
department.” We write this query as follows:
selectavg (salary)
from instructor
wheredept name = 'Comp. Sci.';
The result of this query is a relation with a single attribute containing a single tuple with a
numerical value corresponding to the average salary of instructors in the Computer Science
department

Aggregation with Grouping:


There are circumstances where we would like to apply the aggregate function not only to a
single set of tuples, but also to a group of sets of tuples; we specify this in SQL using the group by
clause. The attribute or attributes given in the group by clause are used to form groups. Tuples
with the same value on all attributes in the group by clause are placed in one group.
“Find the average salary in each department.”
We write this query as follows:
selectdept name, avg (salary) as avg salary
from instructor
group by dept name;

TheHavingClause:
At times, it is useful to state a condition that applies to groups rather than to tuples. For
example, we might be interested in only those departments where the average salary of the
instructors is more than $42,000. This condition does not apply to a single tuple; rather, it applies
to each group constructed by the group by clause. To express such a query, we use the having
clause of SQL. SQL applies predicates in the having clause after groups have been formed, so
aggregate functions may be used in the having clause.
We express this query in SQL as follows:
selectdept name, avg (salary) as avg salary
from instructor
group by dept name
havingavg(salary)>42000;

Join Expressions:
• The Join operation, which combines two relations to form a new relation, is one of the
essential operations in the relational algebra.
• There are various forms of the Join operation
 Theta join
 Equijoin (a particular type of Theta join)
 Natural join
 Outer join
 Semijoin

Theta join or equii join:


The Theta join operation defines a relation that contains tuples satisfying the predicate F from
the Cartesian product of R and S.
• The predicate F is of the form R.ai u S.bi, where u may be one of the comparison operators
(<,>,<=,>=,=,=!).
• We can rewrite the Theta join in terms of the basic Selection and Cartesian product
operations:
Natural join
• Natural join does not use any comparison operator. It does not concatenate the way a
Cartesian product does.
• We can perform a Natural Join only if there is at least one common attribute that exists
between two relations.
• In addition, the attributes must have the same name and domain.

Outer join:
• Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
• The (left) Outer join is a join in which tuples from R that do not have matching values in
the common attributes of S are also included in the result relation.
• Missing values in the second relation are set to null.

Semi join:
 The Semijoin operation defines a relation that contains the tuples of R .R that participate in the
join of R with S satisfying the predicate F.
Division Operation:
• The Division operation defines a relation over the attributes C that consists of the set of
tuples from R that match the combination of every tuple in S.

ER Modeling:
ER modeling is a top-down approach to database design that begins by identifying the important
data called entities and relationships between the data that must be represented in the model.
We then add more details, such as the information we want to hold about the entities and
relationships called attributes and any constraints on the entities, relationships, and attributes.

Entity:
Any thing that has an independent existence and about which we collect data. It is also known
as entity type. In ER modeling, notation for entity is given below.

Entity instance:
 Entity instance is a particular member of the entity type.
 Example for entity instance : A particular employee Regular Entity

Weak entity:
 An entity which depends on other entity for its existence and doesn't have any key
attribute of its own is a weakentity.
 Example for a weak entity : In a parent/child relationship, a parent is considered as a
strong entity and the child is a weak entity.
In ER modeling, notation for weak entity is given below.

Attributes:
Properties/characteristics which describe entities are called attributes. In ER modeling, notation
for attribute is given below.

Domain of Attributes:
The set of possible values that an attribute can take is called the domain of the attribute. For
example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence
this set can be termed as the domain of the attribute day.
Key attribute
 The attribute (or combination of attributes) which is unique for every entity instance is
called key attribute.
 E.g the employee_id of an employee, pan_card_number of a person etc.If the key attribute
consists of two or more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.

Simple attribute:
 If an attribute cannot be divided into simpler components, it is a simple attribute.
Example for simple attribute :employee_id of an employee.
Composite attribute:
 If an attribute can be split into components, it is called a composite attribute.
 Example for composite attribute : Name of the employee which can be split into
First_name, Middle_name, and Last_name.

Single valued Attributes:


 If an attribute can take only a single value for each entity instance, it is a single valued
attribute. example for single valued attribute : age of a student. It can take only one value
for a particular student.

Multi-valued Attributes:
 If an attribute can take more than one value for each entity instance, it is a multi-valued
attribute.
 Multi-valuedexample for multi valued attribute : telephone number of an employee, a
particular employee may have multiple telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.

Derived Attribute:
An attribute which can be calculated or derived based on other attributes is a derived
attribute.Example for derived attribute : age of employee which can be calculated from date of
birth and current date. In ER modeling, notation for derived attribute is given below.

Relationships:
 Associations between entities are called relationships
 Example : An employee works for an organization. Here "works for" is a relation between
the entities employee and organization.
In ER modeling, notation for relationship is given below.

However in ER Modeling, To connect a weak Entity with others, you should use a weak
relationship notation as given below.

Degree of a Relationship:
 Degree of a relationship is the number of entity types involved. The n-ary relationship
is the general form for degree n. Special cases are unary, binary, and ternary ,where the
degree is 1, 2, and 3, respectively.
 Example for unary relationship : An employee ia a manager of another employee
 Example for binary relationship : An employee works-for department.
 Example for ternary relationship : customer purchase item from a shop keeper

Cardinality of a Relationship:
Relationship cardinalities specify how many of each entity type is allowed.
Relationships can have four possible connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship
The minimum and maximum values of this connectivity is called the cardinality of the
relationship

Example for Cardinality – One-to-One (1:1)-Employee is assigned with a parking space.

One employee is assigned with only one parking space and one parking space is assigned to
only one employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1).In ER
modeling, this can be mentioned using notations as given below

Example for Cardinality – One-to-Many (1:N)- Organization has employees

One organization can have many employees , but one employee works in only one organization.
Hence it is a 1:N relationship and cardinality is One-To-Many (1:N).In ER modeling, this can be
mentioned using notations as given below
Example for Cardinality – Many-to-One (M :1)
It is the reverse of the One to Many relationship. employee works in organization.

One employee works in only one organization But one organization can have many employees.
Hence it is a M:1 relationship and cardinality is Many-to-One (M :1)
In ER modeling, this can be mentioned using notations as given below.

Cardinality – Many-to-Many (M:N)-Students enrolls for courses:


One student can enroll for many courses and one course can be enrolled by many students.
Hence it is a M:N relationship and cardinality is Many-to-Many (M:N)

In ER modeling, this can be mentioned using notations as given below

Relationship Participation:
1. Total
In total participation, every entity instance will be connected through the relationship to
another instance of the other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of
the department. In other words, only few instances of employee entity participate in the above
relationship. So employee entity's participation is partial in the said relationship. However each
department will be headed by some employee. So department entity's participation is total in the
said relationship.

Extended ER Features:
Specialization
 Top-down design process; we designate sub-groupings within an entity set that are
distinctive from other entities in the set.
 These sub-groupings become lower-level entity sets that have attributes or participate in
relationships that do not apply to the higher-level entity set.
 Depicted by a trianglecomponent labeled ISA (e.g., instructor“is a” person).
 Attribute inheritance–a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.

Generalization:
 A bottom-up design process–combine a number of entity sets that share the same
features into a higher-level entity set.
 Specialization and generalization are simple inversions of each other; they are
represented in an E-R diagram in the same way.
 The terms specialization and generalization are used interchangeably.

Advantages and Disadvantages of ER Modeling


Advantages:
1. ER Modeling is simple and easily understandable. It is represented in business users language
and it can be understood by non-technical specialist.
2. Intuitive and helps in Physical Database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design.
5. Gives a higher level description of the system.
Disadvantages:
1. Physical design derived from E-R Model may have some amount of ambiguities or
inconsistency.
2. Sometime diagrams may lead to misinterpretations
Symbols of ER Notation:

Design Process:
 Initial phase -- characterize fully the data needs of the prospective database users.
 Second phase -- choosing a data model
 Applying the concepts of the chosen data model
 Translating these requirements into a conceptual schema of the database.
 A fully developed conceptual schema indicates the functional requirements of
the enterprise.
Describe the kinds of operations (or transactions) that will be performed on the data.
 Final Phase -- Moving from an abstract data model to the implementation of the database
• Logical Design – Decidingon the database schema.
 Database design requires that we find a “good” collection of relation
schemas.
 Business decision – What attributes should we record in the database?
 Computer Science decision – What relation schemas should we have and
how should the attributes be distributed among the various relation
schemas?
• Physical Design – Deciding on the physical layout of the database

 In designing a database schema, we must ensure that we avoid two major pitfalls:
• Redundancy: a bad design may result in repeat information.
 Redundant representation of information may lead to data inconsistency
among the various copies of information
• Incompleteness: a bad design may make certain aspects of the enterprise difficult
or impossible to model.
 Avoiding bad designs is not enough. There may be a large number of good designs from
which we must choose.

Structural Constraints:
The constraints should reflect the restrictions on the relationships as perceived in the “real
world.” Examples of such constraints include the requirements that a property for rent must have
an owner and each branch must have staff. The main type of constraint on relationships is called
multiplicity.

Multiplicity- The number (or range) of possible occurrences of an entity type that may relate to
a single occurrence of an associated entity type through a particular relationship.
the most common degree for relationships is binary. Binary relationships are generally referred
to as being one-to-one (1:1), one-tomany (1:*), or many-to-many (*:*).
We examine these three types of relationships using the following integrity constraints:
• a member of staff manages a branch (1:1);
• a member of staff oversees properties for rent (1:*);
• newspapers advertise properties for rent (*:*).

Multiplicity(complexrelationship):
The number (or range) of possible occurrences of an entity type in an n-ary relationship when
the other (n–1) values are fixed.

Cardinality: Describes the maximum number of possible relationship occurrences for an entity
participating in a given relationship type.

Participation: Determines whether all or only some entity occurrences participate in a


relationship.

Reducing E-R Diagrams to Relational Schemas:


 Entity sets and relationship sets can be expressed uniformly as relation schemas that
represent the contents of the database.
 A database which conforms to an E-R diagram can be represented by a collection of
schemas.
 For each entity set and relationship set there is a unique schema that is assigned the name
of the corresponding entity set or relationship set.
 Each schema has a number of columns (generally corresponding to attributes), which
have unique names.
Representing Entity Sets:
 A strong entity set reduces to a schema with the same attributes student(ID, name,
tot_cred)
 A weak entity set becomes a table that includes a column for the primary key of the
identifying strong entity set section ( course_id, sec_id, sem, year)

Representing of Entity Sets with composite Attribute:

 Composite attributes are flattened out by creating a separate attribute for each
component attribute
 Example: given entity set instructorwith composite attribute namewith component
attributes first_nameand last_namethe schema corresponding to the entity set has two
attributes name_first_nameand name_last_name
 Prefix omitted if there is no ambiguity (name_first_namecould be first_name
 Ignoring multivalued attributes, extended instructor schema is instructor(ID, first_name,
middle_initial, last_name,street_number, street_name, apt_number, city, state, zip_code,
date_of_birth)
Example:
Representation of Entity Sets with Multivalued Attributes:
 A multivalued attribute Mof an entity Eis represented by a separate schema EM
 Schema EMhas attributes corresponding to the primary key of Eand an attribute
corresponding to multivalued attribute M
 Example: Multivalued attribute phone_numberof instructoris represented by a
schema:inst_phone= (ID, phone_number)
 Each value of the multivalued attribute maps to a separate tuple of the relation on schema
EM
 For example, an instructorentity with primary key 22222 and phone numbers 456-7890
and 123-4567 maps to two tuples: (22222, 456-7890) and (22222, 123-4567)

Representing Relationship Sets


 A many-to-many relationship set is represented as a schema with attributes for the
primary keys of the two participating entity sets, and any descriptive attributes of the
relationship set.
 Example: schema for relationship set advisor

advisor = (s_id, i_id)

Redundancy of Schemas
 Many-to-one and one-to-many relationship sets that are total on the many-side can be
represented by adding an extra attribute to the “many” side, containing the primary key of
the “one” side
 Example: Instead of creating a schema for relationship set inst_dept, add an attribute
dept_nameto the schema arising from entity set instructor

 For one-to-one relationship sets, either side can be chosen to act as the “many” side
 That is, an extra attribute can be added to either of the tables corresponding to the two
entity sets
 If participation is partial on the “many” side, replacing a schema by an extra attribute in
the schema corresponding to the “many” side could result in null values
 The schema corresponding to a relationship set linking a weak entity set to its identifying
strong entity set is redundant.
 Example: The section schema already contains the attributes that would appear in
sec_course schema

Design Issues:
Common Mistakes in E-R Diagram:
Entities vs. Attributes:
 Use of entity sets vs. attributes

 Use of phone as an entity allows extra information about phone numbers (plus multiple
phone numbers)

Entities Vs Relationships:
 Use of entity sets vs. relationship sets:Possible guideline is to designate a relationship set
to describe an action that occurs between entities

 Placement of relationship attributes:For example, attribute date as attribute of advisor or


as attribute of student.
E-R Design Decisions:
 The use of an attribute or entity set to represent an object.
 Whether a real-world concept is best expressed by an entity set or a relationship set.
 The use of a ternary relationship versus a pair of binary relationships.
 The use of a strong or weak entity set.
 The use of specialization/generalization –contributes to modularity in the design.
 The use of aggregation –can treat the aggregate entity set as a single unit without concern
for the details of its internal structure.

SQL Data Manipulation


History:
 IBM Sequel language developed as part of System R project at the IBM San Jose Research
Laboratory
 Renamed Structured Query Language (SQL)
 ANSI and ISO standard SQL:SQL-86
 SQL-89
 SQL-92
 SQL:1999 (language name became Y2K compliant!)
 SQL:2003

Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later
standards and special proprietary features. Not all examples here may work on your particular
system.
SQL Parts:
 DML --provides the ability to query information from the database and to insert tuples
into, delete tuples from, and modify tuples in the database.
 integrity –the DDL includes commands for specifying integrity constraints.
 View definition --The DDL includes commands for defining views.
 Transaction control –includes commands for specifying the beginning and ending of
transactions.
 Embedded SQL and dynamic SQL --define how SQL statements can be embedded within
general-purpose programming languages.
 Authorization –includes commands for specifying access rights to relations and views.

Data Definition Language:


The SQL data-definition language (DDL) allows the specification of information about relations,
including:
 The schema for each relation.
 The type of values associated with each attribute.
 The Integrity constraints
 The set of indices to be maintained for each relation.
 Security and authorization information for each relation.
 The physical storage structure of each relation on disk.
Domain Types in SQL:
 char(n).Fixed length character string, with user-specified length n.
 varchar(n). Variable length character strings, with user-specified maximum length n.
 int.Integer (a finite subset of the integers that is machine-dependent).
 smallint.Small integer (a machine-dependent subset of the integer domain type).
 numeric(p,d).Fixed point number, with user-specified precision of pdigits, with ddigits to
the right of decimal point. (ex., numeric(3,1), allows 44.5 to be stores exactly, but not
444.5 or 0.32)
 real, double precision.Floating point and double-precision floating point numbers, with
machine-dependent precision.
 float(n).Floating point number, with user-specified precision of at least ndigits.

Create Table Construct:


An SQL relation is defined using thecreate table command:

ris the name of the relation


each Aiis an attribute name in the schema of relation r
Diis the data type of values in the domain of attribute Ai
Example:

Integrity Constraints in Create Table:


 Types of integrity constraints
o primary key(A1, ..., An )
o foreign key (Am, ..., An ) references r not null
o SQL prevents any update to the database that violates an integrity constraint.
Example:
Updates to tables:
 Insert insert into instructor values ('10211', 'Smith', 'Biology', 66000);
 Delete:Remove all tuples from the studentrelation
 delete from student
o Drop Table
 drop table r
o Alter
 alter table r add A D
o where Ais the name of the attribute to be added to relation r and Dis the domain of
A.
o All exiting tuples in the relation are assigned nullas the value for the new attribute.
 alter table r dropA
o where Ais the name of an attribute of relation r
o Dropping of attributes not supported by many databases.

Basic Query Structure :


A typical SQL query has the form:
select A1, A2, ..., An
fromr1, r2, ..., rm
where P

Ai represents an attribute Rirepresents a relation Pis a predicate. The result of an SQL query is a
relation.
The select Clause:
o The selectclause lists the attributes desired in the result of a query
 corresponds to the projection operation of the relational algebra
Example: find the names of all instructors:
select namefrom instructor
o NOTE: SQL names are case insensitive (i.e., you may use upper-or lower-case letters.) E.g.,
Name≡ NAME≡ name
Some people use upper case wherever we use bold font.
o SQL allows duplicates in relations as well as in query results.
o To force the elimination of duplicates, insert the keyword distinctafter select.
o Find the department names of all instructors, and remove duplicates
select distinct dept_namefrom instructor
o The keyword all specifies that duplicates should not be removed.
select alldept_namefrom instructor
o An asterisk in the select clause denotes “all attributes”
select *from instructor
o An attribute can be a literal with no from clause
select '437'
o Results is a table with one column and a single row with value “437”
o Can give the column a name using:
select '437' as FOO
o An attribute can be a literal with from clause
select 'A'from instructor
o Result is a table with one column and Nrows (number of tuples in the instructorstable),
each row with value “A”

The where Clause:


o The whereclause specifies conditions that the result must satisfyCorresponds to the
selection predicate of the relational algebra.
o To find all instructors in Comp. Sci. dept
select namefrom instructorwhere dept_name ='Comp. Sci.'
o SQL allows the use of the logical connectives and, or, and not
o The operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and <>.
o Comparisons can be applied to results of arithmetic expressions
o To find all instructors in Comp. Sci. dept with salary > 80000
select namefrom instructorwhere dept_name ='Comp. Sci.'and salary > 80000

The from Clause:


o The fromclause lists the relations involved in the queryCorresponds to the Cartesian
product operation of the relational algebra.
o Find the Cartesian product instructor X teaches
select ∗from instructor, teaches
 generates every possible instructor –teaches pair, with all attributes from both
relations.
 For common attributes (e.g., ID), the attributes in the resulting table are renamed
using the relation name (e.g., instructor.ID)
o Cartesian product not very useful directly, but useful combined with where-clause
condition (selection operation in relational algebra).

Examples:

The Rename Operation:


 The SQL allows renaming relations and attributes using the as clause:
old-name asnew-name
 Find the names of all instructors who have a higher salary than some instructor in 'Comp.

Sci'.select distinct T.namefrom instructor as T, instructor as Swhere T.salary>


S.salaryand S.dept_name= 'Comp. Sci.’

Keyword asis optional and may be omittedinstructor as T ≡ instructorT

String Operations:
 SQL includes a string-matching operator for comparisons on character strings. The
operator likeuses patterns that are described using two special characters:
o percent ( % ). The % character matches any substring.
o underscore ( _ ). The _ character matches any character.
Find the names of all instructors whose name includes the substring “dar”.
o select namefrom instructorwherename like '%dar%'
Match the string “100%”
o like '100 \%' escape '\'
in that above we use backslash (\) as the escape character.
o Patterns are case sensitive.
o Pattern matching examples:
 'Intro%' matches any string beginning with “Intro”.
 '%Comp%' matches any string containing “Comp” as a substring.
 '_ _ _' matches any string of exactly three characters.
 '_ _ _ %' matches any string of at least three characters.

SQL supports a variety of string operations such as


 concatenation (using “||”)
 converting from upper to lower case (and vice versa)
 finding string length, extracting substrings, etc.

Ordering the Display of Tuples:


 List in alphabetic order the names of all instructors
select distinct namefrom instructororder by name
 We may specify desc for descending order or ascfor ascending order, for each attribute;
ascending order is the default.
o Example: order by name desc
 Can sort on multiple attributesExample: order by dept_name, name

Set Operations:
Find courses that ran in Fall 2017 or in Spring 2018
(selectcourse_idfrom section where sem= 'Fall' and year
=2017)union(selectcourse_idfrom section where sem= 'Spring' and year = 2018)

Find courses that ran in Fall 2017 and in Spring 2018


(selectcourse_idfrom section where sem= 'Fall' and year =
2017)intersect(selectcourse_idfrom section where sem= 'Spring' and year = 2018)

Find courses that ran in Fall 2017 but not in Spring 2018
(selectcourse_idfrom section where sem= 'Fall' and year =
2017)except(selectcourse_idfrom section where sem= 'Spring' and year = 2018)

Null Values:
 It is possible for tuples to have a null value, denoted by null, for some of their attributes
 nullsignifies an unknown value or that a value does not exist.
 The result of any arithmetic expression involving nullis null
o Example: 5 + nullreturns null
 The predicate is nullcan be used to check for null values.
Example: Find all instructors whose salary is null.
selectnamefrominstructorwhere salary is null
 The predicate is not null succeeds if the value on which it is applied is not null.
 SQL treats as unknownthe result of any comparison involving a null value (other than
predicates is null and is not null).Example: 5 < nullornull<> nullornull= null
 The predicate in a whereclause can involve Boolean operations (and, or, not); thus the
definitions of the Boolean operations need to be extended to deal with the value unknown.
o and :(trueand unknown) = unknown, (falseand unknown) = false,(unknown
andunknown) = unknown
o or: (unknownortrue) = true,(unknownorfalse) = unknown(unknown orunknown) =
unknown
 Result of where clause predicate is treated as false if it evaluates to unknown

Aggregate Functions:
These functions operate on the multiset of values of a column of a relation, and return a value
o avg: average value
o min: minimum value
o max: maximum value
o sum: sum of values
o count: number of values

Find the average salary of instructors in the Computer Science department select
avg(salary)from instructorwhere dept_name= 'Comp. Sci.';

Find the total number of instructors who teach a course in the Spring 2018 semesterselect count
(distinct ID)from teacheswhere semester = 'Spring' and year = 2018;

Find the number of tuples in the course relationselect count (*)from course;

Group By:
Find the average salary of instructors in each departmentselect dept_name, avg(salary)
asavg_salaryfrom instructorgroup by dept_name;

Aggregate Functions –Having Clause:


o Find the names and average salaries of all departments whose average salary is greater
than 42000
select dept_name, avg(salary) asavg_salary
from instructor
group by dept_name
having avg(salary) > 42000;
o Note: predicates in the havingclause are applied after the formation of groups whereas
predicates in the whereclause are applied before forming groups

Nested Subqueries:
o SQL provides a mechanism for the nesting of subqueries. A subqueryis a select-from-
whereexpression that is nested within another query.
o The nesting can be done in the following SQL query
select A1, A2, ..., An f
rom r1, r2, ..., rm
where P as follows:
From clause: rican be replaced by any valid subquery
Where clause: Pcan be replaced with an expression of the form:
B<operation> (subquery)
B is an attribute and <operation> to be defined later.
Select clause:
Ai can be replaced be a subquery that generates a single value.

Joined Relations:
o Join operationstake two relations and return as a result another relation.
o A join operation is a Cartesian product which requires that tuples in the two relations
match (under some condition). It also specifies the attributes that are present in the result
of the join
o The join operations are typically used as subquery expressions in the from clause
o Three types of joins:
 Natural join
 Inner join
 Outer join

Natural Join in SQL


 Natural join matches tuples with the same values for all common attributes, and retains
only one copy of each common column.
 List the names of instructors along with the course ID of the courses that they taughtselect
name, course_idfrom students, takeswhere student.ID = takes.ID;
 Same query in SQL with “natural join” constructselect name,course_idfrom student natural
join takes;
 The fromclause in can have multiple relations combined using natural join:
Outer Join:
 An extension of the join operation that avoids loss of information.
 Computes the join and then adds tuples form one relation that does not match tuples in
the other relation to the result of the join.
 Uses nullvalues.
 Three forms of outer join:
 left outer join
 right outer join
 full outer join
Joined Types and Conditions:
 Join operationstake two relations and return as a result another relation.These additional
operations are typically used as subquery expressions in the from clause
 Join condition–defines which tuples in the two relations match, and what attributes are
present in the result of the join.
 Join type–defines how tuples in each relation that do not match any tuple in the other
relation (based on the join condition) are treated.
Views
 In some cases, it is not desirable for all users to see the entire logical model (that is, all the
actual relations stored in the database.)
 Consider a person who needs to know an instructors name and department, but not the
salary. This person should see a relation described, in SQL, by
select ID, name, dept_namefrom instructor
 A viewprovides a mechanism to hide certain data from the view of certain users.
 Any relation that is not of the conceptual model but is made visible to a user as a “virtual
relation” is called a view.
View Definition
 A view is defined using the create view statement which has the form
 create view vas < query expression > where <query expression> is any legal SQL
expression.
 The view name is represented by v.
 Once a view is defined, the view name can be used to refer to the virtual relation that the
view generates.
 View definition is not the same as creating a new relation by evaluating the query
expressionRather, a view definition causes the saving of an expression; the expression is
substituted into queries using the view.
Views Defined Using Other Views:
 One view may be used in the expression defining another view
 A view relation v1is said to depend directly on a view relation v2if v2is used in the
expression defining v1
 A view relation v1is said to depend onview relation v2if either v1 depends directly to v2 or
there is a path of dependencies from v1to v2
 A view relation vis said to be recursiveif it depends on itself.
Materialized Views
 Certain database systems allow view relations to be physically stored.Physical copy
created when the view is defined.Such views are called Materialized view:
 If relations used in the query are updated, the materialized view result becomes out of
dateNeed to maintainthe view, by updating the view whenever the underlying relations
are updated.

View Updates in SQL :


 Most SQL implementations allow updates only on simple viewsThe from clause has only
one database relation.
 The select clause contains only attribute names of the relation, and does not have any
expressions, aggregates, or distinct specification.
 Any attribute not listed in the select clause can be set to null
 The query does not have a group by or having clause.
Integrity Constraints
 Integrity constraints guard against accidental damage to the database, by ensuring that
authorized changes to the database do not result in a loss of data consistency. A checking
account must have a balance greater than $10,000.00
 A salary of a bank employee must be at least $4.00 an hour
 A customer must have a (non-null) phone number
Constraints on a Single Relation:
 not null
 primary key
 unique
 check (P), where P is a predicate

Not Null Constraints:


not nullDeclare nameand budgetto be not null

name varchar(20) not nullbudget numeric(12,2) not null

Unique Constraints :
 unique( A1, A2, …, Am)The unique specification states that the attributes A1, A2, …, Am
form a candidate key.
 Candidate keys are permitted to be null (in contrast to primary keys).

The check clause:


 The check (P) clause specifies a predicate P that must be satisfied by every tuple in a
relation.
 Example: ensure that semester is one of fall, winter, spring or summer
create table section
(course_idvarchar (8),
sec_idvarchar (8),
semester varchar (6),
year numeric (4,0),
building varchar (15),
room_numbervarchar (7),
time slot id varchar (4),
primary key (course_id, sec_id, semester, year),
check(semester in ('Fall', 'Winter', 'Spring', 'Summer')))

Referential Integrity:
 Ensures that a value that appears in one relation for a given set of attributes also appears
for a certain set of attributes in another relation.
 Example: If “Biology” is a department name appearing in one of the tuples in the
instructorrelation, then there exists a tuple in the departmentrelation for “Biology”.
 Let A be a set of attributes. Let R and S be two relations that contain attributes A and
where A is the primary key of S. A is said to be a foreign keyof R if for any values of A
appearing in R these values also appear in S.
 Foreign keys can be specified as part of the SQL createtable statement
foreign key (dept_name) references department
 By default, a foreign key references the primary-key attributes of the referenced table.
 SQL allows a list of attributes of the referenced relation to be specified explicitly.
foreign key (dept_name) references department (dept_name)
Authorization:

 We may assign a user several forms of authorizations on parts of the database.


 Read-allows reading, but not modification of data.
 Insert-allows insertion of new data, but not modification of existing data.
 Update-allows modification, but not deletion of data.
 Delete-allows deletion of data.
 Each of these types of authorizations is called a privilege. We may authorize the user all,
none, or a combination of these types of privileges on specified parts of a database, such as
a relation or a view.
 Forms of authorization to modify the database schema
 Index-allows creation and deletion of indices.
 Resources-allows creation of new relations.
 Alteration-allows addition or deletion of attributes in a relation.
 Drop-allows deletion of relations.

Privileges in SQL:
 select: allows read access to relation, or the ability to query using the view
 Example: grant users U1, U2, and U3selectauthorization on theinstructorrelation:
 grant select on instructor to U1, U2, U3
 insert: the ability to insert tuples
 update: the ability to update using the SQL update statement
 delete: the ability to delete tuples.
 all privileges: used as a short form for all the allowable privileges

Revoking Authorization in SQL:


 The revokestatement is used to revoke authorization.revoke <privilege list>on <relation
or view> from <user list>
 Example:revoke select on student from U1, U2, U3
 <privilege-list> may be all to revoke all privileges the revokeemay hold.
 If <revokee-list> includes public, all users lose the privilege except those granted it
explicitly.
 If the same privilege was granted twice to the same user by different grantees, the user
may retain the privilege after the revocation.
 All privileges that depend on the privilege being revoked are also revoked.

Advanced SQL
Accessing SQL from a Programming Language:
 Not all queries can be expressed in SQL, since SQL does not provide the full expressive
power of a general-purpose language.
 Non-declarative actions --such as printing a report, interacting with a user, or sending the
results of a query to a graphical user interface --cannot be done from within SQL.
 A database programmer must have access to a general-purpose programming language for
at least two reasons
 general-purpose program --can connect to and communicate with a database server using
a collection of functions
 Embedded SQL --provides a means by which a program can interact with a database
server.
 The SQL statements are translated at compile time into function calls.
 At runtime, these function calls connect to the database using an API that provides
dynamic SQL facilities.
 There are two approaches to accessing SQL from a general-purpose programming
language

Functions and Procedures:


 Functions and procedures allow “business logic” to be stored in the database and executed
from SQL statements.
 These can be defined either by the procedural component of SQL or by an external
programming language such as Java, C, or C++.
 The syntax we present here is defined by the SQL standard.
 Most databases implement nonstandard versions of this syntax.

Declaring SQL Functions:


 Define a function that, given the name of a department, returns the count of the
number of instructors in that department.
o create function dept_count(dept_namevarchar(20))returns integerbegindeclare
d_countinteger;select count (* ) into d_countfrom instructorwhere
instructor.dept_name= dept_namereturn d_count;end
 The function dept_countcan be used to find the department names and budget of all
departments with more that 12 instructors.
o select dept_name, budgetfromdepartmentwhere dept_count(dept_name) > 12

Table Functions:
 The SQL standard supports functions that can return tables as results; such functions are
called table functions
 Example: Return all instructors in a given department
createfunctioninstructor_of(dept_namechar(20))
returnstable (
ID varchar(5),namevarchar(20),dept_namevarchar(20),salarynumeric(8,2))
returntable(selectID, name, dept_name,
salaryfrominstructorwhereinstructor.dept_name= instructor_of.dept_name)
 Usage
select *from table (instructor_of('Music'))
 For loop
 Permits iteration over all results of a query
 Example: Find the budget of all departmentsdeclare n integer default 0;for r asselect
budget from department where dept_name= 'Music' doset n = n + r.budget end for

External Language Routines:


 SQL allows us to define functions in a programming language such as Java, C#, C or C++.
 Can be more efficient than functions defined in SQL, and computations that cannot be
carried out in SQL\can be executed by these functions.
 Declaring external language procedures and functions
create procedure dept_count_proc(indept_namevarchar(20),out count
integer)language Cexternal name '/usr/avi/bin/dept_count_proc'create function
dept_count(dept_namevarchar(20))returns integerlanguage Cexternal name
'/usr/avi/bin/dept_count'

Triggers:
 A triggeris a statement that is executed automatically by the system as a side effect of a
modification to the database.
 To design a trigger mechanism, we must:Specify the conditions under which the trigger is
to be executed.
 Specify the actions to be taken when the trigger executes.
 Triggers introduced to SQL standard in SQL:1999, but supported even earlier using non-
standard syntax by most databases.Syntax illustrated here may not work exactly on your
database system; check the system manuals

Statement Level Triggers


 Instead of executing a separate action for each affected row, a single action can be
executed for all rows affected by a transaction
 Use for each statement instead of for each row
 Use referencing old table or referencing new table to refer to temporary tables (called
transition tables) containing the affected rows
 Can be more efficient when dealing with SQL statements that update a large number of
rows
When Not To Use Triggers:
o Triggers were used earlier for tasks such as Maintaining summary data (e.g., total
salary of each department)
o Replicating databases by recording changes to special relations (called change or
delt a relations) and having a separate process that applies the changes over to a
replica
o There are better ways of doing these now: Databases today provide built in
materialized view facilities to maintain summary data
o Databases provide built-in support for replication
o Encapsulation facilities can be used instead of triggers in many cases Define
methods to update fields
o Carry out actions as part of the update methods instead of through a trigger
o Risk of unintended execution of triggers, for example, when Loading data from a
backup copy
o Replicating updates at a remote site
o Trigger execution can be disabled before such actions.
o Other risks with triggers:Error leading to failure of critical transactions that set off
the trigger Cascading execution

SQL vs NoSQL:
SQL NOSQL
Databases are categorized as Relational Database NoSQL databases are categorized as Non-
Management System (RDBMS). relational or distributed database system.
SQL databases have fixed or static or predefined NoSQL databases have dynamic schema.
schema.
SQL databases display data in form of tables so it NoSQL databases display data as collection of
is known as table-based database. key-value pair, documents, graph databases
or wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful language In NoSQL databases, collection of documents
"Structured Query Language" to define and are used to query the data. It is also called
manipulate the data. unstructured query language. It varies from
database to database.
SQL databases are best suited for complex queries. NoSQL databases are not so good for complex
queries because these are not as powerful as
SQL queries.
SQL databases are not best suited for hierarchical NoSQL databases are best suited for
data storage. hierarchical data storage.
MySQL, Oracle, Sqlite, PostgreSQL and MS-SQL etc. MongoDB, BigTable, Redis, RavenDB,
are the example of SQL database. Cassandra, Hbase, Neo4j, CouchDB etc. are the
example of nosql database

You might also like