09.
Programming Databases
Stored Procedures
Two terms here “stored” and “procedure” ? Or “Stored Functions”
“procedure”: is a generic term used for “function” in a program. It has historic importance as many
programming languages had function and procedures as separate constructs. With the time
procedure, as a separate construct, has been phased out, and are not supported in most modern
programming languages. A “cohesive” procedure basically turns out to be a function only.
“stored”: procedures are stored in “database instance”. They run in database instance context under
the control of DBMS.
Why do we need stored Procedure? Here is an motivating example-
Consider above scenario of DA-IICT inviting applications for B.Tech. admissions. Let us
say we have a relation Applications as following-
Applications (AppNo, Name, Marks_Phy, Marks_Chem, Marks_Math, DA_Rank)
Given this relation, if we want to allocate DA_Rank to each applicant, we cannot compute this
merely by writing SQL queries. Below is some pseudo code typically performing required
computation-
rank = 0
for each row in SELECT * FROM Applicants ORDER BY marks DESC
rank = rank + 1
UPDATE applicants set DA_RANK = rank
WHERE application_no = row.application_no
end for
Above pseudo code can be implemented either as stored procedure or in a host programming
language like java using appropriate access API.
It is important to note here that a client program runs on the client, and require bringing resultant
rows of a query, to the client. This adds transportation cost, and lowers the program performance.
Following version of algorithm should help you to understand the movement of the data. Statements
in red are executed on DBMS server. Their results are brought to client. We use variables (in blue)
to store the results, and iteratively update related tuples.
1 | Programming Databases
//Let A be list of application numbers in the descending order of their marks
A = { SELECT applicant_no FROM Applicants ORDER BY marks DESC }
rank = 0
for 𝑎 ∈ 𝐴
rank = rank + 1
UPDATE applicants set DA_RANK = rank WHERE application_no = a
end for
In certain applications like this, bringing of the data not for any use of the client, functionality can
be better implemented in the form of “Stored Procedures” or “Stored Functions”.
Stored Procedures are stored “program” as part of database instance, and run on the same computer
where data are, avoiding data transportation and thus program performance.
Exercise: To make it more complex, let us say, we need to allocate Category Rank as well. Try
figuring out appropriate solution for this.
Stored Procedures – applications and benefits
1. Programs runs in immediate vicinity of data, and does not require any data transportation over
networks, and this speeds up data processing
2. Functionality that is “batch” in nature, that is, does not require any user intervention; need not to
bring any data to the client. Examples
a. Example above for allocating ranks to a large number of applicants is such a
requirement. Once started, no intervention from user is needed.
b. Computation of SPI and CPI at the end of semester; done once and as “batch” task.
c. Run month-end database updates for an online Transaction Processing system
d. Build various materialized views.
e. Data import/export from other database sources
3. Helps in enabling various operational abstractions over data?
4. Stored procedure stores “application functionality” as part of database; this makes the
functionality to be more shareable and reusable.
5. Database triggers are implemented as stored procedures
6. Complex constraints are also enforced using stored procedures.
Language features for Stored Procedures
SQL is not enough?
Stored procedures are written for requirements that cannot be expressed in expressive queries
like SQL, and require procedural constructs like branching and iterations.
Note that newer releases of SQL do support some branching expressive power in SQL queries
(IF, SWITCH or so)
2 | Programming Databases
ANSI added as SQL/PSM (SQL/Persistent Stored Module) as part of SQL extension with
SQL-1999.
RDBMS like Oracle had PL/SQL even before this standardization.
Most RDBMS, today, provide their own procedural languages for creating stored procedures.
For example: PL/SQL (Oracle), PL/PgSQL (postgresql), SQL/PL (DB2), TSQL(MS SQL)
Stored Procedures in PostgreSQL
PostgreSQL provides an open architecture for creating stored procedures.
It allows you creating stored procedures in many languages like C, PL/PgSQL, PL/perl,
PL/Python, PL/Tcl, and PL/Java.
PostgreSQL may require you to load the language before programming in that language.
It also allows you to write simple stored procedures in SQL.
PostgreSQL calls stored procedures as “stored functions”.
For simplicity, we may be using term “function” for “stored functions” in further discussion.
Learning Stored Procedure Language (SPL)
To repeat: Stored Procedure Language (SPL) is “procedural extension” to SQL.
Let us see what those “extensions”? On the way you should also be checking that how does a stored
procedure programming language defer from language like C?
Stored Procedure Language (SPL) = SQL + What ?
Should allow to create and use variables
Should allow mixing variables with attribute-names in SQL.
Allow to submit a SQL statement to the database, and collect the responses into variables so that
can be manipulated further
Support for various procedure constructs like if .. then .. else, loops, exception handling, etc.
3 | Programming Databases
Example Stored Procedure - Partial code for ComputeSPI in PL/pgSQL
More constraints using stored procedures
Some constraints are complex to be defined in SQL-DDL, and sometimes it is not possible to be
defined in standard DDL. Code below should be self-explanatory. It adds a constraint that an
employee can work project controlled by its home department-
4 | Programming Databases
CREATE ASSERTION command
CREATE ASSERTION is another command that is used for creating complex constraints.
Following is an example.
5 | Programming Databases
Triggers
[abstracts from postgresql documentation]
What is a trigger?
A trigger is a special type of “stored procedure” that the database automatically execute on
some database update events.
Triggers can be defined to execute either before or after any INSERT, UPDATE, or DELETE
operation.
If a trigger event occurs, the trigger function is called.
Types of Triggers
Trigger could be row-level-trigger or statement-level-trigger
Row-level-trigger. Note that an update statement on database can affect 0 or n rows. A row
level triggered is invoked once for each row that is affected by the statement that fired the
trigger.
In contrast, statement-level-trigger is invoked only once when an appropriate statement is
executed, regardless of the number of rows affected by that statement. A statement-level-trigger
is executed even if it affects zero rows.
Triggers are also classified as before triggers and after triggers.
Statement-level before triggers naturally fire before the statement starts to do anything, while
statement-level after triggers fire at the very end of the statement.
Row-level before triggers fire immediately before a particular row is operated on, while row-
level after triggers fire after working on the row (but before any statement-level after triggers).
Trigger Execution Cycle
Trigger is invoked on event of performing an update (insert/update/delete) operation on a
relation (or on a view).
Before triggers are fired before executing the operation. Also returns true/false, that is used to
decide if the operation is to be carried out or not.
If the trigger is row level it is executed for every affected row.
After triggers are fired after executing the operation on affected row (if it is row level trigger),
or after finishing the operation on all rows, if it is a statement level trigger.
6 | Programming Databases
Application examples of triggers
Row Level Before trigger can typically be used to enforce complex data validation rules before
adding/modifying/deleting a tuple (row). For example a student cannot take more than 22
credits in a semester, or selection of courses etc; it can be used to implement complex
constraints like this.
Another example is “save old values before making any change in a row”, or for logging events
that are happening on every row of database.
Row Level After: Can be used to make additional updates, for example computing some
derived attribute, or updating other attributes for example updating stock on sales
Statement Level Before: Can be to ensure “assertions”, do some initialization
Statement Level After: Some updates may not be done at row level, instead done after
statement for example posting into account ledger on sale or purchase; sending some alerts, etc.
For some cleaning, for example deletion after copying or so.
Note that many of these works can also be done in host programming languages, but rules that
are found to be associated with the databases only, irrespective of application; in that case
triggers are better option.
Creating Triggers
A trigger description contains three parts: Event, Condition, and Action
Event: A database event that activates the trigger; for example “UPDATing of relation
EMPLOYEE”
When (one of following):
o Before Statement,
o Before Row
o After Statement
o After Row
Action: A code that is to be executed when event occurs
For creating triggers in PostgreSQL, you can refer:
https://www.postgresql.org/docs/current/plpgsql-trigger.html
Also found a blog: https://estuary.dev/blog/postgresql-triggers/
7 | Programming Databases
INSTEAD OF trigger
Instead of trigger is invoked “instead of performing the operation”. That is requested operation
is never performed; instead, the action defined in the trigger is executed.
INSTEAD OF trigger specifies that action is to be executed instead of activating event.
This trigger, normally defined to make the view updatable; when an attempt is made to
INSERT/UPDATE/DELETE onto a view, alternate action is taken instead of actually
performing the event on the view
Example: https://neon.tech/postgresql/postgresql-triggers/postgresql-instead-of-triggers
8 | Programming Databases
Different facets of accessing databases in a program
In any real application we write more than one type of programs. Considering following scenario of
DA-IICT inviting applications for admission, you should be able to identify what functionality
is implemented using what type of program or so-
Student Applies: showing up blank application form - will require getting various inputs from
databases.
Applicant enters his/her details. On submission, various validation happens, may be by looking
into databases.
Finally, new tuples (for new applicant) are inserted into relevant relations.
DA-IICT prepares its own merit list; a process updates few attributes of applications relation.
DA-IICT allocates seats as per available sheet matrix (program wise, category wise, etc);
updates again some fields of some relations.
Following are different types of programs that manipulate databases. Each type of programs has
their own place and suitability.
SQL queries
o SQL queries are written as “declarative relation manipulation expressions” in SQL
o Submitted from the client (a console, GUI client, or a program)
o Executed on DBMS
o SQL is not enough to perform complex data manipulations. Therefore, we often do some
programming on top of SQL.
o Following are such options.
Stored Procedures
o A Procedure (or function) sitting as database element
o Has access to all database elements like tables, views, etc.
o Run on DBMS server therefore no movement of data is required from database to the
program, typically over network.
o More coming ahead!
General programming language programs accessing databases through API
o Most popular way of manipulating database.
o Most application are written in Programming languages like Java, Python, C++, C#,
PHP, or so on.
o Programs written in these languages access (read/write) databases through standard
programming interfaces JDBC, Python DB-API, MS-ADO, ODBC, etc.
Embedded SQL: SQL is permitted in programming
o Oracle’s Pro*C, PostgreSQL’s ECPG, and SQLJ are such environments
9 | Programming Databases
Program accessing databases through Native Programming Interfaces:
o Each DBMS typically provide a lower level access to databases – typically at physical
level schema. This makes most efficient access of databases.
o Programs written using standard API may typically have to be translated to this level
calls by things like JDBC driver.
o Oracle’s OCI (Oracle Call Interface), PostgreSQL’s (libpq) are such examples.
10 | Programming Databases
API based access to databases
Another approach for accessing remote databases in host program is API based access. API based
access are more canonical, powerful and popular way of manipulating remote databases.
Over the period of time, various API standards have evolved in most popular programming
environments. Open Database Connectivity (ODBC) was first initiative by Microsoft towards
standardization. Current popular standards are JDBC, Python DB-API, MS ADO.Net.
Like in case of embedded SQL, in this approach too most of the time we perform one or more of
following operations using API-
1. Connect to remote Database
2. Submit SQL statements
3. If executed query returns a row-set, iterate through the result-set and do some processing for
each row.
Example: Python
11 | Programming Databases
Example: Java
12 | Programming Databases
Embedded SQL
• We allow SQL statements in a host program.
• Host program run on “client”?
• SQL run on DBMS server and response is brought to the client – same way your
PgAdmin-IV does?
• Below is a sample embedded SQL in ECPG (a C pre-processor for PostgreSQL)
almost compatible to oracle’s Pro*C
• In embedded SQL, there is notion of “host variables”, these are variables declared
within SQL DECLARE SECTION.
• Host variables can be mixed in SQL statements by pre-fixing colon. Often used to
supply value into SQL statements or are used to collect values returned by executed
SQL statement. For example, above see use of host variable c.
13 | Programming Databases
Prepared Dynamic Query
• A dynamic Query is the one that uses some variables, and complete query expression
is known only at run-time. For example
query = "SELECT * FROM EMPLOYEE WHERE dno = " + dno
Query is stored in a variable query, and becomes fully known at run-time?
• Sometime such a query is repeated run on DBMS with changed value of some
parameter, for example dno in above case.
• When run repeatedly, we can ask dbms to create its “query execution plan” once and
run it for following calls.
• Such queries are called prepared queries.
Benefits of Prepared Queries
Faster Execution – since query plan can be created once and saved, this makes
execution of dynamic query faster (gain is significant, when a dynamic query appears
in a loop or so)
Avoids Run time exceptions
A popular technique to deal with SQL injections
Example: SQL Injection
14 | Programming Databases