CONCEPTS IN DATABASE SYSTEMS - II
Monday, November 4, 2024 Felix Mutua, Ph.D.
Outline for Today’s Lecture
2
SQL
Spatial databases
Implementation
What is SQL?
3
SQL is the standard language for dealing with Relational Databases.
SQL can be used to insert, search, update, and delete database
records.
SQL can do lots of other operations, including optimizing and
maintenance of databases.
SQL stands for Structured Query language, pronounced as "S-Q-L" or
sometimes as "See-Quel"... Relational databases like MySQL, Postgres
Database, Oracle, MS SQL Server, Sybase, etc. use ANSI SQL.
Felix Mutua, PhD
How to Use SQL
4
SQL Code Example:
SELECT * FROM Students WHERE Age > 21
SQL syntaxes used in different databases are almost similar, though few RDBMS use a few different commands and
even proprietary SQL syntaxes.
Felix Mutua, PhD
What is SQL used for?
5
Here are important reasons for using SQL
It helps users to access data in the RDBMS system.
It helps you to describe the data.
It allows you to define the data in a database and manipulate that specific
data.
With the help of SQL, you can create and drop databases and tables.
SQL offers you to use the function in a database, create a view, and stored
procedure.
You can set permissions on tables, procedures, and views.
Felix Mutua, PhD
A Brief History of SQL
6
1970 - Dr. Edgar F. "Ted" Codd described a relational model for
databases.
1974 - Structured Query Language appeared.
1978 - IBM released a product called System/R.
1986 - IBM developed the prototype of a relational database, which is
standardized by ANSI.
1989 - First ever version launched of SQL
1999 - SQL 3 launched with features like triggers, object-orientation, etc.
SQL 2003 - window functions, XML-related features, etc.
SQL 2006 - Support for XML Query Language
SQL 2011 -improved support for temporal databases
Felix Mutua, PhD
Types of SQL Statements
7
Five types of widely used SQL queries.
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)
Data Query Language (DQL)
Felix Mutua, PhD
Types of SQL Statements
8
Data Definition Language (DDL)
Create/alter/delete tables and their attributes
Following lectures...
Data Manipulation Language (DML)
Query one or more tables – discussed next !
Insert/delete/modify tuples in tables
Transact-SQL
TCL commands deal with the transaction within the database. E.g coomit, Rollback, etc
Data Control Language (DCL)
commands such as GRANT and REVOKE which mainly deal with the rights, permissions
and other controls of the database system.
Data Query Language (DQL)
used for performing queries on the data within schema objects. E.g SELECT
List of SQL Commands
9
Some of the most commonly used SQL commands:
CREATE - defines the database structure schema
INSERT - inserts data into the row of a table
UPDATE - updates data in a database
DELETE - removes one or more rows from a table
SELECT - selects the attribute based on the condition described by the
WHERE clause
DROP - removes tables and databases
Felix Mutua, PhD
SQL Language elements
10
Keywords: Each SQL statement contains single or multiple keywords.
Identifiers: Identifiers are names of objects in the database, like user IDs, tables, and columns.
Strings: Strings can be either literal strings or expressions with VARCHAR or CHAR data types.
Expressions: Expressions are formed from several elements, like constants, SQL operators, column
names, and subqueries.
Search Conditions: Conditions are used to select a subset of the rows from a table or used to control
statements like an IF statement to determine control of flow.
Special Values: Special values should be used in expressions and as column defaults when building
tables.
Variables: Sybase IQ supports local variables, global variables, and connection-level variables.
Comments: Comment is another SQL element which is used to attach explanatory text to SQL
statements or blocks of statements. The database server does not execute any comment.
NULL Value: Use NULL, which helps you to specify a value that is unknown, missing, or not applicable.
Felix Mutua, PhD
What is NoSQL?
11
NoSQL is an upcoming category of Database Management Systems. Its
main characteristic is its non-adherence to Relational Database Concepts.
NoSQL means "Not only SQL".
The concept of NoSQL databases grew with internet giants such as Google,
Facebook, Amazon etc. who deal with gigantic volumes of data.
NoSQL database is non-relational databases that scale-out better than
relational databases and are designed with web applications in mind. They
do not use SQL to query the data and do not follow strict schemas like
relational models. With NoSQL, ACID (Atomicity, Consistency, Isolation,
Durability) features are not guaranteed always.
Felix Mutua, PhD
12 Using SQL
Felix Mutua, PhD
Data Types in SQL
13
Characters:
CHAR(20) -- fixed length
VARCHAR(40) -- variable length
Numbers:
BIGINT, INT, SMALLINT, TINYINT
REAL, FLOAT -- differ in precision
MONEY
Times and dates:
DATE
DATETIME -- SQL Server
Others... All are simple
Tables in SQL
14
Product Attribute names
Table name
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Tuples or rows
Tables Explained
15
A tuple = a record
Restriction: all attributes are of atomic type
A table = a set of tuples
Like a list…
…but it is unordered: no first(), no next(), no last().
Tables Explained
16
The schema of a table is the table name and its attributes:
Product(PName, Price, Category, Manfacturer)
A key is an attribute whose values are unique;
we underline a key
Product(PName, Price, Category, Manfacturer)
SQL Query
17
Basic form: (plus many many more bells and whistles)
SELECT attributes
FROM relations (possibly multiple)
WHERE conditions (selections)
Simple SQL Query
18
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT *
FROM Product
WHERE category=‘Gadgets’
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
“selection” Powergizmo $29.99 Gadgets GizmoWorks
Simple SQL Query
19
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
PName Price Manufacturer
“selection” and SingleTouch $149.99 Canon
“projection” MultiTouch $203.99 Hitachi
A Notation for SQL Queries
20
Input Schema
Product(PName, Price, Category, Manfacturer)
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
Answer(PName, Price, Manfacturer)
Output Schema
Selections
21
What goes in the WHERE clause:
x = y, x < y, x <= y, etc
For number, they have the usual meanings
For CHAR and VARCHAR: lexicographic ordering
Expected conversion between CHAR and VARCHAR
For dates and times, what you expect...
Pattern matching on strings...
The LIKE operator
22
s LIKE p: pattern matching on strings
p may contain two special symbols:
% = any sequence of characters
_ = any single character
Product(PName, Price, Category, Manufacturer)
Find all products whose name mentions ‘gizmo’:
SELECT *
FROM Products
WHERE PName LIKE ‘%gizmo%’
Eliminating Duplicates
23
Category
SELECT DISTINCT category Gadgets
FROM Product
Photography
Household
Compare to:
Category
Gadgets
SELECT category
Gadgets
FROM Product
Photography
Household
Ordering the Results
24
SELECT pname, price, manufacturer
FROM Product
WHERE category=‘gizmo’ AND price > 50
ORDER BY price, pname
Ordering is ascending, unless you specify the DESC keyword.
Ties are broken by the second attribute on the ORDER BY list, etc.
Ordering the Results
25
SELECT category
FROM Product
ORDER BY pname
PName Price Category Manufacturer
?
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Ordering the Results
26
Category
SELECT DISTINCT category Gadgets
FROM Product
Household
ORDER BY category
Photography
Compare to:
?
SELECT DISTINCT category
FROM Product
ORDER BY pname
Joins in SQL
27
Connect two or more tables:
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Company Cname StockPrice Country
What is GizmoWorks 25 USA
the connection Canon 65 Japan
between
them ? Hitachi 15 Japan
Joins
28
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)
Find all products under $200 manufactured in Japan;
return their names and prices.
Join
between Product
and Company
SELECT pname, price
FROM Product, Company
WHERE manufacturer=cname AND country=‘Japan’
AND price <= 200
Joins in SQL
29
Product Company
PName Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi
SELECT pname, price
FROM Product, Company
WHERE manufacturer=cname AND country=‘Japan’
AND price <= 200
PName Price
SingleTouch $149.99
Joins
30
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)
Find all countries that manufacture some product in the ‘Gadgets’ category.
SELECT country
FROM Product, Company
WHERE manufacturer=cname AND category=‘Gadgets’
Joins in SQL
31
Product Company
Name Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi
SELECT country
FROM Product, Company
WHERE manufacturer=cname AND category=‘Gadgets’
Country
What is ??
the problem ? ??
What’s the
solution ?
Joins
32
Product (pname, price, category, manufacturer)
Purchase (buyer, seller, store, product)
Person(persname, phoneNumber, city)
Find names of people living in Seattle that bought some product in the
‘Gadgets’ category, and the names of the stores they bought such product
from
SELECT DISTINCT persname, store
FROM Person, Purchase, Product
WHERE persname=buyer AND product = pname AND
city=‘Seattle’ AND category=‘Gadgets’
When are two tables related?
33
Foreign keys are a method for schema designers to tell you so
A foreign key states that a column is a reference to the key of another table
ex: Product.manufacturer is foreign key of Company
Gives information and enforces constraint
Disambiguating Attributes
34
Sometimes two relations have the same attr:
Person(pname, address, worksfor)
Company(cname, address)
Which
SELECT DISTINCT pname, address address ?
FROM Person, Company
WHERE worksfor = cname
SELECT DISTINCT Person.pname, Company.address
FROM Person, Company
WHERE Person.worksfor = Company.cname
Tuple Variables
35
Product (pname, price, category, manufacturer)
Purchase (buyer, seller, store, product)
Person(persname, phoneNumber, city)
Find all stores that sold at least one product that the store
‘BestBuy’ also sold:
SELECT DISTINCT x.store
FROM Purchase AS x, Purchase AS y
WHERE x.product = y.product AND y.store = ‘BestBuy’
Answer (store)
Tuple Variables
36
General rule:
tuple variables introduced automatically by the system:
Product (name, price, category, manufacturer) SELECT name
FROM Product
WHERE price > 100
Becomes: SELECT Product.name
FROM Product AS Product
WHERE Product.price > 100
Doesn’t work when Product occurs more than once:
In that case the user needs to define variables explicitly.
37 RDBMS software
Felix Mutua, PhD
Sampled List of clients
38
Commercial Open Source (Free)
Oracle MySQL (partially free)
Microsoft SQL Server PostgreSQL (free software)
PostgreSQL (free software) SQLite (free software)
IBM Db2 MariaDB (free software)
Microsoft Access
Hive
Teradata
Microsoft Azure SQL Database
Felix Mutua, PhD
MySQL
39
MySQL is an open-source relational database management system (RDBMS).
Its name is a combination of "My", the name of co-founder Michael Widenius's
daughter, and "SQL", the abbreviation for Structured Query Language.
MySQL is free and open-source software under the terms of the GNU General
Public License, and is also available under a variety of proprietary licenses.
MySQL has stand-alone clients that allow users to interact directly with a MySQL
database using SQL, but more often, MySQL is used with other programs to
implement applications that need relational database capability.
MySQL is used by many database-driven web applications, including Drupal,
Joomla, phpBB, and WordPress.
Felix Mutua, PhD
PostgreSQL
40
PostgreSQL is a powerful, open source object-relational database system
with over 30 years of active development that has earned it a strong
reputation for reliability, feature robustness, and performance.
PostgreSQL comes with many features aimed to help developers build
applications, administrators to protect data integrity and build fault-
tolerant environments, and help you manage your data no matter how big
or small the dataset.
In addition to being free and open source, PostgreSQL is highly extensible.
For example, you can define your own data types, build out custom
functions, even write code from different programming languages without
recompiling your database!
Felix Mutua, PhD
Features found in PostgreSQL
41
Data Types
Primitives: Integer, Numeric, String, Boolean
Structured: Date/Time, Array, Range, UUID
Document: JSON/JSONB, XML, Key-value (Hstore)
Geometry: Point, Line, Circle, Polygon
Customizations: Composite, Custom Types
Data Integrity
UNIQUE, NOT NULL
Primary Keys
Foreign Keys
Exclusion Constraints
Explicit Locks, Advisory Locks
Felix Mutua, PhD
Features found in PostgreSQL
42
Extensibility
Stored functions and procedures
Procedural Languages: PL/PGSQL, Perl, Python (and many more)
SQL/JSON path expressions
Foreign data wrappers: connect to other databases or streams with a
standard SQL interface
Customizable storage interface for tables
Many extensions that provide additional functionality, including PostGIS
Felix Mutua, PhD
PostGIS
43
PostGIS is a spatial database extender for PostgreSQL object-
relational database. It adds support for geographic objects allowing
location queries to be run in SQL.
In addition to basic location awareness, PostGIS offers many features
rarely found in other competing spatial databases such as Oracle
Locator/Spatial and SQL Server.
Felix Mutua, PhD
PostGIS Features
44
Processing and analytic functions for both vector and raster data for
splicing, dicing, morphing, reclassifying, and collecting/unioning with the
power of SQL
raster map algebra for fine-grained raster processing
Spatial reprojection SQL callable functions for both vector and raster data
Support for importing / exporting ESRI shapefile vector data via both
commandline and GUI packaged tools and support for more formats via
other 3rd-party Open Source tools
Packaged command-line for importing raster data from many standard
formats: GeoTiff, NetCDF, PNG, JPG to name a few
Felix Mutua, PhD
PostGIS Features contd..
45
Rendering and importing vector data support functions for standard textual
formats such as KML,GML, GeoJSON,GeoHash and WKT using SQL
Rendering raster data in various standard formats GeoTIFF, PNG, JPG,
NetCDF, to name a few using SQL
Seamless raster/vector SQL callable functions for extrusion of pixel values
by geometric region, running stats by region, clipping rasters by a
geometry, and vectorizing rasters
3D object support, spatial index, and functions
Network Topology support
Packaged Tiger Loader / Geocoder/ Reverse Geocoder / utilizing US
Census Tiger data
Felix Mutua, PhD
46 Introduction to Spatial Databases
Felix Mutua, PhD
Definition
47
A spatial database is a database that is optimized for storing and
querying data that represents objects defined in a geometric space.
Most spatial databases allow the representation of simple geometric
objects such as points, lines and polygons.
Some spatial databases handle more complex structures such as 3D objects,
topological coverages, linear networks, and TINs (triangulated irregular
network).
While typical databases have developed to manage various numeric and
character types of data, such databases require additional functionality to
process spatial data types efficiently, and developers have often added
geometry or feature data types.
Felix Mutua, PhD
Characteristics of Spatial Databases
48
Database systems use indexes to quickly look up values; however, this
way of indexing data is not optimal for spatial queries.
Instead, spatial databases use a spatial index to speed up database
operations.
In addition to typical SQL queries such as SELECT statements, spatial
databases can perform a wide variety of spatial operations
Felix Mutua, PhD
Characteristics of Spatial Databases
49
The following operations and many more are specified by the Open Geospatial
Consortium standard:
Spatial Measurements: Computes line length, polygon area, the distance between
geometries, etc.
Spatial Functions: Modify existing features to create new ones, for example by providing
a buffer around them, intersecting features, etc.
Spatial Predicates: Allows true/false queries about spatial relationships between
geometries. Examples include "do two polygons overlap" or 'is there a residence located
within a mile of the area we are planning to build the landfill?' (see DE-9IM)
Geometry Constructors: Creates new geometries, usually by specifying the vertices (points
or nodes) which define the shape.
Observer Functions: Queries which return specific information about a feature such as the
location of the center of a circle
Felix Mutua, PhD
Spatial Database Management System
Spatial Database Management System (SDBMS) provides the
capabilities of a traditional database management system (DBMS)
while allowing special storage and handling of spatial data.
SDBMS:
Works with an underlying DBMS
Allows spatial data models and types
Supports querying language specific to spatial data types
Provides handling of spatial data and operations
SDBMS Three-layer Structure
SDBMS works with a spatial
application at the front end and
Core Spatial
a DBMS at the back end Functionality
Interface to spatial application
SDBMS has three layers: Taxonomy
Spatial application
Interface to DBMS
Data types
Interface to spatial application
DBMS
Operations
Query language
Core spatial functionality Algorithms
Access methods
Interface to DBMS
Spatial Query Language
Number of specialized adaptations of SQL
Spatialquery language
Temporal query language (TSQL2)
Object query language (OQL)
Object oriented structured query language (O2SQL)
Spatial query language provides tools and structures specifically for
working with spatial data
SQL3 provides 2D geospatial types and functions
Spatial Query Language Operations
Three types of queries:
Basicoperations on all data types (e.g. IsEmpty, Envelope, Boundary)
Topological/set operators (e.g. Disjoint, Touch, Contains)
Spatial analysis (e.g. Distance, Intersection, SymmDiff)
Spatial Data Entity Creation
Form an entity to hold county names, states, populations, and geographies
CREATE TABLE County(
Name varchar(30),
State varchar(30),
Pop Integer,
Shape Polygon);
Form an entity to hold river names, sources, lengths, and geographies
CREATE TABLE River(
Name varchar(30),
Source varchar(30),
Distance Integer,
Shape LineString);
Example Spatial Query
Find all the counties that border on Contra Costa county
SELECT C1.Name
FROM County C1, County C2
WHERE Touch(C1.Shape, C2.Shape) = 1 AND C2.Name = ‘Contra Costa’;
Find all the counties through which the Merced river runs
SELECT C.Name, R.Name
FROM County C, River R
WHERE Intersect(C.Shape, R.Shape) = 1 AND R.Name = ‘Merced’;
CREATE TABLE County( CREATE TABLE River(
Name varchar(30), Name varchar(30),
State varchar(30), Source varchar(30),
Pop Integer, Distance Integer,
Shape Polygon); Shape LineString);
Examples
A database:
Relation counties(sname: string, area: region, spop: int)
Relation cities(cname: string, center: point; ext: region)
Relation rivers(rname: string, route:line)
SELECT * FROM rivers WHERE route intersects R
SELECT cname, sname FROM cities, counties WHERE center inside area
SELECT rname, length(intersection(route, Kisumu)) FROM rivers WHERE route intersects
Kisumu
Spatial Queries
Selection queries: “Find all objects inside query q”,
inside-> intersects, north
Nearest Neighbor-queries: “Find the closest object
to a query point q”, k-closest objects
Spatial join queries: Two spatial relations S1 and S2, find
all pairs: {x in S1, y in S2, and x rel y= true}, rel= intersect,
inside, etc
Assignment 04
58
Each student will choose a system for a database design (based on the
previous assignment). You are required to:
1. Design
a) Prepare a revised ER diagram using pgAdmin 4 tool
b) Forward engineer the diagram to SQL
c) Execute the SQL
The revised diagram, the SQL text and an empty database backup(sql)
should be zipped together and uploaded to
https://forms.gle/EL3CRSWhjFuMfeTb7
Deadline next week.
Felix Mutua, PhD