Topic 5: Data and Databases
Data & Information
What is Data?
Data is a raw and unorganized fact that required to be processed to make it
meaningful. Data can be simple at the same time unorganized unless it is organized.
Generally, data comprises facts, observations, perceptions numbers, characters,
symbols, image, etc.
Data is always interpreted, by a human or machine, to derive meaning. So, data is
meaningless. Data contains numbers, statements, and characters in a raw form.
What is Information?
Information is a set of data which is processed in a meaningful way according to the
given requirement. Information is processed, structured, or presented in a given
context to make it meaningful and useful.
It is processed data which includes data that possess context, relevance, and
purpose. It also involves manipulation of raw data.
Information assigns meaning and improves the reliability of the data.
Structured and Unstructured Data
The term structured data refers to data that resides in a fixed field within a file or
record. Structured data is typically stored in a relational database (RDBMS). It can
consist of numbers and text, and sourcing can happen automatically or manually, as
long as it's within an RDBMS structure.
Unstructured data is more or less all the data that is not structured. Even though
unstructured data may have a native, internal structure, it's not structured in a
predefined way. There is no data model; the data is stored in its native format.
1. Structured data is clearly defined and searchable types of data, while
unstructured data is usually stored in its native format.
2. Structured data is quantitative, while unstructured data is qualitative.
1
3. Structured data is often stored in data warehouses, while unstructured data is
stored in data lakes.
4. Structured data is easy to search and analyze, while unstructured data
requires more work to process and understand.
5. Structured data exists in predefined formats, while unstructured data is in a
variety of formats.
Structured Data
A database is a collection of data that is organized so that it can be easily accessed,
managed and updated.
The database management system (DBMS) is the software that interacts with end
users, applications, and the database itself to capture and analyze the data.
The relational model was introduced by E.F. Codd in 1970 as a way to make database
management systems more independent of any particular application.
Tables
In a relational database, all data is held in tables, which are made up
of rows and columns.
Each table has one or more columns, and each column is assigned a
specific datatype, such as an integer number, a sequence of characters (for text), or
a date. Each row in the table has a value for each column.
The relational model represents the database as a collection of relations. A relation
is nothing but a table of values.
2
Relational Integrity constraints is referred to conditions which must be present for a
valid relation.
Ex: 1
Primary key
The “id” column is what’s known as a primary key: a unique and non-null number
that refers to each record. Every record/row in a relational table has a primary key.
Linking tables with Foreign Keys
When designing a database, the key decisions are what data do you want to store
and what relationship exists between them. Right now we just have a customers
table, but let’s say we also want to store orders because we’re building an e-
commerce website.
This means we need a separate table for Orders. In order to associate a specific
order with a specific customer, we can use a foreign key relationship through
the customer_id field to link the two tables.
3
Ex: 2
Relationship Types (Cardinality)
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one
entity of entity set B and vice versa.
One-to-many (or Many to One)− One entity from entity set A can be associated
with more than one entities of entity set B however an entity from entity set B,
can be associated with at most one entity.
Many-to-many − One entity from A can be associated with more than one entity
from B and vice versa.
4
Structured Query Language (SQL)
SQL lets you access and manipulate databases. Most of the actions you need to
perform on a database are done with SQL statements.
Four basic update operations performed on relational database model are:
Insert, update, delete and select
Insert is used to insert data into the relation
Update allows you to change the values of some attributes in existing tuples.
Delete is used to delete tuples from the table.
Select allows you to choose a specific range of data.
5
Data Definition Language (DDL)
DDL is a part of SQL used to create or alter tables.
To create a database:
CREATE DATABASE SNIC;
To create a table:
CREATE TABLE Students (
StudID int,
LastName varchar(25),
FirstName varchar(20),
Address varchar(60),
Age int
);
CREATE TABLE Students (
StudID int NOT NULL PRIMARY KEY,
LastName varchar(25) NOT NULL,
FirstName varchar(20),
Address varchar(60),
Age int
)
CREATE TABLE Exam (
IndexNo int NOT NULL PRIMARY KEY,
Marks1 int,
Marks2 int,
Marks3 int,
StudID int FOREIGN KEY REFERENCES Students(StudID)
);
Data Manipulation Language (DML)
DML is used for adding (inserting), deleting, and modifying (updating) data in
a database.
Inserting Data
Table Name: Customer
1st Method (without field names)
6
Insert into Customer (1,”Saman”, 0112234556,” Negombo”,45);
Insert into Customer (2,”Nihal”, 0716782350,” Chilaw”,90);
Insert into Customer (3,”Silva”, 0778907678,” Katana”,89);
2nd Method (with field names)
Insert into Customer (CustomerNo,CustomerName,Telephone,City,Bill_No) values (4,”Namal”,
0117899980,” Negombo”,91);
Customer
CustomerNo CustomerName Telephone City Bill_No
1 Saman 0112234556 Negombo 45
2 Nihal 0716782350 Chilaw 90
3 Silva 0778907678 Katana 89
4 Namal 0117899980 Negombo 91
Querying Data
SELECT CustomerName,City FROM Customer;
CustomerName City
Saman Negombo
Nihal Chilaw
Silva Katana
Namal Negombo
SELECT CustomerName,City FROM Customer where City=’Chilaw’;
CustomerName City
Nihal Chilaw
SELECT * FROM Customer where City=’Chilaw’;
CustomerNo CustomerName Telephone City Bill_No
2 Nihal 0716782350 Chilaw 90
SELECT CustomerNo FROM Customer where City=’Chilaw’ and ‘Negombo’;
CustomerNo
1
2
4
SELECT CustomerNo,City FROM Customer where CustomerNo > 2;
CustomerNo City
3 Katana
4 Negombo
7
Link Tables
Joins and Unions can be used to combine data from one or more tables. The
difference lies in how the data is combined.
Joins combine data into new columns.
If two tables are joined together, then the data from the first table is shown in one set
of column alongside the second table’s column in the same row.
Union combine data into new rows.
If two tables are “unioned” together, then the data from the first table is in one set of
rows, and the data from the second table in another set. The rows are in the same
result.
SELECT 23 AS StNo
UNION
SELECT 45 AS StNo;
StNo
23
45
SELECT * FROM
8
(SELECT 23 AS StNo) AS StNo1
JOIN
(SELECT 45 AS StNo) AS StNo2
ON (33=33);
StNo1 StNo2
23 45
Wildcard Characters
Symbol Description Example
Represents zero or more bl% finds bl, black, blue, and blob
% characters
Represents a single h_t finds hot, hat, and hit
_ character
Represents any single h[oa]t finds hot and hat, but not hit
[] character within the brackets
Represents any character h[^oa]t finds hit, but not hot and hat
^ not in the brackets
Represents a range of c[a-b]t finds cat and cbt
- characters
Practice SQL coding related to Wildcard Characters.
Grouping, Ordering and Counting Data
Consider the following “Customer” table.
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
9
Answer:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
Answer:
10