Database management systems (DBMS)
File based approach for storage and retrieval of large volume of data
Flat files
1. Flat files contains one table at a time generally
2. Flat files contains values at each row and separated with a special symbol so to reach the
data you have to parse each row and obtaining an array of values and then you can query the
data.
3. to control the data in a file, you have to read it line by line and parse it and because of it
they are slow
4. there is no control mechanism in files
Placing data in a flat file database has the following advantages
• All records. are stored in one place
• Easy to set up using a number of standard office applications
• Easy to understand
• Simple sorting of records can be carried out
• Record can be viewed or extracted on the basis of simple criteria
Everyday things like business contacts, customer lists and so on can be stored and used in a flat file
database.
Flat File Structure
All individual property data is in one file. The Tax No.is required to search the database.
1
Disadvantages of using flat fife(file-based) approach to store data
1. Potential duplication. With perhaps thousands of records in a file, it can be a very
tedious process to spot duplicated records. Especially if more than one person is maintaining
the table.
2. Non-unique records. Notice that Mr & Mrs Jones have identical ID's '2'. There is
nothing in a flat file system to stop this. But it is a very poor idea to have identical ID's in a
database.
3. Harder to update. Suppose this table now needs to also store their work details - this
will result in multiple records for each person. Again, this is fine - but suppose Sandra Jones
now wanted to be known as 'Sandra Thompson'? Now multiple records need to be updated.
4. Inherently inefficient. What if an email field needs to be added?. If there are tens of
thousands of records, there may be many people having no email address, but every record in
a flat file database has to have the same fields, whether they are used or not.
5. Harder to change data format. Suppose the telephone numbers now have to have a
dash between the area code and the rest of the number, like this 0223-44033. Adding that
extra dash over tens of thousands of records would be a significant task in a flat file database.
6. Poor at complex queries.Flat files are excellent for simple filtering. For example,
show all records where tile field 'City' contains Hull. But for anything a bit more complicated,
a flat file becomes very difficult to use. For example, find all records whose post code
contains '23'.
7. Almost no security.You can protect a flat file database using a password for opening
it. But once it is open, that person can usually see all fields. This is often not a good thing, for
example there may be a confidential field containing their salary that only some people should
be able to see.
For all these reasons, a relational database may be a better option..
Relational database
To overcome the limitations of a simple flat file database that has only a single table, another type
of database has been developed called a 'relational database'.
A relational database holds its data over a number of tables instead of one. Records within the
tables are linked (related) to records held in other tables.
The picture below shows two tables. The main one is called 'customers'. This contains almost the
same fields as we have seen in the flat file database. But there is one key difference - the city is
now held in a separate table called 'city'. The line between them shows there is a link (relationship)
between a record in the city table and records in the main table.
The line between the fields has a ‘1’ on one side and the infinity sign n the other.
This is called a one-to-many relationship.
After creating the relationship, if you open the table city, it will look like similar to following
screen shot. If you click the ‘+’ symbol, it will show the customers in that particular city.
Features of a relational database
Splitting data into a number of related tables brings many advantages over a flat file database.
These include:
1. Data is only stored once. In the previous example, the city data was gathered into one table so
now there is only one record per city. The advantages of this are
No multiple record changes needed
More efficient storage
Simple to delete or modify details.
All records in other tables having a link to that entry will show the change.
2. Complex queries can be carried out. A language called SQL has been developed to allow
programmers to 'Insert', 'Update', 'Delete', 'Create', 'Drop' table records. These actions are further
refined by a 'Where' clause. For example
SELECT * FROM Customer WHERE ID = 2
This SQL statement will extract record number 2 from the Customer table. Far more complicated
queries can be written that can extract data from many tables at once.
3. Better security. By splitting data into tables, certain tables can be made confidential. When a
person logs on with their username and password, the system can then limit access only to those
tables whose records they are authorised to view. For example, a receptionist would be able to view
employee location and contact details but not their salary. A salesman may see his team's sales
performance but not competing teams.
4. Cater for future requirements. By having data held in separate tables, it is simple to add
records that are not yet needed but may be in the future. For example, the city table could be
expanded to include every city and town in the country, even though no other records are using
them all as yet. A flat file database cannot do this.
Summary - advantages of a relational database over flat file
Avoids data duplication
Avoids inconsistent records
Easier to change data
Easier to change data format
Data can be added and removed easily
Easier to maintain security.
Data dictionary
In database management systems, a file that defines the basic organization of a database. A
data dictionary contains a list of all files in the database, the number of records in each file,
and the names and types of each field. Most database management systems keep the data
dictionary hidden from users to prevent them from accidentally destroying its contents.
Data dictionaries do not contain any actual data from the database, only book keeping
information for managing it. Without a data dictionary, however, a database management
system cannot access data from the database.
A 'data dictionary' describes the structure and attributes of data 'items' to be used within a software
application (usually a database).
A 'data dictionary' includes the names and descriptions of the tables and the fields contained in each
table.It also documents information about the data type, field length and other things such as
validation.
The main purpose of the data dictionary is to provide metadata or information about data.
Technically,it is a database about a database.
Relational database
Each database is a collection of related tables; hence the name "relational database". Each table is
a physical representation of an entity or object that is in a tabular format consisting of columns
and rows. Columns are the fields of a record or the attributes of an entity. The rows contain the
values or data instances; these are also called records or tuples.
Primary key
Primary key uniquely specifies within a table.
Foreign key
A foreign key is a field in a relational table that matches the primary key column of another table.
The foreign key can be used to cross-reference tables. Foreign keys need not have unique values in
the referencing relation.
Candidate key
A candidate key is a column, or set of columns, in a table that can uniquely identify any database
record without referring to any other data. Each table may have one or more candidate keys, but one
candidate key is special, and it is called the primary key. This is usually the best among the
candidate keys. When a key is composed of more than one column, it is known as a composite key.
Secondary key
A secondary key is made on a field that you would like to be indexed for faster searches. A
table can have more than one secondary key .When database become large, possibly with
hundreds of thousands of records, they can take a while to search. A database can be set up so
that it can be searched more efficiently.
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one
attribute (column) of a relation (table) to exist as a value of another attribute in a different (or
the same) relation (table).For referential integrity to hold in a relational database, any field in
a table that is declared a foreign key can contain either a null value, or only values from a
parent table's primary key or a candidate key. In other words, when a foreign key value is used
it must reference a valid, existing primary key in the parent table.
Normalization
Database normalization is the process of organizing the fields and tables of a relational database to
minimize redundancy. Normalization usually involves dividing large tables into smaller (and less
redundant) tables and defining relationships between them. The objective is to isolate data so that
additions, deletions, and modifications of a field can be made in just one table and then
propagated through the rest of the database using the defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization
and what we now know as the First Normal Form ( 1NF) in 1970. Codd went on to define
the Second Normal Form (2NF) and Third Normal Form
(3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal
Form (BCNF) in 1974. Informally, a relational database table is often described as
"normalized" if it is in the Third Normal Form.
1st Normal Form
A database is in first normal form if it satisfies the conditions:
• Contains no atomic values
• There are no repeating groups
An atomic value is a value that cannot be divided. For example, in the table shown below, the
values in the [Color] column in the first row can be divided into "red"and "green", hence
[TABLE_PRODUCT] is not in 1NF.
A repeating group means that a table contains two or more columns that are closely related. For
example, a table that records data on a book and its author(s) with the following columns: [Book
ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and [Author
3] are all repeating the same attribute.
1st Normal Form Example
How do we bring an un-normalized table into first normal form? Consider the following
example:
TABLE_PRODUCT
Product ID Color Price
1 red, green 15.99
2 yellow 23.99
3 green 17.50
4 yellow, blue 9.99
5 red 29.99
To bring this table to first normal form, we split the table into two tables and now we
have the resulting tables:
TABLE_PRODUCT_PRICE TABLE PRODUCT_COLOR
Product ID Price Product ID Color
1 15.99 1 red
2 23.99 1 green
3 17.50 2 yellow
4 9.99 3 green
5 29.99 4 yellow
4 blue
5 red
Now first normal form is satisfied, as the columns on are all hold just one
value.
2nd Normal Form
A database is in second normal form if it satisfies the following conditions:
It is in first normal form
All non-key attributes are fully functional dependent on the primary key
In a table, if attribute B is functionally dependent on A, but is not functionally
dependent on a proper subset of A, then B is considered fully functional dependent on
A. Hence, in a 2NF table, all non-key attributes cannot be dependent on a subset of the
primary key. Note that if the primary key is not a composite key, all non-key attributes
are always fully functional dependent on the primary key. A table that is in 1st normal
form and contains only a single key as the primary key is automatically in 2nd normal
form
2nd Normal Form Example
Consider the following example:
TABLE_PUR CHASE_DETAIL
Customer ID Store ID Purchase Location
1 1 Los Angeles
1 3 8an Francisco
2 1 Los Angeles
3 2 New York
4 3 8an Francisco
This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only
part of the primary key. Therefore, this table does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and
now we have the following:
TABLE_PURCHASE TABLE_STORE
Customer Store ID Store ID Purchase Location
ID 1 Los Angeles
1 1
2 New York
1 3
3 San Francisco
2 1
3 2
4 3
What we have done is to remove the partial functional dependency that we initially had. Now, in
the table [TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key
of that table, which is [Store ID].
3rd Normal Form
A database is in third normal form if it satisfies the following conditions:
It is in second normal form
There is no transitive functional dependency
By transitive functional dependency, we mean we have the following relationships in the table: A
is functionally dependent on B, and B is functionally dependent on C.
In this case, C is transitively dependent on A via B.
3rd Normal Form Example
Consider the following example:
TABLE_B OOK _DETAIL
Book ID Genre ID Genre Type Price
1 1 Gardening 25.99
2 2 Sports 14.99
3 1 Gardening 10.00
4 3 Travel 12.99
5 2 Sports 17.99
In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional
dependency, and this structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as follows:
TABLE_BOOK TABLE_GENRE
Book 10 Genre ID Price Genre ID Genre Type
1 1 25.99 1 Gardening
2 2 14.99 2 Sports
3 1 10.00 3 Travel
4 3 12.99
5 2 17.99