Introduction to RDBMS
What is DBMS (Database Management System)?
As the name suggests, the Database Management System consists of two parts. They are: 1. Database 2. Management System
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, Sachin, 35 etc). Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent meaningful information.
Rollno
1
Name
Sachin
Age
35
Table or Relation: Collection of related records.
Rollno
1 2 3
Name
Sachin Saurav Rahul
Age
35 32 34
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples or Records.
Database: Collection of related relations. Consider the following collection of tables: Table-1
Rollno
1 2 3
Name
Sachin Saurav Rahul
Age
35 32 34
Table-2
Rollno
1 2 3
Address
Mumbai Kolkatta Bengaluru
Table-3 Rollno
1 2 3
Year
III I II
Table-4
Year
I II III
Hostel
H1 H2 H3
We now have a collection of 4 tables. They can be called a related collection because we can clearly find out that there are some common attributes existing in a selected pair of tables. Because of these common attributes we may combine the data of two or more tables together to find out the complete details of a student. Questions like Which hostel does the youngest student live in? can be answered now, although Age and Hostel attributes are in different tables. In a database, data is organized strictly in row and column format. The rows are called Tuple or Record. The data items within one row may belong to different data types. On the other hand, the columns are often called Domain or Attribute. All the data items within a single attribute are of the same data type.
What is Management System?
A management system is a set of rules and procedures which help us to create organize and manipulate the database. It also helps us to add, modify delete data items in the database. The management system can be either manual or computerized. The management system is important because without the existence of some kind of rules and regulations it is not possible to maintain the database. We have to select the particular attributes which should be included in a particular table; the common attributes to create relationship between two tables; if a new record has to be inserted or deleted then which tables should have to be handled etc. These issues must be resolved by having some kind of rules to follow in order to maintain the integrity of the database.
What is Relational Database Management System (RDBMS)?
A DBMS that is based on relational model is called as RDBMS. Relation model is most successful mode of all three models. Designed by E.F. Codd, relational model is based on the theory of sets and relations of mathematics. Relational model represents data in the form a table. A table is a two dimensional array containing rows and columns. Each row contains data related to an entity such as a student. Each column contains the data related to a single attribute of the entity such as student name. One of the reasons behind the success of relational model is its simplicity. It is easy to understand the data and easy to manipulate. Almost all Database systems that are sold in the market, now- a-days, have either complete or partial implementation of relational model.
Tuple / Row:
A single row in the table is called as tuple. Each row represents the data of a single entity.
Attribute / Column:
A column stores an attribute of the entity. For example, if details of students are stored then student name is an attribute; course is another attribute and so on.
Column Name:
Each column in the table is given a name. This name is used to refer to value in the column.
Table Name:
Each table is given a name. This is used to refer to the table. The name depicts the content of the table.
Primary Key:
A table contains the data related entities. If you take STUDETNS table, it contains data related to students. For each student there will be one row in the table. Each students data in the table must be uniquely identified. In order to identify each entity uniquely in the table, we use a column in the table. That column, which is used to uniquely identify entities (students) in the table, is called as primary key. In case of STUDENTS table (see figure 1) we can use ROLLNO as the primary key as it is not duplicated. So a primary key can be defined as a set of columns used to uniquely identify rows of a table. Some other examples for primary keys are account number in bank, product code of products, and employee number of an employee.
Composite Primary Key:
In some tables a single column cannot be used to uniquely identify entities (rows). In that case we have to use two or more columns to uniquely identify rows of the table. When a primary key contains two or more columns it is called as composite primary key. In figure 2, we have PAYMENTS table, which contains the details of payments made by the students. Each row in the table contains roll number of the student, payment date and amount paid. Neither of the columns can uniquely identify rows. So we have to combine ROLLNO and DP to uniquely identify rows in the table. As primary key is consisting of two columns it is called as composite primary key.
Foreign Key:
In relational model, we often store data in different tables and put them together to get complete information. For example, in PAYMENTS table we have only ROLLNO of the student. To get remaining information about the student we have to use STUDETNS table. Roll number in PAYMENTS table can be used to obtain remaining information about the student. The relationship between entities student and payment is one-to-many. One student may make payment for many times. As we already have ROLLNO column in PAYMENTS table, it is possible to join with STUDENTS table and get information about parent entity (student). Roll number column of PAYMENTS table is called as foreign key as it is used to join PAYMENTS table with STUDENTS table. So, foreign key is the key on the many side of the relationship.
ROLLNO column of PAYMENTS table must derive its values from ROLLNO column of STUDENTS table. When a child table contains a row that doesnt refer to a corresponding parent key, it is called as orphan record. We must not have orphan records, as they are result of lack of data integrity.
Integrity Rules:
Data integrity is to be maintained at any cost. If data loses integrity it becomes garbage. So every effort is to be made to ensure data integrity is maintained. The following are the main integrity rules that are to be followed.
 Domain integrity:
Data is said to contain domain integrity when the value of a column is derived from the domain. Domain is the collection of potential values. For example, column date of joining must be a valid date. All valid dates form one domain. If the value of date of joining is an invalid date, then it is said to violate domain integrity.
 Entity integrity:
This specifies that all values in primary key must be not null and unique. Each entity that is stored in the table must be uniquely identified. Every table must contain a primary key and primary key must be not null and unique.
 Referential Integrity:
This specifies that a foreign key must be either null or must have a value that is derived from corresponding parent key. For example, if we have a table called BATCHES, then ROLLNO column of the table will be referencing ROLLNO column of STUDENTS table. All the values of ROLLNO column of BATCHES table must be derived from ROLLNO column of STUDENTS table. This is because of the fact that no student who is not part of STUDENTS table can join a batch.
Transaction and its ACID properties:
A transaction consists of set of operations that perform a single logical unit of work in a database environment. It may be an entire program, or part of it or a single SQL command and it may involve any number of operations on the database. If the transaction is completed successfully, then the database moves from one consistent state to another. When a transaction processing system creates a transaction, it will ensure that the transaction will have certain characteristics. These characteristics are known as the ACID properties. ACID is an acronym for atomicity, consistency, isolation, and durability.
Atomicity: The atomicity property identifies that the transaction is atomic. An atomic
transaction is either fully completed, or is not begun at all. Any updates that a transaction might affect on a system are completed in their entirety. If for any reason an error occurs
and the transaction is unable to complete all of its steps, the then system is returned to the state it was in before the transaction was started. An example of an atomic transaction is an account transfer transaction. The money is removed from account A then placed into account B. If the system fails after removing the money from account A, then the transaction processing system will put the money back into account A, thus returning the system to its original state.
Consistency: The consistency property ensures that any transaction will bring the
database from one valid state to another. Any data written to the database must be valid according to all defined rules. The transaction should leave the database in a consistent state, whether or not it completed successfully.
Isolation: Data modifications made by one transaction must be isolated from the data
modifications made by all other transactions. A transaction sees data in the state it was in before another concurrent transaction modified it, or it sees the data after the second transaction has completed, but it doesn't see an intermediate data.
Durability: It means that once a transaction has been committed, it will remain so, even
in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently. If the database crashes immediately thereafter, it should be possible to restore the database to the state after the last transaction committed.
Example:
Suppose a transaction has to transfer Rs. 50 from account A to account B. So, this will be executed in following sequential steps: 1. Read A 2. A = A  50 3. Write A 4. Read B 5. B = B + 50 6. Write B
Atomicity Requirement: If the transaction fails after step 3 and before step 6, the
system should ensure that its updates are not reflected in the database, else an inconsistency will result.
Consistency Requirement: The sum of A and B is unchanged by the execution of
the transaction.
Isolation Requirement: If between steps 3 and 6, another transaction is allowed to
access the partially updated database; it will see an inconsistent database (the sum of A and B will be less than it should be). So, Isolation can be ensured by running transactions serially, that is one after other.
Durability Requirement: Once the use has been notified that the transaction has
been completed (i.e. The transfer of Rs. 50 has taken place), the updates to the database by the transaction must persist despite failures.
Structured Query Language (SQL):
Almost all relational database management systems use SQL (Structured Query Language) for data manipulation and retrieval. SQL is the standard language for relational database systems. SQL is a non-procedural language, where you need to concentrate on what you want, not on how you get it. SQL Commands are divided into following categories, depending upon what they do.
DDL (Data Definition Language): DDL commands are used to define, modify and
delete the structure of database and its objects. Example: CREATE, ALTER, DROP etc.
DML (Data Manipulation Language): DML commands are used to insert, update
and delete data to existing table. Example: INSERT, UPDATE, DELETE
DCL (Data Control Language): DCL commands are used to grant and revoke
different privilege to/from user. Example: GRANT, REVOKE
TCL (Transaction Control Language): TCL commands are used to complete a
transaction. Example: COMMIT, ROLLBACK
DQL/DRL (Data Query/Retrieving Language): DQL/DRL command is used to select
retrieve data from database. Example: SELECT
Advantages of Database Management System:
We must evaluate whether there is any gain in using a DBMS over a situation where we do not use it. Let us summarize the advantages.
1. Reduction of Redundancy: This is perhaps the most significant advantage of using
DBMS. Redundancy is the problem of storing the same data item in more one place. Redundancy creates several problems like requiring extra storage space, entering same data more than once during data insertion, and deleting data from more than one place during deletion.
2. Sharing of Data: In a paper-based record keeping, data cannot be shared among many
users. But in computerized DBMS, many users can share the same database if they are connected via a network.
3. Data Integrity: We can maintain data integrity by specifying integrity constrains, which
are rules and restrictions about what kind of data may be entered or manipulated within the database. This increases the reliability of the database as it can be guaranteed that no wrong data can exist within the database at any point of time.
4. Data security: We can restrict certain people from accessing the database or allow them
to see certain portion of the database while blocking sensitive information. This is not possible very easily in a paper-based record keeping.
5. Data Independence: Data independence means that "the application is independent of
the storage structure and access strategy of data". In other words, The ability to modify the schema definition in one level should not affect the schema definition in the next higher level. There are two types of Data Independence:
  Physical Data Independence: Modification in physical level should not affect the logical level. Logical Data Independence: Modification in logical level should affect the view level.
6. Data Abstraction: It means hiding implementation details(i.e. high level details) from
end user. e.g. In case of storage of data in database user can only access the database, but
implementation details such as how the data is stored physically onto the disc is hidden from user.
There three levels of abstraction:
  Physical level: The lowest level of abstraction describes how data are stored. Logical level: The next higher level of abstraction, describes what data are stored in database and what relationship among those data. View level: The highest level of abstraction describes only part of entire database.