CHAPTER I: PRESENTATION OF DATABASES
1. INTRODUCTION
Enterprise management systems have undergone constant evolution, moving from manual methods
to sophisticated IT solutions (see Table 1). Technological advancements have enabled companies
to manage and exploit their data more efficiently, thus contributing to decision-making and
business growth.
Period Evolution of Enterprise Management Systems
Before IT Use of physical files, binders, and cabinets for document storage.
1950s-1960s Introduction of the first computers, storage on magnetic tapes and hard drives.
Development of file management systems (FMS) for improved organization.
1960s-1970s
Emergence of hierarchical and network models.
1970s and
Emergence of relational databases using interconnected tables.
beyond
1990s and
Widespread adoption of advanced Database Management Systems (DBMS).
beyond
2000s and
Migration to cloud storage (AWS, Azure, Google Cloud) for global accessibility.
beyond
2010s and
Big Data processing with Hadoop and Spark, use of AI and machine learning.
beyond
Table 1. Evolution of Enterprise Management Systems
This table highlights the evolution of enterprise management systems, from the use of manual
methods to the era of Big Data and artificial intelligence. Each period brought significant
advancements in the storage, organization, and utilization of data to meet the growing needs of
businesses.
2. FILE CONCEPTS (ADVANTAGES AND LIMITATIONS)
A file in computing refers to a storage unit that allows for the grouping, organizing, and storing of
data in either a structured or unstructured manner. Files are used to record information on a storage
medium such as a hard drive, USB flash drive, or any other electronic storage device. Each file has
a unique name and can be identified by a path in the file system.
Files can contain various types of data, such as text, images, videos, computer programs, etc. Each
type of file may have a specific format that defines how the data is organized and interpreted by
the associated applications.
File management, often used before the widespread adoption of databases, has both advantages
and limitations. Let us examine these aspects:
2.1. Advantages of File Management:
• Simplicity: File management is often simple to implement. Files are easy to create, read, write,
and delete, which makes initial development faster.
• Flexibility: Files offer flexibility in structuring data. Developers can choose the format that best
suits their specific needs without being limited by a rigid database schema.
• Independence: Each file can be managed independently, meaning an application can work with
its own data files without needing to share a central database.
• Initial Cost: File management can be less costly to set up, as it does not require the complexity
and resources associated with implementing a database management system.
2.2. Limitations of File Management:
• Data Redundancy: Files can lead to data redundancy, as the same information may be stored in
multiple files. This can result in inconsistencies and errors.
• Difficulty in Updating: Updating data in multiple files can be complex and prone to errors.
Changes must be applied consistently across all affected files.
• Search Challenges: Searching for specific information across many files can be inefficient. File
systems are often not optimized for complex search operations.
• Data Integrity: File management can make it difficult to implement data integrity constraints,
such as foreign keys, leading to an increased risk of data inconsistencies.
• Security Issues: Files can be more vulnerable to security issues, such as unauthorized access,
compared to a database, which typically offers more sophisticated security mechanisms.
• Limited Scalability: As the need for larger volumes of data and greater complexity increases,
file management may become inefficient and harder to maintain.
In conclusion, while file management may be suitable for small, simple applications, it has
significant limitations in terms of maintainability, data consistency, and scalability. This has led to
the development and widespread adoption of databases in many contexts.
3. DEFINITION OF DATABASE
According to Connolly [1], "A database is a set of interdependent and related data stored in an
organized manner. It is designed to allow efficient access and management of this data by
providing mechanisms for storage, retrieval, modification, and deletion, while ensuring the
integrity, consistency, and security of the stored information."
The history of databases dates back to the early days of computing. Here is a brief history of the
key developments in database evolution:
1. 1960s:
The hierarchical data model is introduced, where data is organized as a tree structure.
The network model is also developed, allowing for more complex relationships between
records.
2. 1970s:
The relational model is proposed by Edgar Codd in 1970, laying the foundation for
modern relational databases. This model introduces concepts like tables, primary keys,
and foreign keys.
IBM launches the first relational database management system (RDBMS), called System
R.
3. 1980s:
The adoption of relational systems becomes widespread with the emergence of
commercial RDBMS like Oracle, IBM DB2, and Microsoft SQL Server.
Structured Query Language (SQL) becomes a standard for interacting with relational
databases.
Relational databases become a central tool in enterprise computing.
4. 1990s:
The object-relational model combines object-oriented programming concepts with
relational databases, offering greater flexibility in data modeling.
Distributed databases and parallel database management systems are developed to meet
the growing needs for performance and scalability.
5. 2000s to Present:
The emergence of NoSQL databases addresses specific needs, such as the storage and
management of unstructured data, big data processing, and horizontal scalability.
In-memory databases gain popularity to improve performance.
The rise of cloud databases facilitates global data storage and access.
Artificial intelligence and advanced data analytics are integrated into databases to offer
more powerful features.
Today, databases play a crucial role in nearly all areas of computing. They continue to evolve to
address the ever-growing challenges of managing, analyzing, and securing large-scale data.
4. DIFFERENCES BETWEEN FILES AND DATABASES
Files and databases are two different approaches for storing, organizing, and managing data.
Here are some key differences between a file and a database:
4.1. Structure:
Files: Files are typically simple and standalone data structures. They can be flat (text or
binary) or semi-structured (CSV, XML). The structure is often determined by the
application using the file.
Databases: Databases have a more formal structure. They are based on a data model
(relational, hierarchical, object-oriented, etc.) that defines how data is organized and
interconnected.
4.2. Data Redundancy:
Files: Files can lead to data redundancy, where the same information is repeated in
multiple files, potentially causing consistency issues.
Databases: Relational databases are designed to minimize redundancy through data
normalization, which helps maintain data integrity and consistency.
4.3. Security:
Files: Managing file security can be more complex. Access rights are often determined by
the operating system, which may be limited.
Databases: Database management systems (DBMS) offer sophisticated access control
mechanisms. Administrators can set permissions at the table, view, and even row level.
4.4. Performance:
Files: Operations on files can be less efficient, particularly with large data volumes, as
files may require sequential reads or writes.
Databases: Databases are optimized for fast search and retrieval operations, using
indexes and optimized query algorithms.
4.5. Transaction Management:
Files: Managing transactions (e.g., commit, rollback operations) manually can be
complex.
Databases: DBMSs handle transaction management natively, ensuring data consistency
and integrity, even in case of failures.
4.6. Scalability:
Files: File scalability can be limited, and managing large-scale data can become
challenging.
Databases: Databases are designed to be scalable. They can efficiently handle large
volumes of data and complex queries.
In summary, while files may be suitable for simple and specific needs, databases provide a more
robust structure, better security, optimized performance, transaction management, and scalability,
making them more suitable for managing data effectively in complex and evolving environments.
5. DATABASE MODELS
A database model is a conceptual framework that defines how data is stored, organized, and
manipulated within a database. It serves as a theoretical framework for designing, implementing,
and interacting with a database. Database models provide a standardized language for describing
data, the relationships between data, and the operations that can be performed on that data.
Some of the most common types of database models include:
5.1. Hierarchical Model: Data is organized in a tree-like structure with parent-child
relationships. Each node can have one parent but multiple children.
5.2. Network Model: Data is organized in the form of graphs, where each record can have
multiple relationships with other records, forming a complex network.
5.3. Relational Model:
In this model, data is organized in tables (also known as relations), which are composed of rows
and columns. The relationships between the tables are established using keys, such as primary
keys and foreign keys. This model is widely used in traditional databases, such as SQL
databases, and provides a powerful and flexible way to organize and manipulate data.
5.4. Entity-Relationship (ER) Model:
The ER model is used to model real-world concepts and the relationships between them. It
represents entities (objects) and their relationships in a diagrammatic form. This model is useful
for conceptualizing how different entities relate to one another and is often used in the design
phase of database development.
5.5. Object-Oriented Model:
In this model, data is modeled as objects, which contain both attributes (data) and methods
(functions that operate on the data). It is commonly used in object-oriented programming (OOP)
environments, where data structures (objects) directly correspond to real-world entities. This
model supports features like inheritance, encapsulation, and polymorphism, which makes it
particularly suitable for complex applications with dynamic data relationships.
5.6. NoSQL Model:
NoSQL databases are designed to handle non-relational data types. They are particularly useful
when dealing with large volumes of unstructured or semi-structured data. NoSQL includes
several subtypes, such as document-based databases, key-value stores, column-family stores, and
graph databases. These models are highly scalable and flexible, often allowing for distributed
systems and real-time data processing.
This comparative table highlights the main characteristics of network, hierarchical, and relational
databases. It provides an overview of the differences in data modeling, structure, query
complexity, flexibility, and other key aspects of these types of databases.
Characteristic Network Databases Hierarchical Databases Relational Databases
Data Model Network (Graphs) Hierarchical (Trees) Relational (Tables)
Records linked by Records organized in a Tables with rows and
Data Structure
pointers hierarchy columns
Relationship Complex Parent-child Associations via foreign
Between Data connections relationships keys
Flexibility Medium Limited High
Complex, requiring Complex, but less than Simpler, standard SQL
Query Complexity
specific skills network databases language
Characteristic Network Databases Hierarchical Databases Relational Databases
Managed through
Data Integrity Managed manually Partially automated constraints and foreign
keys
Scalability Limited Limited High, easy to extend
MySQL, PostgreSQL,
System Examples IDMS, CODASYL IMS
Oracle
6. NOTION OF DBMS (Database Management System)
A Database Management System (DBMS) is software that allows the creation, manipulation, and
management of databases. Its role is to provide an interface between the user and the database,
enabling efficient and secure definition, storage, querying, updating, and administration of data.
A DBMS acts as an intermediary between users and physical files.
A DBMS facilitates:
Data management, with a simple intuitive representation, such as tables.
Data manipulation. Data and structures can be inserted or modified without altering the
programs that interact with the database.
6.1. KEY FUNCTIONS OF A DBMS
1. Data Model: DBMS can support different data models, including the relational model
(the most common), the object model, the hierarchical model, etc. Most modern DBMSs
use the relational model.
2. Query Language: DBMSs allow users to interact with the database using a query
language. SQL (Structured Query Language) is the standard query language for relational
databases. This language includes three categories of commands:
o DCL (Data Control Language):
1. DCL commands are used to manage access permissions to data within a
database.
2. Key DCL commands:
GRANT: Grants access privileges to users or roles.
REVOKE: Revokes previously granted privileges.
o DDL (Data Definition Language):
1. DDL commands are used to define the structure of the database and its
objects.
2. Key DDL commands:
CREATE: Creates database objects like tables, indexes, views,
etc.
ALTER: Modifies the structure of an existing database object.
DROP: Deletes database objects.
o DML (Data Manipulation Language):
1. DML commands are used to manipulate the data stored within the
database.
2. Key DML commands:
SELECT: Retrieves data from one or more tables.
INSERT: Adds new rows of data to a table.
UPDATE: Modifies existing data within a table.
DELETE: Deletes rows of data from a table.
In summary, DCL is used to manage access permissions, DDL defines the structure of the
database, and DML manipulates the data stored in the database. These categories reflect the
different responsibilities and functionalities of SQL in the context of relational database
management.
3. Data Integrity: DBMSs ensure data integrity by implementing constraints such as
primary keys, foreign keys, and validation rules.
4. Transactions: DBMSs support transactions, which are sequences of operations that are
treated as an indivisible unit. This ensures data consistency, even in the case of system
failures.
5. Concurrency Control: DBMSs manage concurrent access to data by multiple users or
applications, ensuring transaction consistency and isolation.
6. Security: DBMSs offer security mechanisms to control data access, manage user
permissions, and protect sensitive information.
7. Query Optimization: DBMSs optimize queries to improve performance by choosing the
most efficient execution plans.
8. Data Maintenance: DBMSs facilitate data backup, restoration, and maintenance,
ensuring continuous data availability.
Popular DBMS examples include MySQL, PostgreSQL, Oracle Database, Microsoft SQL
Server, and SQLite. Databases managed by a DBMS are used in a wide range of applications,
from dynamic websites to complex enterprise information systems.
6.2. ANSI/SPARC Architecture (Three-Level Model)
1. External Level (or User Level):
o Objective: This level is focused on end-users and specific applications.
o Characteristics: It represents how users perceive the data, with specific views
tailored to their needs. Each user or group of users may have a particular external
view of the database, defined by external schemas.
2. Conceptual Level (or Logical Level):
o Objective: It represents the global and integrated view of the entire database,
independent of individual applications.
o Characteristics: The conceptual level is defined by the conceptual schema, which
represents the overall structure of the database. It provides a logical representation
of the data, independent of implementation details. The Entity-Relationship (ER)
model is often used to describe this conceptual view.
3. Internal Level (or Storage Level):
o Objective: It represents how data is physically stored and processed at the lowest
level.
o Characteristics: The internal level is defined by the internal schema, which
describes how data is stored on disk and the indexing structures used. It focuses
on implementation details and is typically invisible to users and applications.
This three-level architecture helps achieve data independence, where changes in one level (such
as storage structures or user views) do not directly impact the other levels. It provides flexibility
in how data is represented and accessed while ensuring consistent and efficient data
management.