KEMBAR78
Lesson 1 - Introduction | PDF | Databases | Data Model
0% found this document useful (0 votes)
5 views13 pages

Lesson 1 - Introduction

A database-management system (DBMS) is designed to efficiently store and retrieve large collections of interrelated data, ensuring safety and consistency despite potential system failures. Modern applications of database systems span various sectors, including enterprise information, banking, and social media, leveraging complex data structures while providing users with an abstract view of the data. The document outlines the purpose of DBMS, the challenges of file-processing systems, data models, and the importance of data abstraction and integrity constraints.

Uploaded by

jonathanmaithya9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Lesson 1 - Introduction

A database-management system (DBMS) is designed to efficiently store and retrieve large collections of interrelated data, ensuring safety and consistency despite potential system failures. Modern applications of database systems span various sectors, including enterprise information, banking, and social media, leveraging complex data structures while providing users with an abstract view of the data. The document outlines the purpose of DBMS, the challenges of file-processing systems, data models, and the importance of data abstraction and integrity constraints.

Uploaded by

jonathanmaithya9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1 Introduction

A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those
data. The collection of data, usually referred to as the database, contains information relevant to an enterprise.
The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient
and efficient.

Database systems are designed to manage large bodies of information. Management of data involves both
defining structures for storage of information and providing mechanisms for the manipulation of information. In
addition, the database system must ensure the safety of the information stored, despite system crashes or
attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible
anomalous results.

Database systems are used to manage collections of data that:

• are highly valuable,


• are relatively large, and
• are accessed by multiple users and applications, often at the same time.

1.1 Database-System Applications


The earliest database systems arose in the 1960s in response to the computerized management of commercial
data. Those earlier applications were relatively simple compared to modern database applications. Modern
applications include highly sophisticated, worldwide enterprises.

Contrast a simple university database application with a social-networking site. Users of a social-networking site
post varying types of information about themselves ranging from simple items such as name or date of birth, to
complex posts consisting of text, images, videos, and links to other users. There is only a limited amount of
common structure among these data. Both of these applications, however, share the basic features of a database.

Modern database systems exploit commonalities in the structure of data to gain efficiency but also allow for
weakly structured data and for data whose formats are highly variable. As a result, a database system is a large,
complex software system whose task is to manage a large, complex collection of data. Managing complexity is
challenging, not only in the management of data but in any domain. Key to the management of complexity is the
concept of abstraction.

Abstraction allows a person to use a complex device or system without having to know the details of how that
device or system is constructed. Similarly, for a large, complex collection of data, a database system provides a
simpler, abstract view of the information so that users and application programmers do not need to be aware of
the underlying details of how data are stored and organized. By providing a high level of abstraction, a database
system makes it possible for an enterprise to combine data of various types into a unified repository of the
information needed to run the enterprise.

Here are some representative applications:

1) Enterprise Information
• Sales: For customer, product, and purchase information
• Accounting: For payments, receipts, account balances, assets, and other accounting information.
• Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of pay checks
2) Manufacturing: For management of the supply chain and for tracking production of items in factories,
inventories of items in warehouses and stores, and orders for items.
3) Banking and Finance
• Banking: For customer information, accounts, loans, and banking transactions.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Finance: For storing information about holdings, sales, and purchases of financial instruments such as
stocks and bonds; also, for storing real-time market data to enable online trading by customers and
automated trading by the firm.
4) Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
5) Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
6) Telecommunication: For keeping records of calls, texts, and data usage, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
7) Document databases: For maintaining collections of new articles, patents, published research papers, etc.
8) Navigation systems: For maintaining the locations of varies places of interest along with the exact routes of
roads, train systems, buses, etc
9) Web-based services
• Social-media: For keeping records of users, connections between users (such as friend/follows
information), posts made by users, rating/like information about posts, etc.
• Online retailers: For keeping records of sales data and orders as for any retailer, but also for tracking a
user’s product views, search terms, etc., for the purpose of identifying the best items to recommend to
that user.
• Online advertisements: For keeping records of click history to enable targeted advertisements, product
suggestions, news articles, etc. People access such databases every time they do a web search, make an
online purchase, or access a social-networking site.

User interfaces hide details of access to a database, and most people are not even aware they are dealing with a
database, accessing databases forms an essential part of almost everyone’s life today.

Broadly speaking, there are two modes in which databases are used.

a) The first mode is to support online transaction processing, where a large number of users use the
database, with each user retrieving relatively small amounts of data, and performing small updates. This
is the primary mode of use for the vast majority of users of database applications such as those that we
outlined earlier.
b) The second mode is to support data analytics, that is, the processing of data to draw conclusions, and
infer rules or decision procedures, which are then used to drive business decisions.

Data analytics examples:

1) Banks need to decide whether to give a loan to a loan applicant, online advertisers need to decide which
advertisement to show to a particular user.
These tasks are addressed in two steps.

• First, data-analysis techniques attempt to automatically discover rules and patterns from data and
create predictive models.
• These models take as input attributes (“features”) of individuals, and output predictions such as
likelihood of paying back a loan which are then used to make the business decision.
2) Manufacturers and retailers need to make decisions on what items to manufacture or order in what
quantities; these decisions are driven significantly by techniques for analyzing past data, and predicting trends.
The cost of making wrong decisions can be very high, and organizations are therefore willing to invest a lot of
money to gather or purchase required data, and build systems that can use the data to make accurate
predictions.

The field of data mining combines knowledge-discovery techniques invented by artificial intelligence researchers
and statistical analysts with efficient implementation techniques that enable them to be used on extremely large
databases.

1.2 Purpose of Database Systems


File processing system is supported by a conventional operating system. The system stores permanent records in
various files, and it needs different application programs to extract records from, and add records to, the
appropriate files. Keeping organizational information in a file-processing system has a number of major
disadvantages:

1) Data redundancy and inconsistency. Since different programmers create the files and application programs
over a long period, the various files are likely to have different structures, and the programs may be written
in several programming languages. Moreover, the same information may be duplicated in several places
(files). This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency;
that is, the various copies of the same data may no longer agree.
2) Difficulty in accessing data. file-processing environments do not allow needed data to be retrieved in a
convenient and efficient manner. Say a portion of the data.
3) Data isolation. Because data are scattered in various files, and files may be in different formats, writing new
application programs to retrieve the appropriate data is difficult
4) Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. In a file processing system, the problem is compounded when constraints involve several data
items from different files.
5) Atomicity problems. A computer system, like any other device, is subject to failure. In many applications, it is
crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure.

Consider a banking system with a program to transfer $500 from account A to account B. If a system failure
occurs during the execution of the program, it is possible that the $500 was removed from the balance of
account A but was not credited to the balance of account B, resulting in an inconsistent database state.
The funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity
in a conventional file-processing system.
6) Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many
systems allow multiple users to update the data simultaneously
Consider account A, with a balance of $10,000. If two bank clerks debit the account balance (by say $500 and
$100, respectively) of account A at almost exactly the same time, the result of the concurrent executions may
leave the account balance in an incorrect (or inconsistent) state. If the two programs run concurrently, they may
both read the value $10,000, and write back $9500 and $9900, respectively. Depending on which one writes the
value last, the balance of account A may contain either $9500 or $9900, rather than the correct value of $9400.

7) Security problems. Not every user of the database system should be able to access all the data. For example,
in a university, payroll personnel need to see only that part of the database that has financial information.
They do not need access to information about academic records.

These difficulties, among others, prompted both the initial development of database systems and the transition
of file-based applications to database systems, back in the 1960s and 1970s.

In what follows, we shall see the concepts and algorithms that enable database systems to solve the problems
with file-processing systems.

1.3 View of Data


A database system is a collection of interrelated data and a set of programs that allow users to access and modify
these data. A major purpose of a database system is to provide users with an abstract view of the data. That is,
the system hides certain details of how the data are stored and maintained.

1.3.1 Data Models


Underlying the structure of a database is the data model: a collection of conceptual tools for describing data,
data relationships, data semantics, and consistency constraints.

There are a number of different data models that we shall cover in the text. The data models can be classified
into four different categories:

a) Relational Model. The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name. Tables are also
known as relations. The relational model is an example of a record-based model. Record-based models are so
named because the database is structured in fixed-format records of several types. Each table contains
records of a particular type. Each record type defines a fixed number of fields, or attributes. The columns of
the table correspond to the attributes of the record type. The relational data model is the most widely used
data model, and a vast majority of current database systems are based on the relational model.
b) Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic objects, called
entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database design.
c) Semi-structured Data Model. The semi-structured data model permits the specification of data where
individual data items of the same type may have different sets of attributes. This is in contrast to the data
models mentioned earlier, where every data item of a particular type must have the same set of attributes.
JSON and Extensible Markup Language (XML) are widely used semi-structured data representations.
d) Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has become the
dominant software-development methodology. This led initially to the development of a distinct object-
oriented data model, but today the concept of objects is well integrated into relational databases. Standards
exist to store objects in relational tables. Database systems allow procedures to be stored in the database
system and executed by the database system. This can be seen as extending the relational model with notions
of encapsulation, methods, and object identity.

1.3.2 Relational Data Model


In the relational model, data are represented in the form of tables. Each table has multiple columns, and each
column has a unique name. Each row of the table represents one piece of information.

Data Abstraction

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led database system
developers to use complex data structures to represent data in the database. Since many database-system users
are not computer trained, developers hide the complexity from users through several levels of data abstraction,
to simplify users’ interactions with the system:

• Physical level. The lowest level of abstraction describes how the data are actually stored. The physical
level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the database, and
what relationships exist among those data. The logical level thus describes the entire database in terms
of a small number of relatively simple structures. Although implementation of the simple structures at the
logical level may involve complex physical-level structures, the user of the logical level does not need to
be aware of this complexity. This is referred to as physical data independence. Database administrators,
who must decide what information to keep in the database, use the logical level of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Even though the
logical level uses simpler structures, complexity remains because of the variety of information stored in a
large database. Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify their interaction with
the system. The system may provide many views for the same database.

Figure 1:The three levels of data abstraction


1.3.3 Instances and Schemas
Databases change over time as information is inserted and deleted. The collection of information stored in the
database at a particular moment is called an instance of the database. The overall design of the database is called
the database schema.

The concept of database schemas and instances can be understood by analogy to a program written in a
programming language. A database schema corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given instant. The values of the variables in a
program at a point in time correspond to an instance of a database schema.

Database systems have several schemas, partitioned according to the levels of abstraction.

The physical schema describes the database design at the physical level, while the logical schema describes the
database design at the logical level. A database may also have several schemas at the view level, sometimes called
subschemas, that describe different views of the database.

Of these, the logical schema is by far the most important in terms of its effect on application programs, since
programmers construct applications by using the logical schema.

The physical schema is hidden beneath the logical schema and can usually be changed easily without affecting
application programs. Application programs are said to exhibit physical data independence if they do not depend
on the physical schema and thus need not be rewritten if the physical schema changes.

1.4 Database Languages


A database system provides a data-definition language (DDL) to specify the database schema and a data
manipulation language (DML) to express database queries and updates. In practice, the data-definition and data-
manipulation languages are not two separate languages; instead they simply form parts of a single database
language, such as the structured query language (SQL). Almost all relational database systems employ SQL.

1.4.1 Data-Definition Language


We use DDL to specify a database schema and additional properties of the data.

We specify the storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language. These statements define the implementation
details of the database schemas, which are usually hidden from the users.

The data values stored in the database must satisfy certain consistency constraints. For example, suppose the
university requires that the account balance of a department must never be negative. The DDL provides facilities
to specify such constraints. The database system checks these constraints every time the database is updated. In
general, a constraint can be an arbitrary predicate pertaining to the database. However, arbitrary predicates may
be costly to test. Thus, database systems implement only those integrity constraints that can be tested with
minimal overhead:

a) Domain Constraints. A domain of possible values must be associated with every attribute (for example,
integer types, character types, date/time types). Declaring an attribute to be of a particular domain acts as a
constraint on the values that it can take. Domain constraints are the most elementary form of integrity
constraint. They are tested easily by the system whenever a new data item is entered into the database.
b) Referential Integrity. There are cases where we wish to ensure that a value that appears in one relation for a
given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For
example, the department listed for each course must be one that actually exists in the university. More
precisely, the dept_name value in a course record must appear in the dept_name attribute of some record of
the department relation. Database modifications can cause violations of referential integrity. When a
referential-integrity constraint is violated, the normal procedure is to reject the action that caused the
violation.
c) Authorization. We may want to differentiate among the users as far as the type of access they are permitted
on various data values in the database. These differentiations are expressed in terms of authorization, the
most common being: read authorization, which allows reading, but not modification, of data; insert
authorization, which allows insertion of new data, but not modification of existing data; update
authorization, which allows modification, but not deletion, of data; and delete authorization, which allows
deletion of data. We may assign the user all, none, or a combination of these types of authorization.

The processing of DDL statements, just like those of any other programming language, generates some output.
The output of the DDL is placed in the data dictionary, which contains metadata—that is, data about data. The
data dictionary is considered to be a special type of table that can be accessed and updated only by the database
system itself (not a regular user). The database system consults the data dictionary before reading or modifying
actual data.

Examples:

CREATE ALTER DROP TRUNCATE COMMENT SHOW DESCRIBE

1.4.2 Data-Manipulation Language


A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized
by the appropriate data model.

The types of access are:

• Retrieval of information stored in the database.


• Insertion of new information into the database.
• Deletion of information from the database.
• Modification of information stored in the database

There are basically two types of data-manipulation language:

• Procedural DML require a user to specify what data are needed and how to get those data.
• Declarative DML (also referred to as nonprocedural DML) require a user to specify what data are
needed without specifying how to get those data

A query is a statement requesting the retrieval of information


1.5 Database Engine
A database system is partitioned into modules that deal with each of the responsibilities of the overall system.

The functional components of a database system can be broadly divided into:

• storage manager
• query processor
• transaction management.

The storage manager is important because databases typically require a large amount of storage space. Corporate
databases commonly range in size from hundreds of gigabytes to terabytes of data.

The query processor is important because it helps the database system to simplify and facilitate access to data. It
is the job of the database system to translate updates and queries written in a nonprocedural language, at the
logical level, into an efficient sequence of operations at the physical level.

The transaction manager is important because it allows application developers to treat a sequence of database
accesses as if they were a single unit that either happens in its entirety or not at all. This permits application
developers to think at a higher level of abstraction about the application without needing to be concerned with
the lower-level details of managing the effects of concurrent access to the data and of system failures.

While database engines were traditionally centralized computer systems, today parallel processing is key for
handling very large amounts of data efficiently. Modern database engines pay a lot of attention to parallel data
storage and parallel query processing.

1.5.1 Storage Manager


The storage manager is the component of a database system that provides the interface between the low-level
data stored in the database and the application programs and queries submitted to the system. The storage
manager is responsible for the interaction with the file manager. The raw data are stored on the disk using the
file system provided by the operating system. The storage manager translates the various DML statements into
low-level file-system commands. Thus, the storage manager is responsible for storing, retrieving, and updating
data in the database.

The storage manager components include:

• Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks
the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicts.
• File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory
The storage manager implements several data structures as part of the physical system implementation:

• Data files, which store the database itself.


• Data dictionary, which stores metadata about the structure of the database, in particular the schema of
the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database index
provides pointers to those data items that hold a particular value. For example, we could use an index to
find the instructor record with a particular ID, or all instructor records with a particular name.

1.5.2 The Query Processor


The query processor components include:

• DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan consisting
of low-level instructions that the query-evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give
the same result. The DML compiler also performs query optimization; that is, it picks the lowest
cost evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML compiler.

1.5.3 Transaction Management


Often, several operations on the database form a single logical unit of work. An example is a funds transfer in
which one account A is debited and another account B is credited. Clearly, it is essential that either both the credit
and debit occur, or that neither occur. That is, the funds transfer must happen in its entirety or not at all. This all-
or-none requirement is called atomicity. In addition, it is essential that the execution of the funds transfer
preserves the consistency of the database. That is, the value of the sum of the balances of A and B must be
preserved. This correctness requirement is called consistency. Finally, after the successful execution of a funds
transfer, the new values of the balances of accounts A and B must persist, despite the possibility of system failure.
This persistence requirement is called durability.

A transaction is a collection of operations that performs a single logical function in a database application. Each
transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate any
database-consistency constraints. That is, if the database was consistent when a transaction started, the database
must be consistent when the transaction successfully terminates. However, during the execution of a transaction,
it may be necessary temporarily to allow inconsistency, since either the debit of A or the credit of B must be done
before the other. This temporary inconsistency, although necessary, may lead to difficulty if a failure occurs.

Ensuring the atomicity and durability properties is the responsibility of the database system itself—specifically, of
the recovery manager. In the absence of failures, all transactions complete successfully, and atomicity is achieved
easily. However, because of various types of failure, a transaction may not always complete its execution
successfully. If we are to ensure the atomicity property, a failed transaction must have no effect on the state of
the database. Thus, the database must be restored to the state in which it was before the transaction in question
started executing. The database system must therefore perform failure recovery, that is, it must detect system
failures and restore the database to the state that existed prior to the occurrence of the failure.
Finally, when several transactions update the database concurrently, the consistency of data may no longer be
preserved, even though each individual transaction is correct. It is the responsibility of the concurrency-control
manager to control the interaction among the concurrent transactions, to ensure the consistency of the database.

The transaction manager consists of the concurrency-control manager and the recovery manager.

1.6 Database and Application Architecture

Figure 2: System structure


Figure 2 shows the architecture of a database system that runs on a centralized server machine. The figure
summarizes how different types of users interact with a database, and how the different components of a
database engine are connected to each other.

The centralized architecture shown in Figure 2 is applicable to shared-memory server architectures, which have
multiple CPUs and exploit parallel processing, but all the CPUs access a common shared memory. To scale up to
even larger data volumes and even higher processing speeds, parallel databases are designed to run on a cluster
consisting of multiple machines. Further, distributed databases allow data storage and query processing across
multiple geographically separated machines.

We now consider the architecture of applications that use databases as their back-end. Database applications can
be partitioned into two or three parts, as shown in Figure 3. Earlier-generation database applications used a two-
tier architecture, where the application resides at the client machine, and invokes database system functionality
at the server machine through query language statements.

In contrast, modern database applications use a three-tier architecture, where the client machine acts as merely
a front-end and does not contain any direct database calls; web browsers and mobile applications are the most
commonly used application clients today. The front-end communicates with an application server. The application
server, in turn, communicates with a database system to access data. The business logic of the application, which
says what actions to carry out under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three tier applications provide better security as well as better performance
than two-tier applications.

Figure 3:Two-tier and three-tier architectures


1.7 Database Users and Administrators
People who work with a database can be categorized as database users or database administrators.

1.7.1 Database Users and User Interfaces


There are four different types of database-system users, differentiated by the way they expect to interact with
the system.

Different types of user interfaces have been designed for the different types of users.

• Naïve users are unsophisticated users who interact with the system by using predefined user interfaces,
such as web or mobile applications. The typical user interface for naïve users is a forms interface, where
the user can fill in appropriate fields of the form. Naïve users may also view read reports generated from
the database.
• Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces.
• Sophisticated users interact with the system without writing programs. Instead, they form their requests
either using a database query language or by using tools such as data analysis software. Analysts who
submit queries to explore data in the database fall in this category.

1.7.2 Database Administrator


One of the main reasons for using DBMSs is to have central control of both the data and the programs that access
those data. A person who has such central control over the system is called a database administrator (DBA).

The functions of a DBA include:

• Schema definition. The DBA creates the original database schema by executing a set of data definition
statements in the DDL.
• Storage structure and access-method definition. The DBA may specify some parameters pertaining to the
physical organization of the data and the indices to be created.
• Schema and physical-organization modification. The DBA carries out changes to the schema and physical
organization to reflect the changing needs of the organization, or to alter the physical organization to improve
performance.
• Granting of authorization for data access. By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access. The authorization
information is kept in a special system structure that the database system consults whenever a user tries to
access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities are:
• Periodically backing up the database onto remote servers, to prevent loss of data in case of disasters such
as flooding.
• Ensuring that enough free disk space is available for normal operations, and upgrading disk space as
required.
• Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.
1.8 Summary
• A database-management system (DBMS) consists of a collection of interrelated data and a collection of
programs to access those data. The data describe one particular enterprise.
• The primary goal of a DBMS is to provide an environment that is both convenient and efficient for people to
use in retrieving and storing information.
• Database systems are ubiquitous today, and most people interact, either directly or indirectly, with databases
many times every day.
• Database systems are designed to store large bodies of information. The management of data involves both
the definition of structures for the storage of information and the provision of mechanisms for the
manipulation of information. In addition, the database system must provide for the safety of the information
stored in the face of system crashes or attempts at unauthorized access. If data are to be shared among several
users, the system must avoid possible anomalous results.
• A major purpose of a database system is to provide users with an abstract view of the data. That is, the system
hides certain details of how the data are stored and maintained.
• Underlying the structure of a database is the data model: a collection of conceptual tools for describing data,
data relationships, data semantics, and data constraints.
• The relational data model is the most widely deployed model for storing data in databases. Other data models
are the object-oriented model, the object-relational model, and semi-structured data models.
• A data-manipulation language (DML) is a language that enables users to access or manipulate data.
Nonprocedural DML, which require a user to specify only what data are needed, without specifying exactly
how to get those data, are widely used today.
• A data-definition language (DDL) is a language for specifying the database schema and other properties of the
data.
• Database design mainly involves the design of the database schema. The entity relationship (E-R) data model
is a widely used model for database design. It provides a convenient graphical representation to view data,
relationships, and constraints.
• A database system has several subsystems.
o The storage manager subsystem provides the interface between the low-level data stored in the
database and the application programs and queries submitted to the system.
o The query processor subsystem compiles and executes DDL and DML statements.
• Transaction management ensures that the database remains in a consistent (correct) state despite system
failures. The transaction manager ensures that concurrent transaction executions proceed without conflicts.
• The architecture of a database system is greatly influenced by the underlying computer system on which the
database system runs. Database systems can be centralized, or parallel, involving multiple machines.
Distributed databases span multiple geographically separated machines.
• Database applications are typically broken up into a front-end part that runs at client machines and a part that
runs at the backend. In two-tier architectures, the front end directly communicates with a database running
at the back end. In threetier architectures, the back-end part is itself broken up into an application server and
a database server.
• There are four different types of database-system users, differentiated by the way they expect to interact with
the system. Different types of user interfaces have been designed for the different types of users.
• Data-analysis techniques attempt to automatically discover rules and patterns from data. The field of data
mining combines knowledge-discovery techniques invented by artificial intelligence researchers and statistical
analysts with efficient implementation techniques that enable them to be used on extremely large databases

You might also like