DBMS SYSTEM
ARCHITECTURE
Dr.E.Sivasankar, NIT,Tiruchirappalli-15
Centralized Systems
Centralized database systems are those that run on a single computer
system and do not interact with other computer systems.
Such database systems span a range from single-user database systems
running on personal computers to high-performance database systems
running on high-end server systems.
Centralized Systems
A modern, general-purpose computer system consists of one to a few processors and
a number of device controllers that are connected through a common bus that
provides access to shared memory (Figure 17.1).
The processors have local cache memories that store local copies of parts of the
memory, to speed up access to data.
Each processor may have several independent cores, each of which can execute a
separate instruction stream.
Each device controller is in charge of a specific type of device (for example, a disk
drive, an audio device, or a video display).
Centralized Systems
A typical single-user system is a desktop unit used by a single person, usually
with only one processor and one or two hard disks, and usually only one
person using the machine at a time.
A typical multiuser system, on the other hand, has more disks and more
memory and may have multiple processors. It serves a large number of
users who are connected to the system remotely.
Database systems designed for use by single users usually do not provide
many of the facilities that a multiuser database provides.
In particular, they may not support concurrency control, which is not
required when only a single user can generate updates. Provisions for crash
recovery in such system sare either absent or primitive.
Centralized Systems
Most general-purpose computer systems in use today have multiple processors,
they have coarse-granularity parallelism, with only a few processors (about
two to four, typically), all sharing the main memory.
Databases running on such machines usually do not attempt to partition a
single query among the processors; instead, they run each query on a single
processor, allowing multiple queries to run concurrently. Thus, such systems
support a higher throughput; that is, they allow a greater number of transactions to
run per second, although individual transactions do not run any faster.
In contrast, machines with fine-granularity parallelism have a large number of
processors, and database systems running on such machines attempt to
parallelize single tasks (queries, for example) submitted by users.
Centralized Systems
Most general-purpose computer systems in use today have multiple processors,
they have coarse-granularity parallelism, with only a few processors (about
two to four, typically), all sharing the main memory.
Databases running on such machines usually do not attempt to partition a
single query among the processors; instead, they run each query on a single
processor, allowing multiple queries to run concurrently. Thus, such systems
support a higher throughput; that is, they allow a greater number of transactions to
run per second, although individual transactions do not run any faster.
In contrast, machines with fine-granularity parallelism have a large number of
processors, and database systems running on such machines attempt to
parallelize single tasks (queries, for example) submitted by users.
Client–Server Systems
As. Personal computers supplanted terminals connected to centralized systems.
personal computers became faster, more powerful, and cheaper, there was a
shift away from the centralized system architecture
Server systems satisfy requests generated at m client systems, whose general
structure is shown below:
Client–Server Systems
Functionality provided by database systems can be broadly divided into two parts the
front end and the back end.
The back end manages access structures, query evaluation and optimization,
concurrency control, and recovery.
The front end of a database system consists of tools such as the SQL user
interface, forms interfaces, report generation tools, and data mining and
analysis tools.
The interface between the front end and the back end is through SQL, or
through an application program.
Client–Server Systems
Server systems can be broadly categorized as transaction servers and data servers
Transaction-server systems, also called query-server systems, provide an
interface to which clients can send requests to perform an action, in response to
which they execute the action and send back results to the client.
Usually, client machines ship transactions to the server systems, where those
transactions are executed, and results are shipped back to clients that are in
charge of displaying the data.
Data-server systems allow clients to interact with the servers by making
requests to read or update data, in units such as files or pages. For example, file
servers provide a file-system interface where clients can create, update, read, and
delete files.
Transaction Servers
A typical transaction server consists of multiple processes accessing data in shared
memory.
Server processes:
These receive user queries (transactions), execute them and send results back.
Processes may be multithreaded, allowing a single process to execute several user
queries concurrently
Typically multiple multithreaded server processes
Lock manager process: This process implements lock manager functionality,
which includes lock grant, lock release, and deadlock detection.
Transaction Servers
Database writer process: There are one or more processes that output modified
buffer blocks back to disk on a continuous basis.
Log writer process: This process outputs log records from the log record buffer
to stable storage.
Checkpoint process: Performs periodic checkpoints
Process monitor process: Monitors other processes, and takes recovery actions if
any of the other processes fail
E.g., aborting any transactions being executed by a server process and restarting it
Transaction Servers
Shared memory contains shared data
Buffer pool
Lock table
Log buffer
Cached query plans (reused if same query submitted again)
All database processes can access shared memory
To ensure that no two processes are accessing the same data structure at the same time,
databases systems implement mutual exclusion using either
Operating system semaphores
Atomic instructions such as test-and-set
To avoid overhead of interprocess communication for lock request/grant, each database
process operates directly on the lock table
instead of sending requests to lock manager process
Lock manager process still used for deadlock detection
Data Server
Data Server systems are used in local-area networks, where there is a high-speed
connection between the clients and the server, the client machines are comparable
in processing power to the server machine, and the tasks to be executed are
computation intensive.
In such an environment, it makes sense to ship data to client machines, to perform all
processing at the client machine and then to ship the data back to the server
machine.
Note that this architecture requires full back-end functionality at the clients.
Data-server architectures have been particularly popular in object-oriented database
systems
Data Server
Page shipping versus item shipping - The unit of communication for data can be of
coarse granularity, such as a page, or fine granularity, such as a tuple. We use the term
item to refer to both tuples and objects.
Page shipping can be considered a form of prefetching if multiple items reside on a page,
since all the items in the page are shipped when a process desires to access a single item in
the page.
Data Server
Adaptive lock granularity - Locks are usually granted by the server for the data items that
it ships to the client machines.
A disadvantage of page shipping is that client machines may be granted locks of too coarse
a granularity—a lock on a page implicitly locks all items contained in the page.
Techniques for lock de-escalation have been proposed where the server can request its
clients to transfer back locks on prefetched items
Data Server
Data caching- Data that are shipped to a client on behalf of a transaction can be cached at
the client, even after the transaction completes, if sufficient storage space is available.
However, cache coherency is an issue: Even if a transaction finds cached data, it must
make sure that those data are up to date, since they may have been updated by a different
client after they were cached.
Cloud based Servers
Another model for using third-party servers is cloud computing, in which the service
provider runs its own software, but runs it on computers provided by another
company.
Under this model, the third party does not provide any of the application
software; it provides only a collection of machines.
These machines are not “real” machines, but rather simulated by software that
allows a single real computer to simulate several independent computers. Such
simulated machines are called virtual machines.
The service provider runs its software (possibly including a database system) on
these virtual machines.
A major advantage of cloud computing is that the service provider can add
machines as needed to meet demand and release them at times of light load.
This can prove to be highly cost-effective in terms of both money and energy.
Parallel Database
Architectures
Shared memory- All the processors share a common memory.
Shared disk - All the processors share a common set of disks. Shared-disk
systems are sometimes called clusters.
Shared nothing - The processors share neither a common memory nor
common disk.
Hierarchical - This model is a hybrid of the preceding three architectures.
Parallel Database
Architectures
Parallel Database
Architectures
Shared Memory
In a shared-memory architecture, the processors and disks have access to
a common memory, typically via a bus or through an interconnection
network.
The benefit of shared memory is extremely efficient communication between
processors.
Data in shared memory can be accessed by any processor without being
moved with software. A processor can send messages to other processors
much faster by using memory writes (which usually take less than a
microsecond) than by sending a message through a communication mechanism.
The downside of shared-memory machines is that the architecture is not
scalable beyond 32 or 64 processors because the bus or the interconnection
network becomes a bottleneck
Parallel Database
Architectures
Shared Disk
In the shared-disk model, all processors can access all disks directly via an
interconnection network, but the processors have private memories.
There are two advantages of this architecture over a shared-memory
architecture.
First, since each processor has its own memory, the memory bus is not a
bottleneck. Second, it offers a cheap way to provide a degree of fault
tolerance: If a processor (or its memory) fails, the other processors can
take over its tasks, since the database is resident on disks that are accessible
from all processors
Problem - the interconnection to the disk subsystem is now a bottleneck; it is
particularly so in a situation where the database makes a large number of
accesses to disks
Parallel Database
Architectures
Shared Nothing
In a shared-nothing system, each node of the machine consists of a processor,
memory, and one or more disks. The processors at one node may
communicate with another processor at another node by a high-speed
interconnection network.
A node functions as the server for the data on the disk or disks that the node
owns. Here local disk references are serviced by local disks at each processor.
The main drawbacks of shared-nothing systems are the costs of
communication and of nonlocal disk access, which are higher than in a
shared-memory or shared-disk architecture since sending data involves
software interaction at both ends.
Parallel Database
Architectures
Hierarchical
The hierarchical architecture combines the characteristics of shared-
memory, shared-disk, and shared-nothing architectures. At the top level,
the system consists of nodes that are connected by an interconnection network
and do not share disks or memory with one another. Thus, the top level is a
shared-nothing architecture.
A system could be built as a hierarchy, with shared-memory architecture
with a few processors at the base, and a shared-nothing architecture at the
top, with possibly a shared-disk architecture in the middle.
Distributed Systems
In a distributed database system, the database is stored on several
computers. The computers in a distributed system communicate with one
another through various communication media, such as high-speed private
networks or the Internet.
They do not share main memory or disks. The computers in a distributed system
may vary in size and function, ranging from workstations up to mainframe
systems.
The computers in a distributed system are referred to by a number of
different names, such as sites or nodes, depending on the context in which
they are mentioned
Distributed Systems
The main differences between shared-nothing parallel databases and distributed
databases are that distributed databases are typically geographically
separated, are separately administered, and have a slower interconnection.
Another major difference is that, in a distributed database system, we
differentiate between local and global transactions. A local transaction is
one that accesses data only from sites where the transaction was initiated.
A global transaction, on the other hand, is one that either accesses data in a
site different from the one at which the transaction was initiated, or accesses
data in several different sites