SQL Server
Introduction to SQL Server
What is SQL Server RDBMS?
As you most likely know, SQL Server is primarily thought of as a Relational Database
Management System (RDBMS). It is certainly that, but it is also much more.
SQL Server can be more accurately described as an Enterprise Data Platform. It offers many
new features in every new version i.e. even more enhanced or improved features from
previous versions of the product. In addition to traditional RDBMS duty, SQL Server also
provides rich reporting capabilities, powerful data analysis, and data mining, as well as
features that support asynchronous data applications, data-driven event notification, and
more.
The components of SQL Server are
Database Engine
The Database Engine is the primary component of SQL Server. It is the Online Transaction
Processing (OLTP) engine for SQL Server. The Database Engine is a high-performance
component responsible for the efficient storage, retrieval, and manipulation of relational and
Extensible Markup Language (XML) formatted data.
SQL Server’s Database Engine is highly optimized for transaction processing, but offers
exceptional performance in complex data retrieval operations. The Database Engine is also
responsible for the controlled access and modification of data through its security
subsystem.
Information Email: kdssginfo@gmail.com
SQL Server
Analysis Services
Analysis Services delivers Online Analytical Processing (OLAP) and Data Mining functionality
for business intelligence applications. As its name suggests, Analysis Services provides a
very robust environment for the detailed analysis of data. It does this through user-created,
multidimensional data structures that contain de-normalized and aggregated data from
diverse data sources (such as relational databases, spreadsheets, flat files, and even other
multidimensional sources).
Reporting Services
Reporting Services is a Web service–based solution for designing, deploying, and managing
flexible, dynamic Web-based reports, as well as traditional paper reports. These reports can
contain information from virtually any data source. Because Reporting Services is
implemented as a Web service, it must be installed on a server with Internet Information
Services (IIS). However, IIS does not have to be installed on a SQL Server. The Reporting
Services databases are hosted on SQL Server, but the Web service itself can be configured
on a separate server.
Integration Services
SQL Server Integration Services (SSIS) is Microsoft’s new enterprise class data Extract,
Transform, and Load (ETL) tool. SSIS is a completely new product built from the ashes of
SQL Server 2000’s Data Transformation Services (DTS). SSIS offers a much richer feature
set and the ability to create much more powerful and flexible data transformations than its
predecessor. This huge improvement, however, is not without a cost. SSIS is a fairly
complex tool and offers a completely different design paradigm than DTS. Database
administrators adept at the former tool are very often intimidated and frustrated by the new
SSIS. Their biggest mistake is in thinking that Integration Services would just be an
upgrade of Data Transformation Services.
Information Email: kdssginfo@gmail.com
SQL Server
SQL Server Architecture
Components of the SQL Server Engine
Figure 1-1 shows the general architecture of SQL Server, which has four major components
(three of whose subcomponents are listed): protocols, the relational engine (also called the
Query Processor), the storage engine, and the SQLOS. Every batch submitted to SQL Server
for execution, from any client application, must interact with these four components. (For
simplicity, I’ve made some minor omissions and simplifications and ignored certain “helper”
modules among the subcomponents.)
Figure 1-1: The major components of the SQL Server database engine
The protocol layer receives the request and translates it into a form that the relational
engine can work with, and it also takes the final results of any queries, status messages, or
error messages and translates them into a form the client can understand before sending
them back to the client. The relational engine layer accepts SQL batches and determines
what to do with them. For Transact-SQL queries and programming constructs, it parses,
compiles, and optimizes the request and oversees the process of executing the batch. As
the batch is executed, if data is needed, a request for that data is passed to the storage
Information Email: kdssginfo@gmail.com
SQL Server
engine. The storage engine manages all data access, both through transaction-based
commands and bulk operations such as backup, bulk insert, and certain DBCC (Database
Consistency Checker) commands. The SQLOS layer handles activities that are normally
considered to be operating system responsibilities, such as thread management
(scheduling), synchronization primitives, deadlock detection, and memory management,
including the buffer pool.
Protocols
When an application communicates with the SQL Server Database Engine, the application
programming interfaces (APIs) exposed by the protocol layer formats the communication
using a Microsoft-defined format called a tabular data stream (TDS) packet. There are Net-
Libraries on both the server and client computers that encapsulate the TDS packet inside a
standard communication protocol, such as TCP/IP or Named Pipes. On the server side of the
communication, the Net-Libraries are part of the Database Engine, and that protocol layer is
illustrated in Figure 1-1. On the client side, the Net-Libraries are part of the SQL Native
Client. The configuration of the client and the instance of SQL Server determine which
protocol is used.
SQL Server can be configured to support multiple protocols simultaneously, coming from
different clients. Each client connects to SQL Server with a single protocol. If the client
program does not know which protocols SQL Server is listening on, you can configure the
client to attempt multiple protocols sequentially. The following protocols are available:
Shared Memory The simplest protocol to use, with no configurable settings.
Clients using the Shared Memory protocol can connect only to a SQL Server instance
running on the same computer, so this protocol is not useful for most database
activity. Use this protocol for troubleshooting when you suspect that the other
protocols are configured incorrectly. Clients using MDAC 2.8 or earlier cannot use the
Shared Memory protocol. If such a connection is attempted, the client is switched to
the Named Pipes protocol.
Named Pipes A protocol developed for local area networks (LANs). A portion of
memory is used by one process to pass information to another process, so that the
output of one is the input of the other. The second process can be local (on the same
computer as the first) or remote (on a networked computer).
TCP/IP The most widely used protocol over the Internet. TCP/IP can communicate
across interconnected networks of computers with diverse hardware architectures
and operating systems. It includes standards for routing network traffic and offers
advanced security features. Enabling SQL Server to use TCP/IP requires the most
configuration effort, but most networked computers are already properly configured.
Virtual Interface Adapter (VIA) A protocol that works with VIA hardware. This is
a specialized protocol; configuration details are available from your hardware vendor.
Tabular Data Stream Endpoints
SQL Server 2005 also introduces a new concept for defining SQL Server connections: the
connection is represented on the server end by a TDS endpoint. During setup, SQL Server
creates an endpoint for each of the four Net-Library protocols supported by SQL Server, and
if the protocol is enabled, all users have access to it. For disabled protocols, the endpoint
still exists but cannot be used. An additional endpoint is created for the dedicated
Information Email: kdssginfo@gmail.com
SQL Server
administrator connection (DAC), which can be used only by members of the sysadmin fixed
server role. (I’ll discuss the DAC in more detail in configuration chapter.)
The Relational Engine
As mentioned earlier, the relational engine is also called the query processor. It includes the
components of SQL Server that determine exactly what your query needs to do and the best
way to do it. By far the most complex component of the query processor, and maybe even
of the entire SQL Server product, is the query optimizer, which determines the best
execution plan for the queries in the batch.
The relational engine also manages the execution of queries as it requests data from the
storage engine and processes the results returned. Communication between the relational
engine and the storage engine is generally in terms of OLE DB row sets. (Row set is the OLE
DB term for a result set.) The storage engine comprises the components needed to actually
access and modify data on disk.
The Command Parser
The command parser handles Transact-SQL language events sent to SQL Server. It checks
for proper syntax and translates Transact-SQL commands into an internal format that can
be operated on. This internal format is known as a query tree. If the parser doesn’t
recognize the syntax, a syntax error is immediately raised that identifies where the error
occurred. However, non-syntax error messages cannot be explicit about the exact source
line that caused the error. Because only the command parser can access the source of the
statement, the statement is no longer available in source format when the command is
actually executed.
The Query Optimizer
The query optimizer takes the query tree from the command parser and prepares it for
execution. Statements that can’t be optimized, such as flow-of-control and DDL commands,
are compiled into an internal form. The statements that are optimizable are marked as such
and then passed to the optimizer. The optimizer is mainly concerned with the DML
statement SELECT, INSERT, UPDATE, and DELETE, which can be processed in more than
one way, and it is the optimizer’s job to determine which of the many possible ways is the
best. It compiles an entire command batch, optimizes queries that are optimizable, and
checks security. The query optimization and compilation result in an execution plan.
The first step in producing such a plan is to normalize each query, which potentially breaks
down a single query into multiple, fine-grained queries. After the optimizer normalizes a
query, it optimizes it, which means it determines a plan for executing that query. Query
optimization is cost based; the optimizer chooses the plan that it determines would cost the
least based on internal metrics that include estimated memory requirements, CPU
utilization, and number of required I/Os. The optimizer considers the type of statement
requested, checks the amount of data in the various tables affected, looks at the indexes
available for each table, and then looks at a sampling of the data values kept for each index
or column referenced in the query. The sampling of the data values is called distribution
statistics. Based on the available information, the optimizer considers the various access
Information Email: kdssginfo@gmail.com
SQL Server
methods and processing strategies it could use to resolve a query and chooses the most
cost-effective plan.
The SQL Manager
The SQL manager is responsible for everything related to managing stored procedures and
their plans. It determines when a stored procedure needs recompilation, and it manages the
caching of procedure plans so that other processes can reuse them.
The SQL manager also handles auto parameterization of queries. In SQL Server 2008,
certain kinds of ad hoc queries are treated as if they were parameterized stored procedures,
and query plans are generated and saved for them. SQL Server can save and reuse plans in
several other ways, but in some situations using a saved plan might not be a good idea.
The Database Manager
The database manager handles access to the metadata needed for query compilation and
optimization, making it clear that none of these separate modules can be run completely
separately from the others. The metadata is stored as data and is managed by the storage
engine, but metadata elements such as the data types of columns and the available indexes
on a table must be available during the query compilation and optimization phase, before
actual query execution starts.
The Query Executor
The query executor runs the execution plan that the optimizer produced, acting as a
dispatcher for all the commands in the execution plan. This module steps through each
command of the execution plan until the batch is complete. Most of the commands require
interaction with the storage engine to modify or retrieve data and to manage transactions
and locking.
The Storage Engine
The SQL Server storage engine has traditionally been considered to include all the
components involved with the actual processing of data in your database. SQL Server 2005
separates out some of these components into a module called the SQLOS. In fact, the SQL
Server storage engine team at Microsoft actually encompasses three areas: access
methods, transaction management, and the SQLOS.
Transaction Services
A core feature of SQL Server is its ability to ensure that transactions are atomic–that is, all
or nothing. In addition, transactions must be durable, which means that if a transaction has
been committed, it must be recoverable by SQL Server no matter what–even if a total
system failure occurs 1 millisecond after the commit was acknowledged. There are actually
four properties that transactions must adhere to, called the ACID properties: atomicity,
consistency, isolation, and durability.
Locking Operations Locking is a crucial function of a multi-user database system such as
SQL Server, even if you are operating primarily in the snapshot isolation level with
Information Email: kdssginfo@gmail.com
SQL Server
optimistic concurrency. SQL Server lets you manage multiple users simultaneously and
ensures that the transactions observe the properties of the chosen isolation level. Even
though readers will not block writers and writers will not block readers in snapshot isolation,
writers do acquire locks and can still block other writers, and if two writers try to change the
same data concurrently, a conflict will occur that must be resolved. The locking code
acquires and releases various types of locks, such as share locks for reading, exclusive locks
for writing, intent locks taken at a higher granularity to signal a potential “plan” to perform
some operation, and extent locks for space allocation. It manages compatibility between the
lock types, resolves deadlocks, and escalates locks if needed. The locking code controls
table, page, and row locks as well as system data locks.
The SQLOS
Whether the components of the SQLOS layer are actually part of the storage engine
depends on whom you ask. In addition, trying to figure out exactly which components are in
the SQLOS layer can be rather like herding cats. I have seen several technical presentations
on the topic at conferences and have exchanged e-mail and even spoken face to face with
members of the product team, but the answers vary. The manager who said he was
responsible for the SQLOS layer defined the SQLOS as everything he was responsible for,
which is a rather circular definition. Earlier versions of SQL Server have a thin layer of
interfaces between the storage engine and the actual operating system through which SQL
Server makes calls to the OS for memory allocation, scheduler resources, thread and worker
management, and synchronization objects. However, the services in SQL Server that
needed to access these interfaces can be in any part of the engine. SQL Server
requirements for managing memory, schedulers, synchronization objects, and so forth have
become more complex. Rather than each part of the engine growing to support the
increased functionality, all services in SQL Server that need this OS access have been
grouped together into a single functional unit called the SQLOS. In general, the SQLOS is
like an operating system inside SQL Server. It provides memory management, scheduling,
IO management, a framework for locking and transaction management, deadlock detection,
and general utilities for dumping, exception handling, and so on.
Another member of the product team described the SQLOS to me as a set of data structures
and APIs that could potentially be needed by operations running at any layer of the engine.
For example, consider various operations that require use of memory. SQL Server doesn’t
just need memory when it reads in data pages through the storage engine; it also needs
memory to hold query plans developed in the query processor layer. Figure 1-1 (shown
earlier) depicts the SQLOS layer in several parts, but this is just a way of showing that
many SQL Server components use SQLOS functionality.
Information Email: kdssginfo@gmail.com