KEMBAR78
Server Clusters: Frequently Asked Questions For Windows 2000 and Windows Server 2003 | PDF | Computer Cluster | Windows 2000
0% found this document useful (0 votes)
157 views63 pages

Server Clusters: Frequently Asked Questions For Windows 2000 and Windows Server 2003

Frequently asked questions on a wide variety of subjects related to server clusters. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. No part of this document may be reproduced, stored in or introduced into a retrieval system, or for any purpose, without the express written permission of Microsoft Corporation.

Uploaded by

kalilrahiman
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views63 pages

Server Clusters: Frequently Asked Questions For Windows 2000 and Windows Server 2003

Frequently asked questions on a wide variety of subjects related to server clusters. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. No part of this document may be reproduced, stored in or introduced into a retrieval system, or for any purpose, without the express written permission of Microsoft Corporation.

Uploaded by

kalilrahiman
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 63

Server Clusters: Frequently Asked

Questions for Windows 2000 and


Windows Server 2003
Microsoft Corporation

Published: September 1, 2003

Abstract
Frequently asked questions on a wide variety of subjects related to server clusters.
Information in this document, including URL and other Internet Web site references, is
subject to change without notice. Unless otherwise noted, the example companies,
organizations, products, domain names, e-mail addresses, logos, people, places, and
events depicted herein are fictitious, and no association with any real company,
organization, product, domain name, e-mail address, logo, person, place, or event is
intended or should be inferred. Complying with all applicable copyright laws is the
responsibility of the user. Without limiting the rights under copyright, no part of this
document may be reproduced, stored in or introduced into a retrieval system, or
transmitted in any form or by any means (electronic, mechanical, photocopying,
recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other


intellectual property rights covering subject matter in this document. Except as expressly
provided in any written license agreement from Microsoft, the furnishing of this document
does not give you any license to these patents, trademarks, copyrights, or other
intellectual property.

© 2005 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Windows, Windows NT, and Windows Server are either
registered trademarks or trademarks of Microsoft Corporation in the United States and/or
other countries.

All other trademarks are property of their respective owners.


Contents
Server Clusters: Frequently Asked Questions for Windows 2000 and
Windows Server 2003.....................................................................................................1
Abstract.......................................................................................................................1

Contents.............................................................................................................................3

Server Clusters: Frequently Asked Questions for Windows 2000 and Windows Server
2003 ...................................................................................................................5

How is this Document Organized? ........................................................................6

What are Most Frequently Asked Questions? .......................................................7

Introduction to Server Clusters ..............................................................................7

Availability ............................................................................................................11

Manageability and Deployment ...........................................................................12

Scalability (Server Clusters: Frequently Asked Questions for Windows 2000 and
Windows Server 2003) .....................................................................................16

Server cluster Concepts (Server Clusters: Frequently Asked Questions for Windows 2000
and Windows Server 2003) ..............................................................................17

Server Requirements ..........................................................................................31

Interconnects .......................................................................................................32

Storage (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows
Server 2003) ....................................................................................................35
General Storage Questions..........................................................................................35

Storage Area Networks (SAN) questions ............................................................40

Network Attached Storage (NAS) questions ........................................................44

Highly Available File Servers ...............................................................................44

Highly Available Print Servers .............................................................................47

Removing all Single Points of Failure ..................................................................47


Active Directory, DNS and Domain Controllers ...................................................48

Security Considerations (Server Clusters: Frequently Asked Questions for Windows 2000
and Windows Server 2003) ..............................................................................51

In-the-box HA Services ........................................................................................52

Geographically Dispersed Clusters .....................................................................53

Quorum ...............................................................................................................56

Cluster-Aware Applications .................................................................................59

Miscellaneous Topics ..........................................................................................62


5

Server Clusters: Frequently Asked


Questions for Windows 2000 and
Windows Server 2003
Published: January 1, 2003

In This White Paper

How is this Document Organized?


What are Most Frequently Asked Questions?

Introduction to Server Clusters

Availability

Manageability and Deployment

Scalability (Server Clusters: Frequently Asked Questions for Windows 2000 and
Windows Server 2003)

Server cluster Concepts (Server Clusters: Frequently Asked Questions for Windows 2000
and Windows Server 2003)

Server Requirements

Interconnects

Storage (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows
Server 2003)

Storage Area Networks (SAN) questions

Network Attached Storage (NAS) questions

Highly Available File Servers

Highly Available Print Servers

Removing all Single Points of Failure

Active Directory, DNS and Domain Controllers

Security Considerations (Server Clusters: Frequently Asked Questions for Windows 2000
and Windows Server 2003)

In-the-box HA Services
6
Geographically Dispersed Clusters

Quorum

Cluster-Aware Applications

Miscellaneous Topics

How is this Document Organized?


Q. How is this Document Organized?

A. The various sections cover different aspects of server clusters from concepts to
deployment to operational procedures:

• Introduction to Server clusters

• Availability

• Manageability and deployment

• Scalability

• Server cluster concepts

• Server requirements

• Interconnects

• Storage

• Highly available file servers

• Highly available print servers

• Removing single points of failure

• Active directory and domain controllers

• Security Considerations

• In-the-box HA services

• Geographically dispersed clusters

• Quorum

• Cluster-aware applications

• Miscellaneous topics
7

What are Most Frequently Asked


Questions?
Q. What are Most Frequently Asked Questions?

A.

• How do I know if my Server cluster is a supported configuration?

• How many servers can there be in a single cluster?

• How do I benefit from more than two nodes in a cluster?


• Are dynamic disks supported in a cluster?

• Can a cluster disk be extended without rebooting?

• How can I replace a disk that has gone bad in a cluster?

• Why is zoning needed to isolate a Server cluster in a SAN?

• Can a cluster server boot from a SAN?

• Is Kerberos authentication possible for services hosted on a cluster?

• Is IIS cluster-aware?

• Can Server clusters span multiple sites?

• Will Microsoft provide a cluster file system?

Introduction to Server Clusters


Q. What is a Server Cluster?

A. A Server cluster is a collection of independent servers that together provide a single,


highly available platform for hosting applications

Q. What are the benefits of Server clusters?

A. There are three primary benefits to server clustering: availability, manageability, and
scalability:
8

Availability: Server clusters provide a highly available


platform for deploying applications. A
Server cluster ensures that applications
continue to run in the event of planned
downtime due to maintenance or unplanned
downtime due to failures. Server clusters
protect against hardware failures, failures of
the Windows operating system, device
drivers or application software. Server
clusters allow operating system and
application software to be upgraded across
the cluster without having to take down the
application.

Manageability: Server clusters allow administrators to


quickly inspect the status of all cluster
resources and move workloads around onto
different servers within the cluster. This is
useful for manual load balancing, and to
perform "rolling updates" on the servers
without taking important data and
applications offline.

Scalability: Applications that can be partitioned can be


spread across the servers of a cluster
allowing additional CPU and memory to be
applied to a problem. As the problem size
increases, additional servers can be added
to the cluster. A partitioned application is
one where the data (or function) can be
split up into independent units. For
example, a customer database could be
split into two units, one covering customers
with names beginning A thru L and the other
for customers with names beginning M thru
Z.

Q. What are Server clusters used for?

A. Server clusters provide a highly available platform for mission critical, line of business
applications. These typically include database servers (such as Microsoft SQL Server),
9
mail and collaboration servers (such as Exchange Server) and infrastructure servers
such as file servers and print servers.

Q. Is Server cluster the same as Microsoft Cluster Server (MSCS) and/or Wolfpack?

A. The code name for the original Server cluster project was Wolfpack. When the product
was released as part of Microsoft Windows NT 4.0 Enterprise Edition it was named
Microsoft Cluster Server or MSCS. In Microsoft Windows 2000, the official name of the
product became Server cluster.

Q. How do Server clusters fit into the larger Windows high availability story?

A. Server clusters are part of a much larger story for highly available applications running
on Microsoft products. Windows has two cluster technologies, server clusters and
Network Load Balancing (NLB). Server clusters are used to ensure that stateful
applications such as a database (e.g., SQL Server) can be kept highly available by failing
over the application from one server to another in the event of a failure. NLB is used to
distribute client requests across a set of identical servers. NLB is particularly useful for
ensuring that stateless applications such as a web server (e.g., IIS) are highly available
and can be scaled-out by adding additional servers as the load increases.

In addition, Microsoft Application Center 2000 includes Component Load Balancing


(CLB) CLB is a technology used to ensure COM+ applications are highly available and
can scale-out.

The Windows platform also uses other technologies to ensure high availability and
scalability. For example, Active Directory is made highly available through replication of
the directory contents across a set of domain controllers.

Q. What is Scale-up verses Scale-out?

A. Scale-up is a term used to describe the scalability provided by large symmetric multi-
processing (SMP) systems. As the load increases, more processors and more memory
can be added to a single server, increasing the processing power and memory available
for applications. Windows 2000 Datacenter Server and
Windows Server 2003 Datacenter Edition provide the support for scaling up to large
computers with many CPUs and large physical memory. In a scale-out environment,
Server clusters are deployed to ensure that the failure of the server does not cause the
applications or services to fail. Scale-up is typically done for either hosting large
applications that cannot be partitioned, or to consolidate a set of applications into a single
server (or small number of servers in the case of a server cluster).

Scale-out is a term used to describe the idea that an application can be scaled by
partitioning the workload and spreading it across a set of cooperating servers. If the load
increases, additional servers can be added to the set to provide additional processing
power and memory.
10
Server clusters are particularly useful for enhancing the availability of applications that
can be scaled out across a set of nodes by allowing the individual pieces or partitions of a
scale-out solution to be made highly available via failover.

Q. Is Server cluster a standard feature of all Windows operating system products?

A. Server clusters are not available in all Windows operating system products. Server
clusters are available in:

• Windows NT Server, Enterprise Edition

• Windows 2000 Advanced Server and Datacenter server

• Windows Server 2003 Enterprise Edition and Datacenter Edition


• Windows Server 2003, 64 bit Enterprise Edition and Datacenter Edition

Server clusters are also available as part of the Server Appliance Kit, available to OEMs
for building embedded solutions based on the Windows operating system.

Q. How do I know if my Server cluster is a supported configuration?

A. All server clusters must be qualified to be supported by Microsoft. A qualified


configuration has undergone extensive testing using a hardware compatibility test
provided by Microsoft. All qualified solutions appear on the Microsoft Hardware
Compatibility List (HCL) (http://go.microsoft.com/fwlink/?LinkID=67738). Only cluster
solutions listed on the HCL are supported by Microsoft.

The complete cluster solution must be listed on the Cluster HCL list. The complete
solution includes the servers, the storage adapters, the interconnect type, the storage
controllers firmware and driver versions. All of the components must match exactly,
including any software, driver or firmware versions for the solution to be qualified.

The HCL contains a set of qualified cluster components. A solution built from qualified
components does NOT imply that the solution is qualified.

The cluster component lists have been a source of confusion in the past and we
will be removing the cluster component lists (such as Cluster/RAID) from the HCL
for Windows Server 2003.

Q. Where are supported configurations listed?

A. All qualified solutions appear on the Microsoft Hardware Compatibility list (HCL)
(http://go.microsoft.com/fwlink/?LinkID=67738). Only cluster solutions listed on the HCL
are supported by Microsoft.

Q. Where can I find more documentation on Server clusters?

A. Server clusters are extensively covered in the online help in Windows 2000 and
Windows Server 2003. Additional information can be found on the Microsoft web site at:
11
• http://go.microsoft.com/fwlink/?LinkID=67742

• http://go.microsoft.com/fwlink/?LinkID=67740

• http://go.microsoft.com/fwlink/?LinkID=67741

Availability
Q. Can Server clusters provide zero downtime for applications?

A.

• No. Server clusters can dramatically reduce planned and unplanned downtime.
However, even with Server clusters, a server could still experience downtime from
the following events:

• Failover time: If a Server cluster recovers from a server or application failure, or if


it is used to move applications from one server to another, the application(s) will be
unavailable for a non-zero period of time (typically under a minute).

• Failures from which Server clusters can not recover: There are types of failure
that Server clusters do not protect against, such as loss of a disk not protected by
RAID, loss of power when a UPS is not used, or loss of a site when there is no fast-
recovery disaster recovery plan. Most of these can be survived with minimal
downtime if precautions are taken in advance.

• Server maintenance that requires downtime: Server clusters can keep


applications and data online through many types of server maintenance, but not all
(for example: installing a new version of an application which has a new on-disk data
format that requires reformatting preexisting data).

Microsoft recommends that clusters be used as one element in customers' overall


programs to provide high integrity and high availability for their mission-critical server-
based data and applications.

Q. Which types of applications and services benefit from Server clustering?

A. There are three types of server applications that will benefit from Server clusters:

1. "In the box" services provided by the Windows platform: For example: File
shares, print queues, Microsoft Message Queue Server (MSMQ) services, and
Component Services (formerly known as Transaction Server) services.

2. Generic applications: Server clusters include a point-and-click wizard for setting


up any well-behaved server application for basic error detection, automatic recovery,
and operator-initiated management (e.g., move from one server to the other). A "well
12
behaved" server application is one which keeps a recoverable state on cluster
disk(s), and whose client can gracefully handle a pause in service as the application
is automatically re-started.

3. Cluster-aware applications: Software vendors test and support their application


products on Server clusters.

Manageability and Deployment


Q. Do Server clusters provide a single image view for administration?
A. Yes, Server clusters provide administrators with a graphical interface (Cluster
Administrator) and a command line tool (Cluster.exe available on Windows 2000 and
Windows Server 2003) from which they can monitor and manage all of the resources in a
cluster as if it was a single system. The administrator can:

• monitor the status of all servers and applications in the cluster

• setup new applications, file shares, print queues, etc. for high availability

• administer the recovery policies for applications and resources

• take applications offline, bring them back online, and move them from one server
to another

The ability to graphically move workload from one server to another with only a
momentary pause in service (typically less than a minute), means administrators can
easily unload servers for planned maintenance without taking important data and
applications offline for long periods of time.

Q. Can Server clusters be managed remotely?

A. Yes, an authorized cluster administrator can run Cluster Administrator from any
Windows NT or Windows 2000 Workstation or Server, any Windows XP machine or any
Windows Server 2003 on the network. The cluster.exe command line tool can be run
remotely from any Windows 2000 Workstation or Server, any Windows XP machine or
any Windows Server 2003 on the network.

Q. Can Server clusters be setup and configured remotely?

A. Yes, with Windows Server 2003, Cluster administrator and the cluster.exe command
line tool can also be used to remotely setup and configure a Server cluster (e.g., create a
new cluster, add a new server to an existing cluster or remove servers from a cluster).

Q. Is the server cluster software installed by default?


13
A. On Windows NT and Windows 2000, the Server cluster software is not installed by
default. On Windows NT you must install the Enterprise Edition CD. With Windows 2000,
you must install the Server cluster software using the optional component management
tool.

With Windows Server 2003, the Server cluster software is installed by default on
Enterprise Edition and Datacenter Edition. The cluster software is NOT configured by
default and will not start until the server is either used to create a new cluster or the
server is added to an existing cluster. In Windows Server 2003, you must use Cluster
Administrator or the Cluster.exe command line tool to configure a cluster.

Q. Can a Server cluster be installed and configured using unattended installs?

A. Yes, to perform an unattended installation for a Windows 2000 Cluster using an


Uttend.txt file, please consult the Windows 2000 Deployment Guide located on the
Windows 2000 CD-ROM in the following location: %CDROM
%\SUPPORT\TOOLS\DEPLOY.CAB\Unattend.doc

Additional information can be found in article 176507 in the Microsoft Knowledge


Base (http://go.microsoft.com/fwlink/?LinkID=67745).

Q. Do applications have to be installed separately on all the cluster servers?

A. It depends on the application. Each application that can be failed over between
servers in a cluster must be available on all servers in the cluster. Historically, application
setup has not been cluster-aware and therefore the application must be installed
separately on each server.

With more recent versions of some applications, for example SQL Server 2000, the
application setup is aware of Server clusters. In the case of Microsoft SQL Server 2000
setup, the appropriate files are copied to all servers and registry settings and other
SQL Server configuration is done just once.

Q. How can I take advantage of extensibility features of ISA Server?

A. A number of third-party vendors offer solutions such as virus detection, content


filtering, site categorization, reporting, and administration. Customers and developers
also have the ability to create their own extensions to ISA Server. ISA Server includes a
comprehensive software development kit for developing tools that build on ISA Server
firewall, caching, and management features.

Q. Are cluster management operations scriptable?

A. Yes, there are a number of options for scripting management operations:

• The Cluster.exe command line tool allows command files to be built that can
change the state of cluster resources, move resources around etc.. For more
information about the command line, see the online help or type cluster at the
14
command line on a server that has Server clusters installed or has the Windows
server admin pack installed.

• In Windows 2000 and Windows Server 2003, there is a scriptable COM interface
to the Server cluster APIs. This allows VBScript or any other scripting language
supported by Windows to call the cluster APIs. These APIs provide a way to change
the state of the cluster as well as return data about the resources or applications in a
cluster. For more information about the cluster server APIs, see the Platform SDK, it
has comprehensive documentation for the cluster APIs and the COM (automation
server) APIs.

• In Windows Server 2003, there is a WMI provider for Server clusters that allows
WMI scripts to manage the cluster. For more information about the cluster server
WMI schema, see the Platform SDK.

Q. Can command line tools be used to manage a cluster?

A. Yes, Server clusters have a command line tool cluster.exe that can be used to manage
the cluster. With Windows Server 2003, this command line tool can also be used to
create new clusters, add servers to existing clusters and remove servers from clusters.
Almost all of the functionality provided by Cluster Administrator is available through
cluster.exe in Windows Server 2003.

Q. Is WMI supported for managing a cluster?

A. Yes, in Windows Server 2003, there is a WMI provider that allows a Server cluster to
be managed. In addition, all the Server cluster events (such as server up and server
down, resource online, offline, failed etc.) are available through WMI events.

Q. Does Server clusters support WMI eventing?

A. Yes, in Windows Server 2003, all of the Server cluster events (such as server up and
server down, resource online, offline, failed etc.) are available through WMI events.

Q. Can Group Policy be used on cluster servers?

A. You can apply Group Policy to cluster servers. There are a few things to remember.
Applications failover from one server to another in a cluster and will typically expect the
same policies to be in effect no matter where they are hosted. You should ensure that all
cluster servers have the same set of group policies applied to them.

With Windows Server 2003 and Windows 2000 SP3 onwards, virtual servers can be
published in active directory. You should NOT apply group policies to a virtual server
computer object. The policy will only be applied to the server that is currently hosting the
virtual server and all policy settings may be lost if the application fails over.

Q. Is Systems Management Server (SMS) supported for deploying applications in a


Server cluster?
15
A. No, at this time, SMS is not supported for deploying applications across a cluster.

Q. Is Application Center 2000 supported on a Server cluster?

A. No, Application Center 2000 provides a set of features that are intended for managing
servers hosting web server front-end and middle-tier business logic. Server clusters are
typically deployed to ensure high availability of the back-end database engine.

Q. Are MOM management packs available for monitoring Server clusters?

A. No, at this time, there are no server-aware MOM management packs.

Q. How do server clusters help administrators perform rolling upgrades?

A. With Server clusters, server administrators no longer have to do all of their


maintenance within those rare windows of opportunity when no users are online. Instead,
they can simply wait until a convenient off-peak time when one of the servers in the
cluster can be removed for maintenance and the workload distributed across the other
cluster servers. By pointing-and-clicking in Cluster Administrator (or using scripts) to
move all of the workload onto other servers, one of the servers in the cluster can be
upgraded. Once the upgrade is complete and tested, that server can be rebooted where
it will automatically re-join the cluster, ready for work. When convenient, the administrator
repeats the process to perform maintenance on the other servers in the cluster. This
ability to keep applications and data online while performing server maintenance is often
referred to as doing "rolling upgrades" to your servers.

Server clusters allows rolling upgrades for a new release and the previous version.

Q. What combination of Windows versions supports rolling upgrades?

A. Server clusters allow rolling upgrades for a new release and the previous version. The
following rolling upgrades are supported:

• Windows NT Enterprise Edition to Windows 2000

• Windows 2000 to Windows Server 2003

Q. Can I manage different versions of Server cluster from a single Cluster Administrator
tool?

A. Yes, the Server cluster tools allow a mixed version cluster to be managed from a
single point. During a rolling upgrade, a cluster may contain different versions of the
operating system. As an example Windows 2000 and Windows Server 2003 can co-exist
in the same cluster. The administration tools provided with Windows Server 2003 allow
such a mixed-version cluster to be managed.

Q. Are there other manageability benefits with Server clusters?


16
A. Server clusters allow administrators to quickly inspect the status of all cluster
resources, and move workload around onto different servers within the cluster. This is
useful for manual load balancing, and to perform "rolling updates" on the servers without
taking important data and applications offline.

Scalability (Server Clusters: Frequently


Asked Questions for Windows 2000
and Windows Server 2003)
Q. Do Server clusters enhance server scalability?

A. The primary goal of Server clusters is to provide a highly available platform for
applications.

There are some types of applications which can be scaled-out to take advantage of a set
of machines. More machines can be added to the set to increase the CPU or memory
available to the application. Applications of this type are typically scaled by partitioning
the data set. For example, a SQL Server database may be scaled out by partitioning the
database into pieces and then using database views to give the client applications the
illusion of a single database.

Server clusters do not provide facilities to partition or scale-out an application; however,


Server clusters allow such applications to be scaled-out in a highly available
environment. Each of the partitions can be failed over independently within the set of
machines (or to additional spare nodes) so that in the event of failure, a partition of the
application remains available.

Q. Do Server clusters provide load balancing of applications?

A. Server clusters provide a manual load balancing mechanism. Applications can be


moved around the servers of a cluster to distribute the load.

There are no automated tools provided with the operating system to automatically load
balance applications and no enhanced failover policies that use load to determine the
best placement for an application.
17

Server cluster Concepts (Server


Clusters: Frequently Asked Questions
for Windows 2000 and Windows Server
2003)
Q. What hardware do you need to build a Server cluster?

A. The most important criteria for Server cluster hardware is that it be included in a
validated Cluster configuration on the Microsoft Hardware Compatibility List (HCL),
indicating it has passed the Microsoft Cluster Hardware Compatibility Test. All qualified
solutions appear on the Microsoft HCL (http://go.microsoft.com/fwlink/?linkid=67738).
Only cluster solutions listed on the HCL are supported by Microsoft.

In general, the criteria for building a server cluster include the following:

• Servers: Two or more PCI-based machines running one of the operating system
releases that support Server clusters (see below). Server clusters can run on all
hardware architectures supported by the base Windows operating system, however,
you cannot mix 32-bit and 64-bit architectures in the same cluster.

• Storage: Each server needs to be attached to a shared, external storage bus(es)


that is/are separate from the bus containing the system disk, the startup disk or the
pagefile disk. Applications and data are stored on one or more disks attached to this
bus. There must be enough storage capacity on the shared cluster bus(es) for all of
the applications running in the cluster environment. This shared storage configuration
allows applications to failover between servers in the cluster.

Microsoft recommends hardware Redundant Array of Inexpensive Disks (RAID) for


all cluster disks to eliminate disk drives as a potential single point of failure. This
means using a RAID storage unit, a host-based RAID adapter that implements RAID
across disks, etc.

SCSI is supported for 2-node cluster configurations only. Fibre channel arbitrated
loop is supported for 2-node clusters only. Microsoft recommends using fibre channel
switched fabrics for clusters of more than two nodes.

• Network: Each server needs at least two network cards. Typically, one is the
public network and the other is a private network between the two nodes. A static IP
address is needed for each group of applications that move as a unit between nodes.
Server clusters can project the identity of multiple servers from a single cluster by
using multiple IP addresses and computer names: this is known as a virtual server.

Q. What is a cluster resource?


18
A. A cluster resource is the lowest level unit of management in a Server cluster. A
resource represents a physical object or an instance of running code. For example, a
physical disk, an IP address, an MSMQ queue, a COM object all of these things are
considered to be resources. From a management perspective, resources can be
independently started and stopped and each is monitored to ensure that it is healthy.

Server cluster can monitor any arbitrary resource type. This is possible because Server
clusters define a resource plug-in model. Each resource type has an associated resource
plug-in or resource dll that is used to start, stop and provide health information that is
specific to the resource type. For example, starting and stopping SQL Server is different
from starting and stopping a physical disk. The resource dll takes care of the differences.
Application developers and system administrators can build new resource dlls for their
applications that can be registered with the cluster service.
Server clusters provides some generic plug-ins that can be used to make existing
applications cluster-aware very quickly, known as Generic Service and Generic
Application. With Windows Server 2003, a Generic Script resource plug-in was added
that allows the resource dll to be written in any scripting language supported by the
Windows operating system.

Q. What is a resource dependency?

A. A complete application actually consists of multiple pieces or multiple resources, some


pieces are code and others are physical resources required by the application. The
resources are related in different ways; for example, an application that writes to a disk
cannot come online until the disk is online. If the disk fails, then, by definition, the
application cannot continue to run since it writes to the disk. Resource dependencies can
be defined by the application developer or system administrator to capture these
relationships. Resource dependencies define the order that resources are brought online
and control how failures are propagated to the various pieces of the application.

Q. What is a resource group?

A. A resource group is a collection of one or more resources that are managed and
monitored as a single unit. A resource group can be started or stopped. If a resource
group is started, each resource in the group is started (taking into account any start order
defined by the dependencies between resources in the group). If a resource group is
stopped, all of the resources in the group are stopped. Dependencies between resources
cannot span a group. In other words, the set of resources within a group is an
autonomous unit that can be started and stopped independently from any other group. A
group is a single, indivisible unit that is hosted on one server in a Server cluster at any
point in time and it is the unit of failover.

Q. Can I have dependencies between resources in different groups?


19
A. No, resource dependencies are confined to a single group.

Q. What is a virtual server?

A. A virtual server is a resource group that contains an IP address resource and a


network name resource. When an application is hosted in a virtual server, the application
can be accessed by clients using the IP address or network name in that resource group.
As the resource group fails over across the cluster, the IP address and network name
remain the same, therefore the client becomes unaware of the physical location of the
application and will continue to work in the event of a failure of one of the servers in the
cluster.

Q. How can I take advantage of extensibility features of ISA Server?


A. A number of third-party vendors offer solutions such as virus detection, content
filtering, site categorization, reporting, and administration. Customers and developers
also have the ability to create their own extensions to ISA Server. ISA Server includes a
comprehensive software development kit for developing tools that build on ISA Server
firewall, caching, and management features.

Q. What is failover?

A. Server clusters monitor the health of the nodes in the cluster and the resources in the
cluster. In the event of a server failure, the cluster software re-starts the failed server's
workload on one or more of the remaining servers. If an individual resource or application
fails (but the server does not), Server clusters will typically try to re-start the application
on the same server; if that fails, it moves the application's resources and re-starts it on
the other server. The process of detecting failures and restarting the application on
another server in the cluster is known as failover.

The cluster administrator can set various recovery policies such as whether or not to re-
start an application on the same server, and whether or not to automatically "failback" (re-
balance) workloads when a failed server comes back online.

Q. Is failover transparent to users?

A. Server clusters do not require any special software on client computers, so the user
experience during failover depends on the nature of the client side of their client-server
application. Client reconnection can be made transparent, because the Server clusters
software has restarted the applications, file shares, etc. at exactly the same IP address.

If a client is using "state-less" connections such as a standard browser connection, then


the client would be unaware of a failover if it occurred between server requests. If a
failure occurs while a client is connected to the failed resource, then the client will receive
whatever standard notification is provided by the client side of their application when the
server side becomes unavailable. This might be, for example, the standard "Abort, Retry,
20
or Cancel?" prompt you get when using the Windows Explorer to download a file at the
time a server or network goes down. In this case, client reconnection is not automatic
(the user must choose "Retry"), but the user is fully informed of what is happening and
has a simple, well-understood method of re-establishing contact with the server. Of
course, in the meantime, the cluster service is busily re-starting the service or application
so that, when the user chooses "Retry", it re-appears as if it never went away.

Q. What is failback?

A. In the event of the failure of a server in a cluster, the applications and resources are
failed over to another node in the cluster. When the failed node rejoins the cluster (after
reboot for example), that node now is free to be used by applications. A cluster
administrator can set policies on resources and resource groups that allow an application
to automatically move back to a node if it becomes available, thus automatically taking
advantage of a node rejoining the cluster. These policies are known as failback policies.
You should take care when defining automatic failback policies since depending on the
application, automatically moving the application (which was working just fine) may have
undesirable consequences on the clients using the applications.

Q. When an application restarts after failover, does it restore the application state at the
time of failure?

A. No, Server clusters provide a fast crash restart mechanism. When an application is
failed over and restarted, the application is restarted from scratch. Any persistent data
written out to a database or to files is available to the application, but any in-memory
state that the application had before the failover is lost.

Q. At what level does failover exist?

A. At the resource group level.

Q. What is a Quorum Resource and how does it help Server clusters provide high
availability?

A. Server clusters require a quorum resource to function. The quorum resource, like any
other resource, is a resource which can only be owned by one server at a time, and for
which servers can negotiate for ownership. Negotiating for the quorum resource allows
Server clusters to avoid "split-brain" situations where the servers are active and think the
other servers are down. This can happen when, for example, the cluster interconnect is
lost and network response time is problematic. The quorum resource is used to store the
definitive copy of the cluster configuration so that regardless of any sequence of failures,
the cluster configuration will always remain consistent.

Q. What is active/active verses active/passive?


21
A. Active/Active and Active/Passive are terms used to describe how applications are
deployed in a cluster. Unfortunately, they mean different things to different people and so
the terms tend to cause confusion.

From the perspective of a single application or database:

• Active/Active means that the same application or pieces of the same service can
be run concurrently on different nodes in the cluster. For example SQL Server 2000
can be configured such that the database is partitioned and each node can be
running a single instance of the database. SQL Server provides the notion of views to
provide a single image of the entire database.

• Active/Passive means that only one node in the cluster can be hosting the given
application. For example, a single file share is active/passive. Any given file share
can only be hosted on one node at a time.

From the perspective of a set of instances of an application or service:

• Active/Active means that different instances of the same application can be


running concurrently on different cluster nodes. For example, each node in a cluster
can be running SQL Server against a different database. A single cluster can support
many file shares that are hosted on the nodes in a cluster concurrently.

• Active/Passive means that only one instance of a service can be running


anywhere in the cluster. For example, there must only be a single instance of the
DHCP service running in the cluster at any point in time.

From the perspective of the cluster:

• Active/Active means that all nodes in the cluster are running applications. These
may be multiple instances of the same application or different applications (for
example, in a 2-node cluster, WINS may be running on one node and DHCP may be
running on the other node).

• Active/Passive means that one of the cluster nodes is spare and not being used
to host applications.

Server clusters support all of these different combinations; the terms are really about how
specific applications or sets of applications are deployed.

With the advent of more than two servers in a cluster, starting with Windows 2000
Datacenter, the term active/active is confusing because there may be four servers. When
there are multiple servers, the set of options available for deployment becomes more
flexible, allowing different configurations such as N+I.

Q. How do I benefit from more than two nodes in a cluster?


22
A. Failover is the mechanism that instance applications and the individual partitions of a
partitioned application typically employ for high availability (the term Pack has been
coined to describe a highly available, single instance application or partition).

In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is
to failover to the remaining node. As the size of a cluster increases, different failover
policies are possible and each one has different characteristics.

Failover Pairs

In a large cluster, failover policies can be defined such that each application is set to
failover between two nodes. The simple example below shows two applications App1 and
App2 in a 4-node cluster.

Figure 1: Failover pairs

Configuration has pros and cons:


23

Pro Good for clusters that are supporting heavy-


weight1 applications such as databases.
This configuration ensures that in the event
of failure, two applications will not be hosted
on the same node.

Pro Very easy to plan capacity. Each node is


sized based on the application that it will
need to host (just like a 2-node cluster
hosting one application).

Pro Effect of a node failure on availability and


performance of the system is very easy to
determine.

Pro Get the flexibility of a larger cluster. In the


event that a node is taken out for
maintenance, the buddy for a given
application can be changed dynamically
(may end up with standby policy below).

Con In simple configurations such as the one


above, only 50% of the capacity of the
cluster is in use.

Con Administrator intervention may be required


in the event of multiple failures.

1
A heavy-weight application is one that consumes a significant number of system
resources such as CPU, memory or IO bandwidth.

Failover pairs are supported by server clusters on all versions of Windows by limiting the
possible owner list for each resource to a given pair of nodes.

Hot-Standby Server

To reduce the overhead of failover pairs, the spare node for each pair may be
consolidated into a single node, providing a hot standby server that is capable of picking
up the work in the event of a failure.
24

Figure 2: Standby Server

Configuration has pros and cons:

Pro Good for clusters that are supporting heavy-


weight applications such as databases. This
configuration ensures that in the event of a
single failure, two applications will not be
hosted on the same node.

Pro Very easy to plan capacity. Each node is


sized based on the application that it will
need to host, the spare is sized to be the
maximum of the other nodes.

Pro Effect of a node failure on availability and


performance of the system is very easy to
determine.

Con Configuration is targeted towards a single


point of failure.
25

Con Does not really handle multiple failures well.


This may be an issue during scheduled
maintenance where the spare may be in
use.

Server clusters support standby servers today using a combination of the possible
owners list and the preferred owners list. The preferred node should be set to the node
that the application will run on by default and the possible owners for a given resource
should be set to the preferred node and the spare node.

N+I

Standby server works well for 4-node clusters in some configurations, however, its ability
to handle multiple failures is limited. N+I configurations are an extension of the standby
server concept where there are N nodes hosting applications and I nodes spare.

Figure 3: N+I Spare node configuration

Configuration has pros and cons:


26

Pro Good for clusters that are supporting heavy-


weight applications such as databases or
Exchange. This configuration ensures that in
the event of a failure, an application
instance will failover to a spare node, not
one that is already in use.

Pro Very easy to plan capacity. Each node is


sized based on the application that it will
need to host.

Pro Effect of a node failure on availability and


performance of the system is very easy to
determine.

Pro Configuration works well for multiple


failures.

Con Does not really handle multiple applications


running in the same cluster well. This policy
is best suited to applications running on a
dedicated cluster.

Server cluster supports N+I scenarios in the Windows Server 2003 release using a
cluster group public property AntiAffinityClassNames. This property can contain an
arbitrary string of characters. In the event of a failover, if a group being failed over has a
non-empty string in the AntiAffinityClassNames property, the failover manager will
check all other nodes. If there are any nodes in the possible owners list for the resource
that are NOT hosting a group with the same value in AntiAffinityClassNames, then
those nodes are considered a good target for failover. If all nodes in the cluster are
hosting groups that contain the same value in the AntiAffinityClassNames property,
then the preferred node list is used to select a failover target.

Failover Ring

Failover rings allow each node in the cluster to run an application instance. In the event
of a failure, the application on the failed node is moved to the next node in sequence.
27

Figure 4: Failover Ring

Configuration has pros and cons:

Pro Good for clusters that are supporting several


small application instances where the
capacity of any node is large enough to
support several at the same time.

Pro Effect on performance of a node failure is


easy to predict.

Pro Easy to plan capacity for a single failure.


28

Con Configuration does not work well for all


cases of multiple failures. If one Node 1
fails, Node 2 will host two application
instances and Nodes 3 and 4 will host one
application instance. If Node 2 then fails,
Node 3 will be hosting three application
instances and Node 4 will be hosting one
instance

Con Not well suited to heavy-weight applications


since multiple instances may end up being
hosted on the same node even if there are
lightly-loaded nodes.

Failover rings are supported by server clusters on the Windows Server 2003 release.
This is done by defining the order of failover for a given group using the preferred owner
list. A node order should be chosen and then the preferred node list should be set up with
each group starting at a different node.

Random

In large clusters or even 4-node clusters that are running several applications, defining
specific failover targets or policies for each application instance can be extremely
cumbersome and error prone. The best policy in some cases is to allow the target to be
chosen at random, with a statistical probability that this will spread the load around the
cluster in the event of a failure.

Configuration has pros and cons:

Pro Good for clusters that are supporting several


small application instances where the
capacity of any node is large enough to
support several at the same time.

Pro Does not require an administrator to decide


where any given application should failover
to.
29

Pro Provided that there are sufficient


applications or the applications are
partitioned finely enough, this provides a
good mechanism to statistically load
balance the applications across the cluster
in the event of a failure.

Pro Configuration works well for multiple


failures.

Pro Very well tuned to handling multiple


applications or many instances of the same
application running in the same cluster well.

Con Can be difficult to plan capacity. There is no


real guarantee that the load will be balanced
across the cluster.

Con Effect on performance of a node failure is


not easy to predict.

Con Not well suited to heavy-weight applications


since multiple instances may end up being
hosted on the same node even if there are
lightly-loaded nodes.

The Windows Server 2003 release of server clusters randomizes the failover target in the
event of node failure. Each resource group that has an empty preferred owners list will be
failed over to a random node in the cluster in the event that the node currently hosting it
fails.

Customized control

There are some cases where specific nodes may be preferred for a given application
instance.

Configuration has pros and cons:

Pro Administrator has full control over what


happens when a failure occurs.

Pro Capacity planning is easy, since failure


scenarios are predictable.
30

Con With many applications running in a cluster,


defining a good policy for failures can be
extremely complex.

Con Very hard to plan for multiple cascaded


failures.

Server clusters provide full control over the order of failover using the preferred node list
feature. The full semantics of the preferred node list can be defined as:

Preferred Node List Move group to best possible Failover due to node or
initiated via administrator group failure

Contains all nodes in Group is moved to highest Group is moved to the


cluster node in preferred node list next node on the
that is up and running in the preferred node list.
cluster.

Contains a subset of the Group is moved to highest Group is moved to the


nodes in the cluster node in preferred node list next node on the
that is up and running in the preferred node list.
cluster.
If the node that was
If no nodes in the preferred hosting the group is the
node list are up and running, last on the list or was not
the group is moved to a in the preferred node list,
random node. the group is moved to a
random node.

Empty Group is moved to a Group is moved to a


random node. random node.

Q. How many resources can be hosted in a cluster?

A. The theoretical limit for the number of resources in a cluster is 1,674; however, you
should be aware that the cluster service periodically polls the resources to ensure they
are alive. As the number of resources increases, the overhead of this polling also
increases.
31

Server Requirements
Q. How many servers can be clustered?

A. The number of servers in a Server cluster is dependent on the Windows product and
the Windows release. The following table lists the server sizes:

Windows Operating System Maximum number of servers

Windows NT Enterprise Edition 2

Windows 2000 Advanced Server 2

Windows 2000 Datacenter Server 4

Windows Server 2003 Enterprise Edition 8

Windows Server 2003 Datacenter Edition 8

Q. Is it necessary that the servers in a Server cluster be identical?

A. The cluster hardware compatibility test does not require that all the servers in a
qualified cluster be identical. As cluster sizes increase and therefore the investment in
hardware increases, it is likely that different types of servers will appear in a single
cluster.

The qualification process and the listing process are being improved to allow
heterogeneous solutions to be more easily defined and qualified. This is
particularly important to OEMs where server families change relatively often and
therefore the additional qualification passes required increases dramatically with
the current process. The server itself has never been an issue during qualification,
it is typically the HBA or other piece of the storage subsystem, and therefore there
is no real reason to mandate exact servers in a given qualified solution.

Q. Can 32-bit servers and 64-bit servers be mixed in the same Server cluster?

A. No, a single Server cluster must contain all 32-bit servers or all 64-bit servers.

Q. Can I use any of my servers to make a Server cluster?

A. All qualified solutions appear on the Microsoft Hardware Compatibility list (HCL)
(http://go.microsoft.com/fwlink/?linkid=67738). Only cluster solutions listed on the HCL
are supported by Microsoft. You can use any servers that are listed as part of a
configuration to build a complete solution.
32

Interconnects
Q. What types of networks are required for cluster interconnects?

A. The network interface controllers along with any other components used in certified
cluster configurations must have the Windows logo and appear on the Microsoft
Hardware Compatibility List. The interconnect itself must support TCP/IP and UDP traffic
and must appear as a single, non-routed LAN segment or subnet.

Q. Do Server clusters support high-bandwidth, low latency interconnects through


WinSock direct?

A. The Server cluster code itself does not use WinSock direct path for intra-cluster
communication. All communication between cluster nodes is over TCP/IP or UDP. Server
clusters can use, but may not take advantage of any high-bandwidth, low latency
connection between nodes that looks like a standard NDIS network adapter.

Q. How many network adapters should a cluster server have?

A. The cluster nodes must be connected by two or more independent networks in order
to avoid a single point of failure. The use of two local area networks (LANs) is required.
Cluster configurations that have only one network are not supported. At least 2 networks
must be configured for cluster communications. Typically, one network is a private
network, configured for cluster communications only and the other is a public network,
configured for both client access and cluster communications.

Q. Can a cluster support multiple adapters configured as public?

A. Yes, however, there are a couple of caveats:

• Each network adapter must be on a different subnet.

• You can create different IP address resources that can be associated with
different adapters; however, there is no way to make a resource or application
dependent on the two networks so that if one fails, the application will continue to be
available. In Server clusters, the dependencies must all be online for the dependent
application to be online.

For future versions of the cluster technology, we are working on a feature to allow more
flexible dependencies such as: a resource can depend on A OR B for it to be online.

Q. Can a cluster support multiple private networks?

A. Yes.

Q. Why do I see heartbeat packets on the networks marked as client-only?


33
A. In Windows 2000 and beyond, the Server cluster software heartbeats through the
public network as well as the private network regardless of the configuration of the public
network. This is to ensure that the cluster service can detect failures of the public network
adapters and can failover an application if the node currently hosting it cannot
communicate with the outside world.

Q. How should the different networks be configured?

A. The cluster nodes must be connected by two or more independent networks in order
to avoid a single point of failure. The use of two local area networks (LANs) is typical. The
configuration of a cluster with nodes connected by only one network is not supported.

You should configure the private network as Internal Cluster Communication Only and the
public network as All Communications.
Q. What type of information is carried over the cluster network?

A. The cluster service uses the cluster network for the following intra-cluster traffic:

• Server cluster query/management and control information

• Heartbeats for failure detection

• Intra-cluster communication to ensure tight consistency of cluster configuration

• Cluster join requests when a node reboots or recovers from a failure

Q. What types of protocol are supported for applications in a Server cluster?

A. Cluster-aware applications must use protocols based on top of TCP/IP. The cluster
software only supports the TCP/IP protocols for failover out-of -the-box.

Q. Will failover occur if the public network on one server fails?

A. Yes, in Windows 2000 and above, additional heartbeats are sent across the public
networks to detect public network and/or network adapter failures and to cause
applications to failover if the node they are currently hosted on cannot communicate with
the other nodes.

Q. Is network adapter teaming supported in a Server cluster?

A. Yes, however there are caveats. The use of network adapter teaming on all cluster
networks concurrently is not supported. At least one of the cluster networks that are
enabled for internal communication between cluster nodes must not be teamed. Typically,
the un-teamed network is a private interconnect dedicated to this type of communication.
The use of network adapter teaming on other cluster networks is acceptable; however, if
communication problems occur on a teamed network, Microsoft Product Support
Services may require that teaming be disabled. If this action resolves the problem or
34
issue, then you must seek further assistance from the manufacturer of the teaming
solution.

Q. Is DHCP supported for Server cluster virtual servers?

A. No, virtual servers must have static IP addresses.

Q. Is DHCP supported for Server cluster nodes?

A. Yes, addresses may be assigned to the physical nodes dynamically by DHCP, but
manual configuration with static addresses is recommended.

Q. Is NetBIOS required to run Server clusters?

A. In Windows NT and Windows 2000, NetBIOS is required for Server clusters to function
correctly.

In Windows Server 2003, the cluster service does not require NetBIOS. A basic principle
of server security is to disable unneeded services. To determine whether to disable
NetBIOS, consider the following:

• Some services and applications other than the cluster service use NetBIOS.
Review all the ways a clustered server functions before disabling NetBIOS on it.

• With NetBIOS disabled, you will not be able to use the Browse function in Cluster
Administrator when opening a connection to a cluster. Cluster Administrator uses
NetBIOS to enumerate all clusters in a domain.

• With NetBIOS disabled, Cluster Administrator does not work if a cluster name is
specified.

You can control the NetBIOS setting for the cluster through the Cluster IP Address
resource property sheet (parameters).

Q. Is IPSec supported in a Server cluster?

A. Although it is possible to use Internet Protocol Security (IPSec) for applications that
can failover in a server cluster, IPSec was not designed for failover situations and we
recommend that you do NOT use IPSec for applications in a server cluster.

Q. How do Server clusters update router tables when doing IP failover?

A. As part of its automatic recovery procedures, the cluster service will issue IETF
standard ARP "flush" commands to routers to flush the machine addresses related to IP
addresses that are being moved to a different server.

Q. How does the Address Resolution Protocol (ARP) cause systems on a LAN to update
their tables that translate IP addresses to physical machine (MAC) addresses?
35
A. The ARP specification states that all systems receiving an ARP request must update
their physical address mapping for the source of the request. The source IP address and
physical network address are contained in the request. As part of the IP address
registration process, the Windows TCP/IP driver broadcasts an ARP request on the
appropriate LAN several times. This request asks the owner of the specified IP address
to respond with its physical network address. By issuing a request for the IP address
being registered, Windows can detect IP address conflicts; if a response is received, the
address cannot be safely used. When it issues this request, though, Windows specifies
the IP address being registered as the source of the request. Thus, all systems on the
network will update their ARP cache entries for the specified address, and the registering
system becomes the new owner of the address.

Note that if an address conflict does occur, the responding system can send out another
ARP request for the same address, forcing the other systems on the subnet to update
their caches again. Windows does this when it detects a conflict with an address that it
has successfully registered.

Q. Server clusters use ARP broadcasts to re-set MAC addresses, but ARP broadcasts
don't pass routers. So what about clients behind the routers?

A. If the clients were behind routers, they would be using the router(s) to access the
subnet where the cluster servers were located. Accordingly, the clients would use their
router (gateway) to pass the packets to the routers through whatever route (OSPF, RIP,
etc) is designated. The end result is that their packet is forwarded to a router on the same
subnet as the cluster. This router's ARP cache is consistent with the MAC address(es)
that have been modified during a failover. Packets thereby get to the correct Virtual
server, without the remote clients ever having seen the original ARP broadcast.

Storage (Server Clusters: Frequently


Asked Questions for Windows 2000
and Windows Server 2003)
There are many storage questions, these questions are categorized as general
questions, questions about deploying Server clusters on a storage area network (SAN)
and network attached storage (NAS) questions.

General Storage Questions


Q. What storage interconnects are supported by Server clusters?
36
A. Cluster Server does not limit the type of storage interconnects supported; however,
there are some requirements of the storage subsystem that limit the types from a
practical perspective. For example, all of the cluster nodes should be able to access the
storage device. This typically impacts the interconnect since only interconnects that allow
multiple initiators (i.e. nodes) can be used. The set of interconnects that are currently part
of qualified configurations on the HCL include: SCSI (of various flavors), Fibre channel
arbitrated loop and fibre channel switched fabrics.

Remember that only clusters where the full configuration appears on the Cluster HCL are
supported by Microsoft.

Q. How do I configure the nodes and the storage on a SCSI cluster?

A. You must make sure that all of the devices on the SCSI bus have different SCSI Ids.
By default, SCSI adapters tend to have Id 7. You should make sure that the adapters in
each node have different Ids. Likewise, the disks should be given unique SCSI Ids on the
bus.

For a SCSI bus to work correctly it must be terminated. There are many ways to
terminate the bus, both internally (at the host adapter) and externally (using Y cables). To
ensure that the cluster can survive different types of failures (specifically being able to
power down one of the nodes), the SCSI bus should be terminated using passive
components such as a Y cable. Internal termination, which requires the adapter to be
powered up, is not recommended.

Note
Microsoft only allows 2-node clusters to be built using SCSI storage
interconnects.

Q. Does Server clustering support fibre channel arbitrated loop (FC-AL)?

A. Yes, Microsoft only allows 2-node clusters to be built using FC-AL storage
interconnects. Multiple clusters on a single fibre channel loop are NOT supported.

Q. Can multiple clusters be connected to the same storage controller?

A. Yes, there is a special device qualification test for storage controllers that ensures they
respond correctly if multiple clusters are attached to the same controller. For multiple
clusters to be connected to the same controller the storage controller MUST appear on
the Multi-cluster device Hardware Compatibility List (HCL) AND each of the individual
end-to-end cluster solutions must appear on the Cluster Hardware Compatibility List. For
example: EMC Symmetrix 5.0 is on the multi-cluster device HCL list. Multiple clusters
(say a Dell PowerEdge cluster and a Compaq Proliant cluster) can be connected to the
EMC Symmetrix storage controller as long as Dell PowerEdge + EMC Symmetrix AND
Compaq Proliant + EMC Symmetrix as both on the cluster HCL.
37
Q. Will failover occur if the storage cable is pulled from the host bus adapter (HBA)?

A. If the storage cable is pulled from the host bus adapter (HBA), there may be a pause
before the adapter reacts to losing the connection, however, once the HBA has detected
the communication failure, the disk resources within the cluster using the specific HBA
will fail. This will trigger a failover to occur and the resource will be brought back on line
on another node in the cluster.

If the storage cable is reconnected, the Windows operating system may not rescan for
new hardware automatically (this depends on the driver used for the adapter). You may
need to manually initiate a rescan for new devices. Once the rescan is done, the node
can host any of the physical disk resources. If failback policies are set, any resources that
failed over when the cable was removed will failback to the node when the cable is
replaced.

Note
An HBA) is the storage interface that is deployed in the server. Typically this is a
PCI card that connects the server to the storage fabric.

Q. Will Server clusters protect my disks from hardware failures?

A. No, Cluster server protects against server failure, operating system or application
failure and planned downtime due to maintenance. Microsoft strongly suggests that
application and user data is protected against disk failure using redundancy techniques
such as mirroring, RAID or replication either in hardware or in software.

Q. Do Server clusters support RAID or mirrored disks?

A. Yes, Microsoft strongly suggests that application and user data is protected against
disk failure using redundancy techniques such as mirroring, RAID or replication either in
hardware or in software.

Q. Are dynamic disks supported in a cluster?

A. Windows server products shipped from Microsoft do not provide support for dynamic
disks in a server cluster environment. The Volume Manager for Windows 2000 add-on
product from Veritas can be used to add the dynamic disk features to a server cluster.
When the Veritas Volume Manager product is installed on a cluster, Veritas should be the
first point of support for cluster issues.

Q. Can a cluster disk be extended without rebooting?

A. Yes, cluster disks can be extended without rebooting if the storage controller supports
dynamic expansion of the underlying physical disk. Many new storage controllers
virtualize the Logical Units (LUNs) presented to the operating system and these
controllers allow LUNs to be grown online from the storage controller management
38
console. Microsoft provides a tool called DiskPart that allows volumes or partitions to be
grown online to take advantage of the newly created space on the disk without disruption
to applications or users using the disk. There are separate versions of DiskPart for
Windows 2000 and Windows Server 2003. The Windows 2000 version is available as a
free download on the web and the Windows Server 2003 version is shipped on the
distribution media.

Note
An LUN equates to a disk device visible in Disk Administrator.

Q. Can additional disks be added to a cluster without rebooting?

A. Yes, you can insert a new disk or create a new LUN and make that visible to the
cluster nodes. You should only make the disk visible to one node in the cluster and then
create a cluster resource to protect that disk. Once the disk is protected, you can make
the LUN visible to other nodes in the cluster. In some cases you may need to do a rescan
in device manager to find the new disk. In other cases (especially with fibre channel) the
disk may be automatically detected.

Q. Can disks be removed from the cluster without rebooting?

A. Yes

Q. What types of disks can be used for cluster disks?

A. Microsoft recommends that all partitions on the cluster disks be formatted with the
NTFS file system. This is for two reasons. Firstly, NTFS provides access controls that can
be used to secure the data on the disk. Secondly, NTFS can recover from volumes that
have been forcibly dismounted; other file systems may become corrupt if they are forcibly
dismounted.

Server clusters only support Master Boot Record (MBR) format disks. Cluster disks
cannot be GPT format.

Q. Can tape devices or other non-disk devices be attached to the same storage bus as
the Cluster Disks?

A. It depends on the storage interconnect. Server clusters use SCSI reserve and reset to
arbitrate for cluster disks. In Windows NT and Windows 2000 cluster server performs an
untargeted bus reset. In Windows Server 2003 it is possible to target the reset, however,
it may fallback to an untargeted reset. If a tape device receives a reset, this will typically
trigger the tape to rewind.

Server cluster does not provide any arbitration mechanism for tape devices that are
visible from multiple servers, therefore tape devices are not protected against concurrent
access from multiple servers.
39
Microsoft does NOT support attaching a tape device to the SCSI bus containing the
cluster disks in the case of a SCSI cluster or the loop in and fibre channel arbitrated loop.

Tape devices can be attached to a switched fabric as long as they are not visible through
the same adapters as the cluster disks. This can be achieved by either putting the tape
drive in a different zone to the cluster disks or by LUN masking techniques.

Q. Are software fault tolerant disks (software RAID or mirroring) supported in a Server
cluster?

A. Windows server products shipped from Microsoft do not provide support for software
RAID or mirroring, however, there are 3rd party products that provide this functionality in
a clustered environment.
Q. Is the Virtual Snapshot Service (VSS) supported in a Server cluster?

A. Yes, the Virtual Snapshot Service is new with Windows Server 2003 and provides
basic snapshot capabilities that are used by backup applications to create consistent,
single point in time backups. The cluster service has a VSS provider that allows the
cluster service configuration to be snapshoted and stored as part of the system state by
these backup applications.

Q. Do Timewarp snapshots work in a Server cluster?

A. No, Timewarp is a new feature in Windows Server 2003 that allows persistent
snapshot to be created and exposed to clients. TImewarp makes use of features that are
not cluster-aware and it is not supported in a cluster at this time.

Q. Are hardware snapshots or business recovery volumes supported in a Server cluster?

A. Yes, you can use facilities in the latest storage controllers to create snapshots of
existing volumes. Note, however, when you create a snapshot of a disk you should NOT
expose the snapshot back to the same cluster as the original disk. The cluster service
uses the disk signature to uniquely identify a disk. With a snapshot, the disk and the
snapshot have the same disk signature.

If you create a hardware snapshot or a business recovery volume of a cluster disk you
should expose the snapshot to another server or cluster (typically a dedicated backup
server).

Q. What other considerations are there when creating clustered disks?

A. Modern storage controllers provide a virtual view of the storage itself. A physical RAID
set can be carved into multiple logical units that are exposed to the operating system as
individual disks or LUNs. If you intend to carve up physical disks in this way and expose
them as independent LUNs to the hosts, you should think carefully about the IO
characteristics and the failure characteristics remember underneath, there is only a finite
bandwidth to each spindle.
40
Microsoft recommends that you do not create a LUN for use as the quorum disk from the
same underlying physical disks that you will be using for applications. The availability of
the cluster is directly related to the availability of the quorum disk. If I/Os to the quorum
disk take too long, the cluster server will assume that the quorum disk has failed and
initiate a failover of the quorum device. At that point, all other cluster related activity is
suspended until the quorum device is brought back online.

Q. How can I replace a disk that has gone bad in a cluster?

A. The answer depends on the Windows release:

• Windows NT Enterprise Edition

• FTEdit tool along with some manipulate of the registry. This is covered in
article 243195 in the Microsoft Knowledge Base (http://go.microsoft.com/fwlink/?
linkid=67761).

• Windows 2000

• DumpCfg.

• ClusterRecovery Reskit tool provided on Windows Server 2003 Reskit

• Windows Server 2003

• Automated system recovery.

• ConfDisk. This is covered in article 280425 in the Microsoft Knowledge Base


(http://go.microsoft.com/fwlink/?linkid=67762).

• ClusterRecovery Reskit tool provided on Windows Server 2003 Reskit

Storage Area Networks (SAN) questions


Q. What is a Storage Area Network (SAN)?

A. A storage area network (SAN) is defined as a set of interconnected devices (e.g. disks
and tapes) and servers that are connected to a common communication and data
transfer infrastructure such as a fibre channel. The common communication and data
transfer mechanism for a given deployment is commonly known as the storage fabric.
The purpose of the SAN is to allow multiple servers access to a pool of storage in which
any server can potentially access any storage unit. Clearly in this environment,
management plays a large role providing security guarantees (who is authorized to
access which devices) and sequencing or serialization guarantees (who can access
which devices at what point in time).
41
Q. Why use a SAN?

A. Storage area networks provide a broad range of advantages over locally connected
devices. They allow compute units to be detached from storage units, thereby allowing
flexible deployment and re-purposing of both servers and storage dynamically to suit the
current business needs without having to be concerned about buying the right devices for
a given server or without re-cabling a datacenter to attach storage to a given server.

Q. Can a cluster be connected to a SAN?

A. Yes, Microsoft fully supports storage area networks both as part of the base Windows
platform and as part of a complete Windows Clustering high availability solution.

Q. Can a cluster co-exist on a SAN with other servers?


A. One or more Server clusters can be deployed in a single SAN environment along with
standalone Windows servers and/or with other non-Windows platforms.

Q. What additional SAN configuration is required to put a cluster on a shared SAN?

A.

• The cluster disks for each cluster on a SAN MUST be deployed in their own
zone. All host bus adapters (HBAs) in a single cluster must be the same type and at
the same firmware revision level. Many storage and switch vendors require that ALL
HBAs on the same zone, and in some cases the same fabric, are the same type and
have the same firmware revision number.

• All storage device drivers and HBA device drivers in a cluster must be at the
same software version.

• When adding a new server to a SAN, ensure that the HBA is appropriate for the
topology. In some configurations, adding an arbitrated loop HBA to a switched fibre
fabric can result in widespread failures of the storage fabric. There have been real-
world examples of this causing serious downtime.

Note
An HBA is the storage interface that is deployed in the server. Typically this is a
PCI card that connects the server to the storage fabric.

Q. What is LUN masking or selective presentation?

A. LUN masking (also implemented as selective presentations) allows users to express


at the controller level a specific relationship between a LUN and a host. Only the hosts
that are configured to access the LUN should be able to see it.

Q. What is hardware zoning verses software zoning?


42
A. Zoning can be implemented in hardware/firmware on controllers or on software on the
hosts. Microsoft recommends that controller based (or hardware) zoning be used since
this allows for uniform implementation of access policy that cannot be interrupted or
compromised by node disruption or failure of the software component.

Q. Why is zoning needed to isolate a Server cluster in a SAN?

A. The cluster uses mechanisms to protect access to the disks that can have an adverse
effect on other clusters that are in the same zone. By using zoning to separate the cluster
traffic from other cluster or non-cluster traffic, there is no chance of interference.

The diagram shows two clusters sharing a single storage controller. Each cluster is in its
own zone. The LUNs presented by the storage controller must be allocated to individual
clusters using fine-grained security provided by the storage controller itself. LUNs must
be setup so that every LUN for a specific cluster is visible and accessible from all nodes
of the cluster. A LUN should only be visible to one cluster at a time. The cluster software
itself takes care of ensuring that although LUNs are visible to all cluster nodes, only one
node in the cluster accesses and mounts the disk at any point in time.

The multi-cluster device test used to qualify storage configurations for the multi-cluster
HCL list tests the isolation guarantees when multiple clusters are connected to a single
storage controller in this way.
43
Q. Can a cluster server boot from a SAN?

A. Yes, however, there is a set of configuration restrictions around how Windows boots
from a storage area network. For more information, see article 305547 in the Microsoft
Knowledge Base (http://go.microsoft.com/fwlink/?linkid=67837).

Server clusters require that the startup disk, page file disk and system disk be on a
different storage bus to the cluster server disks. To boot from a SAN, you must have a
separate HBA for the boot, system and pagefile disks than the cluster disks. You MUST
ensure that the cluster disks are isolated from the boot, system and pagefile disks by
zoning the cluster disks into their own zone.

Q. Can I use multiple paths to SAN storage for high availability?


A. Microsoft does not provide a generic driver that allows multiple paths to the storage
infrastructure for high availability; however, several vendors have built their own
proprietary drivers that allow multiple HBAs and SAN fabrics to be used as a highly
available storage infrastructure. For a Server cluster that has multi-path drivers to be
considered supported, the multipath driver MUST appear as part of the complete cluster
solution on the Cluster Hardware Compatibility List (HCL). NOTE: The driver version is
VERY important and it MUST match the qualified version on the HCL.

Q. Can the startup disk, pagefile disks and the cluster disks be on the same SAN fabric?

A. No, in Windows Server 2003, there is a registry key that allows the startup disk,
pagefile disks, and cluster disks to be on the same bus. This feature is enabled by a
registry key, which helps ensure that it is not accidentally enabled by customers who do
not understand the implications of this configuration. It is intended for OEMs to ship
qualified and tested configurations and not for a typical end-user or administrator to set
up in an ad hoc manner.

In the original release of Windows Server 2003, the registry key is:

HKLM\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\ManageDisksOnS
ystemBuses 0x01

In Windows Server 2003 with Service Pack 1 (SP1), the registry key is:

HKLM\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\ManageDisksOn
SystemBuses 0x01

In Windows Server 2003 SP1, the key path was changed to use “Clusdisk” as a subkey
instead of “ClusSvc.” This change was made to avoid issues during setup. However, the
change is backward compatible, and systems that use the old key locations do not need
to be modified.

Q. Can serverless backup be performed against cluster disks?


44
A. No, the cluster disk arbitration mechanism uses SCSI reserve and release operations.
Once a server arbitrates for a cluster disk, that disk cannot be accessed by any other
server on the storage network.

Network Attached Storage (NAS)


questions
Q. What is Network Attached Storage (NAS)?

A. Network attached storage (NAS) is an alternative way to connect storage to servers


that is built using standard network components such as Ethernet or other LAN
technologies. The application servers access storage using file system functions such as
open file, read file, write file, close file, etc.. These higher-level functions are
encapsulated in protocols such as CIFS, NFS or AppleShare and run across standard IP-
based connections.

Q. Can Server clusters use NAS for the shared storage?

A. Yes, providing the applications can store data on file shares and the file shares are
accessible to the applications as they failover across the cluster, there is no reason why
NAS cannot be used as the storage solution in a cluster.

There is currently no support in Windows to use NAS as the quorum resource.

In Windows Server 2003, we are providing a new quorum resource Majority Node Set
that can be used to remove the need for a shared disk for the quorum resource. If you
combine NAS storage with Majority Node Set quorum, you can build a failover cluster
that does not require shared disks in the traditional sense of SCSI or SAN.

Highly Available File Servers


Q. Can I have an active/active file server?

A. The Windows 2000 resource kit contains a tool ClusTool that can be used to migrate
file share settings from a single node to a cluster environment.

Q. Can FRS be used to replicate clustered file shares?

A. No, the file replication service (FRS) provided by Windows cannot be used to replicate
a clustered file share. This means that clustered file shares cannot be the source or
45
target for redundant links in a DFS tree. See the online documentation for DFS for more
details.

Q. What file system types are supported in a cluster?

A. All partitions on clustered disks should be formatted with NTFS.

Q. Does client-side caching (offline folders) work with Server clusters?

A. Yes, in Windows Server 2003, you can select client-side caching (also know as offline
folders) for clustered file shares.

Q. Is the Encrypting File System (EFS) supported on cluster disks?

A. With Windows Server 2003, the encrypting file system (EFS) is supported on clustered
file shares. To enable EFS on a clustered file share, you must perform a number of tasks
to configure the environment correctly:

• EFS can only be enabled on file shares when the virtual server has Kerberos
enabled. By default, Kerberos is not enabled on a virtual server. To enable Kerberos
you must check the Enable Kerberos Authentication check box on the network name
resource that will be used to connect to the clustered file share. NOTE: Enabling
Kerberos on a network name has a number of implications that you should ensure
you fully understand before checking the box.

• All cluster node computer accounts, as well as the virtual server computer
account, must be trusted for delegation. See online help for how to do this.

• To ensure that the users private keys are available to all nodes in the cluster, you
must enable roaming profiles for users who want to store data using EFS. See online
help for how to enable roaming profiles.

Once the cluster file shares have been created and the configuration steps above carried
out, users data can be stored in encrypted files for added security.

Q. How many file shares can be hosted on a cluster?

A. The number of file shares in a cluster depends on the number of nodes in the cluster
and the failure scenarios that you are trying to protect against. A single server has a limit
for the number of file shares it can support so you need to take that into account when
planning your cluster.

In a 2-node cluster, if one node fails, the remaining node must pick up all of the file
shares. Therefore, to ensure the highest availability, the cluster should host the maximum
number of shares that can be hosted by a single node.
46

Note
2-node Server clusters are focused on high availability, not scale-out, therefore
you should not expect to hold more shares on a 2-node cluster than a single
node.

In a 4-node cluster, you have other options that may be more appropriate, depending on
the failure scenarios that you wish to protect against. For example, if you wish to survive
one node failing at any point in time, you can configure the shares so that if one node
fails, its work is spread across the remaining three nodes. This means that each node
could be loaded to 66% of the maximum number of shares and still be within the
maximum limit of a single node in the event of a single failure. In this case, the cluster
can host three times the number of shares that a single server can host. If you wish to
survive two nodes failing, then a 4-node cluster can hold twice as many shares (since if
two nodes fail, the remaining two nodes need to pick up the load from the two failed
servers) and so on.

In general, as the number of nodes in a cluster increases, the more options you have and
the more you can use server clusters to scale-out a highly available infrastructure.

Q. What is the maximum capacity of a cluster disk?

A. Server cluster does not impose any restrictions on the size of a volume supported.

Q. How many disks can Server cluster support?

A. In Windows 2000, each clustered disk had to have a drive letter assigned to it,
therefore the maximum number of clustered disks in a single cluster was limited to 23
volumes (26 letters of the alphabet minus A and B [Floppy drives] and C [system/boot
drive]).

In Windows Server 2003, there is no longer a requirement for a clustered disk to have a
driver letter assigned, therefore the number of disks is limited by the number that can be
physically attached and the number supported by the underlying operating system.

Note
Applications can access disks with no drive letters in one of two ways a) directly
using the object name associated with the disk or more likely b) by using mount
points to link multiple disks together that can be accessed using a single drive
letter.

Q. How can a cluster support more disks than there are drive letters?

A. Using file system mount points. For more information about using mount points with
clustered disks see the online help for Windows Server 2003.

Q. Why can I browse shares owned by different virtual servers?


47
A. File shares are not scoped by the virtual server name that is hosting them. If you use a
browsing tool (e.g. the NET VIEW command) you will see all the shares that are currently
hosted on the physical node.

Highly Available Print Servers


Q. How do I cluster a printer?

A. Printers can be clustered using the Print Spooler cluster resource. The Windows 2000
and the Windows Server 2003 online help both give specific examples and the steps
necessary to create a highly available print server.
Q. Can I have active/active print servers?

A. Yes, it is possible to host multiple print spoolers on a single Server cluster. The
spoolers can be failed over independently and can run concurrently on multiple nodes in
the cluster.

Q. How do I migrate printer settings from a single server to a cluster?

A. Microsoft provides a tool (Print Migrator) as part of the ResKit that can be used to
migrate printer settings from one node to another or from one node to a Server cluster.

Q. How many printer resources can be hosted on a cluster?

A. The number of printers is limited by the number of resources the cluster can support,
however, as the number of printers increases, so will the time to failover.

Removing all Single Points of Failure


Q. What other services does the server cluster rely on?

A. The cluster service itself relies on being able to authenticate and sign communications
traffic between the cluster nodes. It uses the domain infrastructure to authenticate using
the cluster service account. In an environment with Server clusters installed, you must
ensure that the domain infrastructure is highly available; any disruption to the
infrastructure can result in the clusters becoming unavailable.

Q. What other services do I need to think about?

A. In order for applications to remain highly available in a clustered environment, any


services that the application requires external to the cluster must also be highly available.
Many of these services have mechanisms such as replication or being made cluster-
48
aware themselves to protect against failures. Examples of services that you should think
about include WINS, DNS, DHCP, the domain infrastructure, firewalls, etc.

Q. What other single points of failure should I protect against?

A. Server clusters are a mechanism that protects applications against hardware,


operating system and application failures. There are some types of hardware failure that
you should think about:

• Disk failures you should use RAID or mirroring to protect against disk failures

• Hardware failures multiple hot swap fans in the server, redundant power supplies
etc.

• Network failures redundant networks that do not have any shared components
• Site failures disaster recovery plans

Active Directory, DNS and Domain


Controllers
Q. Is Kerberos authentication possible for services hosted on a cluster?

A. Yes, in Windows 2000 SP3 and above and Windows Server 2003, the cluster service
publishes a computer object in Active Directory. This provides the infrastructure with
sufficient state to allow Kerberos authentication against applications and services hosted
in a virtual server.

For more information about Kerberos and how it works, see the TechNet web site
(http://go.microsoft.com/fwlink/?linkid=67842).

Q. Can cluster servers also be domain controllers?

A. Yes, however, there are several caveats that you should fully understand before taking
this approach. We recommend that Server cluster nodes are not domain controllers and
that you co-locate a domain controller on the same subnet as the Server cluster public.

If you must make the cluster nodes into domain controllers, consider the following
important points:

• If one cluster node in a 2-node cluster is a domain controller, all nodes must be
domain controllers. It is recommended that you configure at least two of the nodes in
a 4-node Datacenter cluster as domain controllers.

• There is overhead that is associated with the running of a domain controller. A


domain controller that is idle can use anywhere between 130 to 140 megabytes (MB)
49
of RAM, which includes the running of Windows Clustering. There is also replication
traffic if these domain controllers have to replicate with other domain controllers
within the domain and across domains. Most corporate deployments of clusters
include nodes with gigabytes (GB) of memory so this is not generally an issue.

• If the cluster nodes are the only domain controllers, they each have to be DNS
servers as well, and they should point to each other for primary DNS resolution, and
to themselves for secondary DNS resolution. You have to address the problem of the
ability to not register the private interface in DNS, especially if it is connected by way
of a crossover cable (2-node only). For information about how to configure the
heartbeat interface refer to article 258750 in the Microsoft Knowledge Base
(http://go.microsoft.com/fwlink/?LinkID=46549). However, before you can accomplish
step 12 in KB article 258750, you must first modify other configuration settings, which
are outlined in article 275554 (http://go.microsoft.com/fwlink/?LinkID=67844).

If the cluster nodes are the only domain controllers, they must each be Global
Catalog servers, or you must implement domainlets.

• The first domain controller in the forest takes on all flexible single master
operation roles (refer to article 197132 at http://go.microsoft.com/fwlink/?
LinkID=67847). You can redistribute these roles to each node. However, if a node
fails over, the flexible single master operation roles that the node has taken on are no
longer available. You can use Ntdsutil to forcibly take away the roles and assign them
to the node that is still running (refer to article 223787 at
http://go.microsoft.com/fwlink/?LinkID=67851). Review article 223346 at
http://go.microsoft.com/fwlink/?LinkID=19807) for information about placement of
flexible, single master operation roles throughout the domain.

• If a domain controller is so busy that the Cluster service is unable to gain access
to the Quorum drive as needed, the Cluster service may interpret this as a resource
failure and cause the cluster group to fail over to the other node. If the Quorum drive
is in another group (although it should not be), and it is configured to affect the group,
a failure may move all group resources to the other node, which may not be
desirable. For more information regarding Quorum configuration, please refer to the
article 280345 listed in the "Reference" section (http://go.microsoft.com/fwlink/?
LinkID=67855).

• Clustering other programs such as SQL Server or Exchange Server in a scenario


where the nodes are also domain controllers, may not result in optimal performance
due to resource constraints. You should thoroughly test this configuration in a lab
environment prior to deployment.

• You may want to consider making cluster nodes domain controllers (refer to KB
article 171390 at http://go.microsoft.com/fwlink/?LinkID=67857 for more information),
50
but if a domain controller is already local, or there is a reliable high-speed
connectivity to a domain controller available, Microsoft does not recommend
implementing them on cluster nodes.

Note
You must promote a cluster node to a domain controller by using the
Dcpromo tool prior to installing Windows Clustering.

• You must be extremely careful when demoting a domain controller that is also a
cluster node. When a node is demoted from a domain controller, the security settings
and the user accounts are radically changed (user accounts are demoted to local
accounts for example).
Q. Are virtual servers published in active directory?

A. Yes, in Windows 2000 SP3 and above and in Windows Server 2003, each virtual
server has the option of being published in active directory.

Although the network name server cluster resource publishes a computer object in active
directory, that computer object should NOT be used for administration tasks such as
applying Group Policy. The ONLY role for the virtual server computer object in
Windows 2000 and Windows Server 2003 is to allow Kerberos authentication and
delegation and for cluster-aware, active directory-aware services (such as MSMQ) to
publish service provider information.

Q. Is the cluster configuration stored in active directory?

A. No, at this time there is no cluster information other than the computer objects for
virtual servers published in Active directory.

Q. Do Server clusters make domain controllers highly available?

A. No, domain controllers use replication across a set of servers to achieve high
availability.

Q. How should my DNS server be configured to work with Server clusters?

A. The cluster service account needs to be able to publish records. In a secure, DNS
backed zone, the DNS administrator can chose to restrict the access rights for users. The
cluster service account must be granted permission to create records or alternatively, the
records can be pre-created. If the records are pre-created, you should not set the zone to
dynamic update.
51

Security Considerations (Server


Clusters: Frequently Asked Questions
for Windows 2000 and Windows Server
2003)
Q. How do I update the cluster service account password?

A. The cluster service account on ALL nodes in the cluster must match to ensure that the
intra-cluster communication can be successfully authenticated. The cluster service itself
sends messages between cluster nodes under a variety of conditions and if any of those
communications fail, the cluster node will be removed from the cluster (i.e. the cluster
service will be stopped). It is not possible to determine when the cluster service will
establish communication and therefore there is no clear window that allows the cluster
service account to be changed in a reliable way while ensuring that the cluster remains
running.

Windows 2000

On Windows 2000, the cluster account password can only be reliably changed using the
following steps:

1. Stop the cluster service on ALL nodes in the cluster

2. Change the password of the cluster service account at the domain controller

3. Update the service control manager password on ALL cluster nodes

4. Re-start the cluster service on all the cluster nodes

Windows Server 2003

The cluster.exe command on Windows Server 2003 has the ability to change the cluster
account password dynamically without shutting down the cluster service on any of the
nodes. The cluster.exe command changes the domain account password and updates
the service control manager account information about all nodes in the cluster.

Cluster
/cluster:cluster_name1[,cluster_name2,]/changepassword[:new_password[,old_passw
ord]] [/skipdc] [/force] [/options]

For more information refer to the online help for Windows Server 2003.

Q. What other security considerations and best practices do I need to worry about for
Server clusters?
52
A. For security best practices, see the online help for Windows Server 2003.

In-the-box HA Services
Q. What highly available operating system services are provided on the Windows
release?

A. Server clusters following highly available services by default in the Windows operating
system:

• IP Address and Network Name: highly available network configuration to allow


clients to location independent and failover unaware

• DHCP: Highly available DHCP server

• MSDTC: Highly available distributed transaction coordinator

• IIS: Highly available web server and FTP server*

• File Share: Highly available file share service and DFS

• Message Queue: Highly available MSMQ service

• MSMQ triggers: Highly available MSMQ trigger service (new for


Windows Server 2003)

• Print Spooler: Highly available printer service

• WINS: Highly available WINS service

*In Windows Server 2003, IIS is made cluster-aware using the generic script resource
and the scripts provided. There is no specific IIS resource type.

Q. Are MSMQ triggers supported in a Server cluster?

A. Yes, in Windows Server 2003, the MSMQ trigger service can be made highly available
using Server clusters.

Q. Is IIS cluster-aware?

A. Yes, in Windows 2000, IIS web sites and FTP services can be made highly available
using the IIS Server Instance resource type. In Windows Server 2003, the IIS Server
Instance resource type was replaced with a set of generic scripts provided in the
Windows Server 2003 release (see the online help for more information about converting
IIS web servers and FTP servers from Windows 2000 to Windows Server 2003).

Although IIS web servers can be made highly available by failover using Server
clustering, Microsoft recommends that you use a load balancing cluster such as provided
53
by Network Load Balancing (NLB), another cluster mechanism provided by the Windows
operating system to make IIS highly available and to scale-out a web service or web
farm.

Depending on the access characteristics, you may choose either Server clusters or
Network Load Balancing clusters to provide highly available FTP servers. Server
clustering is good for FTP sites with high update rates or where you want to have a single
copy of the FTP content. Network Load Balancing is good for mainly read-only FTP
Servers.

Q. How is the IIS metabase kept consistent across the cluster?

A. The Windows operating system comes with a tool (IISSync) that allows the IIS
metabase to be kept in sync across the nodes in the cluster. For more details see the
online help.

Geographically Dispersed Clusters


Q. Can Server clusters span multiple sites?

A. Yes, Server clusters support a single cluster spanning multiple sites. This is known as
a geographically dispersed cluster. All qualified geographically dispersed cluster solutions
appear on the Microsoft Hardware Compatibility list (HCL)
(http://go.microsoft.com/fwlink/?LinkID=67738). Only cluster solutions listed on the HCL
are supported by Microsoft.

Q. How is a geographically dispersed cluster defined?

A. A geographically dispersed cluster is a Server cluster that has the following attributes:

1. Has multiple storage arrays, at least one deployed at each site. This ensures that
in the event of failure of any one site, the other site(s) will have local copies of the
data that they can use to continue to provide the services and applications.

2. Nodes are connected to storage in such a way that in the event of a failure of
either a site or the communication links between sites, the nodes on a given site can
access the storage on that site. In other words, in a two-site configuration, the nodes
in site A are connected to the storage in site A directly and the nodes in site B are
connected to the storage in site B directly. The nodes in site A con continue without
accessing the storage on site B and vice-versa.

3. The storage fabric or host-based software provides a way to mirror or replicate


data between the sites so that each site has a copy of the data. Different levels of
consistency are available.
54
The following diagram shows a simple two-site cluster configuration.

Q. Will geographically dispersed clusters give me disaster tolerance, or disaster


recovery?

A. The goal of multi-site Server cluster configurations is to ensure that loss of one site in
the solution does not cause a loss of the complete application for business continuance
and disaster recovery purposes. Sites are typically up to a few hundred miles apart so
that they have completely different power, different communications infrastructure
providers, and are placed so that natural disasters (e.g., earthquakes) are extremely
unlikely to take out more than one site.

Geographically dispersed clusters do not provide disaster tolerance, since, in some


cases, manual intervention is required to restart the applications.

Q. Can a geographically dispersed cluster use asynchronous replication between sites?

A. Yes, however, there are a couple of caveats:


55
• The quorum data must be synchronously replicated between the sites. To ensure
that the Server cluster guarantees of consistency are met, the cluster database must
be kept consistent across all nodes. If the quorum disk is replicated across the sites,
it MUST be replicated synchronously.

In Windows Server 2003, a new quorum resource Majority Node Set can be used in
these configurations as an alternative to replicating the quorum disk.

• If data is replicated asynchronously, in the event of a site failure, the secondary


site will be consistent, but out of date. You should check that the applications can
handle going back in time and that the client experience makes sense for the
business. Applications such as SQL Server can be hosted on asynchronous
replication; however, there are a bunch of restrictions and warning that you should be
aware of (see the SQL Server high availability documentation for rules around multi-
site replication of SQL Server data).

• If data is replicated at the block level, the replication must preserve the order of
writes to the secondary site. Failure to ensure this will lead to data corruption.

Q. Does Microsoft provide a complete end-to-end geographically dispersed cluster


solution?

A. No, Microsoft does not provide a software mechanism for replicating application data
from one site to another in a geographically dispersed cluster. Microsoft works with
hardware and software vendors to provide a complete solution. All qualified
geographically dispersed cluster solutions appear on the Microsoft Hardware
Compatibility list (HCL) (http://go.microsoft.com/fwlink/?LinkID=67738). Only cluster
solutions listed on the HCL are supported by Microsoft.

Q. What additional requirements are there on a Server cluster to support multiple sites?

A. The Microsoft server clustering software itself is unaware of the extended nature of
geographically dispersed clusters. There are no special features in Server cluster in
Windows 2000 or Windows Server 2003 that are specific to these kinds of configuration.
The network and storage architectures used to build geographically dispersed clusters
must preserve the semantics that the server cluster technology expects. Fundamentally,
the network and storage architecture of geographically dispersed server clusters must
meet the following requirements:

1. The private and public network connections between cluster nodes must appear
as a single, non-routed LAN (e.g., using technologies such as VLANs to ensure that
all nodes in the cluster appear on the same IP subnets).

2. The network connections must be able to provide a maximum guaranteed round


trip latency between nodes of no more than 500 milliseconds. The cluster uses
heartbeat to detect whether a node is alive or not responding. These heartbeats are
56
sent out on a periodic basis. If a node takes too long to respond to heartbeat packets,
the cluster service starts a heavy-weight protocol to figure out which nodes are really
still alive and which ones are dead; this is known as a cluster re-group. The heartbeat
interval is not a configurable parameter for the cluster service (there are many
reasons for this, but the bottom line is that changing this parameter can have a
significant impact on the stability of the cluster and the failover time). 500ms round-
trip is significantly below any threshold to ensure that artificial re-group operations are
not triggered.

3. Windows 2000 requires that a cluster have a single shared disk known as the
quorum disk. The storage infrastructure can provide mirroring across the sites to
make a set of disks appear to the cluster service like a single disk, however, it must
preserve the fundamental semantics that are required by the physical disk resource:
• Cluster service uses SCSI reserve commands and bus reset to arbitrate for
and protect the shared disks. The semantics of these commands must be
preserved across the sites even in the face of complete communication failures
between sites. If a node on site A reserves a disk, nodes on site B should not be
able to access the contents of the disk. These semantics are essential to avoid
data corruption of cluster data and application data.

• The quorum disk must be replicated in real-time, synchronous mode across


all sites. The different members of a mirrored quorum disk MUST contain the
same data.

In Windows Server 2003, you can use either a mirrored/replicated quorum disk or a new
resource Majority Node Set for a multi-site cluster.

Q. What additional requirements are there on applications?

A. As with the Server cluster itself, applications are unaware of the extended nature of
geographically dispersed clusters. There is no topology or configuration information
provided to applications to make them aware of the different sites.

Typically, no changes are required to ensure that an application runs, as expected, on a


geographically dispersed cluster. However, you should check with the application
vendors. In some cases different failure timeout periods may be required since disk
accesses and failover times may be longer due to the extended distance between
clusters and the need to provide mirroring or replication of data between sites.

Quorum
Q. What is the quorum used for?
57
A. Server clusters require a quorum resource to function. The quorum resource, like any
other resource, is a resource which can only be owned by one server at a time, and for
which servers can negotiate for ownership. Negotiating for the quorum resource allows
Server clusters to avoid "split-brain" situations where the servers are active and think the
other servers are down. This can happen when, for example, the cluster interconnect is
lost and network response time is problematic. The quorum resource is used to store the
definitive copy of the cluster configuration so that regardless of any sequence of failures,
the cluster configuration will always remain consistent.

Q. What options do I have for my quorum resource?

A. In Windows 2000, there are two quorum-capable resources:

• Physical Disk Resource


This allows a disk on the shared cluster storage bus to be used as the quorum
resource. The cluster code uses SCSI commands to arbitrate for the quorum disk
ensuring only one node owns the quorum disk at any point in time. This is the
standard quorum resource that Microsoft recommends for all production
Windows 2000 clusters.

Note
Some storage vendors provide their own quorum capable resources for
specific hardware solutions (e.g., IBM Shark Storage) or software solutions
(such as Veritas Volume Manager). You should use these if they are required
for your environment.

• Local Quorum

This quorum-capable resource allows a single node cluster to be setup without


having a second storage bus. This type of cluster is useful for developing cluster-
aware software without having to have a multi-node cluster.Local Quorum can be
used in a production environment if you want to take advantage of the resource
health monitoring and local restart facilities provided by Server cluster on a single
node.

In Windows Server 2003 we have introduced another quorum capable resource type

Majority Node Set

A majority node set is a single quorum resource from a Server cluster perspective;
however, all of the quorum data is actually stored on multiple disks across the cluster.
The majority node set resource takes care to ensure that the cluster configuration data
stored on the majority node set is kept consistent across the different disks.

See What is Majority Node Set? for more details.


58
Q. Can other applications share the quorum disk?

A. Microsoft recommends that you do NOT use the quorum disk for other applications in
the cluster and that the quorum disk is restricted to use by the cluster service itself. If you
use the quorum disk for other applications you should be aware of the following:

• The quorum disk health determines the health of the entire cluster. If the quorum
disk fails, the cluster service will become unavailable on all cluster nodes. The cluster
service checks the health of the quorum disk and arbitrates for exclusive access to
the physical drive using standard I/O operations. These operations are queued to the
device along with any other I/Os to that device. If the cluster service I/O operations
are delayed by extremely heavy traffic, the cluster service will declare the quorum
disk as failed and force a regroup to bring the quorum back online somewhere else in
the cluster. To protect against malicious applications flooding the quorum disk with
I/Os, the quorum disk should be protected. Access to the quorum disk should be
restricted to the local administrator group and the cluster service account.

• If the quorum disk fills up, the cluster service may be unable to log required data.
In this case, the cluster service will fail, potentially on all cluster nodes. To protect
against malicious applications filling up the quorum disk, access should be restricted
to the local administrator group and the cluster service account.

• The cluster service itself will always try to bring the quorum disk back online. In
doing so, it may violate the failover and failback policies assigned to applications in
the same group.

Q. Can a NAS device be used as the shared quorum disk?

A. No, out-of-the-box, the cluster service supports physical disks on the shared cluster
bus or in Windows Server 2003 Majority Node Set quorum resources.

Q. What is Majority Node Set?

A. A majority node set is a single quorum resource from a Server cluster perspective;
however, all of the quorum data is actually stored on multiple disks across the cluster.
The majority node set resource takes care to ensure that the cluster configuration data
stored on the majority node set is kept consistent across the different disks. This allows
cluster topologies as follows:
59

The disks that make up the majority node set could, in principle be local disks physically
attached to the nodes themselves or disks on a shared storage fabric. In the majority
node set implementation that is provided as part of Server clusters in
Windows Server 2003, every node in the cluster uses a directory on its own local system
disk to store the quorum data. If the configuration of the cluster changes, that change is
reflected across the different disks. The change is only considered to have been
committed (i.e. made persistent), if that change is made to:

(<Number of nodes configured in the cluster>/2) + 1

This ensures that a majority of the nodes have an up-to-date copy of the data. The cluster
service itself will only start up, and therefore bring resources on line, if a majority of the
nodes configured as part of the cluster are up and running the cluster service. If there are
fewer nodes, the cluster is said not to have quorum and therefore the cluster service
waits (trying to restart) until more nodes try to join. Only when a majority or quorum of
nodes, are available, will the cluster service start up the resources be brought online.
This way, since the up-to-date configuration is written to a majority of the nodes,
regardless of node failures, the cluster will always guarantee that it starts up with the
latest and most up-to-date configuration.

Cluster-Aware Applications
Q. What is a Cluster Aware Application?

A. A cluster-aware application is an application that calls the cluster APIs to determine the
context under which it is running (such as the virtual server name etc.) and can failover
between nodes for high availability.

Can applications that were not written for a cluster be made highly available?
60
Yes, Server clusters provide a plug-in environment that allows resource dlls to provide the
necessary control and health monitoring functions to make existing applications highly
available.

Server clusters provide a set of generic resource types that can be used to make existing
applications failover in a cluster. In Windows 2000 there are two generic resource types:

Generic application allows any application to be started,


stopped and monitored by the cluster
service

Generic Service allows an existing Windows Service to be


started, stopped and monitored.

These generic services provide very rudimentary health monitoring (for example, is the
process that was started still a valid process on the system). It does not check that the
application is servicing requests since this requires specific knowledge of the application.
The generic resources can be used to make applications failover relatively quickly;
however, to provide a more appropriate health check, Microsoft recommends that you
build an application-specific resource dll.

In Windows Server 2003, we have provided an additional resource type (Generic Script)
that allows the start/stop and health monitoring functions to be implemented as scripts
rather than using C or C++. This makes the job of building application-specific resource
plug-ins much more manageable and easier.

Q. How do I build a cluster-aware application?

A. Server clusters provides a rich API set that allows applications to recognize and utilize
the cluster environment. These APIs are fully documented in the Platform SDK.

Q. Should I use the Generic Service or Generic Application resource to make my


application highly available?

A. The generic services provide very rudimentary health monitoring (for example, is the
process that was started still a valid process on the system). It does not check that the
application is servicing requests since this requires specific knowledge of the application.
The generic resources can be used to make applications failover relatively quickly;
however, to provide a more appropriate health check, Microsoft recommends that you
build an application-specific resource dll.

Q. Does Microsoft validate or logo software products that work with Server clusters?

A. Yes, Server clustering is an optional component of the Windows Advanced Server logo
program. Applications can be logoed as working on a Server cluster.
61
Q. What Microsoft Applications are cluster-aware?

A. The following services shipped as part of the Windows operating system are cluster-
aware:

• DHCP Highly available DHCP server

• MSDTC Highly available distributed transaction coordinator

• IIS Highly available web server and FTP server*

• File Share Highly available file share service and DFS

• Message Queue Highly available MSMQ service

• MSMQ triggers Highly available MSMQ trigger service (new for


Windows Server 2003)

• Print Spooler Highly available printer service

• WINS Highly available WINS service

The following additional Microsoft Products are cluster-aware:

• SQL Server 6.5, 7.0, 2000 and upwards

• Exchange Server 5.5 and upwards

• Services for Unix 3.0 and upwards

Q. Does Exchange 2000 support active/active clusters?

A. Yes, there are some caveats to supporting Exchange 2000 in an active/active


configuration.

Q. Does SQL Server support active/active clusters?

A. Yes, SQL Server allows the following:

• Multiple nodes in a cluster hosting different databases. Each database can be


failed over independently.

• Multiple nodes in a cluster hosting partitions of a single database using database


views to tie the different instances into a single logical database from a client
perspective.

Q. Where do I find information about writing cluster-aware applications?

A. The cluster concepts and APIs are fully documented in the Platform SDK. In addition,
there are several examples in the Platform SDK that can be used to demonstrate Server
cluster integration.

Q. Is Services for Macintosh (SFM) supported in a Server cluster?


62
A. No, Services for Macintosh is not supported in a Server cluster.

Q. Is Services for Unix (SFU) supported in a Server cluster?

A. Yes, Services for Unix supports highly available NFS shares in SFU 3.0.

Miscellaneous Topics
Q. Can Server clusters and Network Load Balancing be used on the same set of
servers?

A. No, Microsoft Server clusters (MSCS) and Network Load Balancing (NLB) are not
supported on the same set of nodes. Both Server clusters and Network Load Balancing
clusters control and configure network adapters. Since they are not aware of each other,
configuring one can interfere with the other.

Q. Can I use antivirus software with a Server cluster?

A. Yes, you should make sure that the vendor has tested their solution in a Server cluster
environment. Antivirus software typically layers into the storage stack as a disk driver.
This can have an impact on the clusters ability to failover a disk if the driver does not
support the required features.

Q. Does Microsoft provide a Distributed Lock Manager (DLM)?

A. No, at this time, Microsoft has no plans to release a distributed lock manager,
however, that may change if there is sufficient customer demand for a DLM service.

Note
Do not confuse a distributed lock manager with a cluster file system. A cluster file
system can be built in a number of ways, using a lock manager is one of them.
However, just providing a lock manager does not solve the cluster file system
problem.

Q. Will Microsoft provide a shared disk cluster file system?

A. Microsoft is continually looking at ways to improve the services it provides on the


Windows operating system. A shared disk cluster file system is a way to provide a
number of attributes in a cluster:

• A single file system namespace visible to all applications on all nodes in the
cluster

• High speed access to disks from any node in the cluster

Q. How do Server clusters and Fault tolerant servers relate?


63
A. Server clusters address a number of availability issues:

• Hardware failures

• Operating system failures

• Application failures

• Site failures

• Operating system and application upgrades

Fault tolerant servers provide an extremely reliable server platform that addresses
hardware failures by providing redundancy at the hardware level. In some cases, fault
tolerant servers can be used to address site failures.
In and of themselves, fault tolerant servers do not address issues relating to operating
system and application monitoring and failure recovery, nor do they address upgrading
the operating system or applications without taking the service down.

In summary, Server clusters are about providing high availability, fault tolerant servers
provide high reliability. By combining highly reliable, fault tolerant servers with Server
clusters, you get the best of both worlds; a set of reliable servers that can provide high
availability in the face of operating system and application failures and upgrades.

Q. How do I find Server cluster KB Articles?

A. All cluster KB articles can be found at http://go.microsoft.com/fwlink/?LinkID=538

• All Windows NT 4.0 related articles have keyword "MSCS"

• All Windows 2000 articles have keyword "W2000MSCS"

This should allow easy selection of Server cluster related articles.

You might also like