KEMBAR78
RHEL Multipathing Basics | PDF | Computing | Operating System Technology
0% found this document useful (0 votes)
124 views23 pages

RHEL Multipathing Basics

This document discusses two basic types of algorithms used for NIC Teaming - switch-dependent and switch-independent. Switch-dependent algorithms require participation of the switch and all NICs to be connected to the same switch. Switch-independent algorithms do not require switch participation and NICs can be connected to different switches. Common switch-dependent algorithms are Generic/Static Teaming and Dynamic Link Aggregation Control Protocol (LACP) teaming.

Uploaded by

iftikhar ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views23 pages

RHEL Multipathing Basics

This document discusses two basic types of algorithms used for NIC Teaming - switch-dependent and switch-independent. Switch-dependent algorithms require participation of the switch and all NICs to be connected to the same switch. Switch-independent algorithms do not require switch participation and NICs can be connected to different switches. Common switch-dependent algorithms are Generic/Static Teaming and Dynamic Link Aggregation Control Protocol (LACP) teaming.

Uploaded by

iftikhar ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

There are two basic sets of algorithms that are used for NIC Teaming:

a) Algorithms that require the switch to participate in the teaming also known as switch-
dependent modes. These algorithms usually require all the network adapters of the team to be
connected to the same switch.

b) Algorithms that do not require the switch to participate in the teaming also referred to as the
switch-independent modes. Because the switch does not know that the network adapter is part
of a team, the team network adapters can be connected to different switches. Switch-
independent modes do not require that the team members connect to different switches, they
merely make it possible.

Switch-dependent modes:

There are two common choices for switch-dependent modes of NIC Teaming:

1. Generic or Static Teaming - This mode requires configuration on the switch and the computer
to identify which links form the team. Because this is statically configured solution, no
additional protocol assists the switch and the computer to identify incorrectly plugged cables or
other errors that could cause the team to fail.

2. Dynamic teaming (LACP) - The Link Aggregation Control Protocol (LACP) to dynamically
identify links between the computer and a specific switch. This enables the automatic creation
of a team and, in theory, the expansion and reduction of a team simply by a transmission or
receipt of LACP from teh peer network adapter.

RHEL: Multipathing basics


Tested on RHEL 5 & 6

# DM-Multipath is a feature of Red Hat from RHEL 5 on and can be used to


provide:
#
# Redundancy: DM-Multipath can provide failover in an active/passive
configuration. In an
#   active/passive configuration, only half the paths are used at any time for
I/O. If any
#   element of an I/O path (the cable, switch, or controller) fails, DM-
Multipath switches
#   to an alternate path.
#   In an active/active configuration all the paths are used in a round-robin
fashion.
#
# Improved Performance: DM-Multipath can be configured in active/active mode,
where I/O is
#   spread over the paths in a round-robin fashion. In some configurations,
DM-Multipath
#   can detect loading on the I/O paths and dynamically re-balance the load.

# By default, DM-Multipath includes support for the most common storage arrays
that support
# multipathing. The supported devices can be found in the
multipath.conf.defaults file. If
# your storage array supports DM-Multipath and is not configured by default in
this file,
# you may need to add it to the config file.

# DM-Multipathing components
#
------------------------------------------------------------------------------
------------

# - dm-multipath kernel module: Reroutes I/O and supports failover for paths
and path
#   groups.
# - multipathd daemon: Monitors paths; as paths fail and come back, it may
initiate path
#   group switches. Provides for interactive changes to multipath devices.
This must be
#   restarted for any changes to the /etc/multipath.conf file.
# - multipath command: Lists and configures multipath devices. Normally
started up with
#   /etc/rc.sysinit, it can also be started up by a udev program whenever a
block device
#   is added or it can be run by the initramfs file system.
# - kpartx command: Creates device mapper devices for the partitions on a
device It is
#   necessary to use this command for DOS-based partitions with DM-MP. The
'kpartx' is
#   provided in its own package, but the device-mapper-multipath package
depends on it.

# DM-Multipathing config files


#
------------------------------------------------------------------------------
------------

# - /etc/multipath.conf: Main configuration file.


# - /usr/share/doc/device-mapper-multipath-X.X.X/multipath.conf.defaults:
Lists support
#   storage arrays, if your array is not listed it still may be possible to
configure it
#   in the multipath.conf file .
# - /var/lib/multipath/bindings: This file is automatically maintained by the
multipath
#   program. It relates user-friendly names and device WWIDs.
# Each multipath device has a World Wide Identifier (WWID), which is
guaranteed to be
# unique and unchanging. By default the name of multipath device is set to its
WWID but
# there is an option in /etc/multipath.conf, "user_friendly_names" which sets
the alias
# to a node-unique name of the form of mpathX:

   ## Use user friendly names, instead of using WWIDs as names.


   defaults {
      user_friendly_names yes
      bindings_file /etc/multipath_bindings
   }

# DM-Multipathing devices
#
------------------------------------------------------------------------------
------------

# multipath creates three different ways to access the device:


# /dev/mapper/mpathX: These are create early in the boot sequence, thus these
are ideal
#    for logical volumes, boot devices
# /dev/mpathX: Are provided as a convenience so that all multipathed devices
can be seen
#    in one directory. These devices are created by the udev device manager
and may not
#   be available during startup.
# /dev/dm-X: These are for external use only.

# There is a fourth option that consists of setting an alias in the


/etc/multipath.conf file.

# DM-Multipath setup
#
------------------------------------------------------------------------------
------------

# - Install the device-mapper-multipath rpm
# - Edit the /etc/multipath.conf configuration file:
#      - comment out the default blacklist or create you own exclude blacklist
#      - change any of the default (if required)
# - Start the multipath daemons
# - Create the multipath device with the multipath command

# Basic multipath.conf file
#
------------------------------------------------------------------------------
------------

# We can create the initial configuration file by running following command:

mpathconf --enable
#   - the default section configures the multipath to use friendly names,
there are a
#      number of other options that can be used.
#   - the blacklist section excludes specific disks from being multipathed,
notice the
#      exclusion of all wwid disks
#   - the blacklist exceptions section includes the devices with a specific
wwid to be
#      included
#   - the multipaths section creates aliases that match a specific disk to a
alias using
#      the wwid

   multipath.conf (basic)     defaults {


      user_friendly_names yes
      path_group_policy failover
   }

   blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
      devnode "^(hd|xvd|vd)[a-z]*"
      wwid "*"
   }

   # Make sure our multipath devices are enabled.

   blacklist_exceptions {
      wwid "20017580006c00034"
      wwid "20017580006c00035"
      wwid "20017580006c00036"
      wwid "20017580006c00037"
   }

   multipaths {
     multipath {
         wwid "20017580006c00034"
         alias mpath0
     }
     multipath {
         wwid "20017580006c00035"
         alias mpath1
     }
     multipath {
         wwid "20017580006c00036"
         alias mpath2
     }
     multipath {
         wwid "20017580006c00037"
         alias mpath3
     }
   }

# Once multipath.conf configured, perform following steps to start multipathd:

modprobe dm-multipath
service multipathd start

multipath -d
# This will perform a dry to make sure everything is ok. Fix anything that
# appears as a problem.

multipath -v2
# Commits the configuration

multipath -ll

chkconfig multipathd on
# Make devices to be configured after a reboot

# Now, we should see something similar to the output below, each device is
active and ready.

multipath -ll |grep mpath


   mpath2 (20017580006c00036) dm-7 IBM,2810XIV
   mpath1 (20017580006c00035) dm-6 IBM,2810XIV
   mpath0 (20017580006c00034) dm-5 IBM,2810XIV
   mpath3 (20017580006c00037) dm-8 IBM,2810XIV

# Following example shows connections to a HP EVA

multipath -ll
   mpath2 (360060e80057110000000711000005405) dm-8 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:1:0 sdc 8:32  [active][ready]
    \_ 3:0:2:0 sdn 8:208 [active][ready]
   mpath1 (360060e8005711000000071100000810a) dm-7 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:0:1 sdb 8:16  [active][ready]
    \_ 3:0:0:1 sdl 8:176 [active][ready]
   mpath0 (360060e80057110000000711000002206) dm-6 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:0:0 sda 8:0   [active][ready]
    \_ 3:0:0:0 sdk 8:160 [active][ready]
   mpath9 (360060e80057110000000711000005306) dm-15 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:7:0 sdj 8:144 [active][ready]
    \_ 3:0:4:0 sdp 8:240 [active][ready]
   mpath8 (360060e80057110000000711000008305) dm-14 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:6:1 sdi 8:128 [active][ready]
    \_ 3:0:5:1 sdr 65:16 [active][ready]
   mpath7 (360060e80057110000000711000002506) dm-13 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:6:0 sdh 8:112 [active][ready]
    \_ 3:0:5:0 sdq 65:0  [active][ready]
   mpath6 (360060e80057110000000711000007408) dm-12 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:5:0 sdg 8:96  [active][ready]
    \_ 3:0:6:0 sds 65:32 [active][ready]
   mpath5 (360060e80057110000000711000002305) dm-11 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:4:0 sdf 8:80  [active][ready]
    \_ 3:0:7:0 sdt 65:48 [active][ready]
   mpath4 (360060e80057110000000711000006207) dm-10 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:3:0 sde 8:64  [active][ready]
    \_ 3:0:3:0 sdo 8:224 [active][ready]
   mpath3 (360060e80057110000000711000000409) dm-9 HP,OPEN-V
   [size=408G][features=1 queue_if_no_path][hwhandler=0][rw]
   \_ round-robin 0 [prio=2][active]
    \_ 2:0:2:0 sdd 8:48  [active][ready]
    \_ 3:0:1:0 sdm 8:192 [active][ready]

# If you have made a mistake in the multipath.conf file use following steps to
correct it:
vi /etc/multipath.conf
service multipathd reload
multipath -F
multipath -d
multipath -v2

# It may be that the array we have is not in the multipath.conf.defaults file.


We can add a
# device section (check manufacture's documentation). Below is an example of a
HP OPEN-V
# series array.

    device {
        vendor "HP"
        product "OPEN-.*"
        getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
        hardware_handler "0"
        path_selector "round-robin 0"
        path_grouping_policy multibus
        failback immediate
        rr_weight uniform
        no_path_retry 12
        rr_min_io 1000
        path_checker tur
    }

# Advanced multipath.conf file


#
------------------------------------------------------------------------------
------------
# The configuration file is divided into the following sections:

#   - defaults: general setup parameters


#   - blacklist: lists specific devices to exclude from multipathing
#   - blacklist exceptions: lists devices that would otherwise be excluded
#   - multipaths: settings for the characteristics of individual multipath
devices
#   - devices: settings for non-default storage arrays

# We can blacklist any device but we need to tell multipath what to exclude.
Some examples:

# wwid
# ---------------------------------

# Specific wwid
   blacklist {
      wwid "20017580006c00034"
   }

# All wwid
   blacklist {
      wwid "*"
   }

# device name
# ---------------------------------

# All sd devices From "a" to "z"


   blacklist {
      devnode "^sd[a-z]"
   }

# A more advanced example


   blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
      devnode "^(hd|xvd|vd)[a-z]*"
   }

# device type
# ---------------------------------

# Blacklist HP devices
   blacklist {
      device {
         vendor "HP"
         product "*"
      }
   }

# To exclude from the blacklist we create an exception list

# wwid
# ---------------------------------
# Exclude a specific wwid
   blacklist_exceptions {
      wwid "20017580006c00034"
   }

# Exclude all wwid


   blacklist_exceptions {
      wwid "*"
   }

# device name
# ---------------------------------

# All sd devices x through z


   blacklist_exceptions {
      devnode "^sd[x-z]"
   }

# device type
# ---------------------------------

# Exclude HP devices
   blacklist_exceptions {
      device {
         vendor "HP"
         product "*"
      }
   }

# The default section has a number of parameters which can be changed

# Parameter                  Default Value              Description


# ----------                 --------------            
--------------------------------------------------
#
# udev_dir                   /udev                      Specifies the
directory where udev device nodes are created.
#
# verbosity                  2                          (RHEL 5.3 and later)
Specifies the verbosity level
#                                                       of the command. It can
be overridden by the -v command line option.
#
# polling_interval           5                          Specifies the interval
between two path checks in seconds.
#
# selector                   round-robin 0              Specifies the default
algorithm to use in determining what
#                                                       path to use for the
next I/O
#                                                       operation.
#
# path_grouping_policy       failover                   Specifies the default
path grouping policy to apply to
#                                                       unspecified
multipaths.
#
#                                                       Possible values
include:
#                                                         failover = 1 path
per priority group
#                                                         multibus = all valid
paths in 1 priority group
#                                                         group_by_serial = 1
priority group per detected serial number
#                                                         group_by_prio = 1
priority group per path priority value
#                                                         group_by_node_name =
1 priority group per target node name
#
# getuid_callout        /sbin/scsi_id -g -u -s          Specifies the default
program and arguments to call out to obtain a
#                                                       unique path
identifier. An absolute path is required.
#
# prio_callout                                          Specifies the the
default program and arguments to call out to
#                                                       obtain a path weight.
Weights are summed for each path group to
#                                                       determine the next
path group to use in case of failue. "none" is a
#                                                       valid value.
#
# path_checker              readsector0                 Specifies the default
method used to determine the state of the
#                                                       paths. Possible values
include: readsector0, rdac, tur, cciss_tur,
#                                                       hp_tur (RHEL 5.5 and
later), emc_clariion, hp_sw, and directio.
#
# features                                              The extra features of
multipath devices. The only existing feature
#                                                       is queue_if_no_path,
which is the same as setting no_path_retry to
#                                                       queue.
#
# rr_min_io                 1000                        Specifies the number
of I/O requests to route to a path before
#                                                       switching to the next
path in the current path group.
#
# max_fds                                               (RHEL 5.2 and later)
Sets the maximum number of open file descriptors
#                                                       for the multipathd
process. In RHEL 5.3, this option allows a value of
#                                                       max, which sets the
number of open file descriptors to the system maximum.
#
# rr_weight                 uniform                     If set to priorities,
then instead of sending rr_min_io requests to a
#                                                       path before calling
selector to choose the next path, the number of
#                                                       requests to send is
determined by rr_min_io times the path's priority,
#                                                       as determined by the
prio_callout program. Currently, there are priority
#                                                       callouts only for
devices that use the group_by_prio path grouping policy,
#                                                       which means that all
the paths in a path group will always have the same
#                                                       priority.
#                                                       If set to uniform, all
path weights are equal.
#
# failback                  manual                      Specifies path group
failback. A value of 0 or immediate specifies that
#                                                       as soon as there is a
path group with a higher priority than the current
#                                                       path group the system
switches to that path group. A numeric value greater
#                                                       than zero specifies
deferred failback, expressed in seconds. A value of
#                                                       manual specifies that
failback can happen only with operator intervention.
#
# no_path_retry             null                        A numeric value for
this attribute specifies the number of times the
#                                                       system should attempt
to use a failed path before disabling queueing
#                                                       A value of fail
indicates immediate failure, without queuing.
#                                                       A value of queue
indicates that queuing should not stop until the path
#                                                       is fixed.
#
# flush_on_last_del         no                          (RHEL 5.3 and later)
If set to yes, the multipathd daemon will disable
#                                                       queueing when the last
path to a device has been deleted.
#
# queue_without_daemon      yes                         (RHEL 5.3 and later)
If set to no, the multipathd daemon will disable
#                                                       queueing for all
devices when it is shut down.
#
# user_friendly_names       no                          If set to yes,
specifies that the system should using the bindings file
#                                                       to assign a persistent
and unique alias to the multipath, in the form of
#                                                       mpathn. The default
location of the bindings file is /var/lib/multipath/bindings,
#                                                       but this can be
changed with the bindings_file option. If set to no, specifies
#                                                       that the system should
use use the WWID as the alias for the multipath. In
#                                                       either case, what is
specified here will be overriden by any device-specific
#                                                       aliases you specify in
the multipaths section of the configuration file.
#
# bindings_file   /var/lib/multipath/bindings           (RHEL 5.2 and later)
The location of the bindings file that is used with the
#                                                       user_friend_names
option.
#
# mode               The default value is               (RHEL 5.3 and later)
The mode to use for the multipath device nodes, in octal.
#                    determined by the process.
#
# uid                The default value is               (RHEL 5.3 and later)
The user ID to use for the multipath device nodes. You
#                    determined by the process.         must use the numeric
user ID.
#
# gid                The default value is               (RHEL 5.3 and later)
The group ID to use for the multipath device nodes. You
#                    determined by the process.         must use the numeric
group ID.
#
# checker_timeout    The default value is taken from    (RHEL 5.5 and later)
The timeout value to use for path checkers that issue
#                    sys/block/sdx/device/timeout.      SCSI commands with an
explicit timeout, in seconds. 

# The multipaths section parameters are as follows


#
# Parameter                  Description
# ----------                 ---------------------------------------
#
# wwid                       Specifies the WWID of the multipath device to
which the multipath attributes apply.
#
# alias                      Specifies the symbolic name for the multipath
device to which the multipath attributes apply.
#
# path_group_policy
# prio_callout
# path_selector
# failback
# rr_weight                  The same as the defaults table
# no_path_retry
# flush_on_last_del
# rr_min_io
# mode
# uid_gid

# The devices section parameters are as follows


#
# Parameter                  Description
# ----------                 ---------------------------------------
# vendor                     Specifies the vendor name of the storage device
to which the device attributes apply, for example
#                            COMPAQ.
#
# product                    Specifies the product name of the storage device
to which the device attributes apply, for example
#                            HSV110 (C)COMPAQ.
#
# path_checker               Specifies the default method used to determine
the state of the paths. Possible values include
#                            readsector0, rdac, tur, cciss_tur, hp_tur,
emc_clariion, hp_sw, and directio.
#
# features                   The extra features of multipath devices. The only
existing feature is queue_if_no_path, which is
#                            the same as setting no_path_retry to queue.
#
# hardware_handler           Specifies a module that will be used to perform
hardware specific actions when switching path groups
#                            or handling I/O errors. Possible values include
0, 1 emc, and 1 rdac. The default value is 0.
#
# product_backlist           Specifies a regular expression used to blacklist
devices by product.
#
# path_group_policy
# getuid_callout
# prio_callout
# path_selector
# failback                   The same as defaults table
# rr_weight
# no_path_retry
# flush_on_last_del
# rr_min_io

1. Overview

The connection from the server through the HBA to the storage controller is referred as
a path. When multiple paths exists to a storage device(LUN) on a storage subsystem, it
is referred as multipath connectivity. It is a enterprise level storage capability. Main
purpose of multipath connectivity is to provide redundant access to the storage devices,
i.e to have access to the storage device when one or more of the components in a path
fail. Another advantage of multipathing is the increased throughput by way of load
balancing.

  Note: Multipathing protects against the failure of path(s) and not the failure of
a specific storage device.

Common example of multipath is a SAN connected storage device. Usually one or more
fibre channel HBAs from the host will be connected to the fabric switch and the storage
controllers will be connected to the same switch.
A simple example of multipath could be: 2 HBAs connected to a switch to which the
storage controllers are connected. In this case the storage controller can be accessed
from either of the HBAs and hence we have multipath connectivity.

In the following diagram each host has 2 HBAs and each storage has 2 controllers. With
the given configuration setup each host will have 4 paths to each of the LUNs in the
storage.

In Linux, a SCSI device is configured for a LUN seen on each path. i.e, if a LUN has 4
paths, then one will see four SCSI devices getting configured for the same
device. Doing I/O to a LUN in a such an environment is unmanageable

 applications/administrators do not know which SCSI device to use


 all applications consistently using the same device
 in case of a path failure, knowledge to retry the I/O on a different path
 always using the storage device specific preferred path
 spreading I/O between multiple valid paths
1.1. Device Mapper

Device mapper is a block subsystem that provides layering mechanism for block
devices. One can write a device mapper to provide a specific functionality on top of a
block device.

Currently the following functional layers are available:

 concatenation
 mirror
 striping
 encryption
 flaky
 delay
 multipath

Multiple device mapper modules can be stacked to get the combined functionality.

Click here for more information on device mapper.

1.2. Device Mapper Multipathing

Object of this document is to provide details on device mapper multipathing (DM-MP).


DM-MP resolves all the issues that arise in accessing a multipathed device in Linux. It
also provides a consistent user interface for storage devices provided by multiple
vendors. There is only one block device (/dev/mapper/XXX) for a LUN. This is the
device created by device mapper.

Paths are grouped into priority groups, and one of the priority group will be used for I/O,
and is called active. A path selector selects a path in the priority group to be used for an
I/O based on some load balancing algorithm (for example round-robin).

When a I/O fails in a path, that path gets disabled and the I/O is retried in a different
path in the same priority group. If all paths in a priority group fails, a different priority
group which is enabled will be selected to send I/O.

DM-MP consists of 4 components:

1. DM MP kernel module - Kernel module that is responsible for making the


multipathing decisions in normal and failure situations.
2. multipath command - User space tool that allows the user with initial
configuration, listing and deletion of multipathed devices.
3. multipathd daemon - User space daemon that constantly monitors the paths. It
marks a path as failed when it finds the path faulty and if all the paths in a priority
group are faulty then it switches to the next enable priority group. It keeps
checking the failed path, once the failed path comes alive, based on the failback
policy, it can activate the path. It provides an CLI to monitor/manage individual
paths. It automatically creates device mapper entries when new devices comes
into existence.
4. kpartx - User space command that creates device mapper entries for all the
partitions in a multipathed disk/LUN. When the multipath command is invoked,
this command automatically gets invoked. For DOS based partitions this
command need to be run manually.
2. Terminology, Concepts and Usage

2.1. Output of multipath command

Standard output of multipath command

# multipath -ll
mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815 FAStT
[size=512M][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
\_ 29:0:0:1 sdf 8:80 [active][ready]
\_ 28:0:1:1 sdl 8:176 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 28:0:0:1 sdb 8:16 [active][ghost]
\_ 29:0:1:1 sdq 65:0 [active][ghost]

Annotated output of multipath command

mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815 FAStT


------ --------------------------------- ---- --- ---------------
| | | | |-------> Product
| | | |------------------> Vendor
| | |-----------------------> sysfs name
| |-------------------------------------------------> WWID of the device
|------------------------------------------------------ ----------> User defined Alias name

[size=512M][features=1 queue_if_no_path][hwhandler=1 rdac]


--------- --------------------------- ----------------
| | |--------------------> Hardware Handler, if any
| |---------------------------------------------> Features supported
|---------------------------------------------------------------> Size of the DM device

Path Group 1:
\_ round-robin 0 [prio=6][active]
-- ------------- ------ ------
| | | |----------------------------------------> Path group state
| | |-----------------------------------------------> Path group priority
| |--------------------------------------------------------------> Path selector and repeat count
|-------------------------------------------------------------------> Path group level

First path on Path Group 1:


\_ 29:0:0:1 sdf 8:80 [active][ready]
-------- --- ---- ------ -----
| | | | |---------------------------------> Physical Path state
| | | |----------------------------------------> DM Path state
| | |-------------------------------------------------> Major, minor numbers
| |-------------------------------------------------------> Linux device name
|--------------------------------------------------------------> SCSI information: host, channel,
scsi_id and lun

Second path on Path Group 1:


\_ 28:0:1:1 sdl 8:176 [active][ready]

Path Group 2:
\_ round-robin 0 [prio=0][enabled]
\_ 28:0:0:1 sdb 8:16 [active][ghost]
\_ 29:0:1:1 sdq 65:0 [active][ghost]

2.2. Terminology

Path
Connection from the server through a HBA to a specific LUN. Without DM-MP, each
path would appear as a separate device.
Path Group

Paths are grouped into a path groups. At any point of time only path group will be
active. Path selector decides which path in the path group gets to send the next
I/O. I/O will be sent only to the active path.
Path Priority

Each path has a specific priority. A priority callout program provides the priority
for a given path. The user space commands use this priority value to choose an
active path. In the group_by_prio path grouping policy, path priority is used to
group the paths together and change their relative weight with the round
robin path selector.
Path Group Priority
Sum of priorities of all non-faulty paths in a path group. By default, the multipathd
daemon tries to keep the path group with the highest priority active.
Path Grouping Policy
Determines how the path group(s) are formed using the available paths. There are five
different policies:

1. multibus: One path group is formed with all paths to a LUN. Suitable for
devices that are in Active/Active mode.
2. failover: Each path group will have only one path.
3. group_by_serial: One path group per storage controller(serial). All paths
that connect to the LUN through a controller are assigned to a path group.
Suitable for devices that are in Active/Passive mode.
4. group_by_prio: Paths with same priority will be assigned to a path group.
5. group_by_node_name: Paths with same target node name will be assigned to a
path group.

 Setting multibus as path grouping policy for a storage device


in Active/Passive mode will reduce the I/O performance.
Path Selector
A kernel multipath component that determines which path will be chosen for the next I/O.
Path selector can have an appropriate load balancing algorithm. Currently one one path
selector exists, which is the round-robin.
Path Checker
Functionality in the user space that is used to check the availability of a path. This is
implemented as a library function that is used by both multipath command and the
multipathd daemon. Currently, there are 3 path checkers:

1. readsector0: sends a read command to sector 0 at regular time interval.


Produce lot of error messages in Active/Passive mode. Hence, suitable
only for Active/Active mode.
2. tur: sends a test unit ready command at regular interval.
3. rdac: specific to the lsi-rdac device. Sends a inquiry command and sets the
status of the path appropriately.

Path States

This refers to the physical state of a path. A path can be in one of the following
states:
1. ready: Path is up and can handle I/O requests.
2. faulty: Path is down and cannot handle I/O requests.
3. ghost: Path is a passive path. This state is shown in the passive path
in Active/Passive mode.
4. shaky: Path is up, but temporarily not available for I/O requests.

DM Path States
This refers to the DM module(kernel)'s view of the path's state. It can be in one of the
two states:
1. active: Last I/O sent to this path successfully completed. Analogous
to ready path state.
2. failed: Last I/O to this path failed. Analogous to faulty path state.
Path Group State
Path Groups can be in one of the following three states:

1. active: I/O will be sent to the multipath device will be sent to this path group. Only one
path group will be in this state.
2. enabled: If none of the paths in the active path group is in the ready state, I/O will
be sent these path groups. There can be one or more path groups in this state.
3. disabled: In none of the paths in the active path group and enabled path group is in the
ready state. I/O will be sent to these path groups. There can be one or more path groups
in this state. This state is available only for certain storage devices.

UID Callout (or) WWID Callout


A standalone program that returns a globally unique identifier for a path.
multipath/multipathd invokes this callout and uses the ID returned to coalesce multiple
paths to a single multipath device.
Priority Callout
A standalone program that returns the priority for a path. multipath/multipathd invokes
this callout and uses the priority value of the paths to determine the active path group.
Hardware Handler
Kernel personality module for storage devices that needs special handling. This module
is responsible for enabling a path (at the device level) during initialization, failover and
failback. It is also responsible for handling device specific sense error codes.
Failover

When all the paths in a path group are in faulty state, one of the enabled path
group (path with highest priority) with any paths in ready state will be
made active. If there is no paths in ready state in any of the enabled path groups,
then one of the disabled path group (path with highest priority) will be made
active. Making a new path group active is also referred as switching of path
group. Original active path group's state will be changed to enabled.
Failback

A failed path can become active at any point of time. multipathd keeps checking


the path. Once it finds a path is active, it will change the state of the path
to ready. If this action makes one of the enabled path group's priority to be higher
than the current active path group, multipathd may choose to failback to the
highest priority path group.
Failback Policy

Under failback situations multipathd can do one of the following three things:


1. immediate: Immediately failback to the highest priority path group.
2. # of seconds: Wait for the specified number of seconds, for I/O to stabilize, then
failback to the highest priority path group.
3. do nothing: Do nothing, user explicitly fails back to the highest priority path group.

This policy selection can be set by the user through /etc/multipath.conf.


Active/Active
Storage devices with 2 controller can be configured in this mode. Active/Active means
that both the controllers can process I/Os.
Active/Passive
Storage devices with 2 controller can be configured in this mode. Active/Passive means
that one of the controllers(active) can process I/Os, and the other one(passive) is in a
standby mode. I/Os to the passive controller will fail.
Alias

A user friendly and/or user defined name for a DM device. By default, WWID is
used for the DM device. This is the name that is listed in /dev/disk/by-
name directory. When the user_friendly_names configuration option is set, the
alias of a DM device will have the form of mpath<n>. User also has the option of
setting a unique alias for each multipath device.

2.3. Configuration File (/etc/multipath.conf)

DM-Multipath allows many of the feature to be user configurable using the configuration
file /etc/multipath.conf. multipath command and multipathd uses the configuration
information from this file. This file is consulted only during the configuration of multipath
devices. In other words, if the user makes any changes to this file, then
the multipath command need to be rerun to configure the multipath devices (i.e the user
has to do multipath -F followed by multipath).

Support for many of the devices (as listed below) is inbuilt in the user space component
of DM-Multipath. If the support for a specific storage device is not inbuilt or the user
wants to override some of the values only then the user need to modify this file.

This file has 5 sections:

1. System level defaults ("defaults"): Where the user can specify system level
default override.
2. Black listed devices ("blacklist"): User can specify the list of devices they do not
want to be under the control of DM-Multipath. These devices will be excluded.
3. Black list exceptions ("blacklist_exceptions"): Specific devices to be treated as
multipath candidates even if they exist in the blacklist.
4. Storage controller specific settings ("devices"): User specified configuration
settings will be applied to devices with specified "Vendor" and "Product"
information.
5. Device specific settings ("multipaths"): User can fine tune configuration settings
for individual LUNs.

User can specify the values for the attributes in this file using regular expression syntax.

For detailed explanation of the different attributes and allowed values for the attributes
please refer to multipath.conf.annotated file.

 In Mainline, this file is located in the root directory of multipath-tools.


 In RedHat, this file is located in the directory /usr/share/doc/device-mapper-
multipath-X.Y.Z/.
 In SuSE, this file is located in the directory /usr/share/doc/packages/multipath-tools/

2.3.1. Attribute value overrides

Attribute values are set at multiple levels (internally in multipath tools and through
multipath.conf file). Following is the order in which the attribute values will be
overwritten.

1. Global internal defaults, as specified in the man page of multipath.conf.


2. Device specific internal defaults, as defined in libmultipath/hwtable.c.
3. Items described in defaults section of /etc/multipath.conf.
4. Items defined in device section of /etc/multipath.conf.
o  Note that this will completely overwrite configuration information defined
in (2) above. So, if even if you want to change/add only one attribute one
have to provide the whole list for a device.
5. Items defined in multipaths section of /etc/multipath.conf.

2.4. multipath, multipathd command usage

Man page of multipath/multipathd provides good details on the usage of the tools.

multipathd has a interactive mode option which can be used for querying and managing
the paths and also to check the configuration details that will be used.

When multipathd is running, one has to invoke multipathd with the command
line multipathd -k. multipathd will enter into a command line mode where user can
invoke different commands. Checkout the man page for different commands.
3. Supported Storage Devices

This is the list of devices that have configuration information built-in in the multipath
tools. Not being in this list does not mean that the specific device is not supported, it just
means that there is no built-in configuration in the multipath tools.

Some of the devices do need a hardware handler which need to compiled in the kernel.
The device being in this list does mean that the hardware handler is present in the
kernel. It is possible that the hardware handler is present in the kernel but the device is
not added in the list of supported built-in devices.

6. Tips and Tricks

1. Using alias: By default, the multipathed devices are named with the uid of the device,
which one accesses through /dev/mapper/${uid_name}. When one uses
user_friendly_names, devices will be named as mpath0, mpath1 etc., which may meet
ones needs. User also have an option to define a alias in multipath.conf for each of the
device.

Syntax is:

multipaths {
multipath {
wwid 3600a0b800011a2be00001dfa46cf0620
alias mydev1
}
}

1. Persistent device names: The names (uid_names or mpath names or alias names) that
appear in /dev/mapper are persistent across boots, and the names dm-, dm-1 etc., can
change between reboots. So, it is advisable to use the device names that appear
under /dev/mapper and avoid using the dm-? names.
2. Restart of tools after changing multipath,conf file: Once multipath.conf file is
changed, the multipath tools need to be rerun for those configuration values to be
effective. One has to kill multipathd, run multipath -F and then restart multipathd
and multipath.

3. Devices with paritions: Create device partitions before running multipath, as


kpartx is configured to run to create multipathed partitions that way. Partions on
device mpath0 appear as /dev/mapper/mpath0p1, /dev/mapper/mpath0p2, etc.,

4. Using binding file in clustered environment: Bindings file holds the bindings
between the device mapper names and the uid of the underlying device. By
default the file is /var/lib/multipath/bindings, this can be changed by the multipath
command line option -b. In a clustered environment, this file can be created in
one node and can be transferred to another to get the same names.
Note that the same effect can also be acheived by using alias and having the
same multipath.conf file in all the nodes of the cluster.
5. Getting the multipath device name corresponding to a SCSI device: If one knows
the name of a SCSI device and wants to get the device mapper name associated
with that the could use multipath -l /dev/sda, where sda is the SCSI device. On
the other hand, if one knows the device mapper name and wants to know the
underlying device names they could use the same command with the device
mapper name. i.e multipath -l mpath0, where mpath0 is the device mapper
name.

6. When using LVM on dm-multipath devices, it is better to turn lvm scanning off on
the underlying SCSI devices. This can be done by changing the filter parameter
in /etc/lvm/lvm.conf to be filter = [ "a/dev/mapper/.*/", "r/dev/sd.*/" ].
If your root device is also a multipathed lvm device, then make the above change
before you create a new initrd image.
7. To find out if your device (vendor/product) is supported by the tool by default do
the following.

o In RHEL:
 Make sure that multipathd is running. Then run

# multipathd -k
multipathd> show config

 This command will list all the devices that are built-in in the tools. In SLES:

# multipath -t

 this would list all the devices that are built-in in the tools.

8. If you have more than 1024 paths, you need to set a configuration
parameter max_fds to a number equal to or greater than the number of paths +
32. Otherwise, you might multipathd daemon die with an error (in
/var/log/messages) saying that there are too many files open.
9. When multipath/multipathd starts you might see a message(s) like

device-mapper: table: 253:0: multipath: error getting device


device-mapper: ioctl: error adding target to table

 in console or /var/log/messages. This is due to dm-multipath trying to create multipath


devices for your root device and/or other devices that are already mounted or opened.

You might also like