IBM ESS 5000-Quick Deployment Guide
IBM ESS 5000-Quick Deployment Guide
6.1.1
IBM
SC28-3301-01
Note
Before using this information and the product it supports, read the information in “Notices” on page
95.
This edition applies to version 6 release 1 modification 1 of the following product and to all subsequent releases and
modifications until otherwise indicated in new editions:
• IBM Spectrum® Scale Data Management Edition for IBM® ESS (product number 5765-DME)
• IBM Spectrum Scale Data Access Edition for IBM ESS (product number 5765-DAE)
IBM welcomes your comments; see the topic “How to submit your comments” on page xii. When you send information
to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without
incurring any obligation to you.
© Copyright International Business Machines Corporation 2020, 2021.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
Figures.................................................................................................................. v
Tables................................................................................................................. vii
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS
Legacy.............................................................................................................29
Disk call home for ESS 5000, ESS 3000, ESS 3200, and ESS Legacy...................................................... 29
Installing the IBM Electronic Service Agent............................................................................................. 30
Login, activation, and configuration of ESA......................................................................................... 30
Configuring only hardware call home and skipping software call home configuration......................33
ESS call home logs and location.......................................................................................................... 34
Overview of a problem report.............................................................................................................. 37
Uninstalling and reinstalling the IBM Electronic Service Agent............................................................... 42
Test call home............................................................................................................................................ 43
Post setup activities...................................................................................................................................45
essinstallcheck enhancement of software and hardware call home ...................................................... 45
iii
Appendix F. Upgrading the POWER9 firmware...................................................... 59
Appendix H. ESS protocol node deployment by using the IBM Spectrum Scale
installation toolkit........................................................................................... 63
Appendix I. Sample scenario: ESS 3000 and ESS 5000 mixed cluster and file
system............................................................................................................ 65
Notices................................................................................................................95
Trademarks................................................................................................................................................ 96
Glossary..............................................................................................................97
Index................................................................................................................ 105
iv
Figures
7. Problem Description....................................................................................................................................38
v
vi
Tables
1. Conventions.................................................................................................................................................. xi
vii
viii
About this information
IBM Elastic Storage System (ESS) 5000 documentation consists of the following information units.
Related information
Related information
For information about:
• IBM Spectrum Scale, see:
http://www.ibm.com/support/knowledgecenter/STXKQY/ibmspectrumscale_welcome.html
• mmvdisk command, see mmvdisk documentation.
• Mellanox OFED (MLNX_OFED v4.9-2.2.4.0) Release Notes, go to https://docs.mellanox.com/display/
MLNXOFEDv492240/Release%20Notes
• IBM Spectrum Scale call home, see Understanding call home.
• Installing IBM Spectrum Scale and CES protocols with the installation toolkit, see Installing IBM
Spectrum Scale on Linux® nodes with the installation toolkit.
• Detailed information about the IBM Spectrum Scale installation toolkit, see Using the installation toolkit
to perform installation tasks: Explanations and examples.
• CES HDFS, see Adding CES HDFS nodes into the centralized file system.
• Installation toolkit ESS support, see ESS awareness with the installation toolkit.
Table 1. Conventions
Convention Usage
bold Bold words or characters represent system elements that you must use literally,
such as commands, flags, values, and selected menu options.
Depending on the context, bold typeface sometimes represents path names,
directories, or file names.
bold bold underlined keywords are defaults. These take effect if you do not specify a
underlined different keyword.
constant width Examples and information that the system displays appear in constant-width
typeface.
Depending on the context, constant-width typeface sometimes represents path
names, directories, or file names.
italic Italic words or characters represent variable values that you must supply.
Italics are also used for information unit titles, for the first use of a glossary term,
and for general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the
keyboard. For example, <Enter> refers to the key on your terminal or workstation
that is labeled with the word Enter.
\ In command examples, a backslash indicates that the command or coding example
continues on the next line. For example:
{item} Braces enclose a list from which you must choose an item in format and syntax
descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example,
<Ctrl-c> means that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a
vertical line means Or.
In the left margin of the document, vertical lines indicate technical changes to the
information.
Prerequisites
• This document (ESS Software Quick Deployment Guide)
• SSR completes physical hardware installation and code 20.
There should be one node class per ESS Legacy building-block. If the command output does not
show mmvdisk for your ESS Legacy nodes, convert to mmvdisk before running the ESS Legacy
6.1.0.x container.
2. Convert to mmvdisk by running the following command from one of the POWER8 IO nodes or from
the POWER8 EMS node.
You can check if an ESS or IBM Spectrum Scale RPM is signed by IBM as follows.
1. Import the PGP key.
Code version
There are three different releases in ESS 6.1.1.x, each with two editions: Data Management Edition and
Data Access Edition. Example package names are as follows:
// Legacy
ess_legacy_6.1.1.0_0524-19_dme_ppc64le.tgz
ess_legacy_6.1.1.0_0524-19_dae_ppc64le.tgz
// ESS 5000
ess5000_6.1.1.0_0524-18_dme_ppc64le.tgz
ess5000_6.1.1.0_0524-18_dae_ppc64le.tgz
// ESS 3000
ess3000_6.1.0.0_0526-02_dme_ppc64le.tgz
ess3000_6.1.0.0_0526-02_dae_ppc64le.tgz
Note: The versions shown here might not be the GA version available on IBM FixCentral. It is
recommended to go to IBM FixCentral and download the latest code.
POWER8 considerations
If you are moving from an xCAT-based release (5.3.x) to a container based releases (6.1.x.x), the following
considerations apply:
• You must add an additional management network connection to C10-T2.
• A public or additional management connection is mandatory in C10-T3.
• You must stop and uninstall xCAT before installing the container.
b. Uninstall xCAT.
c. Remove dependencies.
Other notes
• The following tasks must be complete before starting a new installation (tasks done by manufacturing
and the SSR):
– SSR has ensured all hardware is clean, and IP addresses are set and pinging over the proper
networks (through the code 20 operation).
– /etc/hosts is blank
– The ESS tgz file (for the correct edition) is in the /home/deploy directory. If upgrade is needed,
download from Fix Central and replace.
– Network bridges are cleared.
– Images and containers are removed.
– SSH keys are cleaned up and regenerated.
– All code levels are at the latest at time of manufacturing ship.
• Customer must make sure the high-speed connections are cabled and the switch is ready before
starting.
If hostid on any node is not unique, you must fix by running genhostid. These steps must be done
when creating a recovery group in a stretch cluster.
• Consider placing your protocol nodes in file system maintenance mode before upgrades. This is not a
requirement but you should strongly consider doing it. For more information, see File system
maintenance mode.
• Do not try to update the EMS node while you are logged in over the high-speed network. Update the
EMS node only through the management or the campus connection.
• After adding an IO node to the cluster, run the gnrhealthcheck command to ensure that there are no
issues before creating vdisk sets. For example, duplicate host IDs. Duplicate host IDs cause issues in
the ESS environment.
Support matrix
Release OS Runs on Can upgrade or deploy
ESS 3200 6.1.1.0 • Red Hat Enterprise • POWER9 EMS • ESS 3200 nodes
Linux 8.2 (PPC64LE) • POWER9 EMS
• POWER9 protocol
nodes
ESS 3000 6.1.1.0 • Red Hat Enterprise • POWER8 EMS • ESS 3000 nodes
Linux 7.9 (PPC64LE) • POWER9 EMS • POWER8 EMS
• Red Hat Enterprise • POWER9 EMS
Linux 8.2 (x86_64)
• POWER9 protocol
nodes
ESS 5000 6.1.1.0 • Red Hat Enterprise • POWER9 EMS • ESS 5000 nodes
Linux 7.9 (PPC64LE) • POWER9 EMS
• POWER9 protocol
nodes
ESS Legacy 6.1.1.0 • Red Hat Enterprise • POWER8 EMS • ESS POWER8 I/O
Linux 8.2 (PPC64LE) • POWER9 EMS nodes (PPC64LE)
• Red Hat Enterprise • ESS POWER8 protocol
Linux 7.9 (PPC64LE) nodes (PPC64LE)
• POWER8 EMS
• POWER9 EMS
Note: You must convert to mmvdisk before using ESS Legacy 6.1.1.x.
The following common instructions need to be run for a new installation or an upgrade of an ESS system.
These instructions are based on steps required for a POWER9 EMS. Important POWER8 notes are
outlined where needed. The following build is used for example purposes.
ess5000_6.1.1.0_0524-18_dme_ppc64le.tgz
Note: If you have protocol nodes, add them to the commands provided in these instructions. The
default /etc/hosts file has host names prt1 and prt2 for protocol nodes. You might have more than
two protocol nodes.
1. Log in to the EMS node by using the management IP (set up by SSR by using the provided worksheet).
The default password is ibmesscluster.
2. Set up a campus or a public connection (interface enP1p8s0f2). Connect an Ethernet cable to C11-
T3 on the EMS node to your lab network. This connection serves as a way to access the GUI or the ESA
agent (call home) from outside of the management network. The container creates a bridge to the
management network, thus having a campus connection is highly advised.
Note: It is recommended but not mandatory to set up a campus or public connection. If you do not set
up a campus or a public connection, you will temporarily lose your connection when the container
bridge is created in a later step.
This method is for configuring the campus network, not any other network in the EMS node. Do not
modify T1, T2, or T4 connections in the system after they are set by SSR, and use the SSR method only
to configure T1 and T2 (if changing is mandatory after SSR is finished). That includes renaming the
interface, setting IP, or any other interaction with those interfaces.
You can use the nmtui command to set the IP address of the campus interface. For more information,
see Configuring IP networking with nmtui.
3. Complete the /etc/hosts file on the EMS node. This file must contain the low-speed (management)
and high-speed (cluster) IP addresses, FQDNs, and short names. The high-speed names must contain
a suffix to the low-speed names (For example, essio1-hs (high-speed name) to essio1 (low-speed
name)). This file must also contain the container host name and the IP address.
Note:
podman ps -a
podman images
nmcli c
ii) Clean up any existing bridges before the new container is set up. The bridge names must be
mgmt_bridge and fsp_bridge.
• Stop the GUI temporarily until upgrade or conversion from xCAT deployment to container is
complete.
cd /home/deploy
ess5000_6.1.1.0_0524-18_dme_ppc64le.sh
ess5000_6.1.1.0_0524-18_dme_ppc64le.sh.sha256
./ess5000_6.1.1.0_0524-18_dme_ppc64le.sh --start-container
During this step, you are first prompted to accept the license agreement. Press 1 to accept. You are
then prompted to input answers to 3 questions before the installation starts (2 questions for ESS
3000).
• Confirm or set EMS FQDN.
• Provide the container short name.
• Provide a free IP address on the FSP subnet for the container FSP connection. (Not applicable to ESS
3000)
Example of contents of the extracted installation package:
ess5000_6.1.1.0_0524-18_dme_ppc64le.dir/
ess5000_6.1.1.0_0524-18_dme.dir/ess5000_6.1.1.0_0524-18_dme_ppc64le.tar
ess5000_6.1.1.0_0524-18_dme.dir/ess5000_6.1.1.0_0524-18_dme_ppc64le_binaries.iso
ess5000_6.1.1.0_0524-18_dme.dir/rhel-8.2-server-ppc64le.iso
ess5000_6.1.1.0_0524-18_dme.dir/podman_rh7.tgz
ess5000_6.1.1.0_0524-18_dme.dir/podman_rh8.tgz
ess5000_6.1.1.0_0524-18_dme.dir/Release_note.ess5000_6.1.1.0_0524-18_dme_ppc64le.txt
ess5000_6.1.1.0_0524-18_dme.dir/python3-site-packages_rh7.tgz
ess5000_6.1.1.0_0524-18_dme.dir/python3_rh7.tgz
ess5000_6.1.1.0_0524-18_dme.dir/Release_note.ess5000_6.1.1.0_0524-18_dme_ppc64le.txt
ess5000_6.1.1.0_0524-18_dme.dir/essmkyml
ess5000_6.1.1.0_0524-18_dme.dir/essmgr
ess5000_6.1.1.0_0524-18_dme.dir/essmgr_p8.yml
ess5000_6.1.1.0_0524-18_dme.dir/essmgr_p9.yml
ess5000_6.1.1.0_0524-18_dme.dir/data/
ess5000_6.1.1.0_0524-18_dme.dir/classes/
ess5000_6.1.1.0_0524-18_dme.dir/logs/
ess5000_6.1.1.0_0524-18_dme.dir/essmgr.yml
Please type the desired and resolvable short hostname [ess5k-cems0]: cems0
Note: The values in parentheses ([ ]) are just examples or the last entered values.
If all of the checks pass, the essmgr.yml file is written and you can proceed to bridge creation, if
applicable, and running the container.
Note: If you are deploying ESS 3000, you are not prompted to answer the FSP IP question.
At this point, if all checks are successful, the image is loaded and container is started. Example:
8. Run the essrun config load command. This command determines the node information based on
VPD and also exchange the SSH keys.
Note:
• Always include the EMS in this command along with all nodes of the same type in the building-
blocks.
• Use the low-speed management host names. Specify the root password with -p.
• The password (-p) is the root password of the node. By default, it is ibmesscluster. Consider
changing the root password after deployment is complete.
After this command is run, you can use -G for future essrun steps (For example, -G ess_ppc64le).
There are different node group names for ESS 3000 and ESS Legacy.
9. Run the essrun config check command. This command does a check of the various nodes looking
for potential issues prior to upgrade. Review the output carefully and make changes as needed before
proceeding.
Note: The password (-p) is the root password of the node. By default, it is ibmesscluster. Consider
changing the root password after deployment is complete.
Before starting with these steps, you must complete the steps in Chapter 3, “ESS Common installation
instructions,” on page 11.
The following steps are covered in this topic:
• Upgrading the EMS and IO nodes, if required.
• Creating network bonds.
• Creating the cluster.
• Adding the EMS node to the cluster.
• Creating the file system.
• Configuring performance monitoring and starting the GUI.
• Setting up call home.
• Setting up time server.
• Final health checks.
Note: You can update by using the management node names (management) or after the config load is
run, you can update by using a group of nodes. The groups are as follows:
• PPC64LE - ESS 5000 and ESS Legacy: ess_ppc64le
• x86_64 - ESS 3000: ess_x86_64
When the group is referenced in these instructions, ess_ppc64le is used as an example. If you are in an
ESS 3000 environment, use ess_x86_64.
For the EMS node, you can use the group ems.
At this point, the user has already determined if an upgrade is required. If the version initially found in /
home/deploy on the EMS node is earlier than the latest available on IBM Fix Central, the latest version
should be already downloaded and deployed according to Chapter 3, “ESS Common installation
instructions,” on page 11.
1. If an upgrade is required, upgrade the EMS node.
Please enter 'accept' indicating that you want to update the following list of nodes: ems1
>>> accept
Note:
• If the kernel is changed, you are prompted to leave the container, reboot the EMS node, restart the
container, and run this command again.
For example:
Navigate back to ESS 6.1.0.x extracted directory and run the following commands:
./essmgr -r
essrun -N ems1 update --offline
• You cannot upgrade a POWER8 EMS currently running ESS Legacy code (5.3.x with xCAT control)
from an ESS 3000 container. If xCAT is installed on the host, you must first uninstall it and cleanup
any dependencies before attempting an EMS upgrade from the container. Do not remove xCAT if
legacy deployment is not needed, typically only if you are moving to ESS Legacy 6.1.1.x container. If
ssh essio1
ESSENV=TEST essnettest -N essio1,essio2 --suffix=-hs
This command performs the test with an optional RDMA test afterward if there is Infiniband. Ensure
that there are no errors in the output indicating dropped packets have exceeded thresholds. When
completed, type exit to return back to the container.
5. Create the cluster.
Note:
• By default, this command attempts to use all the available space. If you need to create multiple file
systems or a CES shared root file system for protocol nodes, consider using less space. For example:
• This step creates combined metadata + data vdisk sets by using a default RAID code and block size.
You can use additional flags to customize or use the mmvdisk command directly for advanced
configurations.
• If you are updating ESS 3000, the default set-size is 80% and it must not be increased. For
additional options, see essrun command. The default block size for PPC64LE is 16M whereas for ESS
3000 it is 4M.
• If you are deploying protocol nodes, make sure that you leave space for CES shared root file system.
Adjust the set-size slightly lower when you are creating this required file system for protocol nodes.
2. From the EMS node (outside of the container), configure and start the performance monitoring
sensors.
The recommended period is 86400 so that the collection is done once per day.
b. To restrict GPFS Fileset Quota to run on the management server node only, run the following
command.
Here the EMSNodeName must be the name shown in the mmlscluster output.
Note: To enable quota, the filesystem quota checking must be enabled. Refer mmchfs -Q and
mmcheckquota commands in IBM Spectrum Scale: Command and Programming Reference.
4. Verify that the values are set correctly in the performance monitoring configuration by running the
mmperfmon config show command on the EMS node. Ensure that GPFSDiskCap.period is
properly set, and GPFSFilesetQuota and GPFSDiskCap are both restricted to the EMS only.
Note: If you are moving from manual configuration to auto configuration then all sensors are set to
default. Make the necessary changes using the mmperfmon command to customize your environment
accordingly. For information on how to configure various sensors using mmperfmon, see Manually
installing IBM Spectrum Scale GUI.
5. Start the performance collector on the EMS node.
b. In a web browser, enter the public or campus IP address with https and walk through the System
Setup wizard instructions.
7. Log in to each node and run the following command.
essinstallcheck -N localhost
Doing this step verifies that all software and cluster versions are up-to-date.
8. From the EMS node, outside of the container, run the following final health check commands to verify
your system health.
gnrhealthcheck
mmhealth node show -a
Warning: You must have a clean and healthy system before starting any ESS upgrade (online or
offline). At least, the following commands must run free of errors when run on any node outside of
container:
gnrhealthcheck
mmhealth node show -a
You can also run the essrun healthcheck command instead, from inside the container.
mmgetstate -N NodeClass
Where NodeClass is your ESS 3000, ESS 5000, or ESS Legacy node class. For more information, see
mmlsnodeclass command.
Offline upgrade assumptions (EMS or protocol nodes only):
• You assume the risks of potential quorum loss.
• The GPFS GUI and collector must be down.
Note: Before upgrading the protocol nodes, consult the IBM Spectrum Scale toolkit documentation. You
might need to shut down services on a given protocol node before the upgrade can start.
1. Complete the steps in Chapter 3, “ESS Common installation instructions,” on page 11. Make sure that
you add the protocol nodes to the configuration load if you are planning to upgrade protocol nodes.
2. Update the EMS node first.
After the reboot and restarting the container, run the EMS node update again.
Note: You cannot upgrade a POWER8 EMS currently running ESS Legacy code (5.3.x with xCAT control)
from an ESS 3000 container. If xCAT is installed on the host, you must first uninstall it and cleanup any
dependencies before attempting an EMS upgrade from the container. Do not remove xCAT if legacy
deployment is not needed, typically only if you are moving to ESS Legacy 6.1.0.x container. If you are
still using an ESS Legacy deployment (5.3.x), update the EMS by using the upgrade instructions
outlined in ESS 5.3.x Quick Deployment Guide.
3. Update the protocol nodes.
4. Run installation check on each node type by logging in to EMS node and protocol nodes.
essinstallcheck
These command examples show ESS 5000 node and node classes, but you can use these commands
with ESS 3000 and ESS Legacy nodes and node classes as well.
After offline update is done, proceed to starting GPFS on the nodes.
6. Run installation check on each node from outside the container.
essinstallcheck
Note: If any protocol nodes are updated, ensure that you restart CES services on those nodes.
essinstallcheck
3. Change the autoload parameter to enable GPFS to automatically start on all nodes.
mmchconfig autoload=yes
gnrhealthcheck
mmhealth node show -a
# vi /opt/ibm/esa/runtime/conf/javaHome.sh
# cat /opt/ibm/esa/runtime/conf/javaHome.sh
JAVA_HOME=/opt/ibm/java-ppc64le-80/jre
# /opt/ibm/esa/bin/activator -C -p 5024 -w -Y
During rolling upgrade, mmhealth During a rolling upgrade (Updating of one ESS I/O node at a • ESS
might show the error time but maintaining quorum), mmhealth might display an 3000
local_exported_ fs_unavail error indicating that the local exported file system is • ESS
even though the file system is still unavailable. This message is erroneous. 5000
mounted.
When running the essrun config This failure means that the pems module is not running the ESS 3000
load command, you might see a canister. For fixing this, do the following:
failure such as this:
1. Log in to the failed canister and run the following
stderr: |- commands:
rc=2 code=186
Failed to obtain the enclosure
cd /install/ess/otherpkgs/rhels8/x86_64/gpfs
device
yum reinstall gpfs.ess.platform.ess3k*
pemsmod 188416 0
scsi_transport_sas 45056 1 pemsmod
After reboot of an ESS 5000 node, Power off the system and then power it on again. ESS 5000
systemd could be loaded
1. Run the following command from the container:
incorrectly.
Users might see the following error rpower NodeName off
when trying to start GPFS:
2. Wait for at least 30 seconds and run the following
Failed to activate service command to verify that the system is off:
'org.freedesktop.systemd1':
timed out rpower NodeName status
rpower NodeName on
In ESS 5000 SLx series, after Run the following command from EMS or IO node to revive ESS 5000
pulling a hard drive out for a long the drive:
time wherein the drive has finished
draining, when you re-insert the mmvdisk pdisk change --rg RGName --pdisk PdiskName --
revive
drive, the drive could not be
recovered. Where RGName is the recovery group that the drive belongs
to and PdiskName is the drive's pdisk name.
After the deployment is complete, if The error about mmvdisk settings can be ignored. The • ESS
firmware on the enclosure, drive, or resolution is to update the mismatched firmware levels on 3000
HBA adapter does not match the enclosure, adapter, or HBA adapters to the correct levels. • ESS
expected level, and if you run You can run the mmvdisk configuration check 5000
essinstallcheck, the following command to confirm.
List the mmvdisk node classes: mmvdisk nc list
When running essinstallcheck Rerun essinstallcheck which should properly query the ESS 5000
you might see an error message firmware level.
similar to:
When running the essrun - N This health check command (essrun - N Node • ESS
Node healthcheck command, the healthcheck) is removed from the ESS documentation and 3000
essinstallcheck script might it is advised to use the manual commands to verify system
• ESS
fail due to incorrect error health after deployment. Run the following commands for
5000
verification which might lead to an health check:
impression that there is a problem
• gnrhealthcheck
where there is none.
• mmhealth node show -a
• essinstallcheck -N localhost: This command
needs to be run on each node.
During command-less disk For command-less disk replacement using commands, only • ESS
replacement, there is a limit on how replace up to 2 disks at a time. If command-less disk 3000
many disks can be replaced at one replacement is enabled, and more than 2 disks are
• ESS
time. replaceable, replace the 1st 2 disks, and then use the
5000
commands to replace the 3rd and subsequent disks.
Issue reported with command-less The replaceable disk will have the amber led on, but not ESS 5000
disk replacement warning LEDs. blinking. Disk replacement should still succeed.
After upgrading an ESS 3000 node After the ESS 3000 upgrade is complete, the pmsensors ESS 3000
to version 6.1.0.x, the pmsensors service does not automatically start. You must manually
service needs to be manually start the service for performance monitoring to be restored.
started. On each ESS 3000 canister, run the following command:
ESS commands such as There is currently an issue with running the ESS deployment • ESS
essstoragequickcheck, commands by using the hostname of a node. The 3000
essinstallcheck must be run workaround is to run checks locally on each node by using
• ESS
using -N localhost. If using the localhost. For example, instead of using
5000
hostname such as -N ess3k1a, an essstoragequickcheck -N ess3k1a, use the following
error occurs. command:
essstoragequickcheck -N localhost
set
default_kernelopts="root=UUID=9a4a93b8-2e6b-4ba6-
bda4-a7f8c3cb908f
ro nvme.sgl_threshold=0 sshd=1 pcie_ports=native
nohup
resume=UUID=c939121b-526a-4d44-8d33-693f2fb7f018
rd.md.uuid=f6dbf6f2:8ac82ed6:875ca663:0094ac11
rd.md.uuid=06c2d5b0:c6603a1e:5df4b4d3:98fd5adc
rhgb quiet crashkernel=4096M"
After change:
set
default_kernelopts="root=UUID=9a4a93b8-2e6b-4ba6-
bda4-a7f8c3cb908f
ro nvme.sgl_threshold=0 sshd=1 pcie_ports=native
nosmt
resume=UUID=c939121b-526a-4d44-8d33-693f2fb7f018
rd.md.uuid=f6dbf6f2:8ac82ed6:875ca663:0094ac11
rd.md.uuid=06c2d5b0:c6603a1e:5df4b4d3:98fd5adc
rhgb quiet crashkernel=4096M"
GUI Hardware page displays FN1 There is currently no workaround for this issue. ESS
Enclosure state failed. However, Legacy
unidentified reason of failed state.
Hardware details: All end points are There is currently no workaround for this issue. ESS
not visible. Legacy
[ERROR] Network adapter MT4115 Ignore this message. The correct version is 12.28.2006. ESS
firmware: found 12.28.2006 Legacy
expected 12.27.2008
In an existing cluster with quorum This is not considered a problem thus no workaround is • ESS
nodes not exceeding 7 nodes, needed. 3000
addition of any new nodes fails
• ESS
irrespective of the firmware level.
5000
• ESS
Legacy
When you run mmlsenclosure There is currently no workaround for this issue. ESS
all on IO nodes, the following Legacy
message gets displayed only for
When enabling security, the To enable the security tool, change the line 130 in • ESS
following error occurs: the /opt/ibm/ess/deploy/ansible/roles/ 3000
security/tasks/securitydisable.yml file as follows:
ERROR! We were unable to read • ESS
either as JSON nor YAML, these Replace when not portmapperdown.stat.exists with 5000
are the errors we got from
when: portmapperdown.stat.exists != True
each: • ESS
JSON: Expecting value: line 1
column 1 (char 0) Note: There are two spaces before when: Legacy
Disk call home for ESS 5000, ESS 3000, ESS 3200, and ESS Legacy
The IBM Spectrum Scale RAID pdisk is an abstraction of a physical disk. A pdisk corresponds to exactly
one physical disk, and belongs to exactly one de-clustered array within exactly one recovery group.
ESS Call Home to IBM
Electronic Service
Agent (ESA)
GNR Core
Disk Replace Events
RHEL
EMS Node
Callback Script
GNR Core
RHEL ESS 5000
I/O Server Node/Canister or
ESS 3000
or
Callback Script ESS 3200
GNR Core
bl8pdg028
RHEL
I/O Server Node/Canister
esagent.pLinux-4.5.5-1.noarch.rpm
The RPM should be installed during manufacturing. In case it is not installed, issue the following
command:
cd /install/ess/otherpkgs/rhels8/ppc64le/ess/
yum install esagent.pLinux-4.5.5-1.noarch.rpm
https://<EMS or ip>:5024/esa
For example:
https://192.168.45.20:5024/esa
ESA uses port 5024 by default. It can be changed by using the ESA CLI if needed. For more information on
ESA, see IBM Electronic Service Agent. On the Welcome page, log in to the IBM Electronic Service Agent
GUI. If an untrusted site certificate warning is received, accept the certificate or click Yes to proceed to
the IBM Electronic Service Agent GUI. You can get the context sensitive help by selecting the Help option
located in the upper right corner.
The All Systems menu option shows the node where ESA is installed. For example, ems1. The node
where ESA is installed is shown as PrimarySystem in the System Info. The ESA status is shown as
Online only on the PrimarySystem node in the System Info tab.
Note: ESA is not activated by default. In case it is not activated, you will get a message similar to the
following message:
The esscallhomeconf has a new switch called --esa-config. Earlier users could activate ESA by
using the /opt/ibm/esa/bin/activator -C command and defer the ESA configuration with the ESA
GUI (earlier method of activating ESA before 6.x). However, with the introduction of --esa-config and
its supporting switch, the activation of ESA and its configuration can also be done by using the
esscallhomeconf command.
This switch can be used to activate ESA and configure ESA by using the CLI with the required customer
information such as customer name, email ID, server location, etc. The user earlier provided this
information when activating ESA by using the ESA GUI.
The usage information of esscallhomeconf is as follows.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 31
[-b CUSTOMER_INFO_B] [-s CUSTOMER_INFO_S]
[-i CUSTOMER_INFO_I] [-p CUSTOMER_INFO_P]
[-w] [-Y]
optional arguments:
-h, --help show this help message and exit
-E ESA-AGENT Provide nodename for esa agent node
--prefix PREFIX Provide hostname prefix. Use = between --prefix and
value if the value starts with -.
--suffix SUFFIX Provide hostname suffix. Use = between --suffix and
value if the value starts with -.
--verbose Provide verbose output
--esa-hostname-fqdn ESA_HOSTNAME_FQDN
Fully qualified domain name of ESA server for
certificate validation.
--stop-auto-event-report
Stop report of automatic event to ESA in case of any
hardware call home event reported to system.
-N NODE-LIST Provide a list of nodes to configure.
--show Show call home configuration details.
--register {node,all}
Register endpoints(nodes, enclosure or all) with ESA.
--no-swcallhome Do not configure software callhome while configuring
hardware callhome
--icn ICN Provide IBM Customer Number for Software callhome.
--serial SOLN-SERIAL Provide ESS solution serial number.
--model SOLN-MODEL Provide ESS model. Applicable only for BE (ppc64)
models.
--proxy-ip PROXY-HOSTNAME
Provides the IP address or the hostname for the proxy configuration.
--proxy-port PROXY-PORT
Provides the port number for the proxy configuration.
--proxy-userid PROXY-USERNAME
Provides the user ID for the proxy configuration.
--proxy-password PROXY-PASSWORD
Provides the password for the proxy configuration.
There are several switches which start with ESA_CONFIG that can be used with the --esa-config
switch of the esscallhomeconf command to activate ESA by using the CLI instead of using the ESA GUI
and activating it.
Entities or systems that can generate events are called endpoints. The EMS, I/O server nodes, and
attached enclosures can be endpoints in ESS. Servers and enclosure endpoints can generate events.
Server can generate hardware events which could be CPU, DIMM, OS Disk, etc. Typically, these events are
also logged in the OPAL log. Enclosure generated call home is mostly during the disk replacement event.
In ESS, ESA is only installed on the EMS node, and it automatically discovers the EMS as
PrimarySystem. The EMS node and I/O server nodes must be registered to ESA as endpoints.
After running this single command, ESA is activated and configured, and the nodes are registered along
with the enclosures. Software call home is also set up with the same command.
Configuring only hardware call home and skipping software call home
configuration
The software call home feature collects files, logs, traces, and details of certain system health events from
different nodes and services in an IBM Spectrum Scale cluster.
These details are shared with the IBM support center for monitoring and problem determination. For
more information on call home, see Installing call home and Understanding call home.
You can configure hardware call home only by using the esscallhomeconf command. Use the --no-
swcallhome option to set up just the call home hardware, and skip the software call home setup.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 33
-y Location -a State -z PostalCode -b Building
-f UserName -j username@example.com -k ContactNum -p 5024 -w -Y -i username@example.com --esa-
config
Attention: For ESS 3000 or ESS 3200 systems, do not use the --no-swcallhome switch
otherwise the ESS 3000 or the ESS 3200 hardware call home will also not function.
Note: For ESS 5000 or ESS Legacy systems, you can skip the software call home functionality by using the
--no-swcallhome switch if you do not want to use the software call home function.
Note: If any software only call home configuration was done earlier by using mmcallhome command or by
using esscallhomeconf then subsequent execution of esscallhomeconf with --no-swcallhome
does not clear the earlier configuration of software call home configuration. A software call home
configuration that exists from before on an ESS cluster can be overwritten only if esscallhomeconf
command is executed on the node without the --no-swcallhome switch. If you use --no-swcallhome
with esscallhomeconf command and want to keep any older software call home configuration, it might
cause issues or affect the ESS 3000 or the ESS 3200 hardware call home functionality. Therefore, in any
case it is advisable to clean up an earlier software call home configuration or overwrite the software call
home configuration before configuring hardware and software call home function on ESS nodes by using
the esscallhomeconf command.
Note: Re-running of the esscallhomeconf checks if ESA has been activated and its configuration was
already done earlier or not. If yes then ESA configuration must be cleared before re-running
esscallhomeconf by using /opt/ibm/esa/bin/unconfig.sh.
Attention: ESS Server Node Registration Entitlement in 6.1.x.x: With the introduction of the ESS 5000,
ESS 3000, and ESS 3200, the node registration entitlement of ESS node has changed. For earlier version
of ESS, the ESS building blocks are registered as a solution MTMs. Usually the MTM of the EMS node is
used as an entire building block Solution MTM. However, in case of ESS 5000, ESS 3000, and ESS 3200,
the system is registered based on its actual server serial number and MTM instead of solution MTM.
Therefore, it is very important to check with IBM that the nodes are registered at the entitlement server
before configuring hardware call home. ESS Legacy will continue to use the solution-based MTM
registration and entitlement even in ESS 6.1.x.x.
Attention: The esscallhomeconf command also configures the IBM Spectrum Scale call home
setup. The IBM Spectrum Scale call home feature collects files, logs, traces, and details of certain
system health events from the I/O and EMS nodes and services running on those nodes. These
details are shared with the IBM support center for monitoring and problem determination. For
more information on IBM Spectrum Scale call home, see IBM Spectrum Scale documentation in
IBM Documentation.
Note: The ESS 3000 and ESS 3200 hardware call home is backed by software call home. In other words,
software call home must be configured by using the esscallhomeconf command, without the --no-
swcallhome switch in the ESS 3000 or the ESS 3200 environment. Otherwise, the ESS 3000 or the ESS
3200 hardware failure events are not reported to ESA and a PMR does not get opened.
The endpoints are visible in the ESA portal after registration, as shown in the following figure:
Name
Shows the name of the endpoints that are discovered or registered.
SystemHealth
Shows the health of the discovered endpoints. A green icon (√) indicates that the discovered system is
working fine. The red (X) icon indicates that the discovered endpoint has some problem.
ESAStatus
Shows that the endpoint is reachable. It is updated whenever there is a communication between the
ESA and the endpoint.
SystemType
Shows the type of system being used. Following are the various ESS device types that the ESA
supports.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 35
Figure 4. List of icons showing various ESS device types
Detailed information about the node can be obtained by selecting System Information. Here is an
example of the system information:
When an endpoint is successfully registered, the ESA assigns a unique system identification (system id) to
the endpoint. The system ID can be viewed using the --show option.
For example:
When an event is generated by an endpoint, the node associated with the endpoint must provide the
system id of the endpoint as part of the event. The ESA then assigns a unique event id for the event. The
system id of the endpoints are stored in a file called esaepinfo01.json in the /vpd directory of the
EMS and I/O servers that are registered. The following example displays a typical esaepinfo01.json
file:
# cat /vpd/esaepinfo01.json
{
"encl": {
"78ZA006": "32eb1da04b60c8dbc1aaaa9b0bd74976"
},
"esaagent": "ems4",
"node": {
"ems4-ce": "6304ce01ebe6dfb956627e90ae2cb912",
"essio41-ce": "a575bdce45efcfdd49aa0b9702b22ab9",
"essio42-ce": "5ad0ba8d31795a4fb5b327fd92ad860c"
}
}
The endpoints are visible in the ESA portal after registration. For more information, see IBM Spectrum
Scale call home documentation.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 37
Another example is POWER9 or POWER8 hardware failure which could be an ESS 5000 or ESS Legacy IO
node or EMS node or protocol node hardware failure. Hardware failure event could be DIMM failure,
power supply failure, or fan failure in POWER9 or POWER8 nodes. Any of the failure events reported by
the POWER nodes are recorded in OPAL log and ESS hardware call home function reads the OPAL events
which need service and it is a call home event.
ESS 5000, ESS 3000, ESS 3200, and ESS Legacy Hardware failure event raised in OPAL log reported
by event type:
• nodeEvent - Any POWER9 node hardware failure which is reported in the OPAL log is a call home event
that is reported with this event.
Similarly, ESS 3000 and ESS 3200 x86 nodes can also report any of the hardware failure events with
mmhealth. The ESS 3000 call home depends on software call home and mmhealth. In other words,
software call home must be configured and mmhealth must detect the issue with hardware to get it
reported to ESA and a PMR getting opened.
While running esscallhomeconf command, make sure to not use the --no-swcallhome switch in the
ESS 3000 or the ESS 3200 environment. Otherwise, the ESS 3000 or the ESS 3200 hardware failure
events are not reported to ESA and a PMR does not get opened.
ESS 3000 and ESS 3200 supported hardware events (Need software call home to be configured):
• bootDrvFail - When boot drive failed at canister.
• canFailed - When canister failed.
• bootDrvMissing - When boot drive missing.
• bootDrvSmtFailed - When boot drive smart failed.
• canFanFailed - When canister FAN stopped working.
• fanFailed - When FAN failed.
• psFailed - Power supply failure.
• psFanFailed - Power supply FAN failure.
Name
It is the serial number of the enclosure containing the drive to be replaced.
Description
It is a short description of the problem. It shows ESS version or generation, service task name and
location code. This field is used in the synopsis of the problem (PMR) report.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 39
Figure 8. Example of a problem summary
If an event is successfully reported to the ESA, and an event ID is received from the ESA, the node
reporting the event uploads additional support data to the ESA that are attached to the problem (PMR) for
further analysis by the IBM support team.
The callback script logs information in the /var/log/messages file during the problem reporting
episode. The following examples display the messages logged in the /var/log/message file generated
by the essio11 node:
• The ESA responds by returning a unique event ID for the system ID in the json format.
Call home monitoring of ESS 5000, ESS 3000, ESS 3200, and ESS Legacy
systems and their disk enclosures
A callback is a one-time event. Therefore, it is triggered when the disk state changes to replace. If ESA
misses the event, for example if the EMS node is down for maintenance, the call home event is not
generated by ESA.
To mitigate this situation, the callhomemon.sh script is provided in the /opt/ibm/gss/tools/
samples directory of the EMS node. This script checks for pdisks that are in the replace state, and
sends an event to ESA to generate a call home event if there is no open PMR for the corresponding
physical drive. This script can be run on a periodic interval. For example, every 30 minutes.
In the EMS node, create a cronjob as follows:
1. Open crontab editor by using the following command:
# crontab -e
*/30 * * * * /opt/ibm/gss/tools/samples/callhomemon.sh
# crontab -l
*/30 * * * * /opt/ibm/gss/tools/samples/callhomemon.sh
The call home monitoring protects against missing a call home due to ESA missing a callback event. If a
problem report is not already created, the call home monitoring ensures that a problem report is created.
Note: When the call home problem report is generated by the monitoring script, as opposed to being
triggered by the callback, the problem support data is not automatically uploaded. In this scenario, the
IBM support can request support data from the customer.
Note: A PMR is created because of the periodic checking of the replaced drive state. For example, when
the callback event is missed, additional support data is not provided for the ESA agent.
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 41
Attention:
• In case of ESS 5000 or ESS Legacy systems, if the hardware event reported by OPAL or any disk
enclosures disk error event attached to POWER IO nodes, which was missed because ESA was
down or due to some other issue in the EMS node, then events can be triggered manually by
invoking the /opt/ibm/gss/tools/samples/callhomemon.sh script. The
callhomemon.sh script reports any missed or new hardware event reported by any POWER9 or
POWER8 node, which might be a part of ESS 5000 or ESS Legacy cluster (such as POWER EMS
Node, POWER protocol nodes, etc.) and ESS 3000 or ESS 3200 disk enclosures disk failure event
only, if it is a part of ESS cluster.
• In case of ESS 3000 or ESS 3200 systems, if the hardware event reported by mmhealth is
missed because ESA is down or due to some other issue in the EMS node then event can be re-
triggered by using mmhealth node eventlog –-clear followed by mmsysmoncontrol
restart. In case the disk enclosures disk event is missed on the ESS 3000 or the ESS 3200
system because the ESA is down or due to some other issue at EMS node then event can be
triggered manually by invoking the /opt/ibm/gss/tools/samples/callhomemon.sh script.
The callhomemon.sh script in case of ESS 3000 or ESS 3200 only re-sends the missed or old
disk enclosures disk error event and any missed or new hardware event reported by any
POWER9 which might be a part of an ESS 3000 or an ESS 3200 cluster (such as POWER EMS
node and POWER protocol nodes).
Upload data
The following support data is uploaded when the ESS system disks in enclosures display a drive replace
notification on an ESS 5000, ESS 3000, ESS 3200, or ESS Legacy system.
• The output of mmlspdisk command for the pdisk that is in replace state.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
The following support data is uploaded when the system displays any hardware issue in an ESS 5000 or
an ESS Legacy system.
• The output of the opal_elog_parse command for the serviceable event that caused failure.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
The following support data is uploaded when the system displays any hardware issue in an ESS 3000 or
an ESS 3200 system.
• The output of the mmhealth command and the actual component that caused failure.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
/opt/ibm/esa/bin/verifyConnectivity -t
• ESA test call home - Test call home from the ESA portal. Go to All systems > System Health for the
endpoint from which you would like to generate a test call home. Click Send Test Problem' from the
newly opened Problems tab.
• ESS call home script setup to ensure that the callback script is set up correctly.
Verify that the periodic monitoring is set up.
crontab -l
[root@ems1 deploy]# crontab -l
*/30 * * * * /opt/ibm/ess/tools/samples/callhomemon.sh
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 43
Figure 10. Sending a Test Problem
# essinstallcheck -N localhost
Start of install check
nodelist: localhost
Getting package information.
[WARN] Package check cannot be performed other than on EMS node. Checking nodes.
================== Summary of node: localhost =============================
[INFO] Getting system firmware level. May take a long time...
[INFO] Getting system profile setting.
[INFO] Spectrum Scale RAID is not active, cannot get gpfs Installed
version:
[OK] Linux kernel installed: 3.10.0-1160.11.1.el7.ppc64le
[ERROR] Systemd not at min recommended level: 219-78.el7_9.2.ppc64le
[ERROR] Networkmgr not at min recommended level: 1.18.8-2.el7_9.ppc64le
[OK] Mellanox OFED level: MLNX_OFED_LINUX-4.9-2.2.5.1
[OK] IPR SAS FW: 19512B00
[OK] ipraid RAID level: 10
[ERROR] ipraid RAID Status: found Degraded expected Optimized
[OK] IPR SAS queue depth: 64
[ERROR] System Firmware : found FW860.81 (SV860_215) expected min
FW860.90 (SV860_226)
[OK] System profile setting: scale
[OK] System profile verification PASSED.
[INFO] Cluster not yet created skipping rsyslog check
[OK] Host adapter driver: 34.00.00.00
Performing Spectrum Scale RAID configuration check.
[OK] New disk prep script: /usr/lpp/mmfs/bin/tspreparenewpdiskforuse
[OK] Network adapter MT4099 firmware: 16.27.2008, net adapter count: 3
[OK] Network adapter firmware
[INFO] Storage firmware check is not required as GPFS cluster does not exist.
[OK] Node is not reserving KVM memory.
[OK] IBM Electronic Service Agent (ESA) is activated for Callhome service.
[OK] Software callhome check skipped as cluster not configured.
End of install check
[PASS] essinstallcheck passed successfully
Appendix B. Configuring call home in ESS 5000, ESS 3000, ESS 3200, and ESS Legacy 45
You can view two more lines in the essinstallcheck output (in bold face) which mention that ESA is
activated (ESA activation indicates that the hardware call home is also configured for this ESS) and
software call home has been configured for this node. This is a very important check which enables
customers to configure hardware and software call home after the cluster creation and the file system
creation is done.
Remember: Enable the hardware and the software call home at the end of the ESS system deployment
when the file system is active, nodes are ready to serve the file system, and none of the configuration is
pending.
• Enable security on the I/O server nodes by running the security sub-command with the enable
option.
• Disable security on the EMS node by running the security sub-command with the disable option.
Protocol node consideration: You can also use these steps to enable security on protocol nodes.
# firewall-cmd --state
running
You can verify the open firewall ports by running firewall sub-command with the verify option.
When the command completes, the required ports in firewall are verified.
• Enable firewall on I/O server nodes by running the firewall sub-command with the enable option.
# firewall-cmd --state
running
You can verify the open firewall ports by running the firewall sub-command with the verify
option. When the command completes, the required ports in firewall are verified.
• Disable firewall on the EMS node by running the firewall sub-command with the disable option.
• Disable firewall on I/O server nodes by running the firewall sub-command with the disable
option.
Note: Make sure that you reboot the node when the selinux sub-command completes.
b) Reboot the node.
# systemctl reboot
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
c) Rerun the selinux sub-command with the enable option to enforce SELinux.
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
After SELinux is enabled, kernel logs any activity in the /var/log/audit/audit.log file.
• Enable SELinux on I/O server nodes as follows.
a) Run the selinux sub-command on the I/O server nodes.
Note: Make sure that you reboot the node when the selinux sub-command completes.
b) Reboot the I/O server nodes.
# systemctl reboot
Reboot the node after the command completes. When the node comes up after reboots, SELinux is
disabled.
You can check the status as follows.
# sestatus
SELinux status: disabled
• To disable SELinux on the I/O server nodes, use the following command.
Reboot the node after the command completes. When the node comes up after reboots, SELinux is
disabled. Any I/O server node name can also be used instead of the group name.
Additional information: Any mentioned security item is an optional feature and you can enable it on
demand for an ESS cluster. Security commands can be run using the essrun command after
deployment of the node is done and before creating the GPFS cluster. In upgrade cases, any such
security commands must be run after stopping the GPFS cluster. Do not attempt to run any security
command while GPFS cluster is up and running.
This command creates the gpfsadmin Linux user and gpfs Linux group on the node and performs all
necessary sudoers set up. For detailed information, see the /etc/sudoers.d/ess_sudoers file.
User can now log in to the node server using the gpfsadmin user and they can perform GPFS
administration tasks.
Make sure that the sudo sub-command is run on all GPFS nodes (EMS node, I/O server nodes, and any
client nodes) as part of the cluster to be completely compliant with the sudo requirement. Change the
node name in the sudo sub-command accordingly. Enabling sudo also allows the gpfsadmin user to
administer xCAT and the GPFS GUI on the EMS node.
Disabling sudo reverts the xCAT policy table to its previous state, deletes /etc/sudoers.d/
ess_sudoers file, and deletes the gpfsadmin user from the Linux node. Make sure that you have
disabled sudo user configuration on all GPFS nodes (EMS node, I/O server nodes, and any client nodes) as
part of the cluster to be completely compliant with the sudo requirement. Change the node name in the
sudo sub-command accordingly.
Important: You must not disable sudo user until the GPFS cluster is set to configure not to use sudo
wrapper and sudo user. Failing to do so might result in cluster corruption.
# mmlscluster
You can configure the cluster to use sudo by issuing the following command.
# mmlscluster
In the preceding mmlscluster command output, remote shell and remote copy commands are changed
to use sudo wrapper (sshwrap and scpwrap).
The sudoUser mmlsconfig parameter is now set to gpfsadmin.
# mmlsconfig sudoUser
sudoUser gpfsadmin
Important:
• The sudo sub-command must not be used for nodes other than the EMS node.
• The IBM Spectrum Scale GUI services must be restarted by using systemctl restart gpfsgui
after enabling or disabling sudo in a GPFS cluster.
• The sudo user password must be set to a new password before using it.
# mmlscluster
You can configure the cluster to not to use sudo by issuing the following command.
# mmlscluster
In the preceding mmlscluster command output, remote shell and remote copy commands are changed
to use ssh and scp instead of sudo wrapper (sshwrap and scpwrap).
The sudoUser mmlsconfig parameter is now set to undefined.
# mmlsconfig sudoUser
sudoUser (undefined)
Important:
• The sudo sub-command must not be used for nodes other than the EMS node.
• The IBM Spectrum Scale GUI services must be restarted by using systemctl restart gpfsgui
after enabling or disabling sudo in a GPFS cluster.
Important: The sudo sub-command must not be used for nodes other than the EMS node.
positional arguments:
{enable,disable,use_sudo_wrapper,no_sudo_wrapper}
optional arguments:
-h, --help show this help message and exit
--user SUDO_USER Provide sudo user name
--group SUDO_GROUP Provide group name
Note: After running this command any future deployment of new nodes only have the adminMode
attribute set to central, by default. For existing nodes in the cluster, you must update the xCAT
security context by running the following command.
2. Update the xCAT security context using the updatenode Node -k script.
# updatenode gss_ppc64,ces_ppc64 -V -k
...
Password: <Type EMS node root Password here>
...
...
Note:
• If you do not run the updatenode Node -k command, the central administration mode gets
enabled for any new nodes deployed using the current EMS node. However, existing nodes can still
do passwordless SSH between each other.
• In case of an upgrade, if you want to enable the central administration mode then run the same
commands.
• Make sure that you do not run updatenode admin_node -V -k on the EMS node which is the
admin node.
• Running the admincentral sub-command against non-container nodes is not allowed. In other
words, with the -N option the container node name must be specified as an argument.
The admincentral sub-command can be run after the deployment of the EMS node, I/O server nodes,
or protocol nodes is completed.
Note: After running this command any future deployment of new nodes only have the central
administration mode disabled. For existing nodes in the cluster, you must update the xCAT security
context by running the following command.
2. Update the xCAT security context using the updatenode Node -k script.
# updatenode gss_ppc64,ces_ppc64 -V -k
...
Password: <Type EMS node root Password here>
...
...
Note:
• If you do not run the updatenode Node -k command, the central administration mode gets
disabled for any new nodes deployed using the current EMS node. However, existing nodes cannot
do passwordless SSH between each other.
• In case of an upgrade, if you want to disable the central administration mode then run the same
commands.
• Make sure that you do not run updatenode admin_node -V -k on the EMS node which is the
admin node.
• Running admincentral sub-command against non-container nodes is not allowed. In other words,
with the -N option the container node name must be specified as an argument.
positional arguments:
{enable, disable}
optional arguments:
-h, --help show this help message and exit
allow Internet_IP_Clients_comma_separated
cd /install/ess/otherpkgs/rhels8/ppc64le/firmware/
sftp EMSNode
mput 01VL950_072_045.img
update_flash -v -f 01VL950_072_045.img
update_flash -f 01VL950_072_045.img
The system restarts and the firmware is upgraded. This process might take 30 - 45 minutes.
Note: If you plan to upgrade the POWER8 EMS firmware, you can retrieve the code from the following
location inside the container (the image file will be different):
/install/ess/otherpkgs/rhels7/ppc64le/firmware/
Note: The following time server setup documentation is for general reference. You can configure the time
server as suitable for your environment. In the simplest example, the EMS host is used as the time server
and the I/O nodes (or protocol nodes) are used as clients. Customers might want to have all nodes point
to an external time server. Use online references for more detailed instructions for setting up Chrony.
Chrony is the preferred method of setting up a time server. NTP is considered deprecated. Chrony uses
the NTP protocol.
For the following example steps, it is assumed that the EMS node is the chronyd server and there is no
public internet synchronization.
• Do the following steps on the EMS node, outside of the container.
a) Set the time zone and the date locally.
b) Edit the contents of the /etc/chrony.conf file.
Note: Replace the server and the allow range with the network settings specific to your setup.
chronyc makestep
• Do the following steps on the client nodes (canister nodes or ESS nodes).
a) Edit the contents of the /etc/chrony.conf file.
Note: Replace the server and the allow range with the network settings specific to your setup.
chronyc makestep
chronyc ntpdata
timedatectl
Prerequisites
• During file system creation, adequate space is available for CES shared root file system. For more
information, see “During file system creation, adequate space is available for CES shared root file
system” on page 63
• ESS container has the protocol node management IP addresses defined. For more information, see
“ESS container has the protocol node management IP addresses defined” on page 63.
• ESS container has the CES IP addresses defined. For more information, see “ESS container has the CES
IP addresses defined” on page 64.
During file system creation, adequate space is available for CES shared root file
system
In a default ESS setup, you can use the Ansible based file system task to create the recovery groups, vdisk
sets, and file system. By default, during this task, 100% of the available space is attempted to be
consumed. If you plan to include protocol nodes in your setup, you must leave enough free space for the
required CES shared root file system. Use the --size flag to adjust the space consumed accordingly.
For example: essrun -G ess_ppc64le filesystem --suffix=-hs --size 80%
Running this command leaves approximately 20% space available for the CES shared root file system or
additional vdisks. If you are in a mixed storage environment, you might not use the essrun filesystem
task due to more complex storage pool requirements. In that case, when using mmvdisk, make sure that
you leave adequate space for the CES shared root file system. The CES shared root file system requires
around 20 GB of space for operation.
ping IPAdress1,...IPAddressN
Each protocol node must respond to the ping test indicating they have an IP address set and it is on
the same subnet as the container.
2. Run the config load task.
If you have more than one node, you can specify them in a comma-separated list. Make sure that you
add all ESS nodes in this config load command before continuing.
3. Create network bonds.
Note: Make sure that the nodes are connected to the high-speed switch before doing this step.
5. Install IBM Spectrum Scale by using the installation toolkit and set up CES.
Use the IBM Spectrum Scale documentation for installing IBM Spectrum Scale by using the installation
toolkit and for enabling the required services for the customer environment. For more information, see:
• Using the installation toolkit to perform installation tasks: Explanations and examples.
• Adding CES HDFS nodes into the centralized file system.
• ESS awareness with the installation toolkit.
Prerequisites
• SSR has completed code 20 on both the ESS 3000 and ESS 5000 nodes (including EMS)
SSR works on Power® nodes and the EMS node first, then the ESS 3000 system.
• Public connection setup on C11-T3 (f3 connection on EMS)
• ESS 3000 and ESS 5000 nodes have been added to /etc/hosts
– Low-speed names FQDNs , short names, and IP addresses
– High-speed names FQDNs, short names, and IP addresses (add suffix of low-speed names)
• Host name and domain set on EMS
• Latest code for ESS 3000 and ESS 5000 stored in /home/deploy on EMS
• For information on how to deploy the ESS system, see ESS 3000 Quick Deployment Guide.
• For information on using the mmvdisk command, see mmvdisk in ESS documentation.
Note: This command creates a combined data and metadata vdisk in the system pool. The file system
name must be fs3k.
Type exit and press Enter to exit the container. Proceed with the instructions on how to setup the
collector, sensors, and run the GUI wizard.
The status of the current ESS 3000 container should be exited. To confirm, use the podman ps -a
command. For example:
If the ESS 3000 container is not in the stopped state, use the podman stop ContainerName command.
Note: If you plan to add protocol nodes in the cluster, include them in the list of nodes that you are
specifying in this command.
2. Update the nodes.
ssh ESS5000Node1
Note:
• Use the high-speed names.
• If there is an error, you might need to log in to each ESS 5000 node and start GPFS.
mmbuildgpl
mmstartup
Type exit and press Enter to exit the container. Running these commands, takes you to the ESS
5000 node.
5. Create mmvdisk artifacts.
a. Create the node class.
d. Define vdiskset.
mmvdisk vs define --vs vs_fs5k_1 --rg ess5k_rg1,ess5k_rg2 code 8+2p --bs 16M --ss 80% --
nsd-usage dataOnly --sp data
e. Create vdiskset.
Appendix I. Sample scenario: ESS 3000 and ESS 5000 mixed cluster and file system 67
Note: You need to understand the implications of this rule before applying it in your system.
When capacity on ESS 3000 reaches 75%, it migrates files (larger ones first) out of the system
pool to the data pool until the capacity reaches 25%.
h. On the EMS node, run the following command.
At this point, add the ESS 5000 nodes to the pmsensors list and use the Edit rack components option in
the GUI to slot the new nodes into the frame.
If you want to add protocol nodes, see Appendix H, “ESS protocol node deployment by using the IBM
Spectrum Scale installation toolkit,” on page 63.
# /bin/mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
4. Convert the P1 port of the device listed in the preceding command to Ethernet from InfiniBand.
5. Reboot they node and query the port type of all attached devices again.
You can run the following to see configuration parameter settings without setting them:
/usr/lpp/mmfs/samples/gss/gssClientConfig.sh -D
After running this script, restart GPFS on the affected nodes for the optimized configuration settings to
take effect.
Important: Do not run gssClientConfig.sh unless you fully understand the impact of each setting on
the customer environment. Make use of the -D option to decide if all or some of the settings might be
applied. Then, individually update each client node settings as required.
Supported paths
SL models (5U92)
• SL1 -> SL2
• SL2 -> SL3
• SL2 -> SL4
• SL3 -> SL4
• SL3 -> SL5
• SL4 -> SL5
• SL4 -> SL6
• SL5 -> SL6
• SL5 -> SL7
• SL6 -> SL7
SC models (4U106)
• SC1 -> SC2
• SC2 -> SC3
• SC2 -> SC4
• SC3 -> SC4
• SC3 -> SC5
• SC4 -> SC5
• SC4 -> SC6
• SC5 -> SC6
• SC5 -> SC7
• SC6 -> SC7
• SC6 -> SC8
• SC7 -> SC8
• SC7 -> SC9
• SC8 -> SC9
Prerequisites
1. All new or existing building blocks must be at ESS 6.1.0.0 or later. If there are protocol nodes in the
setup, they must also be upgraded to the matching ESS version.
2. If space needs to be made, for example for moving of the EMS, this has to be planned for accordingly.
3. LBS must wear an ESD wrist band when physically working on the hardware (like plugging in SAS
cables).
SSR tasks
SSR is responsible for the following tasks.
1. Code 20 of the new enclosures - replacing parts as needed.
2. Running or labeling the new SAS cable connections.
3. Potentially making space in the frame - Moving the EMS.
SSR is not responsible for checking system health using essutils like in a rackless or a rackful solution.
LBS tasks
LBS is responsible for the following tasks.
1. Upgrade of ESS 6.1.0.0 - prior to the capacity upgrade engagement.
2. Post capacity upgrade health checks.
3. Plugging the SAS cables into the adapters and enclosures.
4. Performing capacity upgrade software functions such as conversion and resizing.
5. New storage management functions such as adding new space to existing file system and creating a
new file system.
6. Restriping the file system.
7. Replacing any bad parts such as disks or cables.
8. Pre and post engagement operations
Flow
TDA process ensures that the customer is prepared for the capacity upgrade. Considerations such as if
there is enough room in the rack or usage of the file system space are planned out.
LBS
1. LBS performs normal ESS software upgrade. Customer must be at ESS 6.1.0.0 for the capacity
upgrade. This upgrade is treated as a separate engagement than the future capacity upgrade
operation.
Running mmvdisk recoverygroup list should show both RGs actively managed by
essio2-hs.
6. Plug in the SAS cables for essio1 on the server and enclosure ends. Shut down GPFS, only on the
server just modified, and then reboot the I/O node. Wait for 5 minutes for the node to reboot and
paths to be rediscovered. Run the following commands to ensure that essio1 has discovered the
new enclosures.
Note: Before shutting down GPFS, make sure that autoload is turned off (mmchconfig
autoload=no).
a. gssstoragequickcheck -N localhost
b. gssfindmissingdisks -N localhost
Both commands should return with no issues and recognize the new enclosure and disk counts.
The paths should also be without error. After this is complete, start IBM Spectrum Scale on the
node in question by using mmstartup. After determining that IBM Spectrum Scale is active by
using mmgetstate proceed to the next step.
7. Move the recovery group ownership to essio1-hs. Use the same commands as used in this step
but make sure to use the correct node name (essio1-hs).
After the preceding steps are complete, new enclosures have been successfully cabled to both
servers, proceed with the following final steps.
8. Rebalance both recovery groups by running from any node in the storage cluster.
a. mmvdisk rg list
b. mmvdisk recoverygroup change --recovery-group rg1 --active essio1-hs
c. mmvdisk recoverygroup change --recovery-group rg2 --active essio2-hs
d. Check that the ownership has changed using the mmvdisk recoverygroup list
command.
9. Perform the system verification steps again before proceeding.
10. Update enclosure and drive firmware. If there are any issues, you should stop and replace any
disks or enclosures that could not be updated for some reason.
CurrentIoServer implies running the command from either of I/O server nodes in the building
block.
Note: It might take up to an hour for the firmware upgrade to complete. You might notice that the
fan starts to run at high speed. This is a known issue.
a. CurrentIoServer$ mmchfirmware --type storage-enclosure
b. CurrentIoServer$ mmchfirmware --type drive
c. mmhealth node show -N all --verbose- This command shows any system health
related issues to address. (Run from any node in the storage cluster.)
d. gnrhealthcheck - This command determines if there are any issues in various areas of ESS.
Any problems that show up must be addressed before capacity upgrade starts.
11. Add new storage into recovery groups.
mmvdisk rg resize –-rg rg_essio1-hs,rg_essio2-hs -v no
12. Verify that the new storage is available and the DA is rebalancing.
13. Start up the GUI and use Edit rack components to have the GUI discover the new topologies and
make changes to the frame accordingly. Changes such as modify ESS model to consume more U
space, move EMS, and so on.
14. Reconfigure call home.
At this point, discussions with the customers need to occur on what to do with the free space.
1. Add to the existing file system?
a. See the add building block flow in ESS 5.3.x Quick Deployment Guide for tips on creating new
NSDs and adding to an existing file system.
b. Consider file system restripe at the end which might take time. (mmrestripefs FileSystem -b)
2. Create a new file system?
• See the installation section on how to use essrun on creating a new file system from inside the
container. You may also use mmvdisk commands directly to perform this operation.
ETHERNET PORTS 1- 12
FOR ESS 3200
SMN/C11/T2 T2
SMN/C11/T1
The system displays the 11S serial number similar to the following:
01FT690YA50YD7BGABX
4. Change the default password to the 11S password by using the following command:
cumulus@accton-1gb-mgmt:~$ passwd
5. Log in through SSH or console and log in with the new 11S password to validate the changes.
Note: The default password must be set to the 11S serial number 01FT690YA50YD7BGABX. If not,
the password must be CumulusLinux!.
Mgmt0
Console Port
USB Port
sudo su -
9. Copy the contents of the interface file to the file name /etc/network/interfaces and save the
file.
Note: You can use vi or modify this file.
10. Reload the interfaces by using the following command:
root@cumulus:/etc/network# ifreload -a
root@cumulus:/etc/network# ifquery-a
12. If required, set switch network. It is recommended to set a static IP to log remotely on the switch. For
example, 192.168.44.0/24 network IP switch 192.168.44.20, gateway 192.168.44.1.
• net add interface eth0 IP address 192.168.44.20/24
# Set tag
/bin/ipmitool lan set 1 vlan id 101
# Confirm tag
/bin/ipmitool lan print 1 | grep -i 'VLAN ID'
auto swp10
iface swp10
bridge-pvid 102
bridge-vids 101
Any ports that you designate as IBM Elastic Storage System ports need to have this configuration.
Consult the default IBM Elastic Storage System interfaces file for more information.
4. Copy the new interfaces file to the switch.
5. Reload and verify the interfaces.
6. Set the VLAN tags on the IBM Elastic Storage System canisters.
# Bridge setup
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports glob swp1-48
bridge-pvid 101
bridge-pvid 102
bridge-stp off
Goal
The goal is to enable the customer or SLS to swap out all the POWER8 ESS nodes in the cluster with the
new POWER9 (5105-22E) nodes without taking the cluster or the file system down.
High-level flow
Example environment:
• 1 x POWER8 EMS
• 2 x POWER8 ESS GLxC building-blocks
– Each building block is in its own failure group
– Metadata replication between failure groups
Appendix O. Replacing all POWER8 nodes in an environment with POWER9 nodes in online mode 91
92 IBM Elastic Storage System: Quick Deployment Guide
Accessibility features for the system
Accessibility features help users who have a disability, such as restricted mobility or limited vision, to use
information technology products successfully.
Accessibility features
The following list includes the major accessibility features in IBM Spectrum Scale RAID:
• Keyboard-only operation
• Interfaces that are commonly used by screen readers
• Keys that are discernible by touch but do not activate just by touching them
• Industry-standard devices for ports and connectors
• The attachment of alternative input and output devices
IBM Documentation, and its related publications, are accessibility-enabled.
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21,
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Corporation
Dept. 30ZA/Building 707
Mail Station P300
2455 South Road,
Poughkeepsie, NY 12601-5400
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment or a fee.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
The registered trademark Linux is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Red Hat and Ansible are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the
United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
B
building block
A pair of servers with shared disk enclosures attached.
BOOTP
See Bootstrap Protocol (BOOTP).
Bootstrap Protocol (BOOTP)
A computer networking protocol that is used in IP networks to automatically assign an IP address to
network devices from a configuration server.
C
CEC
See central processor complex (CPC).
central electronic complex (CEC)
See central processor complex (CPC).
central processor complex (CPC)
A physical collection of hardware that consists of channels, timers, main storage, and one or more
central processors.
cluster
A loosely-coupled collection of independent systems, or nodes, organized into a network for the
purpose of sharing resources and communicating with each other. See also GPFS cluster.
cluster manager
The node that monitors node status using disk leases, detects failures, drives recovery, and selects
file system managers. The cluster manager is the node with the lowest node number among the
quorum nodes that are operating at a particular time.
compute node
A node with a mounted GPFS file system that is used specifically to run a customer job. ESS disks are
not directly visible from and are not managed by this type of node.
CPC
See central processor complex (CPC).
D
DA
See declustered array (DA).
datagram
A basic transfer unit associated with a packet-switched network.
DCM
See drawer control module (DCM).
E
Elastic Storage System (ESS)
A high-performance, GPFS NSD solution made up of one or more building blocks. The ESS software
runs on ESS nodes - management server nodes and I/O server nodes.
ESS Management Server (EMS)
An xCAT server is required to discover the I/O server nodes (working with the HMC), provision the
operating system (OS) on the I/O server nodes, and deploy the ESS software on the management
node and I/O server nodes. One management server is required for each ESS system composed of
one or more building blocks.
encryption key
A mathematical value that allows components to verify that they are in communication with the
expected server. Encryption keys are based on a public or private key pair that is created during the
installation process. See also file encryption key (FEK), master encryption key (MEK).
ESS
See Elastic Storage System (ESS).
environmental service module (ESM)
Essentially, a SAS expander that attaches to the storage enclosure drives. In the case of multiple
drawers in a storage enclosure, the ESM attaches to drawer control modules.
ESM
See environmental service module (ESM).
Extreme Cluster/Cloud Administration Toolkit (xCAT)
Scalable, open-source cluster management software. The management infrastructure of ESS is
deployed by xCAT.
F
failback
Cluster recovery from failover following repair. See also failover.
failover
(1) The assumption of file system duties by another node when a node fails. (2) The process of
transferring all control of the ESS to a single cluster in the ESS when the other clusters in the ESS fails.
See also cluster. (3) The routing of all transactions to a second controller when the first controller fails.
See also cluster.
G
GPFS cluster
A cluster of nodes defined as being available for use by GPFS file systems.
GPFS portability layer
The interface module that each installation must build for its specific hardware platform and Linux
distribution.
GPFS Storage Server (GSS)
A high-performance, GPFS NSD solution made up of one or more building blocks that runs on System
x servers.
GSS
See GPFS Storage Server (GSS).
H
Hardware Management Console (HMC)
Standard interface for configuring and operating partitioned (LPAR) and SMP systems.
Glossary 99
HMC
See Hardware Management Console (HMC).
I
IBM Security Key Lifecycle Manager (ISKLM)
For GPFS encryption, the ISKLM is used as an RKM server to store MEKs.
independent fileset
A fileset that has its own inode space.
indirect block
A block that contains pointers to other blocks.
inode
The internal structure that describes the individual files in the file system. There is one inode for each
file.
inode space
A collection of inode number ranges reserved for an independent fileset, which enables more efficient
per-fileset functions.
Internet Protocol (IP)
The primary communication protocol for relaying datagrams across network boundaries. Its routing
function enables internetworking and essentially establishes the Internet.
I/O server node
An ESS node that is attached to the ESS storage enclosures. It is the NSD server for the GPFS cluster.
IP
See Internet Protocol (IP).
IP over InfiniBand (IPoIB)
Provides an IP network emulation layer on top of InfiniBand RDMA networks, which allows existing
applications to run over InfiniBand networks unmodified.
IPoIB
See IP over InfiniBand (IPoIB).
ISKLM
See IBM Security Key Lifecycle Manager (ISKLM).
J
JBOD array
The total collection of disks and enclosures over which a recovery group pair is defined.
K
kernel
The part of an operating system that contains programs for such tasks as input/output, management
and control of hardware, and the scheduling of user tasks.
L
LACP
See Link Aggregation Control Protocol (LACP).
Link Aggregation Control Protocol (LACP)
Provides a way to control the bundling of several physical ports together to form a single logical
channel.
logical partition (LPAR)
A subset of a server's hardware resources virtualized as a separate computer, each with its own
operating system. See also node.
M
management network
A network that is primarily responsible for booting and installing the designated server and compute
nodes from the management server.
management server (MS)
An ESS node that hosts the ESS GUI and xCAT and is not connected to storage. It must be part of a
GPFS cluster. From a system management perspective, it is the central coordinator of the cluster. It
also serves as a client node in an ESS building block.
master encryption key (MEK)
A key that is used to encrypt other keys. See also encryption key.
maximum transmission unit (MTU)
The largest packet or frame, specified in octets (eight-bit bytes), that can be sent in a packet- or
frame-based network, such as the Internet. The TCP uses the MTU to determine the maximum size of
each packet in any transmission.
MEK
See master encryption key (MEK).
metadata
A data structure that contains access information about file data. Such structures include inodes,
indirect blocks, and directories. These data structures are not accessible to user applications.
MS
See management server (MS).
MTU
See maximum transmission unit (MTU).
N
Network File System (NFS)
A protocol (developed by Sun Microsystems, Incorporated) that allows any host in a network to gain
access to another host or netgroup and their file directories.
Network Shared Disk (NSD)
A component for cluster-wide disk naming and access.
NSD volume ID
A unique 16-digit hexadecimal number that is used to identify and access all NSDs.
node
An individual operating-system image within a cluster. Depending on the way in which the computer
system is partitioned, it can contain one or more nodes. In a Power Systems environment,
synonymous with logical partition.
node descriptor
A definition that indicates how ESS uses a node. Possible functions include: manager node, client
node, quorum node, and non-quorum node.
node number
A number that is generated and maintained by ESS as the cluster is created, and as nodes are added
to or deleted from the cluster.
node quorum
The minimum number of nodes that must be running in order for the daemon to start.
node quorum with tiebreaker disks
A form of quorum that allows ESS to run with as little as one quorum node available, as long as there
is access to a majority of the quorum disks.
Glossary 101
non-quorum node
A node in a cluster that is not counted for the purposes of quorum determination.
O
OFED
See OpenFabrics Enterprise Distribution (OFED).
OpenFabrics Enterprise Distribution (OFED)
An open-source software stack includes software drivers, core kernel code, middleware, and user-
level interfaces.
P
pdisk
A physical disk.
PortFast
A Cisco network function that can be configured to resolve any problems that could be caused by the
amount of time STP takes to transition ports to the Forwarding state.
R
RAID
See redundant array of independent disks (RAID).
RDMA
See remote direct memory access (RDMA).
redundant array of independent disks (RAID)
A collection of two or more disk physical drives that present to the host an image of one or more
logical disk drives. In the event of a single physical device failure, the data can be read or regenerated
from the other disk drives in the array due to data redundancy.
recovery
The process of restoring access to file system data when a failure has occurred. Recovery can involve
reconstructing data or providing alternative routing through a different server.
recovery group (RG)
A collection of disks that is set up by ESS, in which each disk is connected physically to two servers: a
primary server and a backup server.
remote direct memory access (RDMA)
A direct memory access from the memory of one computer into that of another without involving
either one's operating system. This permits high-throughput, low-latency networking, which is
especially useful in massively-parallel computer clusters.
RGD
See recovery group data (RGD).
remote key management server (RKM server)
A server that is used to store master encryption keys.
RG
See recovery group (RG).
recovery group data (RGD)
Data that is associated with a recovery group.
RKM server
See remote key management server (RKM server).
S
SAS
See Serial Attached SCSI (SAS).
T
TCP
See Transmission Control Protocol (TCP).
Transmission Control Protocol (TCP)
A core protocol of the Internet Protocol Suite that provides reliable, ordered, and error-checked
delivery of a stream of octets between applications running on hosts communicating over an IP
network.
V
VCD
See vdisk configuration data (VCD).
vdisk
A virtual disk.
vdisk configuration data (VCD)
Configuration data that is associated with a virtual disk.
X
xCAT
See Extreme Cluster/Cloud Administration Toolkit.
Glossary 103
104 IBM Elastic Storage System: Quick Deployment Guide
Index
A N
accessibility features 93 notices 95
audience ix
O
C
overview
call home of information ix
5146 system 29
5148 System 29
background 29
P
overview 29 patent information 95
problem report 37 preface ix
problem report details 38
Call home
monitoring 41 R
Post setup activities 45
resources
test 43
on web x
upload data 42
comments xii
S
D submitting xii
documentation
on web x T
trademarks 96
E troubleshooting
call home 29, 30, 42
Electronic Service Agent
call home data upload 42
activation 30
call home monitoring 41
configuration 34
Electronic Service Agent
Installing 30
problem details 38
login 30
problem report creation 37
Reinstalling 42
ESA 30, 34, 42
Uninstalling 42
Post setup activities for call home 45
testing call home 43
I
IBM Spectrum Scale W
call home
web
monitoring 41
documentation x
Post setup activities 45
resources x
test 43
upload data 42
Electronic Service Agent 30, 42
ESA
activation 30
configuration 34
create problem report 37, 38
login 30
problem details 38
information overview ix
L
license inquiries 95
Index 105
106 IBM Elastic Storage System: Quick Deployment Guide
IBM®
SC28-3301-01