KEMBAR78
RMA Process for HPE Servers | PDF | Solid State Drive | Cache (Computing)
0% found this document useful (0 votes)
362 views29 pages

RMA Process for HPE Servers

The document provides guidance on the RMA process for HPE servers and appliances. It outlines how to determine if a component part replacement or full appliance replacement is needed. It details the steps engineers should take to request an RMA, including obtaining proof of failure from the customer and filling out an RMA template. The document also provides examples for customers to show evidence of a hardware failure using the appliance's web GUI, CLI, iLO access, or physical evidence like photos.

Uploaded by

Jason Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
362 views29 pages

RMA Process for HPE Servers

The document provides guidance on the RMA process for HPE servers and appliances. It outlines how to determine if a component part replacement or full appliance replacement is needed. It details the steps engineers should take to request an RMA, including obtaining proof of failure from the customer and filling out an RMA template. The document also provides examples for customers to show evidence of a hardware failure using the appliance's web GUI, CLI, iLO access, or physical evidence like photos.

Uploaded by

Jason Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

RMA Process

Overview
4/19/2019
Component vs. Full Appliance
• We have the ability to send a number of field replacement units.
• These include: Hard drives, power supply units, memory DIMMS, battery
capacitor packs, SMART storage batteries, & fans
• Note: Battery capacitor packs will only be eligible for G8 servers
• Note: SMART storage batteries will only be eligible for G9 servers
• Any other failure that it outside the scope of one of the above
mentioned parts, must result in the replacement of the entire server.
• If full appliance is required, the engineer should send an email to the
appropriate backline team to review and approve the RMA. This should
be done after the customer has provided all evidence of the failure.
Customer Self
Repair Services
Media Library Click icon to add picture
For component part replacement, we may provide
the customer with the below HPE link. This website
has a library of replacement videos. You may send
this to the customer to assist in their replacement.
https://sml-csr.ext.hpe.com/
Once at the main page, the customer needs to
navigate to the correct product.
G7 servers: Servers ProLiant DL Servers HPE
ProLiant DL360 G7 ServerRemove/Replace videos
G8 servers: Servers ProLiant DL Servers HPE
ProLiant DL360p Gen8 ServerRemove/Replace
videos
G9 servers: Servers ProLiant DL Servers HPE
ProLiant DL360 G9 ServerRemove/Replace videos
Once they click “Remove/Replace videos”, a new
screen will open and they can select the correct
video.
Detailed View
• Customer contacts support either through the portal or phone.
• Engineer begins troubleshooting steps to determine the issue.
• If/when it is determined to be hardware related, the engineer must ask the customer
to provide evidence to demonstrate the failure (examples later).
• Once the customer provides necessary proof of the failure, the engineer must send
the RMA template to the customer to complete.
• Note: We need the serial number of the appliance itself, not the specific failed part. ArcSight
appliances will always begin with “SGH”.
• If component part, engineer should send completed template with the technical
reason to arst-rma@microfocus.com.
• If appliance, the engineer needs to receive approval from backline first. Once
approved, send completed template with technical reason to
arst-rma@microfocus.com.
RMA Template
• Please ensure all the
information is provided
upon sending the RMA
request email. If
information is missing, it
will delay the process.
• Along with providing the
technical reason, also
attach all evidence the
customer provided to the
outlook email.
Detailed View Continued
• Once the template is sent to the ARST RMA PDL, the RMA Team will take over.
• First step is verifying the entitlement using the appliance serial number. Some customers open cases using
incorrect contract information which is why we verify before continuing.
• If support is active, RMA Team will begin the order process.
• If support is inactive, RMA Team will email appropriate renewal team and ask for verification.
• RMA Team will duplicate engineers ticket to create the new RMA Tracking Ticket. This is created to provide
the end customer with updates regarding the actual shipment of the RMA. There should be no additional
troubleshooting occurring here.
• Once the part or appliance is ordered, RMA Team will send the customer tracking information and/or ETA.
• If appliance, RMA Team will request a new license from appropriate license team and ensure the
customers’ contract is updated with the new serial number.
• Engineers should keep their technical support case open until the customer has received their
replacement and successfully brought their appliance back into production. Do not close the case until
the customer has confirmed they are back online.
• Additional support may still be needed after the customer receives their replacement. That support should come from
the original engineer, not the RMA team.
Evidence of the Failure
• There are multiple ways a customer can provide evidence of their
failure. Please test all of the procedures on a lab appliance to
familiarize yourself with the results and choose the method that best
suits your customers’ skills and access levels.
1. ArcSight Web GUI Interface
2. Command Line Interface (CLI)
3. iLO Remote Access
4. Physical Evidence (Photographs)
5. Logs or Snapshots
ArcSight Web GUI Interface
• Banner Across web display
• When the appliance is accessed through the web GUI the customer will
notice a banner across the top of the screen. This banner can indicate a hard
drive has failed. The customer can simply capture a screenshot of the window
to be used for the RMA process.

• System Admin Storage RAID Controller


• Customer can navigate to the System Admin tab in the web GUI and select
RAID Controller under Storage. This will show more detailed information on
the status of the RAID Controller as well as the individual drives.
System Admin Storage RAID
Banner Across Web Display Controller
Full output not shown
RAID Controller Configuration Logical Drive #1
General Controller Information
Size: 1.4 TB
Bus Interface: PCI Fault Tolerance: 5
Slot: 0 Heads: 255
Serial Number: PDNLH0BRH123RP Sectors Per Track: 63
Cache Serial Number: PDNLH0BRH123RP Cylinders: 65535
RAID 6 (ADG) Status: Enabled Strip Size: 64 KB
Controller Status: OK Full Stripe Size: 192 KB
Hardware Revision: B Status: OK
Firmware Version: 4.52 MultiDomain Status: OK
Rebuild Priority: High Caching: Enabled
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle Parity Initialization Status: Initialization Completed
Queue Depth: Automatic
Unique Identifier: 600508B1001C1A0946A1298526E04461
Monitor and Performance Delay: 60 min
Disk Name: /dev/sda Mount Points: /boot 512 MB Partition Number 3, / 350.0 GB
Elevator Sort: Enabled
Partition Number 4, /opt/arcsight/userdata 1.0 TB Partition Number 6
Degraded Performance Optimization: Disabled
OS Status: LOCKED
Inconsistency Repair Policy: Disabled
Logical Drive Label: 00E0B758PDNLH0BRH123RPF084
Wait for Cache Room: Disabled
Drive Type: Data
Surface Analysis Inconsistency Notification: Disabled
LD Acceleration Method: Controller Cache
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 2.0 GB
Total Cache Memory Available: 1.8 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 51
Cache Module Temperature (C): 43
Command Line Interface (CLI)
• By using SSH to access their appliance Linux interface, customers can
use the Command Line Interface (CLI). Customers will need root
access to their appliance. Some of this information will look the same
as the outputs are very similar.
• hpacucli ctrl slot=0 pd all show
• hpacucli controller slot=0 show config
• hpacucli controller slot=0 show config detail
hpacucli ctrl slot=0 pd all show
[root@n15-214-136-h42 ~]# hpacucli ctrl slot=0 pd all show
Smart Array P440ar in Slot 0 (Embedded)
array A
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 6001.1 GB, OK)
hpacucli controller slot=0 show
config
[root@n15-214-136-h42 ~]# hpacucli controller slot=0 show config

Smart Array P440ar in Slot 0 (Embedded) (sn: PDNLH0BRH123RP)


Internal Drive Cage at Port 1I, Box 1, OK
array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (1.4 TB, RAID 5, OK)
logicaldrive 2 (15.0 TB, RAID 5, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 6001.1 GB, OK)
hpacucli controller slot=0 show
config detail
• This is very detailed and contains the same information in the web
GUI RAID controller output.
• This is helpful when trying to determine if the appliance has a bad battery
capacitor pack.
iLO Remote Access
• There are multiple places from the iLO that can reveal the cause of a
hardware failure. The summary page usually isn’t enough to verify what
piece of hardware has failed.
• Information System Information Storage Information
• Information System Information Device Inventory
• Information System Information Power
• Information System Information Fans
• Information>Active Health System Log
• Some of these should be used in conjunction of one another.
• Example: If Gen9, cache module on Storage Information tab is showing as failed,
then check the status of the SMART storage battery on the Device Inventory tab.
Information System Information
Storage Information
• Good for verifying hard drive
failures and showing the
status of the Cache Module.
• Cache Module could show as
failed either due to a bad
battery capacitor pack (Gen8)
or bad SMART storage battery
(Gen9). If neither of those are
bad, chances are the module
itself is bad. This would result
in full appliance RMA.
Information System Information
Device Inventory
• Good for showing the status of the SMART storage battery.
Information System Information
Power
• Good for showing the status of the power supply units.
Information System Information
Fans
• Good for showing the status of the fans.
Information>Active Health System
Log
• Customers can navigate to the
AHSL log under Information in the
iLO web interface. They can enter
support information and save the
data to a file which can be used by
TSE’s to view appliance health
history. The file will be saved in
an .ahs format and can only be
viewed using an AHS logviewer
provided by HPE. The viewer can
be accessed on the following URL:
https://ahsv.support.hpe.com/hpe
sc/public/home/signin
Example of Active Health System
Viewer:
Physical Evidence (Photographs)
• Some customers have log restrictions but we still need them to
provide evidence of the failure. They can do so by taking pictures of
the physical equipment.
• Front Panel
• System Insight Display
• Rear Panel
Front Panel
Mainly used to verify hard drive Can also show the below:
failures.
System Insight Display
• Can be used to show the status of the power supply units, memory
DIMMS, and fans.
Rear Panel
• Used to show the status of the power supply units.
Logs or Snapshots
• Customers can use their web interface to either retrieve logs or
snapshots from their appliances that can be uploaded to their ticket
for the engineer to review. Once the logs are downloaded and
unzipped we can view the log files that contain specific information
related to RAID and HDD health.
• /tmp/ Cli.txt
• /tmp/ sel_list.txt
/tmp/ Cli.txt
Smart Array P440ar in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: PDNLH0BRH1572N
Cache Serial Number: PDNLH0BRH1572N
RAID 6 (ADG) Status: Enabled
Controller Status: OK

<Details Removed>

Physical Drives
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 6001.1 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 6001.1 GB, OK)
/tmp/ sel_list.txt
b | 05/12/2017 | 09:15:16 | Battery #0x47 | Failed | Asserted
c | 05/12/2017 | 09:15:18 | Battery #0x47 | Failed | Asserted
d | 05/12/2017 | 09:15:42 | Battery #0x47 | Failed | Asserted
e | 05/12/2017 | 09:16:08 | Battery #0x47 | Failed | Asserted
f | 05/12/2017 | 09:16:30 | Battery #0x47 | Failed | Asserted
Resources
• https://irock.jiveon.com/groups/support-team-logger/blog/2017/10/02/rma-guidelines-for-hardware-replace
ment
• https://irock.jiveon.com/groups/ilo/blog/2017/11/18/hpe-ilo-documentation-license-firmware-updates-supp
ort
• HPE ProLiant DL360 Gen9 Server Maintenance and Service Guide
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7252836&docLocale=en_US&docId=emr_na-c04441985
• HPE ProLiant Troubleshooting Guide DL360 Gen9: Volume1
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7252836&docLocale=en_US&docId=emr_na-c04444029
• HPE ProLiant Troubleshooting Guide DL360 Gen9: Volume 2
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7252836&docLocale=en_US&docId=emr_na-c04443553
• HPE ProLiant DL360 Gen8 Server Maintenance and Service Guide
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5177953&docLocale=en_US&docId=emr_na-c03242811
• HPE ProLiant DL360 Gen8 Server Troubleshooting Guide: Volume 1
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5177953&docLocale=en_US&docId=emr_na-c03257154
• HPE ProLiant Troubleshooting Guide DL360 Gen8: Volume 2
• http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5177953&docLocale=en_US&docId=emr_na-c03230516

You might also like