KEMBAR78
Cloud Computing Test | PDF | Graphics Processing Unit | Computer Hardware
0% found this document useful (0 votes)
27 views11 pages

Cloud Computing Test

Uploaded by

Duy Tuan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Cloud Computing Test

Uploaded by

Duy Tuan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

PowerSizer

Cloud Computing
Test
VDI_TEST AI

Company Name: VDI

Sizer ID: ai_25633

Date: 26/07/2024

Internal Use - Confidential


R760XA
Server
Servers RU Per Server
138 2

GPU
Solution Highlights
GPU Model GPU Distribution VRAM per GPU
● Up to two 4th Generation Intel® Xeon® Scalable
L40S 4 GPU x 137 servers 48 GB processors with up to 56 cores per processor
● 2U Mainstream-designed, air-cooled
Total VRAM 2 GPU x 1 server
● Standard (1070mm) rack capable
26400 GB
● 32 DDR5 DIMMs
● 4800 MT/s
Performance ● Up to 4 x DW or 12 x SW PCIe Gen5 GPUs powered by
Estimated Throughput NVIDIA, Intel and AMD

440K inferences per second ● NVLINK, XGMI, & XeLink Bridging support enables scaling
of memory and performance to enhance GPU focused
applications
● Multi-Instance GPU (MIG) enabled for multi-tenancy
● Up to 4 x 16 PCIe Gen5 slots
● OCP 3.0 for network cards

Internal Use - Confidential


R760XA
Back-end Fabric
Switch Type No. of Leaf Switches No. of Spine Switches Leaf Switch Port Mode No. of Leaf Breakout Ports
Z9664F 5 3 4x100GbE 552

NIC Type Number of NICs NICs Distribution GPU to NIC Ratio NIC Port Mode
Nvidia ConnectX-6 DX 275 2 NIC x 137 servers 2:1 2x100GbE
1 NIC x 1 server

CPU
Processor | Cores | Speed (GHz) No. CPU per Server
6448Y | 32 | 2.10 2

Memory
Memory Population DIMMs Size (GB) | Speed (MT/s) DIMMs No. Total Configured Memory Max. Configurable Memory
OPTIMIZED_BALANCED 32 | 4800 16 512 GiB 8192 GiB

Total Server(s) Power


Sustained Load TDP Sustained Load TDP Per Server Peak Load Peak Load Per Server

427662 W 3099 W 660468 W 4786 W

Internal Use - Confidential


PowerSizer

Sizing Input

Internal Use - Confidential


Sizing Input
Workload Sizing Input
Workload Use Case Sizing Profile Concurrent Inferences
General AI GPU Inferencing High Performance 550
System Model Type NIC Type GPU Vendor
Data Center/Core Object Detection Nvidia ConnectX NVIDIA

Internal Use - Confidential


PowerSizer

Other Solution Options

Internal Use - Confidential


Other Solution Options
System GPU Estimated Throughput (Inference
Per Sec)

138 x R760XA 550 x L40S 440K

275 x R7625 550 x L40S 440K

Internal Use - Confidential


Sizing Explanations &
PowerSizer

Notes

Internal Use - Confidential


Sizing Explanations and Notes
• Estimated Inferences per Second are estimated from MLPerf Datacenter-Inference data with the given GPU.
• Precision is FP16. TFLOPS assume sparsity.
• Sizing Profile of "Value" consists of servers with only L4 GPUs.
• Sizing Profile of "Balanced" consists of servers with only L40 GPUs.
• Sizing Profile of "High Performance" consists of servers with only L40S GPUs.
• General inferencing performance is per GPU. Adding GPUs does not improve performance but adds concurrency slots.
• At this time, the use of Multi-Instance GPUs (MIG) or slicing a GPU is not considered in the calculations for the number of GPUs recommended, the number of
systems, or the time to complete fine tuning.
• Time estimates assume no network bottlenecks or storage impediments.
• Time estimates may not account for additional network overhead needed for communication across multiple clusters.
• Additional optimization through both model specific tuning and framework/middleware changes may result in higher performance or lower memory usage
• Power estimates are based on peak load and should be interpreted as maximum power guidelines (worst case scenario).
• In the case of AI inference, performance is determined by network latency rather than bit rate, thus 100 Gb/s ports are recommended in this case.

Internal Use - Confidential


Spine-Leaf (Z9664 – 100 GbE Leaf Ports)
Full bisection bandwidth

Max 8192 GPUs


Leaf Switches
Max 8192 Leaf Ports, 100GbE 128 x 100G (QSFP28) 128 x 100G (QSFP28) 128 x 100G (QSFP28) 12 x 100G (QSFP28)


Z9664F Z9664F Z9664F Z9664F

Up to 64 leaf switches

Up to 32 spine switches n x 400GbE (QSFP58-DD)

Spines
Server/Leaf Connections:
100 GbE, QSPF28 Z9664F Z9664F Z9664F Z9664F

1:2 NIC/GPU (2 x 100GbE NIC setup)

Leaf/Spine Links: 400GbE


n x 400GbE (QSFP58-DD)
Leaf Switches Z9664
Z9664F Z9664F Z9664F Z9664F
Max 128 x 100GbE leaf ports/switch …
Max 32 x 400 GbE leaf/spine ports
128 x 100G (QSFP28) 128 x 100G (QSFP28) 128 x 100G (QSFP28) 128 x 100G (QSFP28)

Spine Switches: Z9664 Leaf Switches

Internal Use - Confidential


Internal Use - Confidential

You might also like