NVIDIA T4 FOR VIRTUALIZATION
Nicola Sessions, February 2019
• Why Choose NVIDIA T4 for Virtualization?
• NVIDIA T4 Performance for Virtualization Workloads
AGENDA • Selecting the Right GPU for Your Virtualization Workload
2
ANNOUNCING NVIDIA T4 FOR VIRTUALIZATION
The New Generation of Computer Graphics on a Quadro Virtual Data Center Workstation
• Virtual Quadro Workstation for the Professional
Designer & Data Scientist:
• Up to 2X graphics performance versus M60
• 5 Giga Rays per second for real-time,
interactive rendering
• NGC support; run deep learning inferencing
workloads 25x faster than CPU on a virtual
machine
• Virtual PCs for the Knowledge Worker:
• Up to 33% improved performance versus CPU
only VMs
• Support for VP9 decode and H.265 encode
and decode for improved CPU offload
3
DRIVING NEW WORKFLOWS
Empowering the Modern Digital Workplace
Digital Workplace Photorealistic Rendering Data Science
Windows 10 & Productivity Apps Increasingly Complex Designs Increase in AI/DL & Inference
4
RTX PERFORMANCE IN A
QUADRO VIRTUAL WORKSTATION
Support for up to 5 Giga Rays/Sec
Media & Entertainment Manufacturing Architecture
Real-time Rendering Simulation, modeling, design Rendering, design
5
NVIDIA T4 KEY SPECIFICATIONS
GPU Architecture NVIDIA Turing
NVIDIA CUDA® Cores 2,560
NVIDIA Turing™ Tensor Cores 320
RT Cores 40
Giga Rays/second 5
Memory Size 16 GB GDDR6
Memory BW Up to 320 GB/s
vGPU Profiles 1 GB, 2 GB, 4 GB, 8 GB, 16 GB
PCIe 3.0 single slot
Form Factor
(half height & length)
Power 70W
Thermal Passive
6
LATEST GENERATION
QUADRO VIRTUAL WORKSTATION
Work Faster with Larger Models
Quadro Virtual Workstations
Continued performance 1.6 1.5
increases with latest 1.4
generation GPUs 1.2 [VALUE]
1.0
1
Added AI support and ray
tracing support with 0.8
Tensor and RT cores 0.6
0.4
0.2
0
M60 P4 T4
3D Graphics: 1.5x performance
SPECviewperf13
SPECviewperf 13 results tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver
410.87/412.10, VM config, Windows 10, 8 vCPU, 16GB memory. 7
HIGHEST GRAPHICS PERFORMANCE
ON A VIRTUAL WORKSTATION
Work Faster with Larger Models
SPECviewperf13
Relative Performance
2.5
Up to 2X performance 2.2
compared to M60 2
1.6
2X framebuffer compared to 1.5 1.5 1.5 1.5
1.5
P4 to support larger models 1.2
1.2 M60
1.1
Professional Performance 1 P4
Healthcare T4
Oil & Gas 0.5
Media & Entertainment
Manufacturing
0
Geomean Medical Energy 3ds Max Maya CATIA Creo Siemens NX SOLIDWORKS
Healthcare Oil & Gas Media & Ent Manufacturing/Product Design
SPECviewperf 13 results tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver
410.87/412.10, VM config, Windows 10, 8 vCPU, 16GB memory. 8
RUN RTX APPLICATIONS
ON A VIRTUAL WORKSTATION
Quadro vDWS with RTX-Capable NVIDIA T4
Run applications built on the RTX platform,
the most powerful rendering platform, on
any device, anywhere
Real-time ray tracing performance of up to
5 Giga Rays per second
Accelerate batch rendering for faster time
-to-market
AI-enhanced denoising speeds creative
workflows
Photorealistic design with accurate
shadows, reflections & refractions
9
NVIDIA T4 WITH QUADRO vDWS
Real-Time Inference Performance
Video Inference
Quadro Virtual Workstation for 30
deep learning inferencing
Speedup vs. CPU Server
workloads 25
25X
20
Support for NVIDIA GPU Cloud
(NGC) 15
Ideal for deep learning labs and 10
classrooms 5
0
CPU VM T4 & Quadro vDWS
Speedup: 25x faster
ResNet-50 (7ms latency limit)
Tested on a server with Intel Xeon Gold 6154 (18C, 3.0 GHz), Quadro vDWS with T4-16Q, VMware ESXi 6.7, host/guest driver 410.87/412.10, VM config, Ubuntu 10
16.04, 8 vCPU, 32GB memory. 25X performance improvement over CPU VM.
NVIDIA T4 FOR VIRTUAL PCs
Optimize Data Center Utilization with Mixed Workloads
Virtual PCs
T4 vs. CPU only: Adding NVIDIA GPUs 1.4 [VALUE]
results in 33% better user experience
versus CPU only VMs** 1.2
1
1
T4 vs. M10: provides same user density
with lower power consumption* 0.8
0.6
Same user experience & performance**
0.4
Support for VP9 decode
0.2
Support for H.265 (HEVC) 4:4:4 encode 0
and decode CPU only VM T4
UX: 1.3x better
Support for >1TB system memory UX based on Remoted Frames
• Two NVIDIA T4 GPUs support the same user density as a single M10 and fit in the same 2 slot PCIe form factor. 11
** NVIDIA internal benchmark running Microsoft PowerPoint, Word, Excel, Chrome, PDF viewing and video playback.
NVIDIA DATA CENTER GPUs
Recommended for Virtualization
V100 P40 T4 M10 P6
GPUs / Board 1 1 1 4 1
(Architecture) (Volta) (Pascal) (Turing) (Maxwell) (Pascal)
2,560
CUDA Cores 5,120 3,840 2,560 2,048
(640 per GPU)
Tensor Cores 640 --- 320 --- ---
RT Cores --- --- 40 --- ---
32 GB GDDR5
Memory Size 32 GB/16 GB HBM2 24 GB GDDR5 16 GB GDDR6 16 GB GDDR5
(8 GB per GPU)
1 GB, 2 GB, 4 GB, 1 GB, 2 GB, 3 GB,
1 GB, 2 GB, 4 GB, 8 GB, 16 0.5 GB, 1 GB, 2 GB, 1 GB, 2 GB, 4 GB,
vGPU Profiles 8 GB, 16 GB, 4 GB, 6 GB, 8 GB,
GB 4 GB, 8 GB 8 GB, 16 GB
32 GB 12 GB, 24 GB
PCIe 3.0 Dual Slot & SXM2 PCIe 3.0 Dual Slot PCIe 3.0 Single Slot (rack PCIe 3.0 Dual Slot MXM
Form Factor
(rack servers) (rack servers) servers) (rack servers) (blade servers)
Power 250W/300W 250W 70W 225W 90W
Thermal passive passive passive passive bare board
PERFORMANCE DENSITY BLADE
Optimized Optimized Optimized
12
SELECTING THE RIGHT GPU
NVIDIA Quadro Virtual Data Center Workstation
Use Case: Entry to Midrange Quadro Smaller Profiles,
Workstations
NVIDIA T4 More Users
Workloads: CAD, CAE, Digital Content
Creation, Rendering, Inferencing, My end users work with
larger models or applications
Training
Use Case: High-end Quadro
Workstations
Decreasing
Workloads: Large, Complex CAD NVIDIA P40 user density
per server
models, Seismic Exploration, Complex Increasing
Digital Content Creation, Effects, 3D My end users use CAE workflow/model
Medical Imaging applications, or are complexity
experimenting with DL/AI
Use Case: Ultra High-end Quadro
Workstations
Workloads: Largest CAD models, CAE, NVIDIA V100 Larger Profiles,
Seismic Exploration, GPGPU compute,
Deep Learning, Immersive Visualization Fewer Users
13
SELECTING THE RIGHT GPU
NVIDIA GRID vPC/vApps
2 x NVIDIA T4 1 x NVIDIA M10
Density 32 users 32 users
Form Factor PCIe 3.0 single slot PCIe 3.0 dual slot
Power 140W (70W per GPU) 225W
Cores Available CUDA, Tensor, RT CUDA
CODECs VP9, H.265 H.264
System Memory Support
> 1TB < 1TB
Use Case Universal GPU for virtual workstations,
knowledge workers, rendering, inferencing, Lowest TCO for knowledge workers
training
14
NVIDIA T4 FOR VIRTUALIZATION
Powerful, Versatile Platform for VDI
Powerful virtual workstation for the
engineer, professional designer, and data
scientist
Deep learning inferencing for virtual
labs and classrooms
High density virtual desktops for the best
user experience for Windows 10
15