KEMBAR78
Introduction to HPC & Supercomputing in AI | PPTX
Modern way to look at HPC Workloads
Monday | 25TH MAY, 2020
LIVE WEBINAR
Presented by
Agenda
• About Netweb
• HPC
o TCM
o Storage
• Systems Offerings
• Interconnect
• GPU Optimized AI | ML Solutions
• Converged Mixed Workloads (Future of HPC)
• Tyrone Kubits™
• Q&A Session
Tyrone Systems at a Glance
A State of the Art Facility at Netweb HQ
Tyrone -100Gbps
Interconnect based R&D
Cluster with PFS
HPC Experts part of
Netweb Team
In House Supercomputing Lab
300+ HPC Nodes Test
Facility HPC observatory LABHPC Burn in Test Lab
Netweb HQ@ Faridabad
Solutions that span the entire Data Center
SERVER
• HPC Servers
• Mission Critical X86
• Storage Servers
• High-Density Servers
• GPU Servers
Cloud Solutions Big Data/AIHPC Solutions
Cloud Big Data
Virtualization
AI / DEEP LEARNING
Product Portfolio
WORKSTATIONS
• GPU Workstations
• Tower | Rack
• Liquid Cooling
STORAGE
• Unified Storage
• Storage Array
• Archival
• JBOD
• Ceph Storage
NETWORKING
• InfiniBand
• Omnipath Architecture
Tyrone Kubernetes
Platform
HPC Cluster
GPU Optimised
Supercomputer
HPC On Cloud
SMP Solutions
Mngmt Tools
Analytics
Data Insights
HPC cluster parallel
file systems
Inferencing
Hyper-converged
Virtual SAN
Mixed Workloads
GPU Systems
Understanding HPC
HPC Architecture
Master Node-1 Master Node-2
Parallel
File
System
I/O
I/O Storage
IB/OPA Switch
Control
Node
10GbE
Switch
Node - 1
Node - 2
Node - 3
Node - 4
Our HPC Services
The Parallel Application
Applications End Users,
ISV’s
Job Control Batch queue, schedulers,
cluster monitoring, cluster control
Communication Libraries (PVM, MPI, etc)Middleware
Open Source /
Paid HPC Cluster
Community,
Interconnect Networking
Components
Compute Node
The Operating system (Linux etc)OS OSV’s
Master Node
Proposing
hardware
Designing HPC
architecture
Application
level support
Post Cluster
Installation Services
Inter-node communications hardware/software
Processor base, physical format, H/W management, etc.
Interfacing b/w the entire cluster and user environment
Tyrone Cluster Manager (TCM)
Tyrone Cluster Manager (TCM)
TCM- Architectural Overview
A Single Dashboard has a representation of:
• The number of Nodes along with all node
names, Groups, Users and User-Groups.
• Configuration Processes like Ganglia,
torque, Slurm and PBS.
• The Graphs represents CPU Utilization,
Disk Free, Memory Shared and Network
Details.
• List of all the processes which we have pushed
in the backend for processing through celery
and their results.
TCM- Configuration Processes
Different modules that can be configured with the
help of TCM.
The modules are
• Create TCM repo
• Ganglia
• Torque
• Slurm
• PBS
• Installing Driver(Mellanox and OPA)
• Shared Home – AutoFS
We can install their master as well as client and we
can also Uninstall as per our specification.
TCM- GCC Applications
The later version of TCM has the option to compile GCC applications with the help of our WEB Interface.
• BGW
• Gromacs
• NCView
• Transiesta
• Cp2k
• Hdf5
• Openbabel
• Openblas
• Yambo
• Cpmd
• FFTW
• Openmpi
• Grads
• Namd
• Siesta
• Abinit
• Hdf5
• ShengBTE
• BGW
• Mvapich2
• Siesta
• Cp2k
• Nwchem
• TranSiesta
• Gromacs
• QE 62
The application which can be installed in
GCC are as follows:
Storage Solutions
HIGH PERFORMANCE PARALLEL FILE STORAGE
• Supports Large Datasets & High IOPS Requirements
• Large, fast distributed scratch file system
• A centralised storage for clusters
• Simple building block architecture delivers
predictable scaling to specific requirements.
• Industry leading storage density with High Availability design
• Total Solution including Systems, Software and Services
• No single point of failure
• Reduce storage costs by up to 90%
KEY FEATURES
10GB/s
EDR: 100Gb/s
HDR: 200Gb/s
SAN NAS VTL
Unified Storage Solution : Opslag FS2
Key Features
 4GB/s Bandwidth
 576 TB in Single 4U Enclosure
 5+PB Scalability
 Native InfiniBand Support
 (SRP/NFS over RDMA)
 All-in-one-solution
 (NAS/FC/iSCSI/SRP/VTL)
File Access Protocol
CIFS/SMB
AFP
FTP
NFS
NFS over RDMA
Block Access Protocol
iSCSI Target
FC Target
SRP Target
Key Benefits
Scalability : Easily scale from a few terabytes to 100+ petabytes
Unified Storage: SAN/NAS & VTL combines file, block & Object storage into a single storage box
Unified Management: Single interface for centralized management
Unified Protection: Single solution for local and remote application protection
Dual Controller provides an extra level of data protection
Connects to Any network : Gigabit Ethernet, Fibre-Channel, FDR/EDR InfiniBand Intel Omni-
path
Supports SSD caching for customers looking for extremely high IOPS. Our SSD caching
Deduplication & Compression manage actual required capacity & cost
HIGH
Capacity
HIGH
Performance
FS2 can be set up as All-Flash or Hybrid Flash storage
Our System Offerings
StorageGPU
4U 8Node 2U 4NodeAll-Flash NVme
Interconnect
InfiniBand RDMA Interconnect
PFS Design- Architectural Overview
• 19’’ rack mountable 1U chassis with Dual redundant slots
• 40 QSFP56 non-blocking ports with aggregate data throughput up to 16 Tb/s (HDR)
• Management Ports – 100/1000 RJ45 Ethernet port
• Connectors & Cabling
• QSFP56 connectors
• Passive copper or active fiber cables
GPU optimized for AI/ML Solutions
Delivers 4XFASTER TRAINING
than other GPU-based systems
Your Personal AI Supercomputer
Power-on to Deep Learning in Minutes
Pre-installed with Powerful
Deep Learning
Software
Extend workloads from your
Desk-to-Cloud in Minutes
GPU Systems Optimized For Deep Learning
GPUS 1 X GPU 2 X GPUs 3 x GPUs 4 x GPUs 6 x GPUs 8 x GPUs 10 x GPUs 16 x GPUs 20 x GPUs
MODEL SS400TR-54R SS400TG-16T DS400TG-14R DS400TG-48R DS400TG-12RT DS400TG-12RT DS400TGH-28R
DS400TQV-
416RT
DS400TOG-
424R
DS400TOG-
424RT
DS400NG16-
1016RT
DS400TG-
424RT
FORM FACTOR
5U 1U 1U 4U 1U 1U 1U 4U 4U 4U 10U 4U
COMPUTE
PERFORMANCE
1 X Tesla
V100 32
Single
Precision
14+ TFs
2 X Tesla
V100 32
Single
Precision
28+ TFs
3 X Tesla
V100 32
Single
Precision
42+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
6 X Tesla
V100 32
Single
Precision
84+ TFs
8 X Tesla
V100 32
Single
Precision
125+ TFs
8 X 2080 Ti
Single
Precision
100+ TFs
8 X Tesla
V100 32
Single
Precision
100+ TFs
10 X 2080
Ti Single
Precision
130+ TFs
10 X Tesla
V100 32
Single
Precision
140+ TFs
16 X Tesla
V100 32
Single
Precision
250+ TFs
20 X T4
GPUs Single
Precision
160+ TFs
FP16/FP32
Mixed
Precision
1300+ TFs
TYRONE KUBITS ACCESS
NUMBER OF GPU’S
COMPUTEPERFORMANCE
FASTER AI INNOVATION & INSIGHT
Mixed Workloads Convergence of
AI|HPC| Cloud | Containers
The Era of Mixed Workload
F L E X I B L E Is the usage going to be constant?
O P T I M I Z A T I O N Is optimal utilization required?
R E S I L I E N C E Do we need the application to run all the time.
E A S E Is ‘ease of maintenance’ key?
S C A L A B I L I T Y & S P E E D Do we have one size that fits all?
Connectivity and usage
Virtual Desktop
Laptop
Tyrone Cloud
Manager
Tyrone Cloud
Manager
Laptop
Expand Cloud
Tyrone Hardware
OS
Tyrone Cloud Suite
(TCS)
Service
management
Log
management
Monitoring
OPENSTACK SHARED SERVICES
KeyStone, Horizon, Ceilometer, Percona XtraDB, RabbitMQ,
Memcache, MongoDB, Kubernetes
KVM/LXD
Nova
Dockers
Network -Neutron
Ceph
Storage
Image Block
Object
/ Swift
GPU
CONTAINERIZED APPLICATION
DEEP LEARNING APPLICATIONS
DEEP LEARNING FRAMEWORKS
DEEP LEARNING LIBRARIES
CUDA TOOLKIT
MAPPED NVIDIA DRIVERS
CONTAINER OS
CONTAINERIZATION TOOL
DOCKER ENGINE
NVIDIA DRIVER
HOST OS
NVIDIA CONTAINER RUNTIME FOR DOCKER
SERVER INFRASTRUCTURE
HOST OPERATING SYSTEM
DOCKER ENGINE
Bins / Libs Bins / Libs Bins / Libs
App1 App1 App1
Run Multiple Applications
simultaneously
Tyrone KUBITS™ Cloud
Flow Architecture Revolutionizing Deep Learning CPU-GPU Environment
KUBITS™ Compatible Workstations
WITH TYRONE KUBITS™ CLIENT
KUBITS has a repository of :
50 containerized applications
100s of Containers
10X20X30X40X50X60X70X
SPEED
Tyrone KUBITS : Revolutionizing Deep Learning CPU-GPU Environment
Run different
applications
simultaneously
Check for Tyrone
KUBITS Compatible
Workstations
Get access to over
100+ Containers on
Tyrone KUBITS Cloud.
High scalability
Affordable price
Has both GPU &
CPU Optimized
Containers
Design a simple Workstation
or Large Clusters with KUBITS
technology.
Talk to our experts & build
the right workstation within
your budget.
KUBITS
CLOUDCOMPATIBLE
Cloud Deployment in 5 countries
INDIA
SINGAPORE
UK
USA
AUSTRALIA
Product Configurator
• An easy to use tool on our Tyrone Website
• Allows customers to select from over 400 products and provides customized solution
• Easy to view all SKUs needed to quote particular End User
• Easy access to product technical specifications
http://tyronesystems.com/servers/servers_workstation.html
Q&A SessionContact our team if you have any further questions after this webinar
ai@netwebtech.comTalk to our AI Experts
Navin
navin@netwebindia.com
Tushar
tushar@netwebindia.com
Anurag
Anurag.thakare@netwebindia.com
Anjani
anjani.pandey@netwebindia.com

Introduction to HPC & Supercomputing in AI

  • 1.
    Modern way tolook at HPC Workloads Monday | 25TH MAY, 2020 LIVE WEBINAR Presented by
  • 2.
    Agenda • About Netweb •HPC o TCM o Storage • Systems Offerings • Interconnect • GPU Optimized AI | ML Solutions • Converged Mixed Workloads (Future of HPC) • Tyrone Kubits™ • Q&A Session
  • 3.
  • 4.
    A State ofthe Art Facility at Netweb HQ Tyrone -100Gbps Interconnect based R&D Cluster with PFS HPC Experts part of Netweb Team In House Supercomputing Lab 300+ HPC Nodes Test Facility HPC observatory LABHPC Burn in Test Lab Netweb HQ@ Faridabad
  • 5.
    Solutions that spanthe entire Data Center SERVER • HPC Servers • Mission Critical X86 • Storage Servers • High-Density Servers • GPU Servers Cloud Solutions Big Data/AIHPC Solutions Cloud Big Data Virtualization AI / DEEP LEARNING Product Portfolio WORKSTATIONS • GPU Workstations • Tower | Rack • Liquid Cooling STORAGE • Unified Storage • Storage Array • Archival • JBOD • Ceph Storage NETWORKING • InfiniBand • Omnipath Architecture Tyrone Kubernetes Platform HPC Cluster GPU Optimised Supercomputer HPC On Cloud SMP Solutions Mngmt Tools Analytics Data Insights HPC cluster parallel file systems Inferencing Hyper-converged Virtual SAN Mixed Workloads GPU Systems
  • 6.
  • 7.
    HPC Architecture Master Node-1Master Node-2 Parallel File System I/O I/O Storage IB/OPA Switch Control Node 10GbE Switch Node - 1 Node - 2 Node - 3 Node - 4
  • 8.
    Our HPC Services TheParallel Application Applications End Users, ISV’s Job Control Batch queue, schedulers, cluster monitoring, cluster control Communication Libraries (PVM, MPI, etc)Middleware Open Source / Paid HPC Cluster Community, Interconnect Networking Components Compute Node The Operating system (Linux etc)OS OSV’s Master Node Proposing hardware Designing HPC architecture Application level support Post Cluster Installation Services Inter-node communications hardware/software Processor base, physical format, H/W management, etc. Interfacing b/w the entire cluster and user environment
  • 9.
  • 10.
  • 11.
    TCM- Architectural Overview ASingle Dashboard has a representation of: • The number of Nodes along with all node names, Groups, Users and User-Groups. • Configuration Processes like Ganglia, torque, Slurm and PBS. • The Graphs represents CPU Utilization, Disk Free, Memory Shared and Network Details. • List of all the processes which we have pushed in the backend for processing through celery and their results.
  • 12.
    TCM- Configuration Processes Differentmodules that can be configured with the help of TCM. The modules are • Create TCM repo • Ganglia • Torque • Slurm • PBS • Installing Driver(Mellanox and OPA) • Shared Home – AutoFS We can install their master as well as client and we can also Uninstall as per our specification.
  • 13.
    TCM- GCC Applications Thelater version of TCM has the option to compile GCC applications with the help of our WEB Interface. • BGW • Gromacs • NCView • Transiesta • Cp2k • Hdf5 • Openbabel • Openblas • Yambo • Cpmd • FFTW • Openmpi • Grads • Namd • Siesta • Abinit • Hdf5 • ShengBTE • BGW • Mvapich2 • Siesta • Cp2k • Nwchem • TranSiesta • Gromacs • QE 62 The application which can be installed in GCC are as follows:
  • 14.
  • 15.
    HIGH PERFORMANCE PARALLELFILE STORAGE • Supports Large Datasets & High IOPS Requirements • Large, fast distributed scratch file system • A centralised storage for clusters • Simple building block architecture delivers predictable scaling to specific requirements. • Industry leading storage density with High Availability design • Total Solution including Systems, Software and Services • No single point of failure • Reduce storage costs by up to 90% KEY FEATURES 10GB/s EDR: 100Gb/s HDR: 200Gb/s
  • 16.
    SAN NAS VTL UnifiedStorage Solution : Opslag FS2 Key Features  4GB/s Bandwidth  576 TB in Single 4U Enclosure  5+PB Scalability  Native InfiniBand Support  (SRP/NFS over RDMA)  All-in-one-solution  (NAS/FC/iSCSI/SRP/VTL) File Access Protocol CIFS/SMB AFP FTP NFS NFS over RDMA Block Access Protocol iSCSI Target FC Target SRP Target
  • 17.
    Key Benefits Scalability :Easily scale from a few terabytes to 100+ petabytes Unified Storage: SAN/NAS & VTL combines file, block & Object storage into a single storage box Unified Management: Single interface for centralized management Unified Protection: Single solution for local and remote application protection Dual Controller provides an extra level of data protection Connects to Any network : Gigabit Ethernet, Fibre-Channel, FDR/EDR InfiniBand Intel Omni- path Supports SSD caching for customers looking for extremely high IOPS. Our SSD caching Deduplication & Compression manage actual required capacity & cost HIGH Capacity HIGH Performance FS2 can be set up as All-Flash or Hybrid Flash storage
  • 18.
    Our System Offerings StorageGPU 4U8Node 2U 4NodeAll-Flash NVme
  • 19.
  • 20.
  • 21.
    PFS Design- ArchitecturalOverview • 19’’ rack mountable 1U chassis with Dual redundant slots • 40 QSFP56 non-blocking ports with aggregate data throughput up to 16 Tb/s (HDR) • Management Ports – 100/1000 RJ45 Ethernet port • Connectors & Cabling • QSFP56 connectors • Passive copper or active fiber cables
  • 22.
    GPU optimized forAI/ML Solutions
  • 23.
    Delivers 4XFASTER TRAINING thanother GPU-based systems Your Personal AI Supercomputer Power-on to Deep Learning in Minutes Pre-installed with Powerful Deep Learning Software Extend workloads from your Desk-to-Cloud in Minutes
  • 24.
    GPU Systems OptimizedFor Deep Learning GPUS 1 X GPU 2 X GPUs 3 x GPUs 4 x GPUs 6 x GPUs 8 x GPUs 10 x GPUs 16 x GPUs 20 x GPUs MODEL SS400TR-54R SS400TG-16T DS400TG-14R DS400TG-48R DS400TG-12RT DS400TG-12RT DS400TGH-28R DS400TQV- 416RT DS400TOG- 424R DS400TOG- 424RT DS400NG16- 1016RT DS400TG- 424RT FORM FACTOR 5U 1U 1U 4U 1U 1U 1U 4U 4U 4U 10U 4U COMPUTE PERFORMANCE 1 X Tesla V100 32 Single Precision 14+ TFs 2 X Tesla V100 32 Single Precision 28+ TFs 3 X Tesla V100 32 Single Precision 42+ TFs 4 X Tesla V100 32 Single Precision 56+ TFs 4 X Tesla V100 32 Single Precision 56+ TFs 4 X Tesla V100 32 Single Precision 56+ TFs 6 X Tesla V100 32 Single Precision 84+ TFs 8 X Tesla V100 32 Single Precision 125+ TFs 8 X 2080 Ti Single Precision 100+ TFs 8 X Tesla V100 32 Single Precision 100+ TFs 10 X 2080 Ti Single Precision 130+ TFs 10 X Tesla V100 32 Single Precision 140+ TFs 16 X Tesla V100 32 Single Precision 250+ TFs 20 X T4 GPUs Single Precision 160+ TFs FP16/FP32 Mixed Precision 1300+ TFs TYRONE KUBITS ACCESS NUMBER OF GPU’S COMPUTEPERFORMANCE
  • 25.
  • 26.
    Mixed Workloads Convergenceof AI|HPC| Cloud | Containers
  • 27.
    The Era ofMixed Workload F L E X I B L E Is the usage going to be constant? O P T I M I Z A T I O N Is optimal utilization required? R E S I L I E N C E Do we need the application to run all the time. E A S E Is ‘ease of maintenance’ key? S C A L A B I L I T Y & S P E E D Do we have one size that fits all?
  • 28.
    Connectivity and usage VirtualDesktop Laptop Tyrone Cloud Manager Tyrone Cloud Manager Laptop
  • 29.
    Expand Cloud Tyrone Hardware OS TyroneCloud Suite (TCS) Service management Log management Monitoring OPENSTACK SHARED SERVICES KeyStone, Horizon, Ceilometer, Percona XtraDB, RabbitMQ, Memcache, MongoDB, Kubernetes KVM/LXD Nova Dockers Network -Neutron Ceph Storage Image Block Object / Swift
  • 30.
    GPU CONTAINERIZED APPLICATION DEEP LEARNINGAPPLICATIONS DEEP LEARNING FRAMEWORKS DEEP LEARNING LIBRARIES CUDA TOOLKIT MAPPED NVIDIA DRIVERS CONTAINER OS CONTAINERIZATION TOOL DOCKER ENGINE NVIDIA DRIVER HOST OS NVIDIA CONTAINER RUNTIME FOR DOCKER SERVER INFRASTRUCTURE HOST OPERATING SYSTEM DOCKER ENGINE Bins / Libs Bins / Libs Bins / Libs App1 App1 App1
  • 31.
    Run Multiple Applications simultaneously TyroneKUBITS™ Cloud Flow Architecture Revolutionizing Deep Learning CPU-GPU Environment KUBITS™ Compatible Workstations WITH TYRONE KUBITS™ CLIENT KUBITS has a repository of : 50 containerized applications 100s of Containers 10X20X30X40X50X60X70X SPEED
  • 32.
    Tyrone KUBITS :Revolutionizing Deep Learning CPU-GPU Environment Run different applications simultaneously Check for Tyrone KUBITS Compatible Workstations Get access to over 100+ Containers on Tyrone KUBITS Cloud. High scalability Affordable price Has both GPU & CPU Optimized Containers Design a simple Workstation or Large Clusters with KUBITS technology. Talk to our experts & build the right workstation within your budget. KUBITS CLOUDCOMPATIBLE
  • 33.
    Cloud Deployment in5 countries INDIA SINGAPORE UK USA AUSTRALIA
  • 34.
    Product Configurator • Aneasy to use tool on our Tyrone Website • Allows customers to select from over 400 products and provides customized solution • Easy to view all SKUs needed to quote particular End User • Easy access to product technical specifications http://tyronesystems.com/servers/servers_workstation.html
  • 35.
    Q&A SessionContact ourteam if you have any further questions after this webinar ai@netwebtech.comTalk to our AI Experts Navin navin@netwebindia.com Tushar tushar@netwebindia.com Anurag Anurag.thakare@netwebindia.com Anjani anjani.pandey@netwebindia.com