Introduction to HPC & Supercomputing in AI

Modern way to look at HPC Workloads
Monday | 25TH MAY, 2020
LIVE WEBINAR
Presented by

Agenda
• About Netweb
• HPC
o TCM
o Storage
• Systems Offerings
• Interconnect
• GPU Optimized AI | ML Solutions
• Converged Mixed Workloads (Future of HPC)
• Tyrone Kubits™
• Q&A Session

A State of the Art Facility at Netweb HQ
Tyrone -100Gbps
Interconnect based R&D
Cluster with PFS
HPC Experts part of
Netweb Team
In House Supercomputing Lab
300+ HPC Nodes Test
Facility HPC observatory LABHPC Burn in Test Lab
Netweb HQ@ Faridabad

Solutions that span the entire Data Center
SERVER
• HPC Servers
• Mission Critical X86
• Storage Servers
• High-Density Servers
• GPU Servers
Cloud Solutions Big Data/AIHPC Solutions
Cloud Big Data
Virtualization
AI / DEEP LEARNING
Product Portfolio
WORKSTATIONS
• GPU Workstations
• Tower | Rack
• Liquid Cooling
STORAGE
• Unified Storage
• Storage Array
• Archival
• JBOD
• Ceph Storage
NETWORKING
• InfiniBand
• Omnipath Architecture
Tyrone Kubernetes
Platform
HPC Cluster
GPU Optimised
Supercomputer
HPC On Cloud
SMP Solutions
Mngmt Tools
Analytics
Data Insights
HPC cluster parallel
file systems
Inferencing
Hyper-converged
Virtual SAN
Mixed Workloads
GPU Systems

HPC Architecture
Master Node-1 Master Node-2
Parallel
File
System
I/O
I/O Storage
IB/OPA Switch
Control
Node
10GbE
Switch
Node - 1
Node - 2
Node - 3
Node - 4

Our HPC Services
The Parallel Application
Applications End Users,
ISV’s
Job Control Batch queue, schedulers,
cluster monitoring, cluster control
Communication Libraries (PVM, MPI, etc)Middleware
Open Source /
Paid HPC Cluster
Community,
Interconnect Networking
Components
Compute Node
The Operating system (Linux etc)OS OSV’s
Master Node
Proposing
hardware
Designing HPC
architecture
Application
level support
Post Cluster
Installation Services
Inter-node communications hardware/software
Processor base, physical format, H/W management, etc.
Interfacing b/w the entire cluster and user environment

TCM- Architectural Overview
A Single Dashboard has a representation of:
• The number of Nodes along with all node
names, Groups, Users and User-Groups.
• Configuration Processes like Ganglia,
torque, Slurm and PBS.
• The Graphs represents CPU Utilization,
Disk Free, Memory Shared and Network
Details.
• List of all the processes which we have pushed
in the backend for processing through celery
and their results.

TCM- Configuration Processes
Different modules that can be configured with the
help of TCM.
The modules are
• Create TCM repo
• Ganglia
• Torque
• Slurm
• PBS
• Installing Driver(Mellanox and OPA)
• Shared Home – AutoFS
We can install their master as well as client and we
can also Uninstall as per our specification.

TCM- GCC Applications
The later version of TCM has the option to compile GCC applications with the help of our WEB Interface.
• BGW
• Gromacs
• NCView
• Transiesta
• Cp2k
• Hdf5
• Openbabel
• Openblas
• Yambo
• Cpmd
• FFTW
• Openmpi
• Grads
• Namd
• Siesta
• Abinit
• Hdf5
• ShengBTE
• BGW
• Mvapich2
• Siesta
• Cp2k
• Nwchem
• TranSiesta
• Gromacs
• QE 62
The application which can be installed in
GCC are as follows:

HIGH PERFORMANCE PARALLEL FILE STORAGE
• Supports Large Datasets & High IOPS Requirements
• Large, fast distributed scratch file system
• A centralised storage for clusters
• Simple building block architecture delivers
predictable scaling to specific requirements.
• Industry leading storage density with High Availability design
• Total Solution including Systems, Software and Services
• No single point of failure
• Reduce storage costs by up to 90%
KEY FEATURES
10GB/s
EDR: 100Gb/s
HDR: 200Gb/s

SAN NAS VTL
Unified Storage Solution : Opslag FS2
Key Features
 4GB/s Bandwidth
 576 TB in Single 4U Enclosure
 5+PB Scalability
 Native InfiniBand Support
 (SRP/NFS over RDMA)
 All-in-one-solution
 (NAS/FC/iSCSI/SRP/VTL)
File Access Protocol
CIFS/SMB
AFP
FTP
NFS
NFS over RDMA
Block Access Protocol
iSCSI Target
FC Target
SRP Target

Key Benefits
Scalability : Easily scale from a few terabytes to 100+ petabytes
Unified Storage: SAN/NAS & VTL combines file, block & Object storage into a single storage box
Unified Management: Single interface for centralized management
Unified Protection: Single solution for local and remote application protection
Dual Controller provides an extra level of data protection
Connects to Any network : Gigabit Ethernet, Fibre-Channel, FDR/EDR InfiniBand Intel Omni-
path
Supports SSD caching for customers looking for extremely high IOPS. Our SSD caching
Deduplication & Compression manage actual required capacity & cost
HIGH
Capacity
HIGH
Performance
FS2 can be set up as All-Flash or Hybrid Flash storage

Our System Offerings
StorageGPU
4U 8Node 2U 4NodeAll-Flash NVme

PFS Design- Architectural Overview
• 19’’ rack mountable 1U chassis with Dual redundant slots
• 40 QSFP56 non-blocking ports with aggregate data throughput up to 16 Tb/s (HDR)
• Management Ports – 100/1000 RJ45 Ethernet port
• Connectors & Cabling
• QSFP56 connectors
• Passive copper or active fiber cables

GPU optimized for AI/ML Solutions

Delivers 4XFASTER TRAINING
than other GPU-based systems
Your Personal AI Supercomputer
Power-on to Deep Learning in Minutes
Pre-installed with Powerful
Deep Learning
Software
Extend workloads from your
Desk-to-Cloud in Minutes

GPU Systems Optimized For Deep Learning
GPUS 1 X GPU 2 X GPUs 3 x GPUs 4 x GPUs 6 x GPUs 8 x GPUs 10 x GPUs 16 x GPUs 20 x GPUs
MODEL SS400TR-54R SS400TG-16T DS400TG-14R DS400TG-48R DS400TG-12RT DS400TG-12RT DS400TGH-28R
DS400TQV-
416RT
DS400TOG-
424R
DS400TOG-
424RT
DS400NG16-
1016RT
DS400TG-
424RT
FORM FACTOR
5U 1U 1U 4U 1U 1U 1U 4U 4U 4U 10U 4U
COMPUTE
PERFORMANCE
1 X Tesla
V100 32
Single
Precision
14+ TFs
2 X Tesla
V100 32
Single
Precision
28+ TFs
3 X Tesla
V100 32
Single
Precision
42+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
4 X Tesla
V100 32
Single
Precision
56+ TFs
6 X Tesla
V100 32
Single
Precision
84+ TFs
8 X Tesla
V100 32
Single
Precision
125+ TFs
8 X 2080 Ti
Single
Precision
100+ TFs
8 X Tesla
V100 32
Single
Precision
100+ TFs
10 X 2080
Ti Single
Precision
130+ TFs
10 X Tesla
V100 32
Single
Precision
140+ TFs
16 X Tesla
V100 32
Single
Precision
250+ TFs
20 X T4
GPUs Single
Precision
160+ TFs
FP16/FP32
Mixed
Precision
1300+ TFs
TYRONE KUBITS ACCESS
NUMBER OF GPU’S
COMPUTEPERFORMANCE

FASTER AI INNOVATION & INSIGHT

Mixed Workloads Convergence of
AI|HPC| Cloud | Containers

The Era of Mixed Workload
F L E X I B L E Is the usage going to be constant?
O P T I M I Z A T I O N Is optimal utilization required?
R E S I L I E N C E Do we need the application to run all the time.
E A S E Is ‘ease of maintenance’ key?
S C A L A B I L I T Y & S P E E D Do we have one size that fits all?

Connectivity and usage
Virtual Desktop
Laptop
Tyrone Cloud
Manager
Tyrone Cloud
Manager
Laptop

Expand Cloud
Tyrone Hardware
OS
Tyrone Cloud Suite
(TCS)
Service
management
Log
management
Monitoring
OPENSTACK SHARED SERVICES
KeyStone, Horizon, Ceilometer, Percona XtraDB, RabbitMQ,
Memcache, MongoDB, Kubernetes
KVM/LXD
Nova
Dockers
Network -Neutron
Ceph
Storage
Image Block
Object
/ Swift

GPU
CONTAINERIZED APPLICATION
DEEP LEARNING APPLICATIONS
DEEP LEARNING FRAMEWORKS
DEEP LEARNING LIBRARIES
CUDA TOOLKIT
MAPPED NVIDIA DRIVERS
CONTAINER OS
CONTAINERIZATION TOOL
DOCKER ENGINE
NVIDIA DRIVER
HOST OS
NVIDIA CONTAINER RUNTIME FOR DOCKER
SERVER INFRASTRUCTURE
HOST OPERATING SYSTEM
DOCKER ENGINE
Bins / Libs Bins / Libs Bins / Libs
App1 App1 App1

Run Multiple Applications
simultaneously
Tyrone KUBITS™ Cloud
Flow Architecture Revolutionizing Deep Learning CPU-GPU Environment
KUBITS™ Compatible Workstations
WITH TYRONE KUBITS™ CLIENT
KUBITS has a repository of :
50 containerized applications
100s of Containers
10X20X30X40X50X60X70X
SPEED

Tyrone KUBITS : Revolutionizing Deep Learning CPU-GPU Environment
Run different
applications
simultaneously
Check for Tyrone
KUBITS Compatible
Workstations
Get access to over
100+ Containers on
Tyrone KUBITS Cloud.
High scalability
Affordable price
Has both GPU &
CPU Optimized
Containers
Design a simple Workstation
or Large Clusters with KUBITS
technology.
Talk to our experts & build
the right workstation within
your budget.
KUBITS
CLOUDCOMPATIBLE

Cloud Deployment in 5 countries
INDIA
SINGAPORE
UK
USA
AUSTRALIA

Product Configurator
• An easy to use tool on our Tyrone Website
• Allows customers to select from over 400 products and provides customized solution
• Easy to view all SKUs needed to quote particular End User
• Easy access to product technical specifications
http://tyronesystems.com/servers/servers_workstation.html

Q&A SessionContact our team if you have any further questions after this webinar
ai@netwebtech.comTalk to our AI Experts
Navin
navin@netwebindia.com
Tushar
tushar@netwebindia.com
Anurag
Anurag.thakare@netwebindia.com
Anjani
anjani.pandey@netwebindia.com

Introduction to HPC & Supercomputing in AI

More Related Content

What's hot

Similar to Introduction to HPC & Supercomputing in AI

More from Tyrone Systems

Recently uploaded

Introduction to HPC & Supercomputing in AI