0% found this document useful (0 votes)

44 views10 pages

HPC Int2 Key

Uploaded by

Ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views10 pages

HPC Int2 Key

Uploaded by

Ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 10

SRI RAMAKRISHNA ENGINEERING COLLEGE

[Educational Service: SNR Sons Charitable Trust]

[Autonomous Institution, Reaccredited by NAAC with ‘A+’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001:2015 Certified and all eligible programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.

Department of Computer Science & Engineering

Internal Test – II
Date 25.04.2024 Department CSE
Semester VI Class/section III B.E CSE & III MTech CSE
Duration 2:00 Hours Maximum marks 50
Course Code &Title: 20CS257 & High Performance Computing-Answer Key
Course Outcomes Addressed:
On successful completion of the course, the students will be able to
CO3: Apply parallelism to extract maximum performance in multicore and shared memory processor.

Questions Cognitive
PART – A (Answer All Questions) (10*1 =10 Marks) Level/CO
1. In MPI, which function is commonly used to send a message from one process to another?
a) MPI_Receive b) MPI_Send c) d) MPI_Bcast R/ CO3
MPI_Comm_rank
2. The purpose of MPI_Comm_size function in MPI is
a) To initialize MPI b) To get the c) To determine d) To finalize MPI R/ CO3
communication rank of the the size of the communication
process communicator
3.
In MPI, what does the term “rank” refer to? R/ CO3
a) The size of the b) The process c) The message d) The type of data being
communicator identifier size sent
4.
Data parallelism performed ____, Task parallelism performed _____ . R/CO3
a) Synchronous, b) Synchronous, c) Asynchronous, d) Asynchronous,
Asynchronous Synchronous Synchronous Asynchronous
Computation Computation Computation Computation

5.
The style of parallelism supported on GPUs is best described as R/ CO3
a) SISD - Single b) MISD - c) SIMT - Single d) SIMD - Single
Instruction Single Multiple Instruction Instruction Multiple Data
Data Instruction Multiple Thread
Single Data
6. Identify the limitations of CUDA Kernel.
a) recursion, call b) no c) recursion, no d) no recursion, call R/ CO3
stack, static variable recursion, no call stack, static stack, no static variable
declaration call stack, no variable declarations
static variable declaration
declarations
7. Which of the following correctly describes a GPU kernel?
(A) All thread blocks involved in the same computation use the same kernel R/ CO3
(B) A kernel is part of the GPU's internal micro-operating system, allowing it to act as in
independent host
(C) A kernel may contain a mix of host and GPU code
a) A b) B c) C d) A,B,C
8. _________________ MPI function is used for blocking point-to-point communication to
R/CO3
receive a message. (Answer: MPI_Recv)
9. A CUDA program is comprised of two primary components: a host and a ________.
R/ CO3
(Answer: GPU kernel)
10. ___________ is a technique which allows optimal usage of the global memory bandwidth.
R/ CO3
(Answer: Memory coalescing)
PART – B (Answer All Questions) (5*2 =10 Marks) Cognitive
Level/CO
11. Develop a “Hello World” MPI program in C.
(Program: 2 Marks)

Ap/CO3

12. Summarize about collective communications.

(Summary: 2 Marks)
Collective communication involves one or more senders and one or more receivers.
Examples include broadcast of a single data item from one process to all other processes, U/CO3
broadcast of unique items from one process to all other processes, and the inverse
operation: gathering data from a group of processes.
13. Illustrate the modern GPU architecture.
(Architecture Diagram: 2 Marks)

U/ CO3

14. Differentiate between Task Parallelism and Data Parallelism. U/CO3

(Difference: 2 Marks)
15. Outline the compilation process of a CUDA program.
(Compilation Process: 2 Marks)

U/CO3

PART – C (3*10 = 30 Marks)

16. Compulsory Question: Ap/CO3
Develop a CUDA program to perform vector addition and vector multiplication with
Blocks and Threads.
(Vector Addition Program: 5 Marks, Vector Multiplication Program: 5 Marks)
Vector Addition

%%writefile matrix1DADD.cu
#include<stdio.h>
#include<cuda.h>
__global__ void arradd(int *x,int *y, int *z) //kernel definition
{
int id=blockIdx.x;
/* blockIdx.x gives the respective block id which starts from 0 */
z[id]=x[id]+y[id];
}
int main()
{
int a[6];
int b[6];
int c[6];
int *d,*e,*f;
int i;
printf("\n Enter six elements of first array vector\n");
for(i=0;i<6;i++)
{
scanf("%d",&a[i]);
}
printf("\n Enter six elements of second array vector\n");
for(i=0;i<6;i++)
{
scanf("%d",&b[i]);
}
/* cudaMalloc() allocates memory from Global memory on GPU */
cudaMalloc((void **)&d,6*sizeof(int));
cudaMalloc((void **)&e,6*sizeof(int));
cudaMalloc((void **)&f,6*sizeof(int));
/* cudaMemcpy() copies the contents from destination to source. Here destination is
GPU(d,e) and source is CPU(a,b) */
cudaMemcpy(d,a,6*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(e,b,6*sizeof(int),cudaMemcpyHostToDevice);
/* call to kernel. Here 6 is number of blocks, 1 is the number of threads per block and d,e,f
are the arguments */
arradd<<<6,1>>>(d,e,f);

/* Here we are copying content from GPU(Device) to CPU(Host) */

cudaMemcpy(c,f,6*sizeof(int),cudaMemcpyDeviceToHost);
printf("\nSum of two arrays:\n ");
for(i=0;i<6;i++)
{
printf("%d\t",c[i]);
}
/* Free the memory allocated to pointers d,e,f */
cudaFree(d);
cudaFree(e);
cudaFree(f);
return 0;
}
Vector Multiplication

%%writefile Matrixmul.cu
#include<stdio.h>
#include<cuda.h>
__global__ void VecMul(float* A, float* B, float* C, int N)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if(i < N)
C[i] = A[i]*B[i];
}
int main()
{
int i, N = 10;
size_t size = N * sizeof(float);

// Allocating host and initializing

float A[N],B[N],C[N];
for(i=0;i<N;i++) {
A[i] = B[i] = i;
}
// Allocating device and copying to device
float *d_A, *d_B, *d_C;
cudaMalloc((void **)&d_A, size);
cudaMalloc((void **)&d_B, size);
cudaMalloc((void **)&d_C, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);
// Invoking kernel
int threadsPerBlock = 8;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
VecMul<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
// Copy result from device to host
cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);
for(i=0;i<N;i++)
printf("%f\n", C[i]);
}
Any Two Questions
17. Consider indexing an array containing one element per thread (8 threads per block).
Report the thread which will handle the shaded element in the following array?
(Formula with explanation: 5 Marks, Calculation: 5 Marks)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Ap/ CO3

int index= threadIdx.x+blockIdx.x*M

= 3+2*8
=19

18. Categorize the Memory Visibility in CUDA. Ap/ CO3

(Category with explanation: 10 Marks)
19. Examine the Messages and point-to-point communication in MPI. Ap/ CO3
(Answer: 10 Marks)
Standard MPI data types for C

ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
Processors
No ratings yet
Processors
25 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
01 Cuda C Basics
No ratings yet
01 Cuda C Basics
32 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
CUDA Practical's
No ratings yet
CUDA Practical's
38 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
CUDA Programming Quiz
100% (5)
CUDA Programming Quiz
4 pages
CUDA Matrix Multiplication Quiz
No ratings yet
CUDA Matrix Multiplication Quiz
12 pages
217 Lec7
No ratings yet
217 Lec7
30 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
GPUs and GPGPU
No ratings yet
GPUs and GPGPU
15 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
CUDA Lab Guide for Students
No ratings yet
CUDA Lab Guide for Students
19 pages
CUDA Programming for Developers
No ratings yet
CUDA Programming for Developers
29 pages
2023 CSC14120 Lecture01 CUDAIntroduction
No ratings yet
2023 CSC14120 Lecture01 CUDAIntroduction
32 pages
CUDA Class Lecture03
No ratings yet
CUDA Class Lecture03
18 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
Cuda Mode Lecture2
No ratings yet
Cuda Mode Lecture2
33 pages
PDC Assignment
No ratings yet
PDC Assignment
9 pages
CUDA Putting It All Together
No ratings yet
CUDA Putting It All Together
39 pages
GPU Assignment-3 Solution
No ratings yet
GPU Assignment-3 Solution
4 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
3 Computation
No ratings yet
3 Computation
28 pages
Chap9 - CUDA Optimization
No ratings yet
Chap9 - CUDA Optimization
73 pages
CUDA Programming Guide
No ratings yet
CUDA Programming Guide
57 pages
ECE408 2012 Practice Exam1
No ratings yet
ECE408 2012 Practice Exam1
10 pages
GPU History & CUDA Programming Basics
No ratings yet
GPU History & CUDA Programming Basics
44 pages
GPU Programming Assignments Guide
No ratings yet
GPU Programming Assignments Guide
4 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
Advanced CUDA Programming Guide
No ratings yet
Advanced CUDA Programming Guide
64 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
6963 Midterm Review
No ratings yet
6963 Midterm Review
20 pages
Threads
No ratings yet
Threads
54 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
Assignment 04
No ratings yet
Assignment 04
16 pages
Multi Gpu Programming With Mpi
No ratings yet
Multi Gpu Programming With Mpi
93 pages
Cuda 4.1
No ratings yet
Cuda 4.1
2 pages
CUDA Part-1
No ratings yet
CUDA Part-1
52 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
CENG443 2023 Final
No ratings yet
CENG443 2023 Final
4 pages
Cuda
No ratings yet
Cuda
4 pages
06-CUDA Thread Organization
No ratings yet
06-CUDA Thread Organization
27 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
D. Granularity
No ratings yet
D. Granularity
24 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
CUDA Programming for Developers
No ratings yet
CUDA Programming for Developers
42 pages
3 Some Commonly Used CUDA API: 3.1 Function Type Qualifiers
No ratings yet
3 Some Commonly Used CUDA API: 3.1 Function Type Qualifiers
7 pages
Cheat Sheet CUDA
No ratings yet
Cheat Sheet CUDA
2 pages
Cuda Firstprograms PDF
No ratings yet
Cuda Firstprograms PDF
6 pages
1 CH1 DT 272 - ST
No ratings yet
1 CH1 DT 272 - ST
16 pages
3:3 Combiner Data Sheet
No ratings yet
3:3 Combiner Data Sheet
4 pages
Warranty Management Software Open Source
No ratings yet
Warranty Management Software Open Source
2 pages
STPSC20H12C: 1200 V Power Schottky Silicon Carbide Diode
No ratings yet
STPSC20H12C: 1200 V Power Schottky Silicon Carbide Diode
10 pages
Thyristors: AC Control and Applications
No ratings yet
Thyristors: AC Control and Applications
4 pages
Server Server 2025
No ratings yet
Server Server 2025
6 pages
Exinda 10064: High-Capacity Data Center Appliance
No ratings yet
Exinda 10064: High-Capacity Data Center Appliance
2 pages
B.Tech SE Exam: OOP & Travlendar
No ratings yet
B.Tech SE Exam: OOP & Travlendar
1 page
Uds v3229 Modem Ocr
No ratings yet
Uds v3229 Modem Ocr
126 pages
CSS 9 Q1-Com - Hardware DLP-INC
No ratings yet
CSS 9 Q1-Com - Hardware DLP-INC
17 pages
ASL F18 Power Supply Guide
No ratings yet
ASL F18 Power Supply Guide
6 pages
Automatic LED Emergency Light Guide
No ratings yet
Automatic LED Emergency Light Guide
2 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
19 pages
Lab 7 - Wireshark Ethernet ARP v8.1 PhuongVo
No ratings yet
Lab 7 - Wireshark Ethernet ARP v8.1 PhuongVo
8 pages
Cisco Router Performance by Application v2
No ratings yet
Cisco Router Performance by Application v2
4 pages
Theory QBank
No ratings yet
Theory QBank
7 pages
VMmark Users Guide 4.0.2 2024-10-11
No ratings yet
VMmark Users Guide 4.0.2 2024-10-11
94 pages
BCS6010 Question Paper
No ratings yet
BCS6010 Question Paper
17 pages
DWG Gateway Firmware Upgrade Guide
No ratings yet
DWG Gateway Firmware Upgrade Guide
6 pages
R20D ID USB Reader
100% (1)
R20D ID USB Reader
2 pages
Electronics Design Process Guide
No ratings yet
Electronics Design Process Guide
14 pages
2020 Servo Catalog DS5 New
No ratings yet
2020 Servo Catalog DS5 New
28 pages
TRB Computer Teacher Syllabus
No ratings yet
TRB Computer Teacher Syllabus
10 pages
Online Flight Booking System
No ratings yet
Online Flight Booking System
3 pages
School of Engineering, Rmit University
No ratings yet
School of Engineering, Rmit University
3 pages
CXC - Csec - Electrical Electronnics - Sba Booklet 2010
100% (1)
CXC - Csec - Electrical Electronnics - Sba Booklet 2010
38 pages
Aculload SA FMC - Specifications
No ratings yet
Aculload SA FMC - Specifications
13 pages
Grade 8 Computing Notes
No ratings yet
Grade 8 Computing Notes
19 pages
X-A/V Twin 860S RF Modulator Specs
No ratings yet
X-A/V Twin 860S RF Modulator Specs
1 page
Divy HPC
No ratings yet
Divy HPC
36 pages

HPC Int2 Key

Uploaded by

HPC Int2 Key

Uploaded by

SRI RAMAKRISHNA ENGINEERING COLLEGE

[Educational Service: SNR Sons Charitable Trust]

Department of Computer Science & Engineering

12. Summarize about collective communications.

14. Differentiate between Task Parallelism and Data Parallelism. U/CO3

PART – C (3*10 = 30 Marks)

/* Here we are copying content from GPU(Device) to CPU(Host) */

// Allocating host and initializing

int index= threadIdx.x+blockIdx.x*M

18. Categorize the Memory Visibility in CUDA. Ap/ CO3

You might also like