0% found this document useful (0 votes)

67 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

SWE2017 - Parallel Programming Lab Assignment – 6

return 0;
}

Output:

Sequence={23,45,12,67,89,34,78,10,99,88,76,55,44,32,21,18}
a. Implement a Bitonic Sequence Sorting Program: Write a program that sorts
this 16element sequence using the Bitonic Sort algorithm. The program
should construct the sequence into bitonic sub-sequences and then apply
the bitonic merge operation to achieve a fully sorted sequence.
Code at the end of the question with output
b. Construct a Bitonic Merge Network: Draw the bitonic merge network for
this 16element sequence. Illustrate the sequence of comparisons and
exchanges at each stage of the network, showing how elements are
compared and swapped to achieve sorting.
Step: 1 - 23, 45, 12, 67, 89, 34, 78, 10, 99, 88, 76, 55, 44, 32, 21, 18
Step: 2 - 23, 12, 45, 67, 34, 89, 10, 78, 88, 99, 55, 76, 32, 44, 21, 18
Step: 3 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 21, 32, 18, 44
Step: 4 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 18, 21, 32, 44
Step: 5 - 10, 12, 23, 34, 45, 67, 78, 89, 21, 32, 44, 55, 18, 55, 76, 99
Step: 6(Sorted) - 10, 12, 18, 21, 23, 32, 34, 44, 45, 55, 55, 67, 76, 78, 89, 99
c. Explanation: Explain how the bitonic sequence is divided and merged
through each stage of the network. Also, provide a brief description of how
the Bitonic Sort algorithm operates in parallel and discuss its advantages for
sorting large datasets in parallel systems.

1. Bitonic Sequence Formation: Divide the array into bitonic

subsequences where each half is sorted in opposite directions (one
ascending, one descending).

2. Bitonic Merging: Recursively apply the Bitonic Merge operation,

which compares and swaps elements to build progressively larger
sorted subsequences.

3. Recursive Division: At each level, subsequences are halved until

single-element comparisons are reached.
SWE2017 - Parallel Programming Lab Assignment – 6

Parallel Execution of Bitonic Sort

Bitonic Sort is highly parallelizable:

• Parallel Merge Operations: Each stage of Bitonic Merge can run in

parallel on different processors, with each processor handling
comparisons and swaps within a specific segment.

• Low Communication Overhead: Processors only exchange data

during merge phases, making it efficient for parallel systems.

Advantages in Parallel Systems

• Scalability: Bitonic Sort's fixed pattern of operations allows it to scale

well on multiple processors.

• Efficiency on Large Data: The predictable, structured nature of

Bitonic Sort makes it ideal for sorting large datasets in systems where
many processors work simultaneously.

18.
To parallelize the Hough-Transform for lane boundary detection:
1. Image Segmentation: Divide the 1024x1024 image into smaller sub-images or "tiles"
(e.g., 256x256 blocks) that can be processed independently. Each tile will undergo
edge detection (using, for instance, a Sobel filter) to find points that may contribute
to lines.
2. Distributed Hough Transform: Each processor applies the Hough Transform to its
assigned tile. This involves mapping edge points in the image space to potential lines
in Hough space.
3. Local Accumulator Array: Each processor maintains a local accumulator array to
record detected line parameters (e.g., distance and angle values).
4. Global Accumulation: After processing, the local accumulator arrays are combined
into a global accumulator array. This can be done in two stages—local voting on
each processor, followed by merging.

a. Implementation Using OpenMP or MPI: Write pseudocode for

implementing the parallel Hough Transform using either OpenMP or MPI.
Ensure that your code accounts for accumulating results in a shared
accumulator array for line detection.

#include <omp.h>

#include <stdio.h>

#include <math.h>
SWE2017 - Parallel Programming Lab Assignment – 6

#define IMAGE_SIZE 1024

#define THETA_RES 180 // Resolution of angle

#define RHO_RES 512 // Resolution of distance

#define TILE_SIZE 256

int image[IMAGE_SIZE][IMAGE_SIZE]; // Input image

int accumulator[THETA_RES][RHO_RES] = {0}; // Shared accumulator

array

void detect_edges(int tile_x, int tile_y, int

edges[TILE_SIZE][TILE_SIZE]) {

// Perform edge detection on the tile (e.g., Sobel operator)

void hough_transform(int edges[TILE_SIZE][TILE_SIZE], int

local_acc[THETA_RES][RHO_RES]) {

for (int x = 0; x < TILE_SIZE; x++) {

for (int y = 0; y < TILE_SIZE; y++) {

if (edges[x][y]) { // if edge detected

for (int theta = 0; theta < THETA_RES; theta++) {

double rad = theta * M_PI / 180;

int rho = (int)(x * cos(rad) + y * sin(rad)) +

RHO_RES / 2;

if (rho >= 0 && rho < RHO_RES) {

#pragma omp atomic

local_acc[theta][rho]++;

int main() {
SWE2017 - Parallel Programming Lab Assignment – 6

int local_accumulators[OMP_NUM_THREADS][THETA_RES][RHO_RES] =
{0};

#pragma omp parallel

int tid = omp_get_thread_num();

for (int tile_x = tid * TILE_SIZE; tile_x < IMAGE_SIZE;

tile_x += TILE_SIZE * OMP_NUM_THREADS) {

for (int tile_y = 0; tile_y < IMAGE_SIZE; tile_y +=

TILE_SIZE) {

int edges[TILE_SIZE][TILE_SIZE] = {0};

detect_edges(tile_x, tile_y, edges);

hough_transform(edges, local_accumulators[tid]);

#pragma omp critical

for (int theta = 0; theta < THETA_RES; theta++) {

for (int rho = 0; rho < RHO_RES; rho++) {

accumulator[theta][rho] +=
local_accumulators[tid][theta][rho];

// Print the results from the accumulator or further process

return 0;

b. Data Synchronization and Communication: Discuss how synchronization

and data communication are managed among processors, especially in
updating the shared accumulator array. How would you handle the
accumulation of detected lines to avoid race conditions when multiple
threads or processes update the same data?
To avoid race conditions:
SWE2017 - Parallel Programming Lab Assignment – 6

• Local Accumulators: Each thread has its own local accumulator array
to avoid conflicts.
• Critical Section: After processing, threads add their local accumulator
values to the global accumulator array within a critical section.
• Atomic Operations: If multiple threads must update the same array
location directly, atomic operations can be used to prevent race
conditions.
In MPI, each process would work on its subarray and send results to a master
process for merging.
c. Performance and Scalability: Explain how parallelizing the Hough Transform
improves processing speed. How would this approach scale with image
resolution, and what factors might limit scalability in a real-time system?
Parallelizing the Hough Transform significantly reduces computation time by
allowing simultaneous edge and line detection in separate image segments.
• Scalability with Resolution: Higher resolutions increase the workload
but can also benefit more from parallelization, as each processor
works on a smaller portion of the image.
• Communication Overhead: Merging local accumulators may limit
scalability if too many processors are used, as this increases the
communication cost for synchronizing results.

d. Applying the Method on a Test Image: Given a sample 1024x1024 image

with a few distinct edges, illustrate how the algorithm would divide the
image, process each segment, and then combine results to detect the final
edges.

SWE2017 - Lab Assignment 1pages-7
No ratings yet
SWE2017 - Lab Assignment 1pages-7
5 pages
Algorithms For Parallel Machines
No ratings yet
Algorithms For Parallel Machines
7 pages
GPU-Accelerated Hough Transform for ALICE
No ratings yet
GPU-Accelerated Hough Transform for ALICE
24 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
All HPC Programs
No ratings yet
All HPC Programs
16 pages
Co 2
No ratings yet
Co 2
22 pages
Parallelizationbe
No ratings yet
Parallelizationbe
7 pages
OpenMP Shared
No ratings yet
OpenMP Shared
28 pages
756-4-17-2012 Parallel
No ratings yet
756-4-17-2012 Parallel
80 pages
Parallel Ization Be
No ratings yet
Parallel Ization Be
7 pages
Sforza - Chiara - Rossi - Matteo - ScientificComputing2 - HPC
No ratings yet
Sforza - Chiara - Rossi - Matteo - ScientificComputing2 - HPC
4 pages
HPC Codes-2
No ratings yet
HPC Codes-2
15 pages
Web GPU
0% (1)
Web GPU
40 pages
An Fpga Implementation of Hough Transform Using DSP Blocks and Block Rams
No ratings yet
An Fpga Implementation of Hough Transform Using DSP Blocks and Block Rams
7 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
PDC Report
No ratings yet
PDC Report
22 pages
High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
Hough Transform
No ratings yet
Hough Transform
2 pages
Decomposing A Problem For Parallel Execution - Pablo Halpern - CppCon 2014
No ratings yet
Decomposing A Problem For Parallel Execution - Pablo Halpern - CppCon 2014
48 pages
PDC LAB Experiment 2
No ratings yet
PDC LAB Experiment 2
12 pages
MC Openmp
No ratings yet
MC Openmp
10 pages
2021BCS0103 CSE411 Lab5
No ratings yet
2021BCS0103 CSE411 Lab5
11 pages
HPC Codes
No ratings yet
HPC Codes
18 pages
Parallel Programming
No ratings yet
Parallel Programming
10 pages
Graphics Patterns and Algorithms
No ratings yet
Graphics Patterns and Algorithms
25 pages
S09.s1 - Material
100% (1)
S09.s1 - Material
31 pages
qt6j57h5zw Nosplash
No ratings yet
qt6j57h5zw Nosplash
2 pages
Parallel Assignment 3
No ratings yet
Parallel Assignment 3
9 pages
HPC Printout 1
No ratings yet
HPC Printout 1
22 pages
HPC Practicals
No ratings yet
HPC Practicals
26 pages
Rohini 89299003921
No ratings yet
Rohini 89299003921
3 pages
(Serial)
No ratings yet
(Serial)
8 pages
Tuan 6 Hough Transform Principle2
No ratings yet
Tuan 6 Hough Transform Principle2
88 pages
CG
No ratings yet
CG
28 pages
Link For Video: 2f548388ec255440535e897?sid 9936f6 B2-C57d-49de-8124-3bb0e1a4e612
No ratings yet
Link For Video: 2f548388ec255440535e897?sid 9936f6 B2-C57d-49de-8124-3bb0e1a4e612
11 pages
HPC Lab Manual 2317 Merged Organized
No ratings yet
HPC Lab Manual 2317 Merged Organized
35 pages
Parallelizing The Standard Algorithms Library - Jared Hoberock - CppCon 2014
No ratings yet
Parallelizing The Standard Algorithms Library - Jared Hoberock - CppCon 2014
58 pages
Lecture 05 Expo
No ratings yet
Lecture 05 Expo
14 pages
OpenSeesSP Parallel Analysis Guide
No ratings yet
OpenSeesSP Parallel Analysis Guide
13 pages
Unit 3 CV and Di
No ratings yet
Unit 3 CV and Di
32 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
ECE408 MT2 Review FA24
No ratings yet
ECE408 MT2 Review FA24
58 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
Parallel Computing: Performance Evaluation
No ratings yet
Parallel Computing: Performance Evaluation
40 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
Content PDF
No ratings yet
Content PDF
14 pages
Eliminating The Hardware/Software Divide: Satnam Singh, Microsoft Research Cambridge, UK
No ratings yet
Eliminating The Hardware/Software Divide: Satnam Singh, Microsoft Research Cambridge, UK
146 pages
HPC Project Mpi
No ratings yet
HPC Project Mpi
17 pages
Bert 2b Parallel Algorithms BFS
No ratings yet
Bert 2b Parallel Algorithms BFS
35 pages
Compter Graphics File - 1
No ratings yet
Compter Graphics File - 1
28 pages
Cuda
No ratings yet
Cuda
7 pages
Data - Parallel Algorithms On Gpus
No ratings yet
Data - Parallel Algorithms On Gpus
31 pages
Hough2 PDF
No ratings yet
Hough2 PDF
20 pages
Converted Text
No ratings yet
Converted Text
25 pages
Hough Transform
No ratings yet
Hough Transform
20 pages
Computer Graphics Lab Guide
No ratings yet
Computer Graphics Lab Guide
12 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
08 Dataparallel
No ratings yet
08 Dataparallel
51 pages
2019-22-Hough Transform
No ratings yet
2019-22-Hough Transform
26 pages
Knowledge Management
No ratings yet
Knowledge Management
1 page
JD For AI Intern
No ratings yet
JD For AI Intern
1 page
SWE2029 - Agile Development Process - CAT2 - Answer Key
No ratings yet
SWE2029 - Agile Development Process - CAT2 - Answer Key
5 pages
Volt Poster
No ratings yet
Volt Poster
1 page
FOP 21 September
No ratings yet
FOP 21 September
40 pages
Engineering Internship Insights
No ratings yet
Engineering Internship Insights
57 pages
Sase Reviewer
No ratings yet
Sase Reviewer
161 pages
Combat and Magic Actions Guide
No ratings yet
Combat and Magic Actions Guide
7 pages
Mitsubishi: General Purpose AC SERVO
No ratings yet
Mitsubishi: General Purpose AC SERVO
15 pages
Linear Programming: (Graphical Method)
No ratings yet
Linear Programming: (Graphical Method)
10 pages
Computational Modeling of Nanometer-Scale Tribology
No ratings yet
Computational Modeling of Nanometer-Scale Tribology
6 pages
A10 Final Flight Plan 19690417
No ratings yet
A10 Final Flight Plan 19690417
278 pages
7-3.7 Energy
No ratings yet
7-3.7 Energy
13 pages
Continuity End Behavior and Limits 5
No ratings yet
Continuity End Behavior and Limits 5
39 pages
Temperature Transmitter Datasheet
No ratings yet
Temperature Transmitter Datasheet
6 pages
States of Matter
No ratings yet
States of Matter
19 pages
TC FN990 Family Hardware Design Guide r7
No ratings yet
TC FN990 Family Hardware Design Guide r7
84 pages
ALAR Catalogo
No ratings yet
ALAR Catalogo
84 pages
2023 Hood River Fruit Loop Map
No ratings yet
2023 Hood River Fruit Loop Map
1 page
LIGHTING in MUSEUM - Elements and Design Consideration
100% (1)
LIGHTING in MUSEUM - Elements and Design Consideration
38 pages
Forensic Examination of Sweat
No ratings yet
Forensic Examination of Sweat
8 pages
Tyco Switch
No ratings yet
Tyco Switch
30 pages
gpdk180 DRM PDF
No ratings yet
gpdk180 DRM PDF
313 pages
Panasonic Batteries Vrla For Professionals - Interactive PDF
No ratings yet
Panasonic Batteries Vrla For Professionals - Interactive PDF
148 pages
CAP413 - Phraseology
100% (1)
CAP413 - Phraseology
264 pages
Is 12766 1997
No ratings yet
Is 12766 1997
12 pages
CBR Calculus Differential
No ratings yet
CBR Calculus Differential
14 pages
India's Mineral & Energy Overview
No ratings yet
India's Mineral & Energy Overview
5 pages
Ifa Self-Initiation & Development
100% (7)
Ifa Self-Initiation & Development
14 pages
Intro to Carpentry & Woodwork
No ratings yet
Intro to Carpentry & Woodwork
3 pages
Assessment 2 - Practical Demonstration ICTSAS526
No ratings yet
Assessment 2 - Practical Demonstration ICTSAS526
17 pages
Printmaking Techniques Explained
No ratings yet
Printmaking Techniques Explained
21 pages
Module 01 Notes
No ratings yet
Module 01 Notes
25 pages

SWE2017 - Lab Assignment 1pages-7

Uploaded by

SWE2017 - Lab Assignment 1pages-7

Uploaded by

SWE2017 - Parallel Programming Lab Assignment – 6

1. Bitonic Sequence Formation: Divide the array into bitonic

2. Bitonic Merging: Recursively apply the Bitonic Merge operation,

3. Recursive Division: At each level, subsequences are halved until

Parallel Execution of Bitonic Sort

Bitonic Sort is highly parallelizable:

• Parallel Merge Operations: Each stage of Bitonic Merge can run in

• Low Communication Overhead: Processors only exchange data

Advantages in Parallel Systems

• Scalability: Bitonic Sort's fixed pattern of operations allows it to scale

• Efficiency on Large Data: The predictable, structured nature of

a. Implementation Using OpenMP or MPI: Write pseudocode for

#define IMAGE_SIZE 1024

#define THETA_RES 180 // Resolution of angle

#define RHO_RES 512 // Resolution of distance

#define TILE_SIZE 256

int image[IMAGE_SIZE][IMAGE_SIZE]; // Input image

int accumulator[THETA_RES][RHO_RES] = {0}; // Shared accumulator

void detect_edges(int tile_x, int tile_y, int

// Perform edge detection on the tile (e.g., Sobel operator)

void hough_transform(int edges[TILE_SIZE][TILE_SIZE], int

for (int x = 0; x < TILE_SIZE; x++) {

for (int y = 0; y < TILE_SIZE; y++) {

if (edges[x][y]) { // if edge detected

for (int theta = 0; theta < THETA_RES; theta++) {

double rad = theta * M_PI / 180;

int rho = (int)(x * cos(rad) + y * sin(rad)) +

if (rho >= 0 && rho < RHO_RES) {

#pragma omp atomic

#pragma omp parallel

int tid = omp_get_thread_num();

for (int tile_x = tid * TILE_SIZE; tile_x < IMAGE_SIZE;

for (int tile_y = 0; tile_y < IMAGE_SIZE; tile_y +=

int edges[TILE_SIZE][TILE_SIZE] = {0};

detect_edges(tile_x, tile_y, edges);

#pragma omp critical

for (int theta = 0; theta < THETA_RES; theta++) {

for (int rho = 0; rho < RHO_RES; rho++) {

// Print the results from the accumulator or further process

b. Data Synchronization and Communication: Discuss how synchronization

d. Applying the Method on a Test Image: Given a sample 1024x1024 image

You might also like