KEMBAR78
SWE2017 - Lab Assignment 1pages-7 | PDF | Parallel Computing | Computer Science
0% found this document useful (0 votes)
67 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SWE2017 - Parallel Programming Lab Assignment – 6

return 0;
}

Output:

Sequence={23,45,12,67,89,34,78,10,99,88,76,55,44,32,21,18}
a. Implement a Bitonic Sequence Sorting Program: Write a program that sorts
this 16element sequence using the Bitonic Sort algorithm. The program
should construct the sequence into bitonic sub-sequences and then apply
the bitonic merge operation to achieve a fully sorted sequence.
Code at the end of the question with output
b. Construct a Bitonic Merge Network: Draw the bitonic merge network for
this 16element sequence. Illustrate the sequence of comparisons and
exchanges at each stage of the network, showing how elements are
compared and swapped to achieve sorting.
Step: 1 - 23, 45, 12, 67, 89, 34, 78, 10, 99, 88, 76, 55, 44, 32, 21, 18
Step: 2 - 23, 12, 45, 67, 34, 89, 10, 78, 88, 99, 55, 76, 32, 44, 21, 18
Step: 3 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 21, 32, 18, 44
Step: 4 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 18, 21, 32, 44
Step: 5 - 10, 12, 23, 34, 45, 67, 78, 89, 21, 32, 44, 55, 18, 55, 76, 99
Step: 6(Sorted) - 10, 12, 18, 21, 23, 32, 34, 44, 45, 55, 55, 67, 76, 78, 89, 99
c. Explanation: Explain how the bitonic sequence is divided and merged
through each stage of the network. Also, provide a brief description of how
the Bitonic Sort algorithm operates in parallel and discuss its advantages for
sorting large datasets in parallel systems.

1. Bitonic Sequence Formation: Divide the array into bitonic


subsequences where each half is sorted in opposite directions (one
ascending, one descending).

2. Bitonic Merging: Recursively apply the Bitonic Merge operation,


which compares and swaps elements to build progressively larger
sorted subsequences.

3. Recursive Division: At each level, subsequences are halved until


single-element comparisons are reached.
SWE2017 - Parallel Programming Lab Assignment – 6

Parallel Execution of Bitonic Sort

Bitonic Sort is highly parallelizable:

• Parallel Merge Operations: Each stage of Bitonic Merge can run in


parallel on different processors, with each processor handling
comparisons and swaps within a specific segment.

• Low Communication Overhead: Processors only exchange data


during merge phases, making it efficient for parallel systems.

Advantages in Parallel Systems

• Scalability: Bitonic Sort's fixed pattern of operations allows it to scale


well on multiple processors.

• Efficiency on Large Data: The predictable, structured nature of


Bitonic Sort makes it ideal for sorting large datasets in systems where
many processors work simultaneously.

18.
To parallelize the Hough-Transform for lane boundary detection:
1. Image Segmentation: Divide the 1024x1024 image into smaller sub-images or "tiles"
(e.g., 256x256 blocks) that can be processed independently. Each tile will undergo
edge detection (using, for instance, a Sobel filter) to find points that may contribute
to lines.
2. Distributed Hough Transform: Each processor applies the Hough Transform to its
assigned tile. This involves mapping edge points in the image space to potential lines
in Hough space.
3. Local Accumulator Array: Each processor maintains a local accumulator array to
record detected line parameters (e.g., distance and angle values).
4. Global Accumulation: After processing, the local accumulator arrays are combined
into a global accumulator array. This can be done in two stages—local voting on
each processor, followed by merging.

a. Implementation Using OpenMP or MPI: Write pseudocode for


implementing the parallel Hough Transform using either OpenMP or MPI.
Ensure that your code accounts for accumulating results in a shared
accumulator array for line detection.

#include <omp.h>

#include <stdio.h>

#include <math.h>
SWE2017 - Parallel Programming Lab Assignment – 6

#define IMAGE_SIZE 1024

#define THETA_RES 180 // Resolution of angle

#define RHO_RES 512 // Resolution of distance

#define TILE_SIZE 256

int image[IMAGE_SIZE][IMAGE_SIZE]; // Input image

int accumulator[THETA_RES][RHO_RES] = {0}; // Shared accumulator


array

void detect_edges(int tile_x, int tile_y, int


edges[TILE_SIZE][TILE_SIZE]) {

// Perform edge detection on the tile (e.g., Sobel operator)

void hough_transform(int edges[TILE_SIZE][TILE_SIZE], int


local_acc[THETA_RES][RHO_RES]) {

for (int x = 0; x < TILE_SIZE; x++) {

for (int y = 0; y < TILE_SIZE; y++) {

if (edges[x][y]) { // if edge detected

for (int theta = 0; theta < THETA_RES; theta++) {

double rad = theta * M_PI / 180;

int rho = (int)(x * cos(rad) + y * sin(rad)) +


RHO_RES / 2;

if (rho >= 0 && rho < RHO_RES) {

#pragma omp atomic

local_acc[theta][rho]++;

int main() {
SWE2017 - Parallel Programming Lab Assignment – 6

int local_accumulators[OMP_NUM_THREADS][THETA_RES][RHO_RES] =
{0};

#pragma omp parallel

int tid = omp_get_thread_num();

for (int tile_x = tid * TILE_SIZE; tile_x < IMAGE_SIZE;


tile_x += TILE_SIZE * OMP_NUM_THREADS) {

for (int tile_y = 0; tile_y < IMAGE_SIZE; tile_y +=


TILE_SIZE) {

int edges[TILE_SIZE][TILE_SIZE] = {0};

detect_edges(tile_x, tile_y, edges);

hough_transform(edges, local_accumulators[tid]);

#pragma omp critical

for (int theta = 0; theta < THETA_RES; theta++) {

for (int rho = 0; rho < RHO_RES; rho++) {

accumulator[theta][rho] +=
local_accumulators[tid][theta][rho];

// Print the results from the accumulator or further process

return 0;

b. Data Synchronization and Communication: Discuss how synchronization


and data communication are managed among processors, especially in
updating the shared accumulator array. How would you handle the
accumulation of detected lines to avoid race conditions when multiple
threads or processes update the same data?
To avoid race conditions:
SWE2017 - Parallel Programming Lab Assignment – 6

• Local Accumulators: Each thread has its own local accumulator array
to avoid conflicts.
• Critical Section: After processing, threads add their local accumulator
values to the global accumulator array within a critical section.
• Atomic Operations: If multiple threads must update the same array
location directly, atomic operations can be used to prevent race
conditions.
In MPI, each process would work on its subarray and send results to a master
process for merging.
c. Performance and Scalability: Explain how parallelizing the Hough Transform
improves processing speed. How would this approach scale with image
resolution, and what factors might limit scalability in a real-time system?
Parallelizing the Hough Transform significantly reduces computation time by
allowing simultaneous edge and line detection in separate image segments.
• Scalability with Resolution: Higher resolutions increase the workload
but can also benefit more from parallelization, as each processor
works on a smaller portion of the image.
• Communication Overhead: Merging local accumulators may limit
scalability if too many processors are used, as this increases the
communication cost for synchronizing results.

d. Applying the Method on a Test Image: Given a sample 1024x1024 image


with a few distinct edges, illustrate how the algorithm would divide the
image, process each segment, and then combine results to detect the final
edges.

You might also like