SWE2017 - Lab Assignment 1pages-7
SWE2017 - Lab Assignment 1pages-7
return 0;
}
Output:
Sequence={23,45,12,67,89,34,78,10,99,88,76,55,44,32,21,18}
a. Implement a Bitonic Sequence Sorting Program: Write a program that sorts
this 16element sequence using the Bitonic Sort algorithm. The program
should construct the sequence into bitonic sub-sequences and then apply
the bitonic merge operation to achieve a fully sorted sequence.
Code at the end of the question with output
b. Construct a Bitonic Merge Network: Draw the bitonic merge network for
this 16element sequence. Illustrate the sequence of comparisons and
exchanges at each stage of the network, showing how elements are
compared and swapped to achieve sorting.
Step: 1 - 23, 45, 12, 67, 89, 34, 78, 10, 99, 88, 76, 55, 44, 32, 21, 18
Step: 2 - 23, 12, 45, 67, 34, 89, 10, 78, 88, 99, 55, 76, 32, 44, 21, 18
Step: 3 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 21, 32, 18, 44
Step: 4 - 12, 23, 34, 45, 10, 78, 67, 89, 55, 88, 76, 99, 18, 21, 32, 44
Step: 5 - 10, 12, 23, 34, 45, 67, 78, 89, 21, 32, 44, 55, 18, 55, 76, 99
Step: 6(Sorted) - 10, 12, 18, 21, 23, 32, 34, 44, 45, 55, 55, 67, 76, 78, 89, 99
c. Explanation: Explain how the bitonic sequence is divided and merged
through each stage of the network. Also, provide a brief description of how
the Bitonic Sort algorithm operates in parallel and discuss its advantages for
sorting large datasets in parallel systems.
18.
To parallelize the Hough-Transform for lane boundary detection:
1. Image Segmentation: Divide the 1024x1024 image into smaller sub-images or "tiles"
(e.g., 256x256 blocks) that can be processed independently. Each tile will undergo
edge detection (using, for instance, a Sobel filter) to find points that may contribute
to lines.
2. Distributed Hough Transform: Each processor applies the Hough Transform to its
assigned tile. This involves mapping edge points in the image space to potential lines
in Hough space.
3. Local Accumulator Array: Each processor maintains a local accumulator array to
record detected line parameters (e.g., distance and angle values).
4. Global Accumulation: After processing, the local accumulator arrays are combined
into a global accumulator array. This can be done in two stages—local voting on
each processor, followed by merging.
#include <omp.h>
#include <stdio.h>
#include <math.h>
SWE2017 - Parallel Programming Lab Assignment – 6
local_acc[theta][rho]++;
int main() {
SWE2017 - Parallel Programming Lab Assignment – 6
int local_accumulators[OMP_NUM_THREADS][THETA_RES][RHO_RES] =
{0};
hough_transform(edges, local_accumulators[tid]);
accumulator[theta][rho] +=
local_accumulators[tid][theta][rho];
return 0;
• Local Accumulators: Each thread has its own local accumulator array
to avoid conflicts.
• Critical Section: After processing, threads add their local accumulator
values to the global accumulator array within a critical section.
• Atomic Operations: If multiple threads must update the same array
location directly, atomic operations can be used to prevent race
conditions.
In MPI, each process would work on its subarray and send results to a master
process for merging.
c. Performance and Scalability: Explain how parallelizing the Hough Transform
improves processing speed. How would this approach scale with image
resolution, and what factors might limit scalability in a real-time system?
Parallelizing the Hough Transform significantly reduces computation time by
allowing simultaneous edge and line detection in separate image segments.
• Scalability with Resolution: Higher resolutions increase the workload
but can also benefit more from parallelization, as each processor
works on a smaller portion of the image.
• Communication Overhead: Merging local accumulators may limit
scalability if too many processors are used, as this increases the
communication cost for synchronizing results.