SP24 DS&A Week01 Introduction and Basics
SP24 DS&A Week01 Introduction and Basics
Week01 (14/15/16-Feb-2024)
M Ateeq,
Department of Data Science, The Islamia University of Bahawalpur.
A data structure is a way of organizing and storing data to perform operations efficiently. It defines
the relationship between the data and the operations that can be performed on that data. Data
structures are essential for managing and manipulating information effectively in computer
programs.
In computer systems, data is represented using binary digits, which are 0s and 1s. This binary
representation is the fundamental language that computers understand. Data structures, on the
other hand, provide a higher-level abstraction for organizing and managing this binary data. They
define how data is arranged, accessed, and manipulated in a way that aligns with the underlying
binary representation.
For example, an array is a common data structure that can be used to store a collection of
elements. Each element in the array corresponds to a memory location, and internally, these
memory locations are represented in binary (0s and 1s). The data structure, in this case, organizes
and abstracts the underlying binary representation, providing a more accessible and efficient way to
work with collections of data.
While data types and data structures are related concepts, they are not the same.
Data Types: Data types define the type of data that a variable can hold. They specify the
nature of the data and the operations that can be performed on it. Examples of data types
include integers, floating-point numbers, characters, etc.
Example:
int age = 25; // 'age' is a variable with the data type integer.
Data Structures: Data structures, on the other hand, organize and store multiple variables (of
various data types) to facilitate efficient operations on a collection of data.
Example:
struct Person {
string name;
int age;
};
Here, Person is a data structure that combines variables of different data types ( name of
type string and age of type integer) to represent information about a person.
In short, data types define the nature of individual pieces of data, while data structures organize and
store multiple pieces of data to enable efficient operations on collections of information. The
relationship lies in how data structures leverage the underlying binary representation used by
What: Well-designed data structures enable quick access and retrieval of data.
Why: Fast access is crucial for efficient algorithm execution and responsive systems.
3. Optimized Space Utilization:
What: Algorithms often rely on specific data structures for efficient implementation.
Why: The choice of appropriate data structures influences algorithmic efficiency and
execution speed.
1. Organization of Data:
How: Data structures define the arrangement and organization of data elements.
Example: In an array, elements are stored in a contiguous memory location.
2. Operations on Data:
How: Data structures provide methods for performing operations on stored data.
Example: A linked list allows insertion and deletion of elements with ease.
3. Memory Management:
How: Data structures determine how quickly data can be searched and retrieved.
Example: Binary search in a sorted array for faster retrieval.
What: Piling books one on top of another follows the Last-In-First-Out (LIFO) principle,
similar to a stack.
Why: Easily retrieve the last placed book without disturbing the entire pile.
3. Queue in Waiting Lines:
Reflection MCQs:
a) Randomizing data
b) Organizing and storing data efficiently
c) Ignoring data
d) None of the above
Click to reveal the answer
2. Why is efficient data management important?
a) To complicate algorithms
b) To optimize algorithmic performance and system resources
c) To avoid data organization
d) None of the above
Click to reveal the answer
a) Color of data
b) Organization of data elements
c) Ignoring data arrangement
d) None of the above
Click to reveal the answer
4. Which everyday example represents a queue?
a) Grocery list
b) Waiting in line
c) Piling books
d) None of the above
Click to reveal the answer
Data structures play a fundamental role in programming, serving as the foundation for efficient
algorithm design and implementation. They act as organizational tools for data, enabling
programmers to manage and manipulate information in a way that optimizes computational
resources.
What: Data structures provide efficient mechanisms for storing and organizing data.
Why: Efficient storage is crucial for minimizing memory usage and improving overall
program performance.
2. Fast Search and Retrieval:
What: Well-designed data structures facilitate quick search and retrieval of data.
Why: Speedy access is essential for executing algorithms efficiently, leading to faster
program execution.
3. Effective Memory Management:
What: The choice of appropriate data structures significantly impacts algorithmic efficiency.
Why: Efficient data structures lead to algorithms with faster execution times, reducing
computational overhead.
Consider the example of searching for an element in a collection using different data structures.
1. Array-based Search:
if (arr[mid] == target) {
return mid; // Element found at index mid
} else if (arr[mid] < target) {
low = mid + 1; // Search in the right half
} else {
high = mid - 1; // Search in the left half
}
}
int main() {
const int size = 7;
int sortedArray[size] = {2, 5, 8, 12, 16, 23, 38};
if (result != -1) {
std::cout << "Key found at index: " << result << std::endl;
} else {
std::cout << "Key not found in the array." << std::endl;
}
return 0;
}
binarySearch Function:
int binarySearch(int arr[], int low, int high, int key) {
int binarySearch(int arr[], int low, int high, int key) { : This line defines the
binarySearch function, which takes four parameters:
arr[] : An array of integers to be searched.
low : The lower bound of the search range.
high : The upper bound of the search range.
key : The key to be searched for in the array.
while (low <= high) { : This line starts a while loop that continues as long as the lower
bound low is less than or equal to the upper bound high .
int mid = low + (high - low) / 2; : This line calculates the middle index mid of the
current search range using the formula (low + high) / 2 . It avoids potential overflow issues
by using (high - low) / 2 instead of (low + high) / 2 .
if (arr[mid] == key)
return mid;
if (arr[mid] == key) return mid; : This line checks if the element at the middle index
mid is equal to the key . If it is, it means the key has been found, and the function returns the
index mid .
else if (arr[mid] < key) low = mid + 1; : If the element at the middle index is less
than the key, it means the key might be in the right half of the current range. Therefore, the
lower bound low is updated to mid + 1 .
else
high = mid - 1;
}
else high = mid - 1; : If the element at the middle index is greater than the key, it means
the key might be in the left half of the current range. Therefore, the upper bound high is
updated to mid - 1 .
The while loop continues until the key is found or the search range is exhausted.
return -1; : If the while loop exits without finding the key, the function returns -1 to
indicate that the key was not found in the array.
main Function:
#include <iostream>
#include <iostream> : This line includes the standard input-output stream library, which is
necessary for using std::cout and related functionality.
int main() {
int main() { : This line defines the main function, which is the entry point of the program.
const int size = 7; : This line declares a constant integer size and initializes it with the
value 7 , representing the size of the array.
int sortedArray[size] = {2, 5, 8, 12, 16, 23, 38}; : This line declares an array
named sortedArray and initializes it with sorted values.
if (result != -1) {
if (result != -1) { : This line checks if the result returned by the binarySearch function
is not equal to -1 , indicating that the key was found.
std::cout << "Key found at index: " << result << std::endl;
std::cout << "Key found at index: " << result << std::endl; : If the key is found,
this line prints a message indicating the index where the key was found.
} else {
} else { : This line is the beginning of the else block, which is executed if the key is not
found.
std::cout << "Key not found in the array." << std::endl; : This line prints a
message indicating that the key was not found in the array.
return 0;
}
return 0; : This line indicates successful program termination, and the main function returns
0 to the operating system, signifying that the program ran successfully.
Reflection MCQs:
a) Irrelevant
b) Foundational for efficient algorithm design and implementation
c) Obstructive to programming
d) None of the above
Click to reveal the answer
2. Why is optimized data storage important in programming?
a) To complicate algorithms
b) To minimize memory usage and improve program performance
c) To avoid data storage
d) None of the above
Click to reveal the answer
3. How do efficient data structures contribute to enhanced algorithmic efficiency?
a) Linear search
b) Binary search in a sorted array
c) Recursive search
d) None of the above
Click to reveal the answer
Purpose of Pseudocode:
What is Pseudocode?
1. Step-by-Step Representation:
2. Algorithm Visualization:
How: Pseudocode allows for the visualization of the algorithm's flow and structure.
Example:
If the weather is sunny
Go for a walk
Else
Stay indoors and read a book
End If
3. Logic Clarification:
4. Iterative Processes:
Reflection MCQs:
a) To confuse programmers
b) To emphasize syntax details
c) To allow designers to focus on essential steps and flow
d) None of the above
Click to reveal the answer
3. How does pseudocode contribute to algorithm planning?
The purpose of walking through simple algorithms in pseudocode is to provide a clear and step-by-
step representation of how various algorithms can be expressed in a language-independent and
readable manner. It serves as an educational tool for understanding the logic and structure of
algorithms before transitioning to actual code.
Examples:
Explanation:
The algorithm iterates through each element in the list.
If the current element equals the target, it returns the index.
If no match is found, it returns -1.
Explanation:
The algorithm calculates the factorial of a given number n .
It initializes the result to 1 and multiplies it by each integer from 1 to n .
Translate the idea of calculating the sum of elements in an array into pseudocode.
Procedure CalculateSum(arr)
sum = 0
for each element in arr do
sum = sum + element
end for
return sum
End Procedure
Translate the idea of finding the maximum element in an array into pseudocode.
Procedure FindMaxElement(arr)
max = arr[0]
for i from 1 to length of arr - 1 do
if arr[i] > max then
max = arr[i]
end if
end for
return max
End Procedure
Reflection MCQs:
3. In the pseudocode for calculating the factorial, what is the purpose of the loop?
a) To subtract numbers
b) To calculate the sum of elements
c) To multiply each integer from 1 to n with the result
d) None of the above
Click to reveal the answer
In Lecture 1, we delved into the foundational concepts of data structures and algorithms,
understanding their significance in programming and problem-solving. Let's recap the key points
covered:
What: Data structures are specialized formats for organizing and storing data.
Why: They facilitate efficient data management, improve access and retrieval, and optimize
memory usage.
How: Examples include arrays, linked lists, stacks, and queues.
2. Importance of Data Structures:
Grading Strucutre:
Week3: Assessment 3%
Week5: Assessment 7%
Midterm: 30%
Week11: Assessment 3%
Week13: Assessment 7%
Project: 15%
Final: 35%
Definition of Algorithms:
What are Algorithms?
Algorithms are step-by-step procedures or formulas for solving problems. They are a fundamental
concept in computer science and play a pivotal role in computing. An algorithm is a precisely
defined set of instructions that, when executed, accomplishes a particular task or solves a specific
problem.
1. Problem Solving:
Algorithms take input, process it through a series of steps, and produce an output.
Input and output are well-defined, and the process is deterministic.
2. Sequential Execution:
2. Driving Directions:
Reflection MCQs:
a) Sequential Execution
b) Efficient Problem-Solving
c) Termination
d) None of the above
Click to reveal the answer
1. Correctness:
What: An algorithm produces the correct output for all possible inputs.
Why: Ensures the algorithm accurately solves the intended problem.
How: Rigorous testing and verification processes to guarantee correctness.
2. Efficiency:
Extra Material
Example: Efficient Sorting Algorithm - Merge Sort (C++):
#include <iostream>
#include <vector>
int main() {
std::vector<int> arr = {12, 11, 13, 5, 6, 7};
int size = arr.size();
return 0;
}
Merge Sort is a popular sorting algorithm that follows the divide-and-conquer paradigm. The key
idea behind Merge Sort is to break down the unsorted list into smaller subproblems, solve them
independently, and then combine the solutions to produce a sorted result.
1. Divide:
The unsorted list is divided into two halves recursively until each sublist contains only one
element. This process continues until we have sublists that are inherently sorted due to
containing only one element.
2. Conquer:
The conquer phase involves merging the sorted sublists back together to create larger
sorted sublists. This is achieved by comparing elements from the two sublists and merging
them in ascending order.
3. Combine:
The merging process continues until the entire list is reconstructed in a sorted manner. At
each step, two smaller sorted sublists are combined to create a larger sorted sublist.
Intuitive Explanation:
Imagine sorting a stack of numbered playing cards. Merge Sort would work as follows:
1. Divide:
Now, start combining the smaller stacks into larger, sorted stacks.
Compare the cards in each pair of stacks, placing them in sorted order as you combine
them. Continue this process until you have larger, sorted stacks.
3. Combine:
Keep combining the larger sorted stacks until you have a fully sorted stack representing
the entire deck of cards.
The divide-and-conquer approach ensures that the problem is broken down into smaller,
more manageable subproblems.
2. Predictable Performance:
Merge Sort has a consistent, predictable performance of O(n log n), making it efficient for
large datasets. The logarithmic term comes from the recursive halving of the dataset during
the divide phase.
3. Stability:
Merge Sort is a stable sorting algorithm, meaning that the relative order of equal elements
is preserved. This is important in situations where the original order of equal elements
matters.
4. Parallelization:
Merge Sort is inherently parallelizable, making it well-suited for parallel processing. Multiple
processors or threads can efficiently merge the sorted sublists concurrently.
5. No Worst-Case Scenario:
Unlike some other sorting algorithms, Merge Sort does not have a worst-case scenario. Its
time complexity remains O(n log n) regardless of the initial order of the elements.
6. Memory Efficiency:
Merge Sort is memory-efficient as it uses additional space for merging, but the space
complexity is still O(n), making it feasible for systems with limited memory.
In short, Merge Sort's efficiency stems from its ability to divide the problem into smaller parts,
achieve sorted results within those parts, and then efficiently combine those results to produce a
fully sorted dataset. Its predictable performance, stability, parallelization potential, and memory
efficiency contribute to its popularity and usefulness in various scenarios.
Divide:
merge function is responsible for merging two sorted halves of the array. The parameters
left , middle , and right represent the indices that define the two halves to be
merged.
Divide:
n1 and n2 represent the sizes of the two halves to be merged.
L and R are temporary arrays to store the elements of the left and right halves.
Divide:
Copy the elements of the left and right halves into the temporary arrays L and R .
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
Combine:
If there are remaining elements in either the left or right half, they are copied back into the
original array.
Divide:
The mergeSort function implements the divide phase by recursively calling itself on the
left and right halves of the array.
Conquer and Combine:
The merge function is called to merge the sorted left and right halves, implementing the
conquer and combine phases.
This code implements the Merge Sort algorithm, and the functions mergeSort and merge work
together to achieve the divide-and-conquer strategy explained earlier. The recursive nature of
mergeSort ensures that the array is continually divided until individual elements are reached, and
the merge function efficiently combines and sorts these elements during the merging process.
Trace Tree:
[38, 27, 43, 3, 9, 82, 10]
|
-----------------------------
| |
[38, 27, 43, 3] [9, 82, 10]
| |
-------------------- ---------------
| | | |
[38, 27] [43, 3] [9] [82, 10]
| | |
-------- --------- --------
| | | | | |
[38] [27] [43] [3] [82] [10]
Explanation:
The array is recursively divided into two halves until subarrays with only one element are
reached.
The merging process then combines and sorts these individual elements back into larger,
sorted subarrays.
This process continues until the entire array is merged into a single, sorted array.
Efficiency:
The efficiency of Merge Sort is evident in its divide-and-conquer strategy, where the array is
efficiently divided into smaller subproblems.
The sorting of these smaller subproblems is efficient because each subproblem involves sorting
relatively small lists, which is inherently faster.
The merging process efficiently combines these sorted subarrays back into larger, sorted
subarrays.
The time complexity of Merge Sort is O(n log n), making it efficient for large datasets, as the log
n factor accounts for the recursive division of the array.
In the example trace tree, notice how the array is divided into smaller and sorted subarrays, and
then these sorted subarrays are efficiently merged to create larger and sorted subarrays. This
hierarchical structure allows Merge Sort to achieve a consistent and predictable performance,
making it suitable for various sorting applications.
Reflection MCQs:
Linear search is a straightforward algorithm used to find the position of a target value within a list. It
sequentially checks each element of the list until a match is found or the entire list has been
traversed.
1. Initialization:
What: The algorithm starts with the first element in the list.
Why: Initializes the process and prepares for sequential traversal.
2. Sequential Comparison:
Step-by-Step Walkthrough:
#include <iostream>
#include <vector>
int main() {
std::vector<int> myList = {10, 4, 7, 2, 5, 8};
int targetValue = 5;
if (result != -1) {
std::cout << "Target found at index: " << result << std::endl;
} else {
std::cout << "Target not found in the list." << std::endl;
}
return 0;
}
Linear Search Code Explained:
#include <iostream>
#include <vector>
#include <iostream> : This line includes the standard input-output stream library, which
provides functionalities for input and output operations.
#include <vector> : This line includes the standard vector library, which provides a dynamic
array-like data structure called vector.
for (int i = 0; i < list.size(); ++i) { : This line starts a for loop that iterates over
each element of the vector list . It initializes an integer variable i to 0, and the loop
continues as long as i is less than the size of the vector list . After each iteration, i is
incremented by 1 ( ++i ).
if (list[i] == target) {
if (list[i] == target) { : This line checks if the current element of the vector list at
index i is equal to the target value. If they are equal, it means the target value has been
found in the list.
return i; : If the target value is found in the list, this line returns the index i where the
target value was found.
If the target value is not found in the list after iterating through all elements, the loop exits, and
the following line is executed.
return -1; : This line returns -1 to indicate that the target value was not found in the list
after searching through all elements.
int main() {
int main() { : This line defines the main function, which is the entry point of the program.
if (result != -1) {
if (result != -1) { : This line checks if the result returned by the linearSearch
function is not equal to -1 , which means the target value was found in the list.
std::cout << "Target found at index: " << result << std::endl;
std::cout << "Target found at index: " << result << std::endl; : If the target
value is found, this line prints a message indicating the index where the target value was found.
} else {
} else { : This line is the beginning of the else block, which is executed if the target value
is not found in the list.
std::cout << "Target not found in the list." << std::endl; : This line prints a
message indicating that the target value was not found in the list.
return 0;
}
return 0; : This line indicates successful program termination and is the end of the main
function. It returns 0 to the operating system, indicating that the program ran successfully.
Reflection MCQs:
a) 0
b) -1
c) 1
d) None of the above
Click to reveal the answer
3. In the step-by-step walkthrough, what does the algorithm return when the target is
found?
a) -1
b) The index of the matched element
c) The target value
d) None of the above
Click to reveal the answer
4. When is linear search not the most efficient choice for searching?
Short Questions:
1. Data Structures Overview:
6. Binary Search:
Bubble Sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent
elements, and swaps them if they are in the wrong order. The pass through the list is repeated until
the entire list is sorted. The algorithm gets its name because smaller elements "bubble" to the top of
the list during each pass.
// Implement the Bubble Sort algorithm to
// sort an array of integers in ascending order.
#include <iostream>
int main() {
const int size = 5;
int array[size] = {10, 4, 7, 2, 5};
bubbleSort(array, size);
return 0;
}
Line-by-Line Explanation:
This is the outer loop that iterates through the array from the first element to the second-to-
last element.
i represents the pass number.
3. for (int j = 0; j < size - i - 1; ++j) {
This is the inner loop that compares and swaps adjacent elements within each pass.
The range of the inner loop decreases with each pass ( size - i - 1 ), as the largest
element gets placed in its final position.
4. if (arr[j] > arr[j + 1]) {
The efficiency of the Bubble Sort algorithm is often characterized by its simplicity, but it's important
to note that its efficiency is generally lower compared to more advanced sorting algorithms for larger
datasets. Let's delve into an intuitive explanation of the efficiency of Bubble Sort:
Bubble Sort has a time complexity of O(n^2) in the worst and average cases. This implies that
the number of operations grows quadratically with the size of the input array.
In each pass through the array, Bubble Sort compares and swaps adjacent elements. For an
array of size n, it performs n-1 comparisons and potentially n-1 swaps in a single pass.
3. Bubble-Up Behavior:
The name "Bubble Sort" is derived from the way smaller elements gradually "bubble up" to their
correct positions at the beginning of the array after multiple passes. Larger elements,
conversely, sink towards the end of the array.
5. Adaptive Nature:
Bubble Sort exhibits adaptability in the sense that if the array is nearly sorted, it requires fewer
passes to reach the sorted state. In the best-case scenario (already sorted array), the time
complexity is O(n), making it adaptive in such situations.
6. Easy to Implement:
While not the most efficient, Bubble Sort is straightforward to understand and implement. It is
often used for educational purposes or on small datasets where its simplicity can be
advantageous.
7. Space Complexity:
Bubble Sort has a space complexity of O(1), meaning it uses a constant amount of extra
memory regardless of the input size. This is an advantage in terms of space utilization.
In Summary: Bubble Sort's efficiency lies in its simplicity, ease of implementation, and adaptability
to partially sorted data. However, its quadratic time complexity makes it less suitable for large
datasets in practical applications, where more efficient algorithms like Merge Sort or QuickSort are
often preferred.
Code Exercises:
Problem Statement: You are tasked with implementing the Selection Sort algorithm to sort an array
of integers in ascending order. Selection Sort works by repeatedly finding the minimum element
from the unsorted part of the array and putting it at the beginning. Your task is to complete the
provided C++ code for the Selection Sort algorithm.
#include <iostream>
int main() {
const int size = 6;
int array[size] = {64, 25, 12, 22, 11, 1};
selectionSort(array, size);
return 0;
}
Problem Statement: You are given a sorted array of integers, and your task is to implement a
recursive Binary Search algorithm to find the index of a specific target element. If the target is
present in the array, your function should return its index; otherwise, return -1. Complete the
provided C++ code for the recursive Binary Search algorithm.
#include <iostream>
int binarySearch(const int arr[], int low, int high, int key) {
// TODO: Implement the recursive binary search here
}
Instructions:
For Exercise 1, complete the code for the swap operation inside the Selection Sort algorithm.
For Exercise 2, implement the recursive Binary Search algorithm following the principles of
binary search by dividing the array in half during each recursive call.
Please feel free to share your problems to ensure that your weekly tasks are completed and
you are not staying behind
Grading Strucutre:
Week3: Assessment 4%
Week5: Assessment 6%
Midterm: 30%
Week11: Assessment 4%
Week13: Assessment 6%
Project: 15%
Final: 35%
In [ ]: