KEMBAR78
DIstributed System Merged | PDF | Distributed Computing | Computing
0% found this document useful (0 votes)
125 views46 pages

DIstributed System Merged

The document consists of a series of assignments and solutions related to distributed systems, covering key concepts such as message passing, transparency, fault tolerance, and the use of global snapshots. It includes multiple-choice questions with correct answers and explanations for each concept. The content emphasizes the characteristics, protocols, and challenges associated with distributed systems.

Uploaded by

anishpujari25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views46 pages

DIstributed System Merged

The document consists of a series of assignments and solutions related to distributed systems, covering key concepts such as message passing, transparency, fault tolerance, and the use of global snapshots. It includes multiple-choice questions with correct answers and explanations for each concept. The content emphasizes the characteristics, protocols, and challenges associated with distributed systems.

Uploaded by

anishpujari25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Week 0: Assignment 0

Your last recorded submission was on 2025-07-13, 12:04 IST


1 point
What is a distributed system?
A single computer system with multiple processors.
A collection of independent computers that appears to its users as a single coherent
system.
A system where all components are located on the same physical machine.
A system that does not require networking capabilities.
Yes, the answer is correct.
Score: 1
Accepted Answers:
A collection of independent computers that appears to its users as a single coherent
system.
1 point
Which of the following is NOT a type of transparency in a distributed system?
Access transparency
Failure transparency

Time transparency
Location transparency
Yes, the answer is correct.
Score: 1
Accepted Answers:
Time transparency
1 point
What is the primary goal of a distributed system?
To increase the complexity of the system.
To appear as a single coherent system to the user.
To reduce the number of computers needed.
To operate independently of other systems.
Yes, the answer is correct.
Score: 1
Accepted Answers:
To appear as a single coherent system to the user.
1 point
What is message passing in the context of distributed systems?
The process of executing a program on a single computer.
The method of exchanging messages between processes to communicate and synchronize
actions.
The method of storing messages in a database.
The process of sending messages to a printer.
Yes, the answer is correct.
Score: 1
Accepted Answers:
The method of exchanging messages between processes to communicate and synchronize
actions.
1 point
Which term describes when a message is sent to a specific process?
Unicast
Broadcast

Multicast
Anycast
Yes, the answer is correct.
Score: 1
Accepted Answers:
Unicast
1 point
Which protocol is commonly used for reliable message passing in distributed systems?
HTTP
FTP
TCP
UDP
Yes, the answer is correct.
Score: 1
Accepted Answers:
TCP
1 point
Which of the following is an example of a message-passing interface standard?
SQL
MPI (Message Passing Interface)
HTTP
REST
Yes, the answer is correct.
Score: 1
Accepted Answers:
MPI (Message Passing Interface)
1 point
In a synchronous communication model, what must happen for a message exchange to
complete?
Only the sender needs to be active.
Both sender and receiver must be active and ready to communicate.
The sender sends the message and continues its work.
The receiver processes the message at its convenience.
Yes, the answer is correct.
Score: 1
Accepted Answers:
Both sender and receiver must be active and ready to communicate.
1 point
What does "latency" refer to in message passing?
The time taken to encode a message.
The time taken to decode a message.
The time taken for a message to travel from sender to receiver.

The time taken for a message to be stored in the queue.


Yes, the answer is correct.
Score: 1
Accepted Answers:
The time taken for a message to travel from sender to receiver.

1 point
What is a major challenge associated with message passing in distributed systems?
Ensuring that messages are sent only once.
Maintaining a global clock.
Achieving message ordering and delivery guarantees.
Centralizing control.
Yes, the answer is correct.
Score: 1
Accepted Answers:
Achieving message ordering and delivery guarantees.
Check Answers and Submit
Your score is: 10/10
Week-1 Assignment
Solu�on
_____________________________________________
1. Which of the following is a characteris�c of a distributed system?
A) Scalability
B) Centralized control
C) Single point of failure
D) Homogeneity of hardware and so�ware
Ans: (A) Scalability
Scalability is the characteris�c of a distributed system.
Centralized control, single point of failure, and homogeneity of hardware and so�ware are
characteris�cs of centralized systems, which are the opposite of distributed systems.
A distributed system is designed to handle increasing workloads by adding more resources
(nodes), making it scalable

2. Which term describes the transparency that hides the fact that a resource might be
replicated in a distributed system?
A) Loca�on transparency
B) Replica�on transparency
C) Concurrency transparency
D) Failure transparency
Ans: (B) Replica�on transparency
Replica�on transparency ensures that clients interact with a set of replicated resources as if
they were a single resource. It hides the presence of replicas and manages consistency
among them. This means that users or applica�ons are unaware of the mul�ple copies of
the data and can interact with it as if there's only one version.
3. In a distributed system, what is the term used for ensuring that a system con�nues to
operate, even if some of its components fail?
A) Load balancing
B) Fault tolerance
C) Scalability
D) Concurrency
Ans: (B) Fault tolerance
Fault tolerance is the ability of a system to con�nue opera�ng even if some of its
components fail. It involves techniques like redundancy, error detec�on, and recovery
mechanisms to ensure uninterrupted service.

4. What is middleware in the context of a distributed system?


A) The hardware that connects different systems.
B) A layer of so�ware that provides a programming abstrac�on and masks the
heterogeneity of the underlying network, hardware, and opera�ng systems.
C) The opera�ng system of the distributed system.
D) The applica�on so�ware running on a distributed system.
Ans: (B) A layer of so�ware that provides a programming abstrac�on and masks the
heterogeneity of the underlying network, hardware, and opera�ng systems.
Middleware acts as a bridge between applica�ons and the underlying infrastructure,
simplifying interac�ons and handling complexi�es like different hardware, opera�ng
systems, and network protocols.

5. The distributed system uses a ___________architecture to break down the complexity


of system design. The _______________ is the distributed so�ware that drives the
distributed system while providing transparency of heterogeneity at the pla�orm level.
A) Message-passing, CORBA

B) Tree, RPC

C) Loosely coupled, MPI

D) Layered, Middleware

Ans: (D) Layered, Middleware

Layered architecture is commonly used in distributed systems to break down complexity into
manageable layers. This approach promotes modularity and abstrac�on.
Middleware is the so�ware layer that sits between the applica�on and the underlying system,
providing essen�al services and hiding the complexi�es of the distributed environment. It ensures
transparency, allowing applica�ons to interact without being concerned about the underlying
heterogeneity.

6. Three computers, A, B, and C, communicate using a protocol that implements Lamport


logical clocks (including their clock �me stamp in messages). All three computers begin
with their logical clock set to zero at the beginning of �me. Later, the following sequence
of events occurs:
● A sends message M1 to B: “hi.”
● A�er receiving M1, B sends message M2 to C: “A told me hi.”
● A�er receiving M2, C sends message M3 to A: “B is boring.”
What is the �me included with the message Send (M2, _) as they are sent at each step?
A) Send(M2,1)
B) Send(M2,2)
C) Send(M2,3)
D) Send(M2,4)
Ans: (C) Send(M2,3)
A sends message M1 to B: "hi."

• A increments its logical clock before sending M1.

• Logical clock of A: 0 -> 1

• A sends M1 with �mestamp 1.


B receives M1 from A: "hi.

• B receives M1 with �mestamp 1.

• B updates its logical clock to max(current clock, received �mestamp) + 1.

• Logical clock of B: max(0, 1) + 1 = 2

• B increments its logical clock before sending M2.

• Logical clock of B: 2 -> 3

• B sends M2 to C with �mestamp 3.


C receives M2 from B: "A told me hi."

• C receives M2 with �mestamp 3.

• C updates its logical clock to max(current clock, received �mestamp) + 1.


• Logical clock of C: max(0, 3) + 1 = 4

• C increments its logical clock before sending M3.

• Logical clock of C: 4 -> 5

• C sends M3 to A with �mestamp 5.


A receives M3 from C: "B is boring."

• A receives M3 with �mestamp 5.

• A updates its logical clock to max(current clock, received �mestamp) + 1.

• Logical clock of A: max(1, 5) + 1 = 6


So �mestamp with each message is -
M1: Sent by A to B with �mestamp 1.
M2: Sent by B to C with �mestamp 3.
M3: Sent by C to A with �mestamp 5.
So, the �me included with the message M2 as it is sent is 3.

7. Which of the following best describes "message ordering"?


A) Ensuring messages are delivered in the sequence they were sent.
B) Organizing messages based on their priority.
C) Delivering messages to the fastest receiver first.
D) Storing messages in alphabe�cal order.
Ans: (A) Ensuring messages are delivered in the sequence they were sent.
This accurately describes message ordering in a distributed system. It guarantees that
messages arrive at their des�na�on in the same order they were sent, preserving the
intended sequence of opera�ons.

8. Consider the following statements regarding the paradigms of transparency:


Statement 1: Access transparency hides differences in data representa�on on different
systems and provides uniform opera�ons to access system resources.
Statement 2: Reloca�on transparency is the ability to relocate the resources as they are
being accessed.
A) Both statements are false
B) Only statement 1 is true
C) Only statement 2 is true
D) Both statements are true
Ans: (D) Both statements are true
Access transparency indeed hides differences in data representa�on across systems,
providing a unified interface for accessing resources.
Reloca�on transparency refers to the ability to move resources without affec�ng ongoing
opera�ons, which is correct.

9. In the synchronous system model consensus is solvable whereas In the asynchronous


system model consensus is impossible to solve.
A) True
B) False
Ans: (A)True
• Synchronous system: In a synchronous system, there are upper bounds on message
delivery �me, process execu�on speed, and clock dri�. These guarantees make it
possible to design algorithms that achieve consensus.
• Asynchronous system: In an asynchronous system, there are no such guarantees.
Messages can be arbitrarily delayed, processes can execute at arbitrary speeds, and
clocks can dri� unpredictably. The famous FLP impossibility result proves that
consensus cannot be achieved determinis�cally in an asynchronous system with even
a single process failure.
Therefore, consensus is solvable in synchronous systems but impossible in asynchronous
systems.

10. Event A has a Lamport �mestamp of 4.


Event B has a Lamport �mestamp of 8.

What can we tell about events A and B?

A) Events A and B are causally related.

B) Events A and B are concurrent.

C) Event A happened before event B.

D) If events A and B are causally related, then event A happened before event B.
Ans: (D) If events A and B are causally related, then event A happened before event B.

• Lamport �mestamps provide a par�al ordering of events.

• If event A has a smaller �mestamp than event B, it means that either A happened before B, or
they are concurrent.

• We cannot defini�vely say that A happened before B based solely on the �mestamps.

• However, if we know that A and B are causally related (meaning A directly influenced B), then
the smaller �mestamp of A confirms that it happened before B.

Therefore, op�on D is the most accurate statement.


Week-2 Assignment
Solu�on
_____________________________________________
1. List out the uses of the global snapshot of the system.

A) Checkpoin�ng

B) Garbage collec�on of objects

C) Deadlock detec�on

D) All of the above

Ans: (D) All of the above

A global snapshot captures the state of all processes and messages in transit in a distributed system at
a specific point in �me. It has several cri�cal applica�ons:

• Checkpoin�ng: It can be used to create consistent checkpoints of the system's state, enabling
recovery from failures by restoring the system to a previously saved consistent state.

• Garbage collec�on: By examining the snapshot, it's possible to iden�fy objects that are no
longer referenced, allowing for efficient garbage collec�on.

• Deadlock detec�on: Analyzing the snapshot can help detect cyclic dependencies among
processes, indica�ng a poten�al deadlock situa�on.

Therefore, all the op�ons (A, B, and C) are valid uses of a global snapshot.

2. (True/ False)? The lack of globally shared memory, global clock, and unpredictable message delays
in a distributed system make this problem non-trivial.

A) True

B) False

Ans: (A) True

The absence of globally shared memory, a global clock, and predictable message delays presents
significant challenges in distributed systems.

• No globally shared memory: Processes cannot directly access and modify the same data,
requiring complex communica�on and coordina�on mechanisms.

• No global clock: It's impossible to establish a single, synchronized �me reference across the
system, making it difficult to order events and reason about causality.
• Unpredictable message delays: Messages can experience varying and unpredictable delays,
hindering reliable communica�on and synchroniza�on.

These factors together make achieving consistency, fault tolerance, and efficient coordina�on in
distributed systems non-trivial.

3. Which of the following is true regarding the size of a vector clock in a distributed system?

A) The size of a vector clock is always equal to the number of processes in the system.

B) The size of a vector clock depends on the number of events in the system

C) The size of a vector clock depends on the number of �mestamps used in the system

D) The size of a vector clock is always equal to the number of messages exchanged in the
system

Ans: (A) The size of a vector clock is always equal to the number of processes in the system.

A vector clock is a data structure used to determine the par�al ordering of events in a distributed
system. It consists of a vector of integers, where each element corresponds to a process in the system.
Therefore, the size of the vector clock is directly �ed to the number of processes involved.

4. The Chandy-lamport algorithm can be ini�ated by any process by execu�ng the __________ by
which it records its local state and sends a marker on each outgoing channel.

A) Marker receiving rule

B) Marker sending rule

C) Marker termina�on rule

D) None of the above

Ans: (B) Marker sending rule

The Chandy-Lamport algorithm is ini�ated when a process executes the marker sending rule. This
involves recording its local state and sending a marker message on each of its outgoing channels.

5. In quorum-based mutual exclusion algorithms, what is a quorum?

A) A single process that coordinates access to the cri�cal sec�on.

B) A token that is passed around the system.

C) A subset of processes that must agree to grant access to the cri�cal sec�on.

D) A centralized database holding access permissions.

Ans: (C) A subset of processes that must agree to grant access to the cri�cal sec�on.

In quorum-based mutual exclusion algorithms, a quorum is a group of processes (a subset of the total
processes in the system) that must give permission to a process before it can enter the cri�cal sec�on.
The key property of quorums is that any two quorums must have at least one process in common,
ensuring mutual exclusion.

6. Chandy lamport algorithm of global snapshot recording uses the following kind of channel:

A) Non-FIFO

B) Causal Order

C) FIFO

D) Both Non-FIFO and Causal Order

Ans: (C) FIFO

The Chandy-Lamport algorithm crucially relies on FIFO (First-In-First-Out) channels. This means that
messages sent along a channel are received in the same order they were sent. This property is essen�al
for correctly capturing the state of the system at a consistent cut. If channels were not FIFO, it would
be difficult to determine the order of events and messages accurately.

7. At 10:27:540 (hr, min, 1/100 sec.), server B requests �me from the �me-server A. At 10:27:610,
server B receives a reply from �meserver A with the �mestamp of 10:27:375. Find out the dri� of
B’s clock with respect to the �me-server A‘s clock (assume there is no processing �me at the �me-
server for �me service).

A) -1.5 sec.

B) 2 sec.

C) 3 sec.

D) -2 sec.

Ans: (D) -2 sec.

RTT: Reply – Request = 610-540 = 70 1/100sec. Adjusted local �me: Server + RTT/2 = 375 + 35 = 410
1/100sec. Dri�: Adjusted local �me – local �me = 410 – 610 = -200 1/100sec. = -2 sec

8. Recording the global state of a distributed system is an important paradigm when one is interested
in

A) Analyzing

B) Monitoring

C) Tes�ng

D) All of the above

Ans: (D) All of the above

Recording the global state of a distributed system is a versa�le technique with applica�ons in:
• Analyzing system behavior: By capturing snapshots at different points in �me, you can study
trends, paterns, and anomalies.

• Monitoring system performance: You can track resource u�liza�on, response �mes, and other
metrics to iden�fy botlenecks and op�mize performance.

• Tes�ng system correctness: You can use global snapshots to verify system behavior against
expected outcomes and detect errors or inconsistencies.

Therefore, recording the global state is valuable for all three aspects.

9. Consider the following statements related to Process Failure Models:

i.) Fail-stop: In this model, a properly func�oning process may fail by stopping execu�on from
some instant thenceforth. Addi�onally, other processes can learn that the process has failed.

ii.) Crash: In this model, a properly func�oning process may fail by stopping to func�on from
any instance thenceforth. Unlike the fail-stop model, other processes do not learn of this crash.

iii.) Receive omission: A properly func�oning process may fail by intermitently receiving only
some of the messages sent to it or by crashing.

iv.) Send omission: A properly func�oning process may fail by intermitently sending only
some of the messages it is supposed to send or by crashing.

A) Only (i) and (ii) are true

B) Only (i), (ii) and (iii) are true

C) Only (i) and (iv) are true

D) All are true

Ans: (D) All are True

• Fail-stop: This is a classic failure model where a process abruptly stops and other processes
are no�fied.

• Crash: Similar to fail-stop, but without the no�fica�on to other processes.

• Receive omission: This is a more nuanced failure where a process may intermitently miss
messages or crash en�rely.

• Send omission: This is another nuanced failure where a process may intermitently fail to send
messages or crash.

All four descrip�ons accurately represent common process failure models in distributed systems.

10. How many messages are required for N processes during a single cri�cal sec�on execu�on in
Lamport's algorithm?

A) 2(N− 1) messages

B) 3(N− 1) messages
C) 4(N− 1) messages

D) 5(N− 1) messages

Ans: (B) 3(N− 1) messages

In Lamport's algorithm for mutual exclusion in a distributed system: 3(N-1) messages are required for
N processes during a single cri�cal sec�on execu�on.

Request Messages (N-1): Each process, except the one reques�ng entry to the cri�cal sec�on, sends
a single REQUEST message to the reques�ng process.

Reply Messages (N-1): The reques�ng process sends a REPLY message to each process that sent it a
REQUEST message, gran�ng them permission to enter the cri�cal sec�on in the future.

Release Message (1): A�er exi�ng the cri�cal sec�on, the process sends a RELEASE message to inform
all other processes that it's no longer in the cri�cal sec�on.

Therefore, the total number of messages is:

(N-1) REQUEST messages + (N-1) REPLY messages + 1 RELEASE message = 3(N-1) messages
Week-3 Assignment 3
Solution

1. What is the primary goal of consensus algorithms in distributed systems?


a. To ensure all nodes agree on a single data value or decision.

b. To distribute load evenly across nodes.

c. To detect and recover from node failures.

d. To synchronize clocks across all nodes.

Ans: (a) To ensure all nodes agree on a single data value or decision.

The primary goal of consensus algorithms in distributed systems is- To ensure all nodes agree on a
single data value or decision.

Here's why this is the key function of consensus algorithms:

Distributed systems consist of multiple independent nodes (computers) working together.

These nodes may have different information or experiences due to network delays or partial views of
the system.

Consensus algorithms provide a mechanism for these nodes to communicate and reach an agreement
on a single value or decision. This ensures consistency across the system and prevents conflicting
actions.

While the other options are important aspects of distributed systems:

a. Synchronizing clocks is a related issue, but consensus algorithms specifically focus on


agreeing on data or decisions, not just time.

c. Distributing load evenly is achieved by load balancing techniques, not necessarily through
consensus algorithms.

d. Detecting and recovering from node failures can be facilitated by consensus algorithms
(e.g., electing a new leader node), but it's not the primary goal.

2. What does "agreement" refer to in the context of distributed systems?

a. Consensus among all processes in the system

b. Uniformity of data representation across nodes

c. Speed of message delivery

d. Synchronization of clocks

Ans: (a) Consensus among all processes in the system


In the context of distributed systems, "agreement" refers to: Consensus among all processes in the
system

Here's why this definition captures the essence of agreement:

• Distributed Nature: Distributed systems involve multiple processes running on separate


computers across a network. These processes may have access to different information or
experience events in a slightly different order.
• Common Goal: For the system to function correctly, there might be situations where
processes need to agree on a common value or state. This agreement ensures consistency
and prevents conflicting actions

3. True or False? In token-based algorithms, a unique token is shared among the sites. A site is
allowed to enter its critical section if it possesses the token.

a. True

b. False

Ans: (a) True

In token-based algorithms for mutual exclusion in distributed systems, a unique token is indeed shared
among all the sites. A site can only enter its critical section if it holds the token. This ensures that only
one site can be in its critical section at any given time, preventing conflicts and maintaining data
consistency.

4. In the context of rollback recovery, what is the "domino effect"?

a. Efficient message passing between nodes

b. Redundant checkpointing strategies

c. Automatic recovery without human intervention

d. Cascading failures caused by one node crashing

Ans: (d) Cascading failures caused by one node crashing

In the context of rollback recovery, the domino effect refers to: Cascading failures caused by one node
crashing

Here's why the domino effect can occur in rollback recovery:

• Dependencies Between Nodes: In distributed systems, processes often communicate and


exchange data. This creates dependencies between them.

• Rollback Triggers: When a failure occurs, and rollback recovery is initiated, the system might
need to roll back the state of a process to a previous checkpoint.

• Cascading Rollbacks: If the rolled-back process had sent messages or interacted with other
processes before the failure, those interactions might now be inconsistent with the rolled-back
state. This can trigger rollbacks in those dependent processes as well, creating a cascading
effect.
5. The Suzuki-Kasami broadcast algorithm is designed to handle:

a. Synchronous systems only

b. Asynchronous systems only

c. Both synchronous and asynchronous systems

d. Real-time systems exclusively

Ans: (c) Both synchronous and asynchronous systems

The Suzuki-Kasami broadcast algorithm's design, which does not rely on timing assumptions and
operates based on message exchanges and sequence numbers, makes it suitable for both synchronous
and asynchronous systems. It ensures mutual exclusion in a distributed environment regardless of the
underlying system's timing characteristics.

6. In Raymond's algorithm, the token is passed:

a. In a circular fashion

b. Along the edges of a logical tree

c. Randomly between processes

d. Only to the highest priority process

Ans: (b) Along the edges of a logical tree

Raymond's algorithm employs a tree-based structure to manage the token passing process. The
processes in the distributed system are organized into a logical tree. When a process acquires the
token, it passes it to one of its children in the tree. This ensures that the token is eventually passed to
all processes in the system, preventing deadlock and starvation.

Key points:

• Logical tree: The processes are arranged in a hierarchical structure.

• Token passing: The token moves from parent to child nodes in the tree.

• Distributed nature: The algorithm operates without a central coordinator.

By using a tree-based approach, Raymond's algorithm effectively manages the token and ensures fair
access to the critical section for all processes in the distributed system.

7. Which of the following techniques can help mitigate the livelock problem in checkpointing?

a. Increasing checkpoint frequency

b. Using optimistic concurrency control

c. Implementing coordinated checkpointing

d. Reducing system load

Ans: (c) Implementing coordinated checkpointing


• Livelock in checkpointing occurs when processes repeatedly roll back to the same checkpoint
without making progress. This can happen due to inconsistencies between checkpoints of
different processes.

• Coordinated checkpointing ensures that all participating processes take checkpoints


simultaneously or in a well-defined order. This helps to maintain consistency among
checkpoints and reduce the likelihood of livelock.

Other options:

• Increasing checkpoint frequency: While this might seem counterintuitive, it can actually
exacerbate the livelock problem as more frequent checkpoints can lead to more frequent
rollbacks.

• Using optimistic concurrency control: This technique is primarily used for concurrency
control, not specifically for addressing livelock in checkpointing.

• Reducing system load: While reducing system load can improve overall system performance,
it doesn't directly address the root cause of livelock.

By implementing coordinated checkpointing, the system can effectively prevent the cascading rollback
scenario that leads to livelock.

8. What does the Fischer-Lynch-Paterson (FLP) impossibility result state?

a. It is impossible to design a distributed algorithm for consensus in a synchronous system

b. It is impossible to reach consensus in an asynchronous system, even if a single process has


a crash failure

c. It is impossible to achieve consistency in a distributed database

d. It is impossible to design a distributed algorithm for leader election

Ans: (b) It is impossible to reach consensus in an asynchronous system, even if a single process has a
crash failure

The Fischer-Lynch-Paterson (FLP) impossibility result is a fundamental theorem in distributed


computing. It states that it is impossible to design a deterministic algorithm for reaching consensus in
an asynchronous system where even a single process might fail by crashing.

9. Cascaded rollback which causes the system to roll back to too far in the computation (even to the
beginning), in spite of all the checkpoints is known as:

a. Phantom Effect

b. Domino Effect

c. Rollback

d. Livelock

Ans: (b) Domino Effect


The domino effect accurately describes the cascading rollback that can occur in a distributed system,
where a single failure triggers a chain reaction of rollbacks, potentially leading to an excessive and
unnecessary rollback to a very early state.

10. What does each node maintain in the context of the HOLDER variables?

a. The number of privileges it holds

b. The status of other nodes in the network

c. The identity of a node that has the privilege or leads to the node having the privilege

d. The number of neighbors it has

Ans: (c) The identity of a node that has the privilege or leads to the node having the privilege

Each node in Raymond's Tree-based algorithm maintains a HOLDER variable which stores the identity
of another node. This node is believed to either possess the privilege or is on the path to obtaining the
privilege. This information helps nodes to forward requests for the privilege towards the node that
currently holds it or is in the process of acquiring it. Sources and related content.
Week-4 Assignment 4
Solution
1. What advantage does Distributed Shared Memory (DSM) offer to programmers in a distributed
system compared to the message-passing model?

a. Elimination of data access across the network

b. Simplification of synchronization and consistency concerns

c. Direct access to physical memory modules

d. Exclusively using send and receive communication primitives

Ans: (b) Simplification of synchronization and consistency concerns

The primary advantage of DSM lies in simplifying synchronization and consistency concerns.
Programmers can treat the distributed memory as a single shared space, and the underlying DSM
system manages the complexities of data replication, consistency protocols, and synchronization
between processes accessing the same data. This reduces the burden on programmers and allows
them to focus on the application logic rather than low-level communication details.

2. In Distributes Shared Memory (DSM) systems, the Sequential Consistency model ensures that

a. All nodes see writes to a memory location in the same order

b. Writes are immediately visible to all nodes

c. Nodes can read stale data

d. Memory is always synchronized with the disk

Ans: (a) All nodes see writes to a memory location in the same order

3. In the GHS algorithm for Minimum Spanning Tree (MST), an edge adjacent to the fragment with
the smallest weight and that does not create a cycle is called as:

a. Weight outgoing edge (WOE)

b. Least weight outgoing edge (LWOE)

c. Link outgoing weight edge (LOWE)

d. Minimum weight outgoing edge (MWOE)

Ans: (d) Minimum weight outgoing edge (MWOE)


In the GHS algorithm for Minimum Spanning Tree (MST), the correct term for an edge adjacent to the
fragment with the smallest weight and that does not create a cycle is: Minimum weight outgoing edge
(MWOE)

Weight outgoing edge (WOE): This term doesn't explicitly specify that the weight is the minimum
among outgoing edges.

Least weight outgoing edge (LWOE): While similar to MWOE, "least" is sometimes used
interchangeably with "minimum" but might be less precise.

Link outgoing weight edge (LOWE): This term doesn't accurately convey the concept of the edge being
the minimum weight or outgoing.

Minimum weight outgoing edge (MWOE) clearly and concisely captures the essence of this concept:
it's the edge leaving a fragment that has the smallest weight among all outgoing edges and doesn't
introduce a cycle when added. This term aligns with the terminology used in the description of the
GHS algorithm.

4. The worst-case complexity of Mitchell and Merritt’s algorithm of Distributed Deadlock Detection
is:

a. s transmit steps, where s is the number of processes in the cycle

b. (s - 1) transmit steps, where s is the number of processes in the cycle

c. s (s - 1) transmit steps, where s is the number of processes in the cycle

d. s (s - 1)/2 transmit steps, where s is the number of processes in the cycle

Ans: (d) s (s - 1)/2 transmit steps, where s is the number of processes in the cycle

The worst-case complexity of Mitchell and Merritt's algorithm for Distributed Deadlock Detection is:

s(s - 1)/2 transmit steps, where s is the number of processes in the cycle

Here's why:

• Probe Messages: The algorithm relies on probe messages sent between processes to detect
cycles in the wait-for graph.

• Cycle Size: In the worst case, where a deadlock involves all s processes in the system forming
a cycle, each process will need to send a probe message to all other s-1 processes in the cycle.

However, there's an optimization:

• No Duplicate Messages: Since processes only send probes to processes they are waiting on,
and a process can only be waiting on one other process at a time within the cycle, there are
no duplicate probe messages sent. This reduces the total number of messages compared to a
scenario where every process sends probes to every other process.

Therefore, the total number of transmit steps in the worst case becomes:

s (processes in the cycle) * (s-1) (processes to send probes to) / 2 (avoiding duplicates)

Simplifying this expression, we get: s(s - 1) / 2


5. Which deadlock handling strategy is most suitable for systems with high availability
requirements?

a. Deadlock prevention

b. Deadlock avoidance

c. Deadlock detection and recovery

d. Deadlock ignorance

Ans: (c) Deadlock detection and recovery

Deadlock detection and recovery allows the system to continue operating even if deadlocks occur,
which is essential for high availability systems. Prevention and avoidance can be too restrictive, while
ignorance is risky.

6. Which of the following is a potential drawback of deadlock detection and recovery?

a. Potential for data inconsistency

b. Risk of cascading failures

c. High overhead

d. All of the above

Ans: (d) All of the above

Deadlock detection and recovery involves periodic checks for deadlocks, which can be computationally
expensive. If recovery involves process rollback, it can lead to data inconsistency. Additionally,
recovering from a deadlock might trigger a chain of events that lead to further failures.

7. Which of the following is a popular distributed MST algorithm?

a. Prim's algorithm

b. Kruskal's algorithm

c. Gallager-Humblet-Spira (GHS) algorithm

d. Dijkstra's algorithm

Ans: (c) Gallager-Humblet-Spira (GHS) algorithm

While Prim's and Kruskal's are centralized algorithms, GHS is specifically designed for distributed
environments and is widely used for constructing DMSTs.

8. A key challenge in constructing an MST in the message-passing model is:

a. Handling dynamic network topologies

b. Ensuring correctness and termination

c. Dealing with asynchronous communication


d. All of the above.

Ans: (d) All of the above.

Distributed MST algorithms must be resilient to network changes, guarantee the correctness of the
final result, and handle unpredictable communication delays.

9. What is the difference between a wait-for graph and a resource allocation graph?

a. There is no difference between the two.

b. A wait-for graph focuses on process dependencies, while a resource allocation graph focuses
on resource allocation.

b. A wait-for graph is used for deadlock detection, while a resource allocation graph is used for
deadlock prevention.

d. A wait-for graph is a more detailed representation of a system than a resource allocation


graph.

Ans: (b) A wait-for graph focuses on process dependencies, while a resource allocation graph focuses
on resource allocation.

Wait-for graph: This graph represents the dependencies between processes in a system. Nodes in the
graph represent processes, and edges represent the "wait-for" relationship between processes. If
process A is waiting for process B to release a resource, there will be an edge from A to B in the wait-
for graph. A cycle in the wait-for graph indicates a deadlock.

Resource allocation graph: This graph represents the allocation of resources to processes. Nodes in
the graph represent both processes and resources. Edges represent the allocation of resources to
processes or the request for resources by processes. While it can be used to detect deadlocks, its
primary focus is on resource allocation.

10. What are some limitations of using WFG for deadlock detection and prevention?

a. It requires global knowledge of the system state.

b. It might not detect all types of deadlocks.

c. It can be computationally expensive for large systems.

d. All of the above.

Ans: (d) All of the above.

Limitations of Using a Wait-For Graph (WFG) for Deadlock Detection and Prevention:

• Requires Global Knowledge of the System State:

A WFG needs up-to-date information about all process dependencies across the
system, which can be challenging to maintain, especially in large or distributed
systems.

• Might Not Detect All Types of Deadlocks:


The WFG is effective for detecting circular wait conditions but might miss deadlocks
that involve more complex resource allocation scenarios, such as those involving
multiple resources.

• Can Be Computationally Expensive for Large Systems:

In large systems with many processes and resources, constructing and analyzing the
WFG can be computationally intensive, potentially leading to performance
bottlenecks.
Week-5 Assignment 5
Solution
1. What is termination detection in distributed systems?

a. Ensuring all processes finish at the same time

b. Determining when all processes have completed their tasks and there are no messages in
transit

c. Synchronizing clocks across all nodes

d. Detecting deadlock situations

Ans: (b) Determining when all processes have completed their tasks and there are no messages in
transit

Termination detection in distributed systems focuses on: Determining when all processes have
completed their tasks and there are no messages in transit

2. In spanning tree-based termination detection, what role does the spanning tree play?

a. It optimizes message routing

b. It maintains network connectivity

c. It facilitates the propagation of termination signals

d. It ensures fault tolerance

Ans: (c) It facilitates the propagation of termination signals

In spanning tree-based termination detection for distributed systems, the spanning tree plays a crucial
role in: It facilitates the propagation of termination signals

Here's why the spanning tree is critical for this purpose:

• Distributed Communication: Processes in a distributed system communicate by sending


messages. Termination detection needs a mechanism to determine when all processes have
finished their tasks.

• Efficient Message Propagation: The spanning tree provides a structured and efficient way to
propagate termination signals from processes that have finished to those that haven't.

The spanning tree structure ensures:

• No Cycles: There are no cycles in a spanning tree, preventing redundant messages and infinite
loops in the termination detection process.
• Reachability: All processes are reachable from the root of the tree through a unique path,
allowing termination signals to reach all processes efficiently.

Let's explore why the other options are not the primary roles of the spanning tree in this context:

• It ensures fault tolerance: While a spanning tree can improve the resilience of the system to
some failures by providing alternate communication paths, fault tolerance is not the sole focus
of the tree in termination detection.

• It optimizes message routing: While the structure of the spanning tree can influence message
routing to some extent, its primary purpose in termination detection is to facilitate the
propagation of termination signals, not necessarily optimize general message routing for the
entire system.

• It maintains network connectivity: A spanning tree itself doesn't guarantee overall network
connectivity in a distributed system. However, it can help maintain connectivity within the
context of the termination detection process by ensuring all processes can be reached through
the tree structure.

3. What aspect of Dijkstra's self-stabilizing token ring system ensures that it can recover from any
transient faults?

a. Redundant token passing

b. Asynchronous message delivery

c. Periodic checkpointing

d. Self-correcting mechanism

Ans: (d) Self-correcting mechanism

The key aspect of Dijkstra's self-stabilizing token ring system that enables recovery from transient faults
is: Self-correcting mechanism

Here's why:

• Self-Stabilization Property: The core principle of the system is self-stabilization. This means
that regardless of the initial state (even if it's an invalid or inconsistent state due to a transient
fault), the system can automatically converge to a legitimate state where exactly one process
holds the token.

• Self-Correction in Action: The system achieves this through a set of rules and state transitions
defined for each process. These rules involve checking neighboring processes' states and
taking corrective actions based on the information received. For example, if a process doesn't
receive the token within a certain time frame, it might initiate a token regeneration process to
rectify the situation.

Let's explore why the other options are not the primary mechanisms for fault recovery:

• Redundant token passing: While some token ring systems might use redundant tokens for
fault tolerance, Dijkstra's algorithm itself doesn't rely on this approach. It focuses on correcting
the state within the existing token passing mechanism.
• Asynchronous message delivery: The system can operate even with asynchronous message
delivery (messages might arrive with varying delays). The self-correcting mechanism takes into
account potential delays and adjusts its behavior accordingly.

• Periodic checkpointing: Checkpointing is a technique for saving system state periodically,


which is more commonly used for rollback recovery. Dijkstra's algorithm focuses on on-the-fly
correction of the token ring state itself, not on saving and restoring snapshots.

4. Why are Read operations excluded from participating in total order broadcasts under the principle
of Sequential Consistency?

a. Read operations are executed atomically.

b. Read operations are independent between processors.

c. Read operations involve pipelining

d. Read operations are suitable for Read-intensive programs.

Ans: (b) Read operations are independent between processors.

In Sequential Consistency, the goal is to ensure that all processes in a distributed system observe the
same execution order for operations. This includes write operations, which modify the shared state
and need to be seen in a consistent order by all processes.

However, read operations are different. They typically access a snapshot of the shared state at a
specific point in time and don't modify it. Since reads are independent and don't affect the outcome
of other operations, their specific order in the total broadcast doesn't impact the overall consistency
of the system.

Here's a breakdown of why the other options are not the primary reason:

• a. Read operations are executed atomically: While reads might be atomic (completed as a
single indivisible unit), atomicity doesn't automatically exclude them from total order. The key
factor is that reads don't affect the outcome of other operations, so their precise order is less
critical.

• c. Read operations involve pipelining: Pipelining is an optimization technique that can be used
with both read and write operations. It doesn't inherently affect the need for total ordering.

• d. Read operations are suitable for Read-intensive programs: While this might be true for
certain applications, it doesn't explain why reads are excluded from total order broadcasts.
Read-intensive programs might still benefit from some level of consistency, but total order for
reads might not be strictly necessary.

By excluding reads from total order broadcasts, Sequential Consistency can achieve a balance between
consistency and efficiency. Reads can be optimized for faster access and potentially executed in a more
relaxed order, as long as the overall state remains consistent for write operations.

5. What is the difference between total order and causal order in group communication?
a. Total order ensures that all processes agree on the same order of messages, while causal
order only ensures that messages are delivered in the same order as they were sent.

b. Causal order is stronger than total order as it guarantees that messages are delivered in the
same order as they were sent, while total order only ensures that all processes agree on the
same order.

c. There is no difference between total order and causal order.

d. Total order is achieved through vector clocks, while causal order is achieved through
Lamport timestamps.

Ans: (a) Total order ensures that all processes agree on the same order of messages, while causal order
only ensures that messages are delivered in the same order as they were sent.

Explanation: Total order guarantees a global ordering of messages, while causal order ensures that
messages are delivered in the order that respects the causal relationship between events.

6. Which message ordering paradigm is most suitable for applications that require strict
synchronization among processes?

a. FIFO

b. Causal

c. Total

d. None of the above

Ans: (c) Total

Total order guarantees that all processes agree on the same order of messages. This means
that every process delivers messages in the same sequence, ensuring strict synchronization.

FIFO (First-In-First-Out) only guarantees that messages from the same sender are delivered in
the order they were sent, which is not sufficient for strict synchronization among multiple
processes.

Causal ordering ensures that messages are delivered in an order that preserves causality, but
it doesn't guarantee a global total order, making it unsuitable for applications requiring strict
synchronization.

Therefore, total order is the most suitable message-ordering paradigm for applications that
necessitate strict synchronization among processes.

7. What are some potential optimizations to reduce the message overhead of the weight throwing
algorithm?

a. Combining multiple weight updates into a single message.

b. Using a hierarchical weight structure.

c. Employing probabilistic weight updates.


d. All of the above.

Ans: (d) All of the above.

All of the options listed are potential optimizations to reduce the message overhead of the weight-
throwing algorithm

Combining multiple weight updates into a single message: This reduces the number of messages
exchanged by batching updates together, thus lowering communication overhead.

Using a hierarchical weight structure: This organizes processes into a hierarchy, reducing the number
of messages needed as weight updates can be aggregated and passed through fewer levels.

Employing probabilistic weight updates: This technique can reduce the frequency or necessity of
weight updates by using probabilistic methods to determine when updates should be sent, thereby
reducing message traffic.

8. Which of the following is a key challenge in designing self-stabilizing algorithms?

a. Ensuring that the algorithm works in a centralized environment.

b. Handling distributed deadlocks effectively.

c. Minimizing the convergence time to reach a legitimate state.

d. Guaranteeing infinite loops to keep the system active.

Ans: (c) Minimizing the convergence time to reach a legitimate state.

One of the major challenges in self-stabilizing algorithm design is to ensure that the system can quickly
converge from an arbitrary state to a legitimate state. Faster convergence is crucial for practical
applications, where prolonged periods of instability can be unacceptable.

9. In the context of self-stabilization, what is a "legitimate state"?

a. A state from which the system cannot progress further.

b. A state that the system never reaches.

c. A state where the system satisfies its intended specification.

d. A state that the system avoids to prevent faults.

Ans: (c) A state where the system satisfies its intended specification.

A legitimate state in self-stabilizing systems is one that satisfies the system's specifications or
correctness conditions. Once the system reaches a legitimate state, it should maintain correct behavior
as long as no new faults occur.

10. Why is termination detection challenging in distributed systems?


a. Lack of centralized control.

b. Network latency.

c. Process failures.

d. All of the above.

Ans: (d) All of the above.

Distributed systems lack a central coordinator, network delays can hinder communication, and process
failures can disrupt the computation, making it difficult to determine termination.
Week-6 Assignment 6
Solution
1. In Peer-to-peer (P2P) Networks, the ongoing entry and exit of various nodes, as well as dynamic
insertion and deletion of objects, is termed as _______________
a. Peer
b. Chunk
c. Objects
d. Churn

Ans: (d) Churn


In P2P networks, the ongoing entry and exit of various nodes, as well as dynamic insertion and
deletion of objects, is termed as: Churn

2. What is the primary function of a Distributed Hash Table (DHT)?

a. Centralized data storage

b. Routing messages between peers

c. Storing and retrieving key-value pairs

d. Providing high computational power

Ans: (c) Storing and retrieving key-value pairs

DHTs operate like a hash table, but the data is distributed across multiple nodes in a network. Each
node is responsible for storing a specific portion of the data based on a hashing function. To retrieve
data, a client calculates the hash of the key and contacts the node responsible for that hash range. This
allows for efficient data lookup and retrieval in a distributed environment.

3. In GFS (Google File System), files are divided into ____________ chunks.

a. Both Fixed Size and Variable Size chunks

b. Fixed Size chunks

c. Variable Size chunks

d. None of these

Ans: (b) Fixed Size chunks

In GFS, files are divided into fixed-size chunks. This design choice has several advantages:

• Efficient data management: Fixed-size chunks simplify data management and allocation.
• Reduced metadata overhead: Each chunk is identified by a unique 64-bit chunk handle,
reducing the amount of metadata required.

• Optimized network I/O: Large chunk sizes minimize the number of network round trips
required for file operations.

While variable-size chunks might seem more flexible, they can introduce complexities in terms of data
management, metadata overhead, and network I/O. GFS's choice of fixed-size chunks is well-suited to
its specific use case of handling large-scale data storage and retrieval.

4. What is the primary benefit of using randomized algorithms in distributed systems?

a. Increased complexity

b. Decreased reliability

c. Simplified design and improved performance

d. Higher cost

Ans: (c) Simplified design and improved performance

The primary benefit of using randomized algorithms is:- Simplified design and improved
performance.

Here's a deeper look at why randomization can be advantageous:

• Breaking Symmetry: Deterministic algorithms can struggle with situations where multiple
processes in a distributed system behave identically. This symmetry can lead to inefficiencies
like collisions or deadlocks. Randomized algorithms, by introducing some level of
unpredictability, can help break this symmetry and encourage more balanced or spread-out
behavior, ultimately improving overall system performance.

• Load Balancing: Randomly assigning tasks or choosing communication targets can help
distribute the workload more evenly across the system. This prevents bottlenecks from
forming on specific nodes and improves overall efficiency.

• Avoiding Worst-Case Scenarios: Deterministic algorithms can sometimes be susceptible to


specific inputs or scenarios that lead to significant performance degradation. By introducing
randomness, the algorithm can escape these worst-case conditions and achieve better average
performance in practice.

While randomization might seem to introduce some uncertainty, it can often simplify the design of
distributed algorithms and lead to more robust and efficient systems.

Google file systems are divided into fixed-size chunks. Each chunk is identified by an immutable and
globally unique 64-bit chunk handle assigned by the master at the time of chunk creation. Chunk
servers store chunks on local disks as Linux files and read or write chunk data specified by a chunk
handle and byte range.

5. Which of these is an example of Unstructured peer-to-peer (P2P) overlay network?

a. CAN
b. CHORD

c. Kademila

d. Gnutella

Ans: (d) Gnutella

Unstructured P2P Overlay Networks: In unstructured peer-to-peer (P2P) networks, peers are
connected randomly, and there is no strict organization or structure that dictates how data is
distributed or how peers are connected. These networks rely on flooding-based techniques to search
for data, which can be inefficient but are more resilient to peer churn (peers joining and leaving the
network).

• Gnutella is an example of an unstructured P2P network where peers are randomly connected,
and data is searched using a flooding mechanism.

Structured P2P Overlay Networks: In contrast, structured P2P networks use specific algorithms to
organize the network and data. Examples include:

• CAN (Content Addressable Network): A structured P2P network that uses a distributed hash
table (DHT) for efficient data lookup.

• Chord: Another structured P2P network that also uses a DHT for organizing data and routing
queries efficiently.

• Kademlia: A structured P2P network using a DHT that optimizes the search and retrieval of
data by maintaining certain metrics about node proximity.

6. True (or) False ? “The approach that is used to devising a randomized leader election algorithm is
to use randomization to create asymmetry by having processors choose random pseudo-identifiers,
drawn from some range, and then execute a deterministic leader election algorithm.”

a. True

b. False

Ans: (a) True

The approach described involves using randomization to create asymmetry among the processors by
assigning them random pseudo-identifiers. These identifiers are drawn from a predefined range,
ensuring that processors are likely to have unique identifiers. Once the identifiers are assigned, a
deterministic leader election algorithm can then be executed, with the processor having the highest
(or lowest) identifier typically being elected as the leader. This method leverages randomness to avoid
symmetry, which could otherwise lead to conflicts or ties in the leader election process.

7. What is the default replication factor for chunks in Google File System (GFS)?

a. 1

b. 2

c. 3

d. 4

Ans: (c) 3
The default replication factor for chunks in GFS is 3. This means that each chunk of data is replicated
on three different ChunkServers to ensure data durability and availability. If one ChunkServer fails, the
other two can still serve the data.

8. How does Google File System (GFS) handle write operations to ensure consistency?

a. The Master performs the write operation itself

b. The client writes data directly to the ChunkServer holding the chunk

c. The client sends the write request to all ChunkServers simultaneously

d. The client writes data to a primary replica, which then propagates the change to secondary
replicas

Ans: (d) The client writes data to a primary replica, which then propagates the change to secondary
replicas

This is how GFS ensures consistency in write operations:

Client sends write request: The client sends a write request to the Master node, specifying the file and
the desired chunk to modify.

Master assigns primary replica: The Master node assigns the primary replica for that chunk to the
client. Client writes to primary: The client writes the data to the primary replica.

Primary propagates to secondaries: The primary replica then propagates the write to the secondary
replicas.

Master updates metadata: Once all replicas have acknowledged the write, the Master updates its
metadata to reflect the new data.

This approach ensures that all replicas eventually have the same data, maintaining consistency.

9. A common application of synchronous one-shot algorithms is:

a. Distributed database

b. Leader election

c. Consensus

d. Distributed file system

Ans: (b) Leader election

Synchronous one-shot algorithms are often used in leader election processes within distributed
systems. These algorithms help select a unique leader among multiple processes in a distributed
environment, typically in a single round (hence "one-shot"). They are designed to be completed quickly
and efficiently, making them well-suited for leader election scenarios.

10. In a synchronous iterated leader election algorithm, why is randomization used?

a. To ensure that the leader is chosen fairly

b. To prevent predictable outcomes.

c. To break ties in case multiple processes have the same highest value.
d. All of the above.

Ans: (d) All of the above.

Randomization is used in a synchronous iterated leader election algorithm for multiple reasons:

• To ensure that the leader is chosen fairly: Randomization helps in providing each process an
equal chance of being elected as the leader.

• To prevent predictable outcomes: If the process were deterministic, it could lead to


predictable and potentially unfair outcomes. Randomization introduces uncertainty, making
the process more robust.

• To break ties in case multiple processes have the same highest value: Randomization can help
resolve conflicts when multiple processes end up with the same highest value, ensuring that
only one leader is chosen.

Thus, randomization serves to make the election process fair, unpredictable, and effective in handling
ties.
Week-7 Assignment 7
Solution

1. HDFS (Hadoop Distributed File System), application data are stored on other servers
called________. All servers are ________ and communicate with each other
using________protocols.

a. DataNodes, fully connected, TCP-based

b. NameNodes, fully connected, TCP-based

c. NameNodes, loosely connected, UDP-based

d. DataNodes, loosely connected, UDP-based

Ans: (a) DataNodes, fully connected, TCP-based

HDFS (Hadoop Distributed File System): This distributed file system stores large datasets
across multiple machines in a cluster.

DataNodes: These are the worker nodes responsible for storing actual application data. They
hold the data blocks that make up files in HDFS.

Fully connected (not necessarily true): While HDFS aims for high availability, it's not always
practical for every DataNode to be directly connected to every other one in a large cluster.
However, there's a level of communication and coordination between DataNodes, often
facilitated by the NameNode.

TCP-based (not necessarily the only option): TCP (Transmission Control Protocol) is a reliable,
connection-oriented protocol commonly used for data transfer in HDFS. It ensures data
integrity and retransmission in case of errors. However, some implementations might explore
alternative protocols depending on specific needs.

2. What is a typical data structure used for the output of the Map function in MapReduce?

a. Linked list

b. Tree

c. Key-value pairs

d. Array

Ans: (c) Key-value pairs

In MapReduce, the typical data structure used for the output of the Map function is: Key-value pairs
key-value pairs provide a balance of efficiency, flexibility, and scalability, making them the typical and
preferred data structure for the output of the Map function in MapReduce.

3. What is the primary abstraction in Apache Spark for processing large datasets?

a. DataFrame

b. DataSet

c. RDD (Resilient Distributed Dataset)

d. Table

Ans: (c) RDD (Resilient Distributed Dataset)

While DataFrames and Datasets are important parts of Spark's ecosystem, the primary abstraction for
processing large datasets in Apache Spark is: RDD (Resilient Distributed Dataset)

4. In the Map function of MapReduce, what role do combiners play?

a. They perform preliminary data aggregation locally on each node

b. They distribute data across the network

c. They synchronize Map tasks across nodes

d. They finalize the output before the Reduce phase

Ans: (a) They perform preliminary data aggregation locally on each node

In the Map function of MapReduce, combiners play the role of: They perform preliminary data
aggregation locally on each node

Here's why combiners are used for local aggregation:

• Reducing Network Traffic: By performing some aggregation on the Map tasks themselves,
combiners can significantly reduce the amount of intermediate data that needs to be shuffled
and transferred across the network before the Reduce phase. This improves overall
performance by minimizing network bandwidth usage.

• Similar to Reduce Function (but Local): Combiners share similarities with the Reduce function
in that they aggregate intermediate key-value pairs with the same key. However, combiners
operate locally on each node during the Map phase, while the Reduce function aggregates
data globally across all Map tasks.

Let's explore why the other options are not the primary roles of combiners:

• They distribute data across the network: Distribution is handled by the MapReduce
framework itself. Combiners focus on local aggregation within the Map task on a single node.

• They synchronize Map tasks across nodes: Synchronization between Map tasks is not a
primary function of combiners. They operate independently on each node.
• They finalize the output before the Reduce phase: While combiners might reduce the volume
of data, they don't finalize the output. The Reduce function still performs the final aggregation
and produces the final results.

5. What is the primary difference between proactive and reactive routing protocols?

a. Proactive protocols send updates only when there is a change in the network topology, while
reactive protocols send updates periodically.

b. Proactive protocols maintain a complete picture of the network topology, while reactive
protocols only build routes as needed.

c. Proactive protocols are more scalable than reactive protocols.

d. Reactive protocols are more efficient than proactive protocols.

Ans: (b) Proactive protocols maintain a complete picture of the network topology, while reactive
protocols only build routes as needed.

Proactive Routing Protocols (Table-driven): These protocols continuously maintain up-to-date routing
information to all nodes in the network. They do this by periodically exchanging routing tables
throughout the network, allowing every node to have a complete picture of the network topology at
all times. Examples include OLSR (Optimized Link State Routing) and DSDV (Destination-Sequenced
Distance-Vector).

Reactive Routing Protocols (On-demand): These protocols create routes only when required. When a
route to a destination is needed, the protocol will discover or build the route. Since routes are only
maintained when necessary, these protocols can save bandwidth and energy. Examples include AODV
(Ad-hoc On-demand Distance Vector) and DSR (Dynamic Source Routing).

Thus, the primary difference is that proactive protocols maintain a complete and updated view of the
network, whereas reactive protocols do not maintain such information until a route is requested.

6. Which of the following is a common application of sensor networks?

a. Smart homes

b. Internet of Things (IoT)

c. Military surveillance

d. All of the above

Ans: (d) All of the above

Sensor networks have a wide range of applications, including:

• Smart homes: Sensor networks are used in smart homes to monitor and control various
aspects of the home environment, such as temperature, lighting, security, and appliances.

• Internet of Things (IoT): Sensor networks are a fundamental component of IoT systems, where
they collect data from various sources and enable communication between devices,
facilitating automation and smart decision-making.
• Military surveillance: Sensor networks are used in military applications for surveillance,
monitoring enemy movements, detecting intrusions, and gathering intelligence in real-time.

7. Spark also allows programmers to create two restricted types of shared variables to support two
simple but common usage patterns such as ___________and____________.

a. Broadcast, Accumulators

b. Map, Reduce

c. Flatmap, Filter

d. Sort, Shuffle

Ans: (a)Broadcast, Accumulators

Broadcast Variables:

These are used to efficiently distribute large read-only data to all worker nodes in a
cluster. Instead of shipping a copy of the data with each task, Spark sends the data to
each worker node only once, which can be used by all tasks running on that node. This
reduces the communication overhead and memory usage.

Accumulators:

These are used to perform write-only operations, such as counters or sums, that can
be added up across tasks. Accumulators are particularly useful for implementing
global counters, sums, or other aggregations that need to be performed across many
tasks.

Both broadcast variables and accumulators help optimize and manage data efficiently in distributed
processing.

8. What is the primary purpose of incorporating locality considerations in the MapReduce


framework?

a. To eliminate the need for data replication in a distributed file system.

b. To ensure that all map tasks are scheduled on the master node for optimal performance

c. To conserve network bandwidth by minimizing data transfer over the network

d. To increase the number of task retries in case of failures during execution

Ans: (c) To conserve network bandwidth by minimizing data transfer over the network

In the MapReduce framework, locality considerations are primarily used to minimize the amount of
data that needs to be transferred over the network. By scheduling map tasks on the nodes where the
data is already stored (data locality), the framework reduces the need for network communication,
which conserves bandwidth and enhances overall performance. This is particularly important in large-
scale distributed systems where data transfer can become a significant bottleneck.
a. Incorrect because locality considerations do not eliminate the need for data replication in a
distributed file system like HDFS.

b. Incorrect, as map tasks are typically scheduled on worker nodes close to the data, not
necessarily on the master node.

d. Incorrect because locality considerations do not directly affect the number of task retries.

The primary goal of locality in MapReduce is to reduce network data transfer, making option c the
correct choice.

9. How does Apache Spark's support for real-time data processing compare to Hadoop MapReduce?

a. Both support real-time data processing equally well.

b. Spark supports real-time data processing, whereas MapReduce does not.

c. MapReduce supports real-time data processing, whereas Spark does not.

d. Neither supports real-time data processing.

Ans: (b) Spark supports real-time data processing, whereas MapReduce does not.

Spark: Spark is designed to be a fast and general-purpose cluster computing system. It


supports a variety of data processing workloads, including real-time streaming data. Spark's
in-memory data processing capabilities and optimized execution engine allow it to handle real-
time data streams efficiently.

MapReduce: MapReduce is a distributed computing framework primarily designed for batch


processing of large datasets. While it can be used for some real-time scenarios, its batch-
processing nature and slower execution compared to Spark make it less suitable for truly real-
time applications.

So, Spark's in-memory processing and optimized execution make it well-suited for real-time data
processing, while MapReduce is primarily designed for batch processing.

10. What is the primary reason Apache Spark can be less efficient in certain scenarios compared to
Hadoop MapReduce?

a. Spark’s in-memory processing can lead to higher memory consumption.

b. MapReduce’s disk-based processing is always faster than Spark.

c. Spark cannot handle batch processing as effectively as MapReduce.

d. MapReduce has a better data replication mechanism than Spark.

Ans: (a) Spark’s in-memory processing can lead to higher memory consumption.

While Apache Spark's in-memory processing offers significant speed advantages, it can also lead to
higher memory consumption. This can be a limitation in environments where memory resources are
constrained, making MapReduce more efficient in such cases.
Week-8 Assignment 8
Solution

1. The computers that process transactions for the Bitcoin network are commonly called:

a. Miners

b. Truckers

c. Linesman

d. GPU

Ans: (a) Miners

Miners: In Bitcoin's Proof-of-Work (PoW) system, miners are responsible for validating transactions
and adding new blocks to the blockchain. They compete to solve a complex mathematical puzzle, and
the first miner to solve it gets to add the next block and earn a reward in Bitcoin.

2. Consider the following statements:

Statement 1: In one-way authentication, only one principal verifies the identity of the other
principal.

Statement 2: In mutual authentication, both communicating principals verify each other’s identity.

a. Statement 1 is true and statement 2 is false

b. Statement 1 is false and statement 2 is true

c. Both statements are false

d. Both statements are true

Ans: (d)Both statements are true

Statement 1: In one-way authentication, only one principal verifies the identity of the other principal.
This is true. Typically, a client verifies the identity of a server using a server certificate. The server
doesn't need to verify the client's identity in this scenario.

Statement 2: In mutual authentication, both communicating principals verify each other's identity.
This is also true. In mutual authentication, both the client and the server use certificates to establish
trust and verify each other's identities before proceeding with communication.

3. Which of the following is NOT a component of the Kerberos Protocol?


a. Ticket Granting Server (TGS)

b. Authentication Server (AS)

c. Key Distribution Center (KDC)

d. Session Layer Server (SLS)

Ans: (d) Session Layer Server (SLS)

Among the listed options, the one that is NOT a component of the Kerberos protocol is: Session Layer
Server (SLS)

Here's why:

• Kerberos Components: The Kerberos protocol relies on three main components:

o Authentication Server (AS): Handles initial user authentication and issues Ticket
Granting Tickets (TGTs).

o Ticket Granting Server (TGS): Grants service tickets for specific services based on a
valid TGT.

o Key Distribution Center (KDC): Encompasses both the AS and TGS functionalities,
acting as a trusted third party for authentication and key distribution.

• Non-existent Component: Session Layer Server (SLS) is not a standard component of the
Kerberos protocol. The protocol focuses on authentication and issuing tickets for service
access, not managing sessions directly. Session management might be handled by the
application layer or higher-level protocols.

4. What fundamental feature allows Blockchain Technology to achieve decentralization?

a. Proof of Authority

b. Consensus mechanism

c. Centralized control

d. Data encryption

Ans: (b) Consensus mechanism

Blockchain technology achieves decentralization through: Consensus mechanism

Here's why:

• Distributed Ledger: Blockchain stores data in a distributed ledger across multiple nodes in a
network. This eliminates the need for a central authority to control the ledger.

• Agreement on State: The consensus mechanism is the core principle that allows all
participants in the network to agree on the current state of the ledger (i.e., the validity of
transactions). There's no single entity dictating what gets added to the blockchain.

Let's explore why the other options are not essential for decentralization:
• Proof of Authority: While Proof of Authority can be a specific consensus mechanism used in
some blockchains, it's not the only approach. Different consensus mechanisms can achieve
decentralization with varying levels of trust assumptions.

• Centralized control: This is the opposite of decentralization. A central entity controlling the
network would negate the benefits of a distributed ledger.

• Data encryption: Data encryption is important for security in blockchain, but it doesn't directly
lead to decentralization. Encryption can be implemented in both centralized and decentralized
systems.

5. What is the primary cryptographic method used by SSL to establish a secure session?

a. Symmetric key encryption

b. Asymmetric key encryption

c. Hashing

d. Stream cipher

Ans: (b) Asymmetric key encryption


Explanation: SSL uses asymmetric key encryption (public-private key pairs) during the handshake
process to securely exchange a session key. Once the session is established, symmetric encryption is
used for actual data transmission.

6. What is the role of the Authentication Server (AS) in Kerberos?

A. It stores and maintains user passwords.

b. It manages encryption keys for file transfers.

c. It authenticates users and issues Ticket Granting Tickets (TGT).

d. It routes network packets between clients and servers.

Ans: (c) It authenticates users and issues Ticket Granting Tickets (TGT).
Explanation: The AS is responsible for verifying the user’s credentials (such as username and
password) and issuing a TGT, which can then be used to request access to services from the TGS.

7. Which technology is described as applicable to any digital asset transaction exchanged online
and has the potential to eliminate the need for trusted third parties in validating and mediating
transactions?

a. Internet commerce

b. Bitcoin

c. Blockchain

d. Financial institutions

Ans: (c) Blockchain

Blockchain is the technology that applies to any digital asset transaction exchanged online and has the
potential to eliminate the need for trusted third parties, such as banks or financial institutions, in
validating and mediating transactions. Blockchain is a decentralized, distributed ledger system where
transactions are securely recorded and verified by a network of participants using cryptographic
methods, ensuring trust and transparency without intermediaries.

8. What is the consequence of the fact that Kerberos uses a principal’s password (encryption key) as
the fundamental proof of identity?

a. Kerberos assumes network security

b. Compromise of host security affects Kerberos

c. Users must remember multiple passwords

d. Host security is not important in Kerberos

Ans: (b) Compromise of host security affects Kerberos

In Kerberos, a user's password is used to derive encryption keys that are essential for the
authentication process. If a host (computer) where a user logs in is compromised, an attacker could
potentially capture the user's password or the derived keys. This could allow the attacker to
impersonate the user, thereby compromising the security of the entire Kerberos authentication
process. Thus, the security of the host is crucial to the overall security of the Kerberos system.

9. In blockchain technology, ________________are basically computer programs that can


automatically execute the terms of a contract.

a. Smart Property

b. Public ledger

c. Smart contracts

d. Smart Consensus

Ans: (c) Smart contracts

In blockchain technology, smart contracts are self-executing computer programs that automatically
enforce and execute the terms of a contract when predefined conditions are met. These contracts
operate on decentralized blockchain networks, eliminating the need for intermediaries and ensuring
trust and transparency between the involved parties

10. The SSL record protocol takes an application message to be transmitted, fragments the data into
manageable blocks, optionally compresses the data, applies MAC, encrypts, adds a header, and
transmits the resulting unit into a TCP segment.

a. SSL record protocol

b. SSL TCP segment

c. SSL handshake protocol

d. None of the mentioned

Ans: (a) SSL record protocol

The SSL record protocol is responsible for taking application data, fragmenting it into manageable
blocks, optionally compressing it, applying a Message Authentication Code (MAC) to ensure integrity,
encrypting it for confidentiality, adding a header, and then transmitting the resulting unit. This
processed data is then encapsulated into a TCP segment for secure transmission. The SSL record
protocol ensures secure communication at the transport layer, making it a critical component of the
SSL/TLS suite.

You might also like