Chapter 2(Part
II )
Distributed Objects and
File system
Chapter Outline
2.1 Introduction 2.6 Introduction to DFS
2.2 Communication between distributed objects 2.7 File Service Architecture
2.8 Sun Network File System
2.3 Remote Procedure Call
2.9 Introduction to Name Services
2.4 Events And Notifications
2.10 Name Services and DNS
2.5 Java RMI Case Study 2.11 Directory and Discovery Services
2.12 Comparison of Different Distributed File
Systems
File System
● A file system is a method used by operating systems and software to organize
and store data on a computer's storage devices, such as hard drives, solid-state
drives (SSDs), or flash drives.
● It provides a structured way to store, retrieve, and manage files.
File systems manage various aspects of data
storage
● File Naming: How files are named and identified within the system.
● File Organization: How files are stored on the storage medium and how they are
accessed.
● File Access Permissions: Controlling who can view, modify, or execute specific files.
● File Metadata: Storing additional information about files, such as creation date, last
modified date, size, and file type.
● File Recovery: Providing mechanisms for recovering data in case of corruption or
accidental deletion.
● File Compression and Encryption: Some file systems support compression and
encryption of data to save space or enhance security.
Examples
● NTFS (New Technology File System):
Developed by Microsoft for Windows operating systems.
Supports large file sizes, file compression, and file encryption.
● HFS (Hierarchical File System):
Developed by Apple Inc. for early versions of the Macintosh operating system.
Supports a hierarchical directory structure and metadata for file attributes.
● ext4 (Fourth Extended Filesystem):
A widely used file system in Linux distributions.
Supports large file sizes (up to 16 terabytes) and volumes (up to 1 exabyte).
2.6 Introduction to DFS
● A Distributed File System ( DFS ) is simply a classical model of a file system
distributed across multiple machines. The purpose is to promote sharing of
dispersed files.
● The resources on a particular machine are local to itself. Resources on other
machines are remote.
● A file system provides a service for clients. The server interface is the normal
set of file operations: create, read, etc. on files.
● Distributed file systems support the sharing of information in the form of files
throughout the intranet.
2.6 Introduction to DFS
● A distributed file system enables programs to store and access remote files
exactly as they do on local ones, allowing users to access files from any
computer on the intranet.
● Recent advances in higher bandwidth connectivity of switched local networks
and disk organization have lead high performance and highly scalable file
systems.
● File systems are responsible for the organization, storage, retrieval, naming,
sharing and protection of files.
2.6 Introduction to DFS
Characteristics of file systems
● Files contain data and attributes.
● The data consist of a sequence of data items(typically 8-bit bytes ), accessible
by operations to read write any portion of the sequence.
● The attributes are held as a single record containing information such as a
length of the file, timestamps, file type, owner’s identity and access control lists.
● In a file system, a directory is a container or a special type of file that holds
references to other files and directories. It acts as a hierarchical structure to
organize and manage files stored on a computer's storage device.
● Directories are commonly referred to as folders in graphical user interfaces.
Characteristics of file systems
● The term metadata is often used to refer to all of the extra information stored
by a file system that is needed for the management of files.
● It includes file attributes, directories and all other persistent information used
by the file system.
File attribute record structure
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access Control List
File attribute record structure
File System Operations
file_des = open(name, mode) Open an existing file with given name
file_des = create(name, mode) Creates a file with given name
Status = close(file_des ) Closes the open file
Count = read(file_des, buffer, n) Transfers n bytes from the file referenced by
files_des tobuffer
Count = write(file_des, buffer, n) Transfers n bytes to the file referenced by file_des
from buffer
File System Operations
File System Modules
Directory Module Relates file names to file IDs
File module Relates file IDs to particular files
Access control module Checks permission for operation requested
File access module Reads or writes file data or attributes
Block module Accesses and allocates disk blocks
Device module Disk I/O and buffering
Distributed File System Requirements
● Transparency
● Concurrent file updates
● File replication
● Hardware and Operating System heterogeneity
● Fault tolerance
● Consistency
● Security
● Efficiency
Transparency
➔ The design of the file service should support many of the transparency
requirements for distributed systems.
➔ The design must balance the flexibility and scalability that derive from
transparency against software complexity and performance.
● Access Transparency
● Location transparency
● Mobility Transparency
● Performance Transparency
● Scaling transparency
Concurrent File updates
● Changes to a file by one client should not interfere with the operation of other
clients simultaneously accessing or changing the same file.
● File locking helps prevent conflicts that can arise when multiple processes or
threads attempt to access or modify the same file simultaneously.
File Replication
● File replication ensures that multiple copies of files are distributed across
different storage nodes or servers within the DFS.
● By distributing copies of files across multiple servers, DFS can distribute read
and write requests among these servers, thus balancing the workload and
improving overall performance.
● By replicating files across multiple nodes or servers, DFS provides fault
tolerance.
● DFS replication mechanisms typically ensure consistency and data integrity
across replicas. Changes made to files are synchronized across replicas,
ensuring that all copies remain consistent and up-to-date.
Hardware and Operating System Heterogeneity
● The service interfaces should be defined so that client and server software can
be implemented for different OS and computers.
● Important aspect of Openness
Fault Tolerance
● Fault tolerance is a critical requirement in a Distributed File System (DFS) to
ensure continuous availability and reliability of data, even in the presence of
failures or faults.
● For transient communication failures , the design based on at-most-once
invocation semantics or at-least-once, ensures that duplicated requests do not
result in invalid updates to files.
● Tolerance of connection or server failures requires file replication.
Consistency
● Consistency in a Distributed File System (DFS) refers to the property that all
clients accessing the same file or directory see the same, up-to-date version of
the data, regardless of which server or storage node they access.
● Ensuring consistency is crucial for maintaining data integrity and preventing
data corruption or inconsistencies in a distributed environment.
● Strong Consistency: Strong consistency guarantees that all read and write
operations on a file or directory follow a strict sequential order and appear to
be instantaneous from the perspective of all clients.
Security
● Security requirements in a Distributed File System (DFS) are critical to protect
sensitive data, prevent unauthorized access, and ensure the integrity and
confidentiality of files and user information.
● Authentication( verify the identity of users), Authorization(determine what
actions users are allowed), encryption(to protect data in transit and at rest),
Access Control Lists (ACLs - to specify detailed permissions for individual users
or groups on files and directories)
Efficiency
● A distributed file service should offer facilities that are of at least the same
power and generality as those found in conventional file systems and should
achieve a comparable level of performance.
● Efficiency requirements in a Distributed File System (DFS) are essential for
ensuring optimal performance, scalability, and resource utilization while
minimizing latency and overhead.
Stateful vs Stateless Service
● In a stateless server architecture, each client request is treated independently,
and the server does not maintain any information about the state of the
client's session between requests.
● Stateless servers handle each request in isolation, without relying on any
context or data from previous requests.
● After responding to a client request, the server forgets about the interaction
and does not store any information about the client's session.
Stateful vs Stateless Service
● Stateless servers are typically simpler and more scalable because they do not
need to manage client state.
● They can distribute requests across multiple server instances without
worrying about session affinity or data synchronization.
● Examples of stateless protocols include HTTP, where each request from a web
browser to a web server is independent and stateless.
Stateful vs Stateless Service
● In contrast, stateful server architectures maintain information about the state
of client sessions and use this information to manage subsequent interactions
with clients.
● Stateful servers keep track of client state between requests, storing session
data such as authentication tokens, user preferences, shopping cart contents,
or game state.
● When a client sends a request to a stateful server, the server can use the stored
session data to customize its response based on the client's previous
interactions.
Stateful vs Stateless Service
● Stateful servers require additional resources and complexity to manage client
state, including mechanisms for session management, data storage, and
synchronization.
● While stateful servers can offer more personalized and efficient interactions
with clients, they are often less scalable and more prone to issues such as
session timeouts, data inconsistency, and single points of failure.
● Examples of stateful protocols include protocols used in online gaming, where
servers maintain real-time game state and player interactions.
2.7 File service architecture
● A distributed file system (DFS) architecture typically consists of multiple
interconnected components distributed across a network of computers or
storage nodes.
● An architecture that offers a clear separation of the main concerns in providing
access to files is obtained by structuring the file service as three components -
a flat file service, a directory service and a client module.
● Most importantly this architecture enables stateless implementation of the
server modules.
2.7 File service architecture
2.7 File service Architecture
Flat File Service:
● Concerned with implementing operations on the contents of files.
● Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat
file service operations.
● Responsibilities of file and directory service is based upon UFID (long sequence
of bits so each file has UFID which is unique in DS).
● When the flat file service receives a request to create a file, it generates a new
UFID for it and returns the UFID to the requester.
2.7 File service Architecture
Directory Service:
● It provides a mapping between text names for files and their UFIDs
● Client Obtain UFID by quoting text name to the directory service.
● The directory service provides the functions needed to generate directories, to
add new file names to directories and to obtain UFIDs from directories.
● It is a client of flat file service; its directory files are stored in files of the flat file
service.
2.7 File service Architecture
Client Module:
● It runs in each client computer.
● Integrate and expand the operations of the flat file service under single
application programming interface.
● The client also holds information about the network locations of the flat file
server and directory server processes.
● Finally, it can play important role in achieving satisfactory performance
through the implementation of a cache of recently used file blocks at the client.
2.8 SUN Network File System(NFS)
● One of the earliest and quite successful distributed systems was developed by
Sun Microsystems, and is known as the Sun Network File System (or NFS).
● In defining NFS, Sun took an unusual approach: instead of building a
proprietary and closed system, Sun instead developed an open protocol which
simply specified the exact message formats that clients and servers would use
to communicate.
● there are many companies that sell NFS servers (including Oracle/Sun, NetApp
EMC, IBM, and others), and the widespread success of NFS is likely attributed
to this “open market” approach.
2.8 SUN Network File System(NFS)
● It is an architecture of the client/server, which contains a client program, server
program, and a protocol that helps for communication between the client and
server.
● It's commonly used in environments where multiple users need to access the
same files or data, such as in UNIX and Linux systems.
● This protocol is mainly implemented on those computing environments where
the centralized management of resources and data is critical.
● It uses the Transmission Control Protocol (TCP) and User Datagram Protocol
(UDP) for accessing and delivering the data and files.
Objectives
● Machine and Operating System Independence
● Fast Crash Recovery
● Transparent Access
● UNIX semantics should be maintained on client
● “Reasonable” performance
NFS Architecture
Virtual File System:
● NFS provides access transparency: user programs can issue file operations for local or
remote files without distinction.
● This integration is achieved by a virtual file system(VFS) module, which has been
added to the UNIX kernel to distinguish between local and remote files and to
translate between the UNIX-independent file identifiers used by NFS and internal file
identifiers normally used in UNIX and other file systems.
● VFS keeps track of the file systems that are currently available both locally and
remotely, and it passes each request to the appropriate local system module.
Basic design
Three important parts:
• The protocol
• The server side
• The client side
The protocol
● Uses the Sun RPC mechanism and Sun eXternal Data Representation (XDR)
standard.
● Defined as a set of remote procedures.
● Protocol is stateless: each procedure call contains all the information
necessary to complete the call.
● The server does not track anything about what clients are doing; rather, the
protocol is designed to deliver in each protocol request all the information that
is needed in order to complete the request.
Advantages of Statelessness
1. Crash recovery is very easy:
• When a server crashes, client just resends request until it gets an answer from
the rebooted server.
• Client cannot tell difference between a server that has crashed and recovered
and a slow server.
2. Client can always repeat any request.
Consequences of statelessness
1. Read and writes must specify their start offset
• Server does not keep track of current position in the file.
• User still use conventional UNIX reads and writes.
2. Open system call translates into several lookup calls to server.
Server side
● The file identifiers used in NFS are called file handles.
● A file handle is opaque to clients and contains whatever information the server
needs to distinguish an individual file.
File Handle: Filesystem i-node number of i-node generation
identifier file
Server side
1. File handle consists of
• Filesystem id: identifying disk partition/file system
• I-node number: identifying file within partition
• Generation number: changed every time i-node is reused to store a new file
2. Server will store
• Filesystem id in filesystem superblock.
• I-node generation number in i-node.
Client side
● Provides transparent interface to NFS.
● The NFS client module cooperates with the virtual file system in each client
machine.
● It transfers blocks of files to and from the server and caching the blocks in the
local memory.
Mount Protocol
● The "mount" protocol refers to the process by which an NFS client system
connects to an NFS server and makes a shared directory from the server
available locally on the client system.
● Exporting Directories on the Server: First, on the NFS server, one or more
directories need to be designated for sharing with remote clients. This is
typically done by configuring the NFS server to export specific directories using
NFS export configurations (such as /etc/exports file on Unix-based systems).
● Client Configuration: On the NFS client system, administrators configure the
client to mount the exported directories from the server. This involves
specifying the server's hostname or IP address and the path to the shared
directory on the server.
Mount Protocol
● Mount Command: The administrator or user on the client system then issues
the "mount" command, specifying the NFS protocol and the server's address
along with the path to the shared directory.
● NFS Protocol Communication: The NFS client sends a mount request to the
server over the network using the NFS protocol. The server responds by
providing access to the shared directory.
● Mounting: Once the mount request is accepted, the shared directory from the
NFS server becomes accessible on the client system at the specified local
mount point. From the client's perspective, the remote directory appears as if it
is part of the local filesystem.
Challenges for NFS
● Limited Security
● Performance Overhead
● Susceptibility to Network Issues
● Single Point of Failure
● Scalability Challenges
2.9 Introduction to Name Services
● In a distributed system, names are used to refer to a wide variety of
resources such as: Computers, services, remote objects, and files, as well
as users.
● Resources are accessed using identifier or reference
○ An identifier can be stored in variables and retrieved from tables quickly.
○ Identifier includes or can be transformed to an address for an object. E.g. NFS
file handle, CORBA remote object reference.
2.9 Introduction to Name Services
● A name is human-readable value (usually a string) that
can be resolved to an identifier or address.
○ Internet domain name, file pathname, process number
○ E.g ./etc/passwd, http://www.cdk3.net/
● Name service is as distinct service that is used by client
processes to obtain the attributes such as the resources
or objects when given their names.
2.9 Introduction to Name Services
● You need to name an entity in order to use it.
● If you don’t have a name or don’t know a name you should be able to describe its
characteristics in order to identify it.
● According to these two requirements we have two services:
○ Naming service
○ Directory service
Naming Service
● Given the name of a resource, returns the information about the resource.
● For example consider the white pages(http://www.whitepages.com/ ): given the
name of a person you get the address/telephone number of that person.
● Other examples: LDAP (Lightweight Directory Access Protocol) a person on
UB(University of Buffalo http://ldap.buffalo.edu/ ) computers gives you
information about the person’s email, campus address, phone number, position
held etc.
Directory Services
● Given a description, find a service or resource that matches the description.
● For example consider the yellow pages( http://www.yellowpages.com/ ): when you
want to rent a car, it may give a list of car rental agencies.
● A more powerful service than naming where you look up for names using the
attributes than the other way.
● Clients can Lookup for services by providing their attributes rather the name.
● A discovery service provides registry and lookup for spontaneous networking.
● Registry is used by server to publish a service and lookup is used by a client to locate
a service.
Domain Name System
● Hierarchical Distributed Database.
● DNS is the foundation of the Internet naming scheme.
● DNS supports accessing resources by using alphanumeric names.
● InterNIC(network information center) is responsible for managing the domain
namespace.
● DNS was created to support the Internet’s growing number of hosts.
Domain Name System
Name Space
● A namespace refers to the hierarchical structure used to organize
domain names into a logical and navigable system.
● A namespace is a collection of all valid names recognized by a particular
service.
● Allow simple but meaningful names to be used.
● Potentially infinite number of names.
● Structured
○ to allow similar subnames without clashes/conflict.
○ to group related names
● Allow re-structuring of name trees
○ for some types of change, old programs should continue to work
● Management of trust.
Name Space
● /etc/passwd is a hierarchic name with two components.
● The first, ‘etc’, is resolved relative to the context ‘/’, or root,
and the second part, ‘passwd’, is relative to the context
‘/etc’.
● The name /oldetc/passwd can have a different meaning
because its second component is resolved in a different
context.
● Similarly, the same name /etc/passwd may resolve to different
files in the contexts of two different computers.
Domain Name Space
Standard For DNS Naming
● The following characters are valid for DNS names:
○ A through Z
○ a through z
○ 0 through 9
○ Hyphen (-)
● The underscore (_) is a reserved character.
DNS Resolution
● Resolution is an iterative process whereby a name is repeatedly presented
to the naming contexts.
● DNS resolution is the process of converting a domain name (such as
www.example.com) into the corresponding IP address (such as
192.0.2.1).
● The name is first presented to some initial naming context; resolution
iterates as long as further context and derived names are output.
● Example1: /etc/passwd in which ‘etc’ is presented to context / and
‘passwd’ is presented to context /etc.
● Example 2: www.dcs.qmw.ac.uk in which the alias is resolved to another
domain name such as copper.dcs.qmw.ac.uk which is further resolved to
produce IP address.
What Are the Components of a DNS Solution?
DNS – How it works?
DNS – How it works (mechanism)
What Is a DNS Query?
● A query is a request for name resolution and is directed to a
DNS server.
● Queries are recursive or iterative.
● DNS clients and DNS servers both initiate queries
● DNS servers are authoritative(complete and up-to-date
information about the domain ) or nonauthoritative for a
namespace
What is a resource record?
•A domain contains resource records
•Resource records are analogous to files
•Classified into types
•Some of the important types are SOA, NS, A, CNAME
and MX
•Normally defines in “zone files”
Types of resource record
2.12 Comparison of Different DFS
Hard Times in Life
End of Chapter Two
THANK YOU
For your patience and attention