KEMBAR78
Distributed File Systems | PDF | Cache (Computing) | File System
0% found this document useful (0 votes)
10 views28 pages

Distributed File Systems

The document provides an overview of Distributed File Systems (DFS), explaining their client/server architecture, mechanisms for data sharing, and key components. It discusses specific implementations like Sun NFS and Andrew File System (AFS), highlighting their features, advantages, and differences, particularly in caching strategies and scalability. Additionally, it addresses challenges such as cache consistency and update policies, along with the importance of replication and access control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

Distributed File Systems

The document provides an overview of Distributed File Systems (DFS), explaining their client/server architecture, mechanisms for data sharing, and key components. It discusses specific implementations like Sun NFS and Andrew File System (AFS), highlighting their features, advantages, and differences, particularly in caching strategies and scalability. Additionally, it addresses challenges such as cache consistency and update policies, along with the importance of replication and access control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Distributed File Systems

(DFS)
GROUP 6
PRESENTATION
Introduction

 A distributed file system is a client/server-based application


that allows clients to access and process data stored on the
server as if it were on their own computer.
 When a user accesses a file on the server, the server sends the
user a copy of the file, which is cached on the user's computer
while the data is being processed and is then returned to the
server.
Mechanisms for data sharing

 Web servers
 P2P file sharing
 Distributed storage systems
-Distributed file systems
-Distributed object systems
Components in a DFS
Implementation
 Client side:
 What has to happen to enable applications access a remote file in the
same way as accessing a local file
 Communication layer:
 Just TCP/IP or some protocol at higher abstraction
 Server side:
 How does it service requests from the client
Goals of distributed file service
To enable programs to store and access remote files exactly as they
do local ones
 Data sharing of multiple users
 Transparency: access, location, mobility, performance, scaling
 Backups and centralized management
Distributed file system
requirements
•File replication: A file may be represented by several copies of its contents at
different locations
•Fault tolerance: The service continues to operate in the face of client and
server failures
•Concurrent file updates: changes to a file by one client should not interfere
with the operation of other clients simultaneously accessing or changing the
same file
•Hardware and operating system heterogeneity using middleware
•Consistency: An inevitable delay in the propagation of modifications to all sites
File service architecture
Sun NFS (Network File System)
 introduced in 1985
 An important goal of NFS is to achieve a high level of support for
hardware and operating system heteroginity
 the first file service designed as a product
 RFC1813: NFS protocol version 3
 Each computer can act as both a client and a server
 Industry standard for local networks since the 1980’s
 OS independent (originally unix implementation)
 – rpc over udp or tcp
Access control and authentication
 The NFS server is stateless server, so the user's identity
and access rights must be checked by the server on each
request.
 In the local file system they are checked only on
the file’s access permission attribute.
 Every client request is accompanied by the userID and
groupID
 Kerberos has been integrated with NFS to provide a
stronger and more comprehensive security solution
NFS architecture
Summary for NFS
 access transparency: same system calls for local or remote files
 Client caches only a block of data
 scalability: can cope with an increase of nodes and does not cause any disruption
of service. Scalability also includes the system to withstand high service load,
accommodate growth of users and integration of resources
 file replication: read-only replication, no support for replication of files with
updates
 security: added encryption--Kerberos
 An excellent example of a simple, robust, high-performance distributed service
 mobility transparency: mount table need to be updated on each client (not
transparent)
Andrew File System (AFS)
 developed at CMU for use as a campus computing and information
system
 The design of the AFS reflects an intension to support information
sharing on a large scale by minimizing client-server communication.
 this was achieved by transferring whole files between server and client
computers and caching them at clients until the server receives a more
up-to-date version
 Goal: provide transparent access to remote shared files
Andrew file system architecture
Andrew file system architecture
simplified
Two unusual design
characteristics
• – Whole file serving: the entire contents of directories
and files are transmitted to client computers
– Whole-file caching: clients permanently cache a copy
of a file or a chunk on its local disk
Scenario of AFS
 Open a new shared remote file
– A user process issues open() for a file not in the local cache
– and then sends a request to the server
– The server returns the requested file
– The copy is stored in the client’s local UNIX file system and the
resulting UNIX file descriptor is returned to the client
• Subsequent read, write and other operations on the file are applied to
the local copy
Best for university setup
Scenario of AFS continued…
 • When the process in the client issues close()
– if the local copy has been updated, its contents are sent
back to the server
– server updates the contents and the timestamps on the
file
– the copy on the client’s local disk is retained
Characteristics

• Good for shared files likely to remain valid for long periods
– infrequently updated
– normally accessed by only a single user
– Overwhelming majority of file accesses
• Local cache can be allocated a substantial proportion of the disk space
– should be enough for a working set of files used by one user
Characteristics continued…
 • Assumptions about average and maximum file size and
reference locality
– Files are small; most are less than 10KB in size
– Read operations are much more common than writes
– Sequential access is much more common than random
access
– Most files are written by only one user. When a file is
shared, it is usually only one user who modified it
– Files are referenced in bursts. A file referenced recently
is very probably referenced soon.
• Maybe good for distributed database applications
Callback mechanism
Restart of workstation after failure
 If client B opens a file which client B has already opened, a
notification will be sent to client B if A has edited and closed the
file
 retains as many locally cached files as possible, but callbacks may
have been missed
 Venus sends cache validation request to the Vice server
– contains file modification timestamp
– if timestamp is current, server sends valid and callback
promise is reinstantiated with valid
– if timestamp not current, server sends cancelled
Callback mechanism
continued…
communication link failures
 callback must be renewed with above protocol before new open if a time
T has lapsed since file was cached or callback promise was last validated
Scalability
 AFS callback mechanism scales well with increasing number of users
-communication only when file has been updated
-in NFS timestamp approach: for each open
 since majority of files not accessed concurrently, and reads more
frequent than writes, callback mechanism performs better
Cache Consistency problem
 Files are identified with one master copy residing at the server machine, but
copies (or parts) of the file are scattered in different caches
 When a cached copy is modified, the changes need to be reflected on the
master copy to preserve the relevant consistency semantics.
 The problem of keeping the cached copies consistent with the master file is
the cache-consistency problem
 A client machine is sometimes faced with the problem of deciding whether a
locally cached copy of data is consistent with the master copy (and hence can
be used).
 If the client machine determines that its cached data are out of date, it must
cache an up-to-date copy of the data before allowing further accesses
Cache-Update Policy9
 The policy used to write modified data blocks back to the server’s
master copy has a critical effect on the system’s performance and
reliability.
 The simplest policy is to write data through to disk as soon as they are
placed in any cache.
 The advantage of a write-through policy is reliability: little information is
lost when a client system crashes.
 However, this policy requires each write access to wait until the
information is sent to the server, so it causes poor write performance.
 Caching with write-through is equivalent to using remote service for
write accesses and exploiting caching only for read accesses
Cache-Update Policy continued…
 An alternative is the delayed-write policy, also known as write-back caching,
where we delay updates to the master copy.
 Modifications are written to the cache and then are written through to the
server at a later time.
 This policy has two advantages over write-through.
 Firstly, because writes are made to the cache, write accesses complete
much more quickly.
 Secondly, data may be overwritten before they are written back, in which
case only the last update needs to be written at all.
 Unfortunately, delayed-write schemes introduce reliability problems,
since unwritten data are lost whenever a user machine crashes.
File Update Semantics

 to ensure strict one-copy update semantics: modification of


cached file must be propagated to any other client caching
this file before any client can access this file.
Sun NFS vs AFS

 They both use Remote procedure call (RPC)


 –AFS is Primarily attributable to the scalability
 – AFS does Caching of whole files in client nodes while in NFS
only a block of data is cached
 In NFS, read-only replication, no support for replication of files with
updates
 NFS caches only a block of data, AFS caches the entire file
 AFS: permanent caches, survives reboots
Questions and answers

 What is replication?

Why do we need replication?


What is High concurrency concept?
What is RPC
References

 Coulouris G, Dollimore J. and Kindberg T. (2011) Distributed


Systems: Concepts and Design, 5th Edition Addison-Wesley
 https://www.slideshare.net/awesomesos/distributed-file-systems
 https://searchwindowsserver.techtarget.com/tutorial/Windows-Dis
tributed-File-System-DFS-Tutorial
 http://sourcedaddy.com/windows-7/the-distributed-file-system.ht
ml
 https://www.techopedia.com/definition/1825/distributed-file-syste
m-dfs

You might also like