Reece McMillin

Distributed File Systems

2022-03-24

Files and File Systems

A file system

is a basic service that any operating system provides.

A file contains both data and attributes.

Basic file operations:

  • open(), create()
  • read()
  • write()
  • close()

File System Operations

  • Directory Service
    • Organize files in a directory structure
    • Files have human-readable names
  • Storage of files onto hard disk
    • File consists of blocks (usually 4KB)

Distributed File System

Distributed File Systems (DFS) support similar functionalities throughout an intranet (LAN).

  • provide access to files stored at a server with similar performance and reliability to files stored on local disks.

Requirements

  • Transparency
    • access, location, mobility, scale
  • Concurrency
  • Replication
    • share loads and enhance fault tolerance
  • Heterogeneous
  • Fault Tolerance
  • Consistency
    • one-copy update semantic
  • Security
  • Efficiency

Issues

  • file & directory naming: locate the file!
  • semantics: client/server operations, file-sharing
  • performance and scalability
    • fault tolerance
    • deal with remote server failures
  • implementation considerations
    • caching
    • replication
    • update protocols

Naming

  • Explicit naming
    • machine + path (/machine/path)
    • one namespace but not transparent
  • Implicit naming
    • location transparency
      • file name doesn't include the name of the server where file is stored
    • mounting remote filesystems onto local file hierarchy
      • view of the filesystem may be different on each computer
    • full naming transparency
      • a single namespace that looks the same on all machines

Semantic - Operational

  • fault tolerance
    • at-most-once semantics for file operations
    • at-least-once semantics with a server protocol designed in terms of idempotent file operations
    • replication (stateless, so servers can be restarted after failure)

Semantic - File Sharing

  • one-copy semantics
    • updates are written to the single copy and are available immediately
    • all clients see contents of file identically as if only one copy of file exists
    • if caching is used: after an update operation, no program can observe a discrepancy between data in cache and stored data
  • serializability
    • transaction semantics (file locking protocols implemented - share for read, exclusive for write)
    • session semantics
      • copy file on open, work on local copy and copy back on close

File Service Architecture

Flat File Service

  • implements operations on the contents of files
  • uses Unique File Identifiers (UFIDs) to refer to files
  • a new UFID is generated whenever a file creation is requested

Directory Service

  • mapping between text names for files and their UFIDs
  • clients may obtain UFID by quoting its text name to the directory service (lookup)
  • provides functions to
    • generate directories
    • add new file names to directories
    • obtain UFIDs to directories

Client Module

  • run in client machines
  • provide APIs for user-level programs for accessing file and directory services
  • holds information about the network locations of the file/directory server processes
  • caches file blocks for improved performance

Network File Service

Operations

  • Flat File
    • read(FileID, i, n) -> Data
    • write(FileID, i, Data)
    • create() -> FileID
    • delete(FileID)
    • GetAttributes(FileID) -> Attr
    • SetAttributes(FileID, Attr)

This API is very similar to UNIX's file system opeations with some notable differences:

  • no open or close functions
  • all operations except create are idempotent (to allow at-least-once RPC semantic)
  • the operations can be implemented by a stateless server

Modules

  • server module resides in the kernel on each computer that acts as an NFS server
  • client module resides on client computer who accesses the files
  • requests referring to a remote file system (RFS) are translated by the client module to NFS protocol operations and passed to server module
    • communication mostly happens via RPC

NFS provides access transparency

  • user programs issue file operations on local/remote files without distinction

VFS enables remote files to appear as local files using file handles. File handles have three parts:

  • filesystem identifier
    • unique ID given to entire file system
  • i-node number
    • unique number to identify and locate a file within the file system
  • i-node generation number
    • in UNIX, i-node is reused when a file is deleted (this keeps an extra counter when reused)

VFS layer contains v-node per open file (like i-node in UNIX). For local files, v-nodes are i-nodes. For remote files, v-nodes are file handles.

Client Authentication

Unlike convential UNIX operating system, NFS server is stateless (does not keep file "open" on behalf of client).

  • every request is independent of others, so the server

Server Interface

  • $create(dirfh, name, attr) \to newfh, attr$

Mount Service

Makes remote file system avaiable to local client by specifying remote host name and pathname of the directory.

  • Local: /usr/x
  • Remote: /usr/students/big
  • Remote: /usr/staff/jane
1fname = /usr/x
2fp = fopen(fname, "r")
3data = read(fp, 1024)

Mount Procedure

  • NFS Server runs a separate (user-level) mount service process
    • in the server, there is a file with a well-known name (/etc/exports) containing the names of local filesystems that are available for remote mounting
    • an access list may indicate which hosts can mount
  • NFS clients use mount command to request mounting
    • by specifying the remote host's name, the pathname of a directory in the remote filesystem and the local name with which it is to be mounted
  • Mount command uses and RPC-based mount protocol
    • takes directory pathname and returns file handle
    • the location (ip/port) of the server and the file handle for the remote directory are passed onto the VFS layer and NFS client
  • Two types of mounting:
    • hard mounted
      • user program is suspended until operation completes
      • if the server fails, waits until server is back (as though nothing happened)
    • soft mounted
      • an error is returned after some small number of retries
      • the client program then needs to do necessary recovery/reporting action

Caching

Caching is an important operation common in operating systems to achieve enhanced performance.

  • file content is stored on disk
  • if each request needs to go to disk directly, that would be too slow!
  • instead, OS usually "caches" content (blocks) read from the disk into memory

In NFS, both client and server can cache blocks.

  • $read$ /remote/file1 1-1000
  • $read$ /remote/file1 1001-2000

The server may want to cache a requested block in case other clients want it. A client may want to cache a requested block in case it expects to read from it many times.

What problems would this pose? Consistency!

Server Caching

NFS servers use cache to hold disk blocks. Typically two types of cache operations are used:

  • read-ahead: read blocks ahead of time (spatial locality from 281)
  • delayed-write: hold blocks in memory and do not write to disk (cache is replaced by new content)

Read does not cause any consistency issues, but write might! Two options exist for writing:

  • write-through
- fast at the cost of safety - write-commit - safe at the cost of speed

Client Caching

NFS client module caches the rusult of certain operations

  • usually read, write, getattr, lookup, and readdir operations.

Client caching may allow different versions of files to exist in different client nodes.

  • writes by a client do not update cached copies of other clients
  • clients need to check with the server (and pull updated ones)

To check validity of a cached block, a timestamp-based validation is used

  • a cached block is valid if it wasn't updated since last cached
  • only checks after a certain interval (called freshness) has elapsed

$$ \text{Validity Condition} = (T - Tc < t) \vee (Tm_\text{client} = Tm_\text{server}) $$

  • $T =$ current time
  • $Tc$ = last validated
  • $Tm$ = last modified

Summary

  • Transparency
  • Replication
  • Hardware/OS Heterogeneity
  • Fault Tolerance
    • server failurse may suspend clients (except for soft-mounting)
    • NFS access protocol is stateless and idempotent (no state = easy recovery)
  • Consistency
    • approximately one-copy semantic