CIS 25 - Notes 0011

These notes are a continuation of last class's notes. Basically repeating the things that you should already know after using both Windows and UNIX operating systems.

File Systems Mounting

As discussed in previous notes, a file system resides on a partition. We can also mount one file system under another file system as a directory.

Sometimes the act has to be performed explicitly. Under UNIX for example, you need to mount all partitions under some directory. Under Windows, the mounting is done automatically at system boot time (and there are no provisions for mounting other file systems under directories).

In a networked world, one can even mount file systems from remote computers via the network.

File Sharing

When many people can access the system, the system must act as a moderator for file accessing between users. It must make sure that permissions are maintained on who can access what files, and to enforce those permissions.

There is also the matter of people accessing the same file at the same time. Should two users be allowed to write to the same file at the same time?

The network brings many sharing possibilities. As mentioned earlier, we can share files by allowing others to mount the file system. We can run a server (such as ftp, for http) for others to get access to the file.

Networking complicates the security issue immensely. Those are no longer concerned with a fairly fixed set of users, but with the entire network.

Others

The book also talks about distributed information systems, like DNS, etc., while it is important for us to understand those, they are not directly related to operating systems.

File System Implementation

File System Structure

Disks provide the bulk of secondary storage on which a file system is maintained. They have favorable characteristics that make them suitable for the task. They can rewrite data in place (read a block, modify it, and write it back to the same location). Disks also allow for either sequential or random access of files. To move from one file to another, all that is required is moving the read/write head.

Disk drives are block devices. Instead of reading data a byte at a time, they read blocks at a time (or sectors at a time). Usually, the sector is 512 bytes.

Layering

File Systems generally use layering, where the lower layer provides low level services, with higher level layers providing high level features. For example, when we read/write files in application programs, we only care about issuing read/write commands. The operating system decides which file those commands go to and under what file system, etc., it then sends commands to the lower level layer to read/write individual sectors from the disk.

This layering is very beneficial, since an operating system may provide access to files on various devices such as CD-ROMs, hard drives, floppies, etc., and layering provides a common interface and limits code duplication.

Overview

Disks generally have several general areas used for specific things. The first block (or sector) of the disk is used when booting the computer to start the operating system.

There is also a table containing partition information. How many partitions are on the disk, which partition is allocated to what OS, the format of the partition, etc. Then, there is the partition which contains the file system.

The file system is generally represented by some arrangement of FCB (file control blocks) which generally contain file permissions, file dates, file owner, file size, and file data blocks.

In addition to the on disk information, the operating system usually maintains lots of information in memory, such as open file lists (discussed in last class's notes), file system information, and caching info.

Virtual File System

Since many operating systems are not just concerned with representing files in their file systems, there is a concept as Virtual File System. It allows us to represent networked file systems as seamlessly as local on-disk file systems. On some systems (such as UNIX) we can represent majority of the hardware devices as files in the file system.

Directory Implementations

We can implement the directory in several ways.

Linear List

The simplest approach to implementing a directory is simply maintaining a list (or array) of file names (and other info).

This approach has some problems like: searching for a file means we must do a linear search, and inserting and deleting from a fixed list is not simple.

We can also maintain a sorted list, but that would mean maintaining the sorted structure.

Hash Table

We can also implement the directory as a hash table. This could be in addition to the linear list. A hash table allows us to quickly find any individual file.

Hash tables aren't the perfect solution either. One problem is hash clashes. These must be resolved, and ways to resolve them aren't exactly better than maintaining a linear list.

File Allocation Methods

Storing files is the primary job of the disk, but the method we use to allocate space to the file plays a key role in performance.

Contiguous Allocation

In Contiguous Allocation we allocate a file as a contiguous set of disk sectors.

This has benefits that seeking within a file is easy, since we can easily calculate where parts of file are on the disk. Reading a file is also efficient since the read/write had doesn't have to do too much seeking from sector to sector.

The disadvantage is that this approach causes external fragmentation. Finding space for a file may become a problem.

We can overcome some difficulties by allocating a contiguous block, then once that is used up, allocate another large contiguous block for a file and link those together.

Linked Allocation

Linked allocation creates inked lists using disk sectors as nodes. For example, if the sector size is 512 bytes, and it takes a 32 bit number to represent the next block address, then only 508 bytes can be used to store data (the other 4 bytes are used to locate the next file sector).

This approach eliminates external fragmentation (since even the smallest block can now be used).

Some problems with this approach are that seeking within a file is now difficult. For example, if we're interested in the last 100 bytes of a 100mb file, we need to traverse all the sectors of the file (follow links) to get to the last 100 bytes.

Another major issue is that we need to do a disk seek with every disk sector (unless disk sectors are contiguous).

And yet another issue with this is that storing pointers on every disk sector uses up a lot of disk space.

Hybrid Methods

Most systems don't implement clean linked allocation method. In order to minimize space wasted on the links per sector, we can logically cluster several sectors together, and link those.

DOS for example uses a File Allocation Table, which resides in the beginning of the partition, and identifies links between file clusters on the disk. This method is more efficient to the general linked method since if we cache the file allocation table, we can seek through the file (and find free clusters) fairly quickly.

Again, many of these methods have their strengths and weaknesses, and the method chosen for each operating system usually reflects the intended use of the system.

Indexed Allocation

Indexed allocation is similar to linked allocation, but instead of having each node having a reference to the next node, there is one node, that has links to all the other nodes (based on an index). We can also use nested indexes, etc.

Free Space Management

In order to make a file system work we not just have to care how to allocate files, but how to allocate and manage free space.

Bit Vector

One approach to allocating free space is to maintain a bit vector of free and used sectors. The big array would generally be stored in memory to make finding free space a simple matter of searching the bit list for unset bits.

This approach suffers from the fact that even with a bit for each sector the memory requirements for the bit vector are relatively large (especially for large disks).

We can alleviate it somewhat by indicating free sector clusters instead of free individual sectors. For example, if we cluster 4 sectors together into 1 logical allocation unit, we'd need to maintain 4 times less bits.

Linked Lists

We can also use linked lists of free sectors (or clusters). A linked list would contain the next free block, which would point to next free block, etc. Once a block is allocated, the next free block becomes the 'free block'.

Freeing a block means making it the free block, and pointing it to the next free block.

Performance

Depending on the chosen algorithm and its implementation we can specialize the system for a specific type of use and desired efficiency. For some systems, linked allocation scheme might work best, for others, contiguous might be better, and yet others, a hybrid approach might do well.

In any case, the system can improve efficiency by storing parts of the disk data in memory and accessing it there instead of the disk. For example, the system can maintain the file allocation table in memory, etc.

This presents some reliability problems, since in case of a crash, volatile memory gets wiped out, and the disk may no longer be in valid state.

We can apply all sorts of recovery and proper state management techniques to try to avoid any problems a crash might cause to the operating system.