CS 39 -- Project 3


Adding a filesystem

For this assignment, we will be adding filesystem support to the kernel. In order for the kernel to provide a filesystem, the virtual machine must include a block device that persistently stores data. Such a block device is provided with the new version of the virtual machine.

There is also a new version of the assembler that has already been installed on algol. It provides the capability to insert string and integer literals into your executable images, and then to use the addresses at which those values will be loaded within your assembly code.

In order to get started with this project, you must download the latest version of the virtual machine, version 1.3. You can find the tarball for this code in the expected place:

~sfkaplan/public/cs39/VP-v1.3.tar.bz2

This tarball will include code for the virtual machine, the MMU, and the block device that will serve as the persistent storage on which you will build a filesystem. Note, as always, that these components may not be changed.


The block device

The block device has a simple interface that allows the kernel to query the number of bytes in each block, and the number of total blocks available. It also allows the kernel to read and write particular blocks, and to synchronize the device (that is, flush buffers and commit changes). The functions provided by the block device are as follows:


New system calls

There are seven new system calls that the filesystem must support. Each of them return one of many possible return values to the caller through register %s0. We list the system calls themselves, and then list the return values.

  1. OPEN: 0xF0000000

    This system call should attempt to open a specified file in the filesystem. Register %s1 should contain the virtual address (with the address space of the calling process) of a null-terminated string that stores the requested filename.

    If the requested file exists, the filesystem should record that the file has been opened in the open-file table of that process. This table should contain one entry for each open file, and each entry should contain at least the following information:

    Once this entry is created, the kernel should place the file ID number in %s1. Upon success the FILE_SUCCESS code should be returned. If the requested file does not exist, then the FILE_BAD_NAME value should be returned.

  2. CLOSE: 0xF0000001

    This system call should close the file specified by the file ID provided by the caller in %s1. If successful, FILE_SUCCESS should returned; if the file ID does not specify an open file for this process, FILE_BAD_ID should be returned.

  3. CREATE: 0xF0000002

    This system call should attempt to create a specified file in the filesystem. Register %s1 should contain the virtual address of a null-terminated string that stores the requested filename. The file should be added to the metadata of the filesystem, making it a valid file that can be opened, and that has an initial length of 0 bytes.

    If the filename is valid, and the file can be created, then FILE_SUCCESS should returned; if the file already exists, FILE_EXISTS should be returned; if there is no space on the device to add another file, FILE_DEVICE_FULL should be returned.

  4. DELETE: 0xF0000003

    This system call should delete a specified file in the filesystem. Register %s1 should contain the virtual address of a null-terminated string that stored the requested filename. If the file is successfully deleted, then FILE_SUCCESS should be returned; if the file name is not valid, then FILE_BAD_NAME should be returned; if the file is currently open, FILE_OPEN should be returned.

  5. READ: 0xF0000004

    This system call should read the next byte in the open file specified by the file ID in %s1. If the read is successful, FILE_SUCCESS should be returned and the byte of data placed in the low-order byte of %s1. The system call should return FILE_BAD_ID for an invalid file ID, and FILE_EOF if the current position is already at the end of the file.

  6. WRITE: 0xF0000005

    This system call should write the low-order byte value from %s2 to the next byte of open file specified by the file ID in %s1. If the write is successful, FILE_SUCCESS should be returned. The system call should return FILE_BAD_ID for an invalid file ID, and FILE_DEVICE_FULL if writing the byte request space not available on the block device.

    Note that if the current file position is at the end of the file, then the new byte should be appended to the file.

  7. SEEK: 0xF0000006

    This system call moves the current position pointer for the file specified by the file ID in %s1 to the offset specified in %s2. If the seek is succesful, FILE_SUCCESS should be returned. The system call should return FILE_BAD_ID for an invalid file ID, and FILE_BAD_OFFSET if the requested offset is beyond the end of the file.

The return values that these system calls place into %s0 are:


Buffer space for the filesystem

When blocks are read by the kernel, they must be read into some memory space. Because a process may read or write only particular bytes within a block, it is up to the filesystem to load the whole block into its buffer space, and then serve the requested bytes from that block.

In real systems, there is a separate pool of page frames in main memory that are allocated for the caching of filesystem blocks. The system must arbitrate between the needs of the filesystem and the needs of virtual memory in allocating those page frames.

  1. The simple version: Allocate one cached filesystem block. That is, it may reserve one page frame in the simulated physical memory, and use that for file operations.
  2. The more complex version: Use an arbitrary number of page frames to cache filesystem blocks. You will need a data structure to track which blocks are cached in the filesystem cache pages. You will also need a page replacement policy that can select either a virtual memory page or a filesystem cache page. If a filesystem cache page is selected, its contents should not be evicted to the virtual memory swap space. Before reclaiming a filesystem cache page, be sure that the contents of the cached blocks have been written back to the disk!

The naming space

You must allow for at least 16 character filenames. You need only to support a flat naming spacing; that is, you do not need subdirectories. A single, root-level directory that stores the names of files will be sufficient. If you would like to try hierarchical naming, you are welcome to do so, but it will be solely for your edification.


Additions to the assembly language capabilities

Some of the system calls described above require processes to provide pointers to spaces that hold strings. Therefore, the assembler allows you to specify string and integer constants within programs. Each such value is given a label that can be used in place of any immediate opcode value for an instruction. Such labels will be automatically replaced with the address at which the constant value will be loaded into the executable image.

Take the following bit of contrived assembly code as an example:

  (
      ; Set the code for a DELETE system call request.
      (()    stri %s0 #xf0000003)

      ; Set the address at which the filename can be found.
      (()    stri %s1 filename)

      ; Do the DELETE.
      (()    sysc)

      ; Load into %s0 the code used to specify an EXIT system call.
      ; Then perform the call itself.
      (()    load %s0 %zero exitcode)
      (()    sysc)
  )
  (
      ; The system call code for exiting.
      (exitcode #x11111111)

      ; The name of a file to be deleted.
      (filename "Foo.txt")
  )
    

Note that the assembly code forms one outer list, and the constant values form another. The assembler calculates that addresses at which all instructions and constants will be loaded, and then substitutes those addresses wherever a corresponding label is used.

Each constant value must have a label. Each string is automatically null-terminated; that is, the character whose value is 0 is placed at the end of string to mark its termination. Only strings and integers may be specified as constants. The spaces named by the labels are virtual memory spaces -- although the values initially placed into them are constants in one sense, those values may be changed at runtime by the program simply by storing into the given address.

Also note that you should list all integers before any strings. The assembler is dumb with regards to alignment. Thus, if you first listed a string of, say, 2 characters, and then followed that by an integer, then the integer would not be word-aligned; load/store operations on the address at which that integer is stored will cause invalid memory accesses.


Submitting your work

When your group has completed its project, submit all files as project-3 using the cs39-submit program. Note that this program can take directory names as arguments.


This assignment is due on Wednesday, December 14th at 11:59 pm!

This due date (and time) is the very last possible moment in the semester that I can accept course work. There will be no extensions!


Scott F. Kaplan
Last modified: Sat Nov 26 12:33:08 EST 2005