kVMTrace: Quick start guide

This guide will help you to get a system installed with kVMTrace, collect traces with it, and process those traces. Please note: kVMTrace has not been extensively tested on many machines and many Linux distributions, and therefore may not work (or work well) on all systems. This is an experimental trace collection tool that has been made to work well enough to drive research on virtual memory and file system caching, and it is certainly not ready for critical or production systems.

Installing kVMTrace

kVMTrace collects references to the file system and virtual memory system by operating within the OS kernel. Specifically, it is a patch to a standard Linux kernel which you must build and install onto an otherwise normal system. Here are recommended steps to perform this task:

Select a Linux/x86 system on which to run the kVMTrace kernel. I recommend that you dedicate a system to this purpose since kVMTrace is not as stable as the stock kernel and because it has not been thoroughly tested with all systems.

kVMTrace has been tested using Slackware 10.1 [disc 1] [disc 2] [disc 3] on both a "vanilla" Pentium 4 system (SiS motherboard and chipset, IDE discs, etc.) and on VMware. Using the latter will remove any concerns regarding hardware compatability with kVMTrace, but it also hinders performance mildly.

Install your chosen system normally. Then, login as root to continue the installation.
Download a copy of the Linux 2.4.20 kernel onto your system. Extract the kernel:

shell% tar -xjpf linux-2.4.20.tar.bz2
Apply the UML patch: First download the UML patch, and then apply it like so:

shell% cd linux-2.4.20 shell% bunzip2 < uml-patch-2.4.20-8.bz2 | patch -p 1
Apply the kVMTrace patch: Download the kVMTrace patch and apply it:

shell% bunzip2 < ../2.4.20-kVMTrace-v0.5.0.patch.bz2 | patch -p 1
Configure the kernel: For simplicity, these instructions describe a kernel build without modules so that you can create the single kernel image file, put it into place, and boot from it. The kVMTrace patch includes a configuration file for VMware machines -- if you use a different machine, you can use this one as a starting point:

shell% make menuconfig

Then select the penultimate menu item, Load an Alternate Configuration File, clear the provided default, and enter:

vmware.config

You will be returned to the menu of choices. Make any changes to the configuration that you see fit, if any. Finally, use the right-arrow to select Exit and, when prompted, select Yes to saving changes.
Build the kernel: Enter the following commands to perform this step:

shell% make dep shell% make bzImage
Install the kernel: First, copy your kernel the appropriate directory:

cp arch/i386/boot/bzImage /boot/vmlinuz-kVMTrace

Now, edit the bootloader's configuration file so that this new kernel is a booting option. Editing that file depends on your bootloader (likely LILO or GRUB), but there is abundant documentation and examples for adding a kernel to these bootloaders.
Install the activator: Before rebooting (that is, while still running this system under the standard kernel), switch to the root
- Download the activator source code.
- Unpack and compile it, and move the executable to the home directory:
  
  cd actiavtor make mv activator ~

All of the components of kVMTrace are in place! You can now move on to collecting traces.

Collecting traces

To trace a workload, perform the following steps:

Reboot your system, selecting the kVMTrace kernel.
Activate kernel eventstracing: Login as root and run the activator, providing pathnames to which the kernel events and reference traces should be written. This step will commence the logging of kernel events (but not references):

./activator /big-partition/kernel-trace /big-partition/reference-trace

Note that the kernel event and reference traces can get quite large, so they should be placed on partitions that have many GB of available space. Also note that the activator must remain running throughout tracing.
Login as a user: Login as whatever user you want to be to run your workload. Do any necessary preparations leading up to the running of your workload (although any preparations that can be done before booting under kVMTrace should be).
Activate reference tracing: Select option 1 activator's menu.
Run your workload! Do whatever you planned to do in running your workload, be it batch processing, interactivity, servers, etc.

kVMTrace is not production-level software. Some programs may crash with segmentation faults. More rarely, the kernel itself may crash. Reports of any consistent, reproducible errors of this kind are greatly appreciated! At the moment, kVMTrace is stable enough that many workloads can be handled successfully, and crashes are rarely reproduced easily.
Stop trace collection: Select menu item 2 from the activator menu, thus shutting down tracing. At this point, the trace files are closed, and we recommend that you reboot under a normal kernel.
Compress the traces: The original traces are emitted in an uncompressed form. We recommend that you compress these files (they tend to shrink to approximately 20% of their original size). If you compress using gzip, then post-processing steps will progress more easily. (We don't recommend the use of bzip2 -- although it produces smaller traces, it is exceedingly slow on such large files.)

gzip -9 /big-partition/kernel-trace /big-partition/reference-trace

Post-processing the traces

Once you have a pair of trace files (kernel events and references) collected, you will need to post-process them. Specifically, the post-processing merger utility can both translate the trace into a form usable by a simulator. In doing so, it can associate each reference with its task and reconcile uses of shared space. It is capable of emitting many different trace formats; please see the merger's README file for more information.

Follow these steps to perform post-processing:

Download the merger utility.
Extract the source code:

tar -xjpf merger-v0.5.0.2.tar.bz2
Build the merger utility:

cd v0.5.0.2 make
Run the merger utility using the provided run-script. It will read the gzipped kernel event and reference traces, and it will produce gzipped output trace(s).

Specifically, this utility can produce four types of output traces:
- basic: A single trace that represents the reference stream by all processes and threads as scheduled during the original execution. Each record indicates whether the reference is a virtual memory read, a virtual memory write, a file system read, or a file system write. It also indicates the canonical page number -- numbers assigned so that references to shared pages always appear as references to the same page.
- basic-named: Like basic, but with additional information in each record that indicates whether the reference is to an anonymous or a file page, thus allowing simulators to more accurately account for compulsory fault costs.
- per-process: One reference trace file is produced for each process. Those processes with multiple threads will have their references interleaved in the same order as they occurred during the original execution.
- per-thread: One reference trace file is produced for each kernel-level thread. These files are grouped by process, with one directory per process.
Using these types, invoke the merger utility:

./run-merger basic v0.5 /big-partition/reference-trace /big-partition/kernel-trace /big-partition/basic-trace.gz

The resulting trace can then be read by the simulator of your choice. If you need a fundamentally different trace format, you can add an output type to the merger utility. Simply inherit from the Consumer class which is reponsible for reading the source traces and reconstructing the state of the processes at each moment. The existing examples show how to create your own output format with the information your simulator needs.

Scott F. Kaplan

Last modified: Wed Jul 13 15:27:39 EDT 2005