Laplace: Per-Process Merge Format

As trace collection occurs, the scheduling of tasks (which are equivalent to kernel threads for our purposes) is reflected in the original reference sequence recorded by Laplace. If we want to simulate a different scheduling of these tasks, then we need to separate the references of each task into a separate trace file. However, one process may comprise a number of tasks sharing a virtual address space, and so we want to group together the traces of such tasks. A per-process merge separates an original reference sequence in this fashion.

The trace format

For each process, a directory is created. Within that directory, a trace file is created for each task that composed the process. Each trace contains both the reference and kernel-level events contained in the original traces, as well as additional information reconstructed during the merging process. More specifically, each record from the kernel trace is emitted without modification, but each record from the raw record trace is amended to include the canonical page number that was touched. (For more on canonical pages, see the Basic Merge Format and Merging Background pages.)

Note that references performed by the kernel are separated into a single trace to represent "the kernel thread". Currently, there is not sufficient information in the kernel trace to correlate the processing of a specific system call with a sequence of references performed by the kernel. However, that information could be inserted into the kernel trace if the reference behavior of the kernel was of particular interest. Here, we assume that the referencing behavior of the kernel should be factored out of subsequent simulations.

Each trace produced by this type of merge takes the following form:

    r 123456789abcdef0 4 9a8b7 8888ffff3e456a21
    F 123456789abcdef7 100 77
    M 123456789abcdefe 100 1a2b3c4d 9000 8efa5567 35 5 1029384756afbecd foobar.so

The first field---the 8-bit type tag---indicates the kind of memory reference or kernel event reflected in the given record. See the raw trace format descriptions to see what tag values are possible.

Note that the format of the memory reference record is somewhat different that in the raw reference trace. It takes the following form:

Type: This value is a character that takes on one of the following six values to indicate not only what kind of memory reference occurred, but also whether the reference was performed on behalf of the kernel or of a user-level process.
1. 'r': User process read (load)
2. 'w': User process write (store)
3. 'i': User process instruction fetch
4. 'R': Kernel read (load)
5. 'W': Kernel write (store)
6. 'I': Kernel instruction fetch
Timestamp: A 64-bit, hexidecimal value of the processor's cycle counter at the time of the reference.
Length: A 4-bit, hexidecimal number of bytes read/written by this reference.
Virtual page number: A 20-bit, hexidecimal number of the page referenced.
Canonical page number: A 64-bit, hexidecimal number of the page referenced.

Note first that there is no virtual address space identifier. Since per-thread traces are grouped together on a per-process basis, it can simply be assumed that those threads share a virtual address space; the original addres space identifier is no longer relevant. Second, observe that only the page numbers are provided, and not the exact address, since we assume that these traces will be used for page-level memory management. Last, note that both the virtual and canonical page numbers are provided. The canonical page number can be used to identify the use of shared space, while the virtual page number can be used to track spatial locality if desired.

Scott F. Kaplan

Last modified: Wed Dec 4 11:35:37 CST 2002