Laplace: Trace Reduction


Laplace can collected traces in their raw formats faster than common disks can store them or compression algorithms can process them. Therefore, it takes little execution time for the traces to become unmanageably large. We present here a few common trace reduction techniques that we have employed with Laplace and that are available as part of the package.


Lossless reduction

If we seek to reduce the storage requirements for a trace, then we may employ methods that allow a trace to be compressed and then decompressed without loss of any information. One such technique is the use of common, "general purpose" lossless data compression utilities such as gzip or bzip2. While these utilities can compress traces substantially, there are other, complementary techniques that are specific to reference traces that can aid in far greater compression.

Specifically, one can use difference encoding (as developed for the PDATS and Mache trace reducers) prior to the use of general purpose compressors. This approach is based on the simple observation that, for most of the numerical fields of each trace record, the value for one record is likely to be similar to the value for the previous record. Addresses (or page numbers) are likely to be similar to one another thanks to spatial locality. Timestamps advance by small amounts with each record. Virtual address space identifiers are most often identical for a large number of consecutive records. Therefore, we can reduce a trace by encoding those values as differences relative to the values in the previous record.

For example, if a raw reference trace began with the following records with the fields described below...

    r 4132ff93 4 1015a 80b1d32
    w 4132ff95 2 1015a 80b1000
    r 4132ff96 4 1015a 80b2152
      
  1. Reference type tag (read, write, instruction fetch)
  2. Timestamp
  3. Length (in bytes) of memory access
  4. Virtual address space identifier
  5. Virtual address

...then different encoding would transform them into...

    r 4132ff93 4 1015a + 80b1d32
    w        2 2     0 -     d32
    r        1 4     0 +    1152
      
  1. Reference type tag
  2. Difference from previous timestamp (always positive)
  3. Length in memory access (not difference encoded)
  4. Virtual address space identifier (or 0 for the same identifier as the last record)
  5. Sign of the difference in the virtual address from the last record
  6. Magnitude of the difference in the virtual address from the last record

Note that different fields can be encoded in different ways according to the regularities of the values in each field. Timestamps always advance, and so only the magnitude of the difference needs to be encoded. Lengths are small (8 bytes or less) and are often 1, 2, or 4 bytes, and so little it to be gained by difference encoding that field. Virtual address space identifiers simply don't change often, and when they do, there's no similarity between the previous and new values; therefore, we encode those in a new-value-or-no-change format. Finally virtual addresses are likely to be similar to one another, but in a positive or negative fashion, requiring that we encode both the sign and magnitude of the difference.

Using this technique exposes a semantic level of similarity between values in the raw reference trace that a general purpose compressor can exploit. Combining these technique reduces the storage requirements of traces substantially.

For a difference encoded trace to be processed, it must be decoded into its original form. Note that the simulator, using the decoded trace as input, will have to process the complete, original trace. The virtue of lossless reduction -- that the original trace is maintained -- is also its failing, as it is the number of trace records that a simulator must process that dictates the runtime of the simulation. Therefore, lossless reduction only reduces storage, and not runtime. To improve runtime, we must turn to lossy techniques.


Lossy reduction

We consider methods of trace reduction that selectively discard information from an original trace in order to achieve two goals:

  1. Reduction of simulation time: By reducing the total number of records in the trace, a lossy approach leaves less processing for a simulator to perform.
  2. Greater reduction in storage requirements: Lossless methods can substantially compress the representation of a trace. However, even tens of seconds of execution time will produce an overwhelming amount of trace data, even in when difference encoded and text-compressed. Lossy techniques allow for far greater levels of storage reduction, decreasing trace sizes by orders of magnitude.

The basic premise behind most lossy reduction methods is that a vast majority of references are to pages that will already be resident and in no danger of eviction for any sensible policy. That is, most references are to pages already clearly in any ``working set''. Processing all of those references is superfluous, as the each such referencing event does little or nothing to alter the contents of memory by forcing a significant event (a fetch or eviction of a page) to occur. Lossy reduction involves the removal of such ``useless'' memory references, leaving only those that would likely cause a significant event to occur.

Any such lossy method must rely on assumptions about the memory management policies that will be simulated. Given such assumptions, some lossy methods can provide certain guarantees about the amount of error introduced to the resultant simulation. No lossy reduction method should be used without an understanding of the assumption on which is relies and of the policies to be simulated. No lossy reduction method is applicable to all policies, and may impose different kinds of simulation error upon different policies.

We have experimented with SAD, a lossy trace reduction technique that assumes LRU or LRU-like page replacement. SAD takes a reduction memory size as a parameter. If r is that chosen memory size, then SAD guarantees that if the reduced trace is simulated using the LRU replacement policy, then the results for any memory of at least r pages will yield exact results. That is, the reduction memory size is chosen at reduction time as the smallest memory size likely to be simulated. Any LRU simulation using the reduced trace will contain no error (compared to the same simulation using the original, unreduced trace) so long as the simulated memory size is at least the reduction memory size.

We have found this guarantee to be useful for many common virtual memory experiments. Since real systems use approximations of LRU, and because SAD has been shown to introduce small error (less than 1 percent) into such simulation, SAD reduces the storage size and simulation time for such experients with minimal interference.

SAD and timestamps: SAD achieves its reduction by finding references that it can simply eliminate from the original trace. While that elimination is beneficial to storage and simulation time, it also implies that some desired information may be lost. In particular, Laplace traces contain timing information that we would like to maintain.

Therefore, we have modified SAD to maintain three different kinds of ``time'':

  1. Cycle time: The timestamps in the original trace, taken from the processor's cycle counter at tracing time, remain in those trace records that were not eliminated. Thus the gaps in cycle time between records may be much larger than in the original trace, and the knowledge of events that occurred during those gaps has been lost. However, the time at which the remaining records occurred is still available information.
  2. Reference time: It is possible to mark time by the number of references that have occurred. The original trace allows a simulator to track reference time simply by counting the number of references that it has processed. However, SAD eliminates references. Therefore, each record in a reduced trace contains an additional field that indicates a reference timestamp of that record. (In other words, it contains the number of that reference in the original, unreduced reference sequence.)
  3. Instruction time: It is also possible to mark time by the number of instructions that have been processed. Given the original trace, a simulator could track instruction time by counting the number of instruction fetches. As with reference time, the elimination of references can interfere with the tracking of instruction time. Therefore, SAD also introduces an additional field in each record of the reduced trace that provides an instruction timestamp.

Using these time metrics, it is possible to perform the scheduling of threads based on the information in reduced traces. A simulator that uses these reduced traces must take more care in tracking the use of time, as interrupts may occur at any point inbetween remaining references. However, even with a small memory reduction size of a few pages (16 to 32), an order of magnitude reduction in storage and simulation time can be achieved.

SAD-reduced trace format: Once a trace has been reduced with SAD, it takes on the following format:

    r 123456789abcdef0 1bd8ef123 2ca4563118 9f8e7 800a3
      
  1. Type tag: An 8-bit character indicating whether a read or write was performed.
  2. Cycle timestamp: The 64-bit, hexidecimal cycle time at which this reference occurred.
  3. Instruction timestamp: The 64-bit, hexidecimal instruction time at which this reference occurred.
  4. Reference timestamp: The 64-bit, hexidecimal reference time at which this reference occurred.
  5. Virtual address space ID: The 20-bit page number of the top-level page table for the address space used by this reference.
  6. Virtual page number: The 20-bit page number referenced.

Note that only page numbers are emitted, and not addresses. SAD implicitly performs blocking -- the collection of multiple, consecutive references to the same block (or page) into a single record. Also, since byte-level addresses are lost, these records do not indicate the number of bytes accessed by each reference. SAD is careful to detect cases when a single original reference may have crossed a page boundary, and thus treats it as references to two pages. (This handling of two-page reference is reasonable, as two virtual-to-physical translations would need to be performed in a real system.)


Combining methods

The lossy and lossless methods we've described here are complimentary. Specifically, the lossy methods should be applied first, as it requires the information in the original trace to perform its reduction. The lossless method can then be applied next, where the lossy reduction is designed specifically for SAD-reduced traces.

Given the description of the fields for each record in a SAD-reduced trace, given above, we can easily devise a difference encoding. The timestamp fields all advance from one record to the other, and so the magnitude of the difference is all that needs to be maintained. As described in the section on lossy encoding above, virtual address space identifiers tend to be identical for a long sequence of references and then change to a new value, thus allowing a new-value-or-no-change encoding. Virtual page numbers, like virtual addresses, can change in both positive and negative directions, and typically are of small magnitude, and so both the sign and magnitude of the change are kept.

Therefore, a trace that has been reduced first with SAD and then with difference encoding will take the following form:

    r 100 232 25e 0 + a3
      
  1. Type tag
  2. Cycle time difference
  3. Instruction time difference
  4. Reference time difference
  5. Virtual address space ID, where 0 implies no change
  6. Sign of the virtual page number difference
  7. Magnituude of the virtual page number difference

Use of general purpose text-compression utilities on this trace format make it possible to trace many minutes of execution. Note, however, that even with this combination of strategies, and using a 64 page reduction memory size, tracing of 5 minutes of execution would require tens of GB of storage. So many memory references occur during the course of normal processing that it is always beneficial to find new ways of reducing traces in both lossy and lossless forms.


Multi-programming, shared spaces, and reduction

The techniques described above have been applied to the original raw reference trace generated by Laplace. However, that original sequence reflects references performed by many threads composing multiple processes scheduled in one particular order by the kernel that controls the traced system. If the references performed by each thread are separated, just as the per-process merger does, then the use of trace reduction has new implications that we must consider.

First, we will observe that the lossless techniques pose no problems. Prior to separating the references on a per-thread basis, the difference encoding can be decoded, and then the new, per-thread traces can themselves be difference encoded. However, the situation is not so simple for SAD (and for other lossy reduction techniques.)

Multi-programming and SAD: Since references to the various pages from different virtual address spaces are mixed together in the original trace, then the reduction of that original trace applies to that mixed group of pages. Therefore, if we used a SAD-reduced trace using an r-page reduction memory to simulate those pages with the global LRU (gLRU) policy, then we would only need to ensure that the simulated memory size was at least r pages.

However, we may wish to simulate a policy that manages the pages for each virtual address space (that is, each process) separately. We can imagine that an LRU queue is maintained for each address space, and that some other policy arbitrates between these per-process queues. In this case, we can conservatively guarantee that the LRU management of pages for each process will be fully accurate so long as the allocation to each process is at least r pages. Put differently, the first r positions of the LRU queue for each address space may be out of order, but positions beyond that will be correct.

Shared spaces and SAD: Unfortunately, the above assertion is true only for completely isolated, unshared virtual address spaces. In real systems, multiple virtual pages, potentially in different virtual address spaces may map to the same underlying canonical page (e.g. Mapping of a file into two address spaces). These shared spaces break SAD's guarantees of complete accuracy.

Consider that SAD would eliminate some references based on the original sequence as observed in the raw reference trace. Then consider that the references may be divided on a per-thread basis by the per-process merger. Finally, a simulator may schedule the threads and their references in a different order. If two threads, taken from two processes, addressed a shared space through their respective virtual address spaces, then the simulated rescheduling may change the order of references to that single, shared space. Therefore, references that SAD had removed as "irrelevant" may become relevant again due to the re-ordering.

In this case, the reduction memory size r becomes only an inexact control for the error introduced into simulation. For such shared spaces, small values of r are less likely to introduce error, but cannot be guaranteed not to do so. Larger values of r increase the probability of error. No empirical study has been done to measure the relationship between r and the error introduced for simulation of shared spaces. The amount of error will depend not only on r, but also on the characteristics of the scheduler: The more often a different thread is scheduled, the more likely inaccuracy will be introduced.

We believe that, in practice, the amount of error introduced in this situation is likely to be small. However, use of smaller values of r is advised. Unfortunately, the smaller the reduction memory, the less substantial the benefits of SAD.


Scott F. Kaplan
Last modified: Thu Dec 5 17:22:17 CST 2002