Next: 2 Design Considerations
Up: Design of an Address Tracing System
Previous: Contents
The purpose of this project is to design and build a system which
allows address traces to be recorded from real applications running on
an Intel 80x86 based PC. By address trace we mean the list of all
memory addresses requested by the CPU for memory reads (code and data)
and writes (data) as a program runs. This data can be used for testing
memory caching strategies and their effectiveness with applications
that reference memory in different ways.
We must consider problems of speed and storage space. The hardware must
be fast enough to read and latch valid addresses as they appear on the
system bus, and should be able to place the CPU in a ``stall'' state
while the addresses are stored to disk. The problem of storage space
is due to the fact that a PC makes millions of memory references per
second, and we want to store several bytes of address and other
information for each reference. Another problem which must be considered
is that to record all memory references made by the CPU we must disable
any on-chip cache, such as that of the i486.
The major goals of this project are summarized below. An assessment of
which goals were satisfied is presented in section 6.2.
- Simplicity: We will use commonly available components and for
control logic we will use programmable logic devices for ease of
modification. The performance benefit of custom ASICs is not worth the
design time, fabrication time, or cost tradeoffs, and would prohibit
others from duplicating the circuitry easily. Although dynamic RAM
would be cheaper, static RAM will be used because it does not require
refresh circuitry and it is faster.
- Modularity: The system will be modularized into a Bus Interface
Subsystem (Rocky) and a Recording Subsystem (Bullwinkle). To build a
system capable of tracing addresses on a different type of bus, such as
a NuBus or S-bus, only a new Interface Subsystem would have to be
built. In fact, Bullwinkle should be designed so that it can be used
for any type of high-speed digital data collection, not limited to
address traces. Once fully tested, Bullwinkle should be installed in a
fast i486 machine with a large capacity hard disk and network
connectivity.
The Bullwinkle card itself is modularized into four identical Buffer
Modules, each comprised of an FPGA (Boris) with closely coupled SRAM
interface (Natasha). Boris and Natasha can acquire, store, and extract
16 bits of data, so a total of 64 bits in width can be traced each cycle.
- Expandability: We will design logic which can be easily modified to
support larger buffer memory sizes or wider address bus traces. The
methods of selecting buffer sizes and timing should still be applicable
as technologies change. For example, if disk access rates improve
faster than bus speeds, a smaller buffer will give comparable time
dilation. Since the Natasha memory interface is modularized, increasing
storage capacity would involve redesigning just the Natasha module to
use larger SRAMs. Any multiple of 16 bits in width can be attained by
using the appropriate number of Boris and Natasha units, with a linear
increase in control logic complexity.
- Transparency: The application must be able to run just as it
would without the tracing hardware active. Ideally, the hardware would
be fast enough to read and store the address stream without a
significant slowdown of the program being executed. The possible
bottlenecks include latch delays from the bus to the interface card,
transmission delays from the interface card to the recording subsystem
hardware, and perhaps most significantly, hard disk write times on the
recording subsystem. In addition, having to disable the on-chip cache
of the i486 will inevitably slow down execution. The factor of
slowdown with respect to normal CPU operation is the time
dilation. Our goal is to achieve a time dilation of no more than 10%
above the physically-dictated throughput ratio equal to the address
reference generation rate () divided by the Memory-to-Disk
transfer rate ().
- Testability: Both Rocky and Bullwinkle
should have headers suitable for connection to a logic analyzer for
debugging purposes. Buffer usage statistics should be available to
determine the ideal usage of the on-board memory banks. For testing
address trace accuracy, a small program should be simulated by hand and
run as a test application.
A secondary goal is to give Bullwinkle a
hardware-based compression scheme that will reduce the amount of data
which must be written to the hard disk. Corresponding software can be
written to decompress the address trace at a later time, perhaps on a
machine with more on-line storage. The only reason this goal is
secondary is that the address trace system should be functional with or
without data compression. However, compression may be critical in
reducing the bottleneck of hard disk write times, resulting in a smaller
time dilation.
The hardware-based compression should be accomplished using
VHDL synthesized logic in the Boris FPGA modules because of speed of
implementation and ease of modification. The compression hardware
could even be reprogrammed through software via the PC bus, allowing
the user to choose from among several algorithms depending on the type
of data (address traces, network streams, etc.) being collected.
Next: 2 Design Considerations
Up: Design of an Address Tracing System
Previous: Contents
Scott E. Harrington
Sat Apr 29 18:56:25 EDT 1995