TAB-BackSpace:
Unlimited-Length Trace Buffers with Zero Additional On-Chip Overhead
Flavio M. de Paula, Amir Nahir, Ziv Nevo, Avigail Orni, and Alan J. Hu. 2011. TAB-BackSpace: unlimited-length trace buffers with zero additional on-chip overhead. In Proceedings of the 48th Design Automation Conference (DAC '11). ACM, New York, NY, USA, 411-416.
Summary
BackSpace is a promising formal-based root-cause backtracing technique.
It uses pre-image computation to realize the unlimited-length trace buffer. However,
because it manipulates state bits in the concrete space, it is extremely time
consuming. Abstract BackSpace was then proposed to reduce the runtime requirement,
by focusing only on “visible” state bits, without producing any false negative.
While this seems produces a large improvement, it is still needs formal engine
to compute the pre-images; besides, this abstraction introduces false positives
brought by (1) spurious transitions in the trace and (2) false matches
(breakpointing at the wrong time, occurs because of an abstract state is a
combination of many concrete states).
Under such a circumstance, TAB-BackSpace is proposed to not only reduce
the runtime but also minimize the risk of spurious traces. Similar to
BackSpace, this technique needs to rerun the test many times to extend the
length of the back trace. It also needs the configurable breakpoint mechanism to
serve as the interconnection between runs. However, the improvements are (1) TAB-BackSpace
can trace back many cycle of history at once, while BackSpace can only compute
one step of pre-image, (2) TAB-BackSpace depends entirely on the contents
recorded in the trace buffer, which is the actual history of the chip
execution, and doesn’t need to spend time on the monstrous pre-image
computation, and (3) because TAB-BackSpace tries to find trace overlap by
directly matching the trace buffer without pre-image computation, one source of
spurious trace is totally eliminated. The other source of spurious trace,
brought by false matches, is also suppressed because it matches multiple cycles
of history and makes the probability of false match become very small. According
to the experimental result, this approach is able to reveal the root cause of a
real bug in IBM POWER 7 within 10 iterations
Comments:
In this paper, the authors are inspired by BackSpace, a technique achieving
unlimited-length backtracing by rerunning the tests and formally computing the
pre-image of the trace, and propose a more efficient backtracing technique:
TAB-BackSpace. One of TAB-BackSpace’s advantages over BackSpace is that it doesn’t
need the time-consuming formal techniques to compute pre-images. The other
advantage is that it needs only basic debugging hardware components, trace
buffers and breakpoint-related logic, which are essential for a lot of state-of-the-art
post-silicon validation techniques, and doesn’t cause extra on-chip area overhead.
While I consider this approach a very effective improvement over
BackSpace, I still have the following concerns:
(1)
This approach cannot guarantee the traces they found
are short enough. Shortcuts can exist from the root cause to the breakpoint or
the crash state; however, because of its simulation nature, the search in each
step is incomplete and highly depends on the test patterns or test programs. Thus,
it is possible for TAB-BackSpace to detour and cannot backtrace to the root
cause before reaching the limitation of runtime or iteration.
(2)
Although the authors claim that they can handle
non-determinism/randomness to some extent, they still cannot cope with
electrical errors because most of these errors can be irreproducible.
Discussion
1.
In the experiment, the authors demonstrate that
TAB-BackSpace can trace back to the root cause of the bug by backtracing about
2000 cycles within 10 iterations. Is this distance (from the root cause to the
crash state) a normal case? Can the distance of some bugs be more than several
millions of cycles long?
2.
Is this technique practical for NoC or multicore
processor validation? Although the authors claim that they can handle the non-determinism/randomness,
it is possible that TAB-BackSpace has to run a lot of extra iterations, or even
needs luck to bring it to the same partial trace.
3.
The claim “zero additional on-chip overhead” looks
like a marketing slogan. This paper blames BackSpace for using too much area on
signature storage and breakpoint circuit. However, TAB-BackSpace technique also
needs the area for the trace buffer and breakpoint circuit. Does it provide any
improvement on area?
No comments:
Post a Comment