Search This Blog

Nov 11, 2012

TAB-BackSpace: Unlimited-Length Trace Buffers with Zero Additional On-Chip Overhead


TAB-BackSpace: Unlimited-Length Trace Buffers with Zero Additional On-Chip Overhead
Flavio M. de Paula, Amir Nahir, Ziv Nevo, Avigail Orni, and Alan J. Hu. 2011. TAB-BackSpace: unlimited-length trace buffers with zero additional on-chip overhead. In Proceedings of the 48th Design Automation Conference (DAC '11). ACM, New York, NY, USA, 411-416. 

Summary
BackSpace is a promising formal-based root-cause backtracing technique. It uses pre-image computation to realize the unlimited-length trace buffer. However, because it manipulates state bits in the concrete space, it is extremely time consuming. Abstract BackSpace was then proposed to reduce the runtime requirement, by focusing only on “visible” state bits, without producing any false negative. While this seems produces a large improvement, it is still needs formal engine to compute the pre-images; besides, this abstraction introduces false positives brought by (1) spurious transitions in the trace and (2) false matches (breakpointing at the wrong time, occurs because of an abstract state is a combination of many concrete states).
Under such a circumstance, TAB-BackSpace is proposed to not only reduce the runtime but also minimize the risk of spurious traces. Similar to BackSpace, this technique needs to rerun the test many times to extend the length of the back trace. It also needs the configurable breakpoint mechanism to serve as the interconnection between runs. However, the improvements are (1) TAB-BackSpace can trace back many cycle of history at once, while BackSpace can only compute one step of pre-image, (2) TAB-BackSpace depends entirely on the contents recorded in the trace buffer, which is the actual history of the chip execution, and doesn’t need to spend time on the monstrous pre-image computation, and (3) because TAB-BackSpace tries to find trace overlap by directly matching the trace buffer without pre-image computation, one source of spurious trace is totally eliminated. The other source of spurious trace, brought by false matches, is also suppressed because it matches multiple cycles of history and makes the probability of false match become very small. According to the experimental result, this approach is able to reveal the root cause of a real bug in IBM POWER 7 within 10 iterations


Comments:
In this paper, the authors are inspired by BackSpace, a technique achieving unlimited-length backtracing by rerunning the tests and formally computing the pre-image of the trace, and propose a more efficient backtracing technique: TAB-BackSpace. One of TAB-BackSpace’s advantages over BackSpace is that it doesn’t need the time-consuming formal techniques to compute pre-images. The other advantage is that it needs only basic debugging hardware components, trace buffers and breakpoint-related logic, which are essential for a lot of state-of-the-art post-silicon validation techniques, and doesn’t cause extra on-chip area overhead.
While I consider this approach a very effective improvement over BackSpace, I still have the following concerns:
(1)     This approach cannot guarantee the traces they found are short enough. Shortcuts can exist from the root cause to the breakpoint or the crash state; however, because of its simulation nature, the search in each step is incomplete and highly depends on the test patterns or test programs. Thus, it is possible for TAB-BackSpace to detour and cannot backtrace to the root cause before reaching the limitation of runtime or iteration.
(2)     Although the authors claim that they can handle non-determinism/randomness to some extent, they still cannot cope with electrical errors because most of these errors can be irreproducible.

Discussion
1.       In the experiment, the authors demonstrate that TAB-BackSpace can trace back to the root cause of the bug by backtracing about 2000 cycles within 10 iterations. Is this distance (from the root cause to the crash state) a normal case? Can the distance of some bugs be more than several millions of cycles long?
2.       Is this technique practical for NoC or multicore processor validation? Although the authors claim that they can handle the non-determinism/randomness, it is possible that TAB-BackSpace has to run a lot of extra iterations, or even needs luck to bring it to the same partial trace.
3.       The claim “zero additional on-chip overhead” looks like a marketing slogan. This paper blames BackSpace for using too much area on signature storage and breakpoint circuit. However, TAB-BackSpace technique also needs the area for the trace buffer and breakpoint circuit. Does it provide any improvement on area?

GoldMine: Automatic Assertion Generation Using Data Mining and Static Analysis


GoldMine: Automatic Assertion Generation Using Data Mining and Static Analysis
Shobha Vasudevan, David Sheridan, Sanjay Patel, David Tcheng, Bill Tuohy, and Daniel Johnson. 2010. GoldMine: automatic assertion generation using data mining and static analysis. InProceedings of the Conference on Design, Automation and Test in Europe (DATE '10). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 626-629.
Summary
In this paper, the authors introduce a novel automatic assertion-generation methodology for RTL designs in which they strive to leverage the use of data mining techniques and static analysis to create insightful assertions. In comparison to traditional, handcrafting assertion generation techniques, in which engineers first manually deduce the assertions and then simulate the design, this approach runs a great amount of random simulation first in order to acquire enough simulation trace of the design. The data, along with/without the constraints analyzed by the static analyzer, is then fed into the data miner, in which the supervised, decision tree based learning algorithm is employed, to construct the propositional and temporal understanding of the design. Because of the incomplete nature of these steps, spurious assertions can be generated. Thus, the authors use formal verification tools the check the validity of the collected knowledge, and feedback the counterexamples to the miner as learnt constraints. Finally, human judgments which rank the assertions into four levels are also fed back to the miner for it to refinement the result accordingly. Besides, they demonstrate the effects brought by the evolutions of the underlying algorithm. The combined results seem to be very promising, even in comparison to commercial directed-random-simulation tools.

Comments
The idea seems simple but effective. Because of the logical nature of the designs, there exist patterns in signal interactions. Using data mining technique can not only explore the intended local implications, but also capture the unintended global behavioral correlations, which are hard to find in modularized designs. According to their experimental results, in a few cases, the assertions can astonishingly capture temporal relationships that are more than 6 logic-level deep in the design.
However, several basic assumptions and demonstrations, in my opinion, look either trivial or problematic. Let’s talk about the trivial parts first (mainly in the Case Study section):
(1)   The need for random generation is trivial. Since the authors employs data mining techniques, a great amount of data is undoubtedly necessary, in order to train the miner. This step, however, is basic for a data mining process, and need not to be advertised in the paper.
(2)   The double sampling technique looks useful at the first glance. However, the authors do not provide any example to demonstrate the quality of the extra assertions found by this technique. In this case, we will think that it is natural to introduce a lot of trivial temporal assertions like “if a_wire is 1 @ (negedge clock), a_reg will also be 1 @ (next posedge clock)”
(3)   The static analyzer doesn’t look essential. The two decision trees in the demonstrative example are not much different. The only difference locates at the hit rate; however, this difference also looks problematic and will be discussed again later.
Let’s go to the problematic parts. Both the basic assumption of this approach and the experimental results do not seem very much correct to me:
(1)   I think that the basic idea that generating the assertions based on the designs behavior is incorrect. If we want to generate tests to validate a DUV (noting that the correctness of the DUV’s behaviors is still unknown), how can we generate assertions based on the DUV’s behavior?
(2)   How do they apply the formal verification on explored assertions? Do they have a golden spec of the design?
(3)   In the demonstrative example, why would the assertion A2 fail to pass the formal verification? The signal int.L1_hit looks equivalent to int.has_dreg in Fig. 2.
(4)   In Case Study, why the number of assertions grows rapidly after the authors add the static analysis to the algorithm? It seems like that the solution space should become smaller because the static analysis results serve as constraints to the space.
There are several other shortcomings of this paper:
(1)   The editing is not done carefully. While the contents of the paper is easy to understand, there exist several typos and thus make the paper less convincing.
(2)   This paper is only 4-page long
Unfortunately, due to the space limit, the authors do not elaborate these parts.

Discussion Points
(1)   The double sampling technique looks useful at the first glance. However, the authors do not provide any example to demonstrate the quality of the extra assertions found by this technique. In this case, we will think that it is natural for this technique to introduce a lot of trivial temporal assertions like “if a_wire is 1 @ (negedge clock), a_reg will also be 1 @ (next posedge clock).” What kind of useful assertions can be found only be double sampling?
(2)   If we want to generate tests to validate a DUV (noting that the correctness of the DUV’s behaviors is still unknown), how can we generate assertions based on the DUV’s behavior?
(3)   In Case Study, why the number of assertions grows rapidly after the authors add the static analysis to the algorithm? It seems like that the solution space should become smaller because the static analysis results serve as constraints to the space.
(4)   Data mining is very time consuming. How well can this approach scale?

Nov 8, 2012

Accelerating Microprocessor Silicon Validation Exposing ISA diversity


Accelerating Microprocessor Silicon Validation Exposing ISA diversity
Nikos Foutris, Dimitris Gizopoulos, Mihalis Psarakis, Xavier Vera, and Antonio Gonzalez. 2011. Accelerating microprocessor silicon validation by exposing ISA diversity. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11). ACM, New York, NY, USA, 386-397
SUMMARY
Self-checking-based methods are flourishing in recent development in post-silicon validation, and this paper provides an insightful idea on improving the self-checking mechanism with ISA diversity. It aims at, and succeeds in, 3 major goals: (1) Self checking consistency without the need for golden response, (2) digesting the validation data to provide more refined and useful information, and (3) reducing the effect of blocking bugs. The utilization of the RIT-ERIT methodology to find out the inconsistencies between two executions enables them to point out the possible buggy location on the logic paths. One of the biggest advantages of the proposed methodology, compared to others, is its novel idea about bypassing the failing instructions. This is done by replaying the segment with its equivalent class, and thus can prevent the corrupted data from propagating (that is, bypass the blocking bugs) and can detect much more bugs in a single run. Besides, its post-processing enables only some succinct data to be passed to the engineers, and thus reduce the manual efforts needed.

COMMENTS
The idea in this paper is very novel and useful. As described above, it greatly reduces the redundant information and can detect more bugs in one test by using replaying to resume its normal operation. However, there are several drawbacks exist in this method:
(1) The first step, generating the ISA diversity database, needs through understanding about the underlying architecture, which obviously requires a large amount of human efforts and can be both error-prone and time consuming.
(2) The assumption that the equivalent class of an instruction is bug-free is very strong. Since the equivalent instruction segment may be executed on some same logic path as its counterpart is, the bug existing in the shared parts of circuit could prevent its being detected.
(3) As IFRA does, this method also suffers from a large amount of candidate signals. Especially, since this method identifies only the failing “segments of instructions” which are divided by the store instruction, it could be possible that such segments actually contain most types of instruction and thus leave the search space inreducible.
As for the paper itself, it doesn’t provide a comparison on the “bugs-detected-per-test”, which is claimed as an important improvement in this paper, with other self-checking-based methods. It also doesn’t provide a detailed description about its hardware overhead. Last but not least, the claimed result of 100% bug detection looks unconvincing because bugs could exist in the instructions that have no equivalence class. While this paper declares that their result is superior to Reversi’s because the latter may suffer from portion of instructions that lack of reverse counterpart, it should also notice that the same situation could also happen to their proposed methods.

DISCUSSION POINTS
1.     Recognizing equivalent classes of instruction sequences needs through knowledge about the architecture and a lot of human efforts. However, this process is time consuming and incomplete, even error-prone. Would it be possible to automatize this step by limiting the length of candidate sequence to a small number and then apply formal methods to find all the possible sequences?
2.     There might be a situation that both an instruction and its equivalence have a bug in their logic path, respectively. How possible is it? How to cope with such situation?
3.     Bug could exist on the logic path of the instructions which have no equivalent class. According to Figure 1, there are at least 8% and up to 20% of chances that such kind of instruction can be executed. How to detect bugs in such scenario?
4.    While attempting to replay the failing segment of the program binary, how is the previous processor state restored?




NOTES
  • The experimental result shows the snapshot when the proposed method reaches 100% bug coverage.
  • Area overhead: Store/Estore addr buffer-> can be implemented using L1 Cache
  • Performance Degradation: can be shut down in final tape-out