Search This Blog

Nov 11, 2012

GoldMine: Automatic Assertion Generation Using Data Mining and Static Analysis


GoldMine: Automatic Assertion Generation Using Data Mining and Static Analysis
Shobha Vasudevan, David Sheridan, Sanjay Patel, David Tcheng, Bill Tuohy, and Daniel Johnson. 2010. GoldMine: automatic assertion generation using data mining and static analysis. InProceedings of the Conference on Design, Automation and Test in Europe (DATE '10). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 626-629.
Summary
In this paper, the authors introduce a novel automatic assertion-generation methodology for RTL designs in which they strive to leverage the use of data mining techniques and static analysis to create insightful assertions. In comparison to traditional, handcrafting assertion generation techniques, in which engineers first manually deduce the assertions and then simulate the design, this approach runs a great amount of random simulation first in order to acquire enough simulation trace of the design. The data, along with/without the constraints analyzed by the static analyzer, is then fed into the data miner, in which the supervised, decision tree based learning algorithm is employed, to construct the propositional and temporal understanding of the design. Because of the incomplete nature of these steps, spurious assertions can be generated. Thus, the authors use formal verification tools the check the validity of the collected knowledge, and feedback the counterexamples to the miner as learnt constraints. Finally, human judgments which rank the assertions into four levels are also fed back to the miner for it to refinement the result accordingly. Besides, they demonstrate the effects brought by the evolutions of the underlying algorithm. The combined results seem to be very promising, even in comparison to commercial directed-random-simulation tools.

Comments
The idea seems simple but effective. Because of the logical nature of the designs, there exist patterns in signal interactions. Using data mining technique can not only explore the intended local implications, but also capture the unintended global behavioral correlations, which are hard to find in modularized designs. According to their experimental results, in a few cases, the assertions can astonishingly capture temporal relationships that are more than 6 logic-level deep in the design.
However, several basic assumptions and demonstrations, in my opinion, look either trivial or problematic. Let’s talk about the trivial parts first (mainly in the Case Study section):
(1)   The need for random generation is trivial. Since the authors employs data mining techniques, a great amount of data is undoubtedly necessary, in order to train the miner. This step, however, is basic for a data mining process, and need not to be advertised in the paper.
(2)   The double sampling technique looks useful at the first glance. However, the authors do not provide any example to demonstrate the quality of the extra assertions found by this technique. In this case, we will think that it is natural to introduce a lot of trivial temporal assertions like “if a_wire is 1 @ (negedge clock), a_reg will also be 1 @ (next posedge clock)”
(3)   The static analyzer doesn’t look essential. The two decision trees in the demonstrative example are not much different. The only difference locates at the hit rate; however, this difference also looks problematic and will be discussed again later.
Let’s go to the problematic parts. Both the basic assumption of this approach and the experimental results do not seem very much correct to me:
(1)   I think that the basic idea that generating the assertions based on the designs behavior is incorrect. If we want to generate tests to validate a DUV (noting that the correctness of the DUV’s behaviors is still unknown), how can we generate assertions based on the DUV’s behavior?
(2)   How do they apply the formal verification on explored assertions? Do they have a golden spec of the design?
(3)   In the demonstrative example, why would the assertion A2 fail to pass the formal verification? The signal int.L1_hit looks equivalent to int.has_dreg in Fig. 2.
(4)   In Case Study, why the number of assertions grows rapidly after the authors add the static analysis to the algorithm? It seems like that the solution space should become smaller because the static analysis results serve as constraints to the space.
There are several other shortcomings of this paper:
(1)   The editing is not done carefully. While the contents of the paper is easy to understand, there exist several typos and thus make the paper less convincing.
(2)   This paper is only 4-page long
Unfortunately, due to the space limit, the authors do not elaborate these parts.

Discussion Points
(1)   The double sampling technique looks useful at the first glance. However, the authors do not provide any example to demonstrate the quality of the extra assertions found by this technique. In this case, we will think that it is natural for this technique to introduce a lot of trivial temporal assertions like “if a_wire is 1 @ (negedge clock), a_reg will also be 1 @ (next posedge clock).” What kind of useful assertions can be found only be double sampling?
(2)   If we want to generate tests to validate a DUV (noting that the correctness of the DUV’s behaviors is still unknown), how can we generate assertions based on the DUV’s behavior?
(3)   In Case Study, why the number of assertions grows rapidly after the authors add the static analysis to the algorithm? It seems like that the solution space should become smaller because the static analysis results serve as constraints to the space.
(4)   Data mining is very time consuming. How well can this approach scale?

No comments:

Post a Comment