I've worked with many of our clients on Behavioral Compiler designs, and have accrued a few helpful tips for users of Synopsys BC. Many engineers have found BC to be extremely effective in reducing design time (5x) when used appropriately. Use these tips to help save you time and stress.
Behavioral Compiler needs to have variables initialized properly
to provide good results. If you fail to initialize the entire variable
in one chunk, BC doesn't recognize the initialization as valid. Failure
to initialize results in a warning messages, may create simulation mismatches,
and may create unnecessary registers.
Warning: Variable 'MAIN/ERROR_CONDITON/x_loop_connect' is not initialized (HLS-155)The following is an example that might puzzle designers expecting more than the tool currently delivers:
reg [31:0] x; ... while (COND) begin :WHILE if (COND) begin x[31:8] = a; end //if else begin x[31:8] = b; end //else x[7:0] = 0; data_out <= x; @(posedge clock); end //whileSimply adding 'x = 0;' directly after the 'while' and before the 'if' saves a register and a warning message. BC doesn't recognize that x was fully initialized. Note that if BC sees a loop, it will put a register unless it can determine the old value is of no use. Initialization solves this. Alternately, you can assign a register to itself in all branches of a conditional (pain).
I recommend eliminating all BC warning messages. Only occasionally is
it acceptable to leave the warnings.
(return to the tip index)
Behavioral Compiler supports a graphical analysis tool called bc_view.
It's part of the BC package, but is not enabled by default. To turn on
bc_view, be sure to put the following in your scheduling scripts:
bc_enable_analysis_info = true /* must occur before analyze & elaborate */It makes the database slightly larger, but makes debug a lot easier. Bc_view also makes scheduling reports much more readable. Anyone who has used BC for a real-world design knows the schedule report can be quite unmanageable. With the latest BC release there are features to help isolate sections of the report table, and the cross highlighting between source code, scheduling table and state graph are getting much better. There is a manual with the on-line documents on how to use it.
Two minor related tips:
Behavioral Compiler supports the use of Verilog's 'disable'
to emulate VHDL's 'exit' and 'next'. This works if you
follow the examples closely; however, overuse can lead to problems. There
are five common mistakes.
MISTAKE 1. Disabling the wrong block
begin :LOOP forever begin :BODY @(posedge clock); if (COND1) disable BODY; // equivalent to VHDL 'NEXT' if (COND2) disable LOOP; // equivalent to VHDL 'EXIT' end endAll too often, engineers leave out the enclosing begin-end pair. It seems more natural; however, Verilog rules dicate that the inner begin block is not connected with the forever.
MISTAKE 2. Asserting outputs and disabling without an intervening clock
request_out <= 1; @(posedge clock); begin :LOOP forever begin if (acknowledge_in == 1'b1) begin request_out <= 0; disable LOOP; end @(posedge clock); end endVerilog-XL simulation will reveal that request_out never gets set to zero. This behavior differs from VCS which will provide the expected zero. The reason for Verilog-XL is that all events scheduled within a block that gets disabled are removed from the event queue. In this case, 0 was scheduled to be placed on request_out, but the disable cancelled it. Verilog semantics consider this behavior as unspecified and hence both simulators are within legal bounds. Insertion of @(posedge clock) before the disable will fix this problem both from a simulation and synthesis point of view.
MISTAKE 3. Attempting to disable more than one level of hierarchy
Verilog allows disabling any block from a simulation point of view. Unfortunately, BC does not support exiting more than a single level of loop hierarchy. This should be addressed in a future BC version (time unspecified).
MISTAKE 4. Disabling a labeled block not associated with a loop
Verilog semantics allow for disabling many things. Intuitively, disabling a block is nice as an error escape mechanism. In code:
begin :CODE_SEQUENCE ... if (ERROR_CONDITION) begin error_flag = 1; disable CODE_SEQUENCE; end ... if (ERROR_CONDITION) begin error_flag = 1; disable CODE_SEQUENCE; end ... if (ERROR_CONDITION) begin error_flag = 1; disable CODE_SEQUENCE; end ... if (ERROR_CONDITION) begin error_flag = 1; disable CODE_SEQUENCE; end ... end //CODE_SEQUENCEUnfortunately, Synopsys does not support this at the present time. For some designs this appears to work occasionally (can you say "feature" with a sly grin).
A work-around involves setting a bogus variable to true and using a loop that ends with a test that unconditionally exits. Unfortunately, there is a drawback as discussed in Mistake #5 below.
MISTAKE 5. Too many disables leads to long schedule times
Synopsys has a complexity problem if the number of states and transitions
gets too large. Because BC looks for the best places to place operations,
when the number of states and transitions gets large, the search space
can get large
exponentially. This leads to slow scheduling by the tool.
This is related to Synopsys' recommendation that the number of operations
be kept under 150 (artificial number) viewed from another angle. If there
are a large number of operations OR transitions, then there are
a large number of combinations to consider. The number of considerations
directly impacts the tools performance.
(return to the tip index)
Behavioral Compiler places specific constraints on all I/O. All
I/O are referenced using VHDL terminology 'signals'; whereas, internal
temporaries and registers are referred to as variables. Because all I/O
is scheduled, it is important to quickly see all I/O in your source code.
Outputs are handled by using non-blocking assignments which helps. Inputs
are handled merely by their appearance. Any net that crosses the process
(always block) boundary is designated a 'signal'.
I recommend BC designers use a combination of a naming convention using suffixes, and explicitly place inputs into temporary variables. The naming convention is merely to append either and _O (oh) for output signals and an _I for input signals.
Thus:
data_O <= value; @(posedge clock); ack = ack_I; while (ack == 0) begin ack = ack_I; @(posedge clock); endThe convention makes it easier for yourself and others to see the I/O and be alert to BC restrictions. (return to the tip index)
Behavioral Compiler code is sometimes difficult to debug if problems
are found at the gate level. This happens many times for teams using emulation
as part of their methodology. Part of the reason for this is due to the
level of abstraction. BC designs an efficient FSM that can be hard to follow.
I recomend adding a debug_State output port (registered by default)
and assigning it unique values throughout the code. Ideally, this register
should be conditionally compiled in (use of m4 preprocessor is recommended).
This allows for the register to be used for emulation or early prototypes,
and removed on subsequent "clean" compilations.
// Verilog snippets... module Whopper (... , debug_State_o ); ... output [7:0] debug_State_o; reg [7:0] debug_State; // COMMENT OUT FOR FINAL GATES ... debug_State_o <= 0; // COMMENT OUT FOR FINAL GATES @(posedge clock); ... if (CONDITION) begin ... debug_State_o <= 1; // COMMENT OUT FOR FINAL GATES @(posedge clock); end else begin ... debug_State_o <= 2; // COMMENT OUT FOR FINAL GATES @(posedge clock); end ...(return to the tip index)
The latest release of Behavioral Compiler announced some really
cool features. Not the least of these was the promise of using tasks as
a level of hierarchy. What the announcement failed to point out was the
uselessness of the current implementation. Yes, you can use tasks; however,
you may not use signals (I/O) within the tasks. That is unfortunate because
a natural application of tasks would be to create I/O handlers (e.g Read_PCI,
Write_PCI, Send_Packet, etc.).
The reason tasks don't handle I/O is two fold. First, Verilog specifies that task inputs are read statically at time of invocation and outputs are computed statically when the task finishes (returns). So if you write:
task Read; input [15:0] addr; output [7:0] data; output [15:0] addr_port_o; input [7:0] data_port_i; ... endtask ... Read(fifo,result,real_addr_o,real_data_i);The signal real_addr_o will be written at the END of the task, and the input data will be read at the BEGINNING of the task invocation! A natural response would be to try directly accessing the I/O ports; however, BC will provide an error message stating they don't support side effects. The good news is that Synopsys is working to remedy this in a future version of BC (time unspecified as usual). (return to the tip index)
This topic is a difficult one to approach, but there are some key
ideas.
What follows is a technique to simplify application of cycle constraints
on fast handshake loops. This is necessary as part of defensive coding
to guard against possible future changes that could break proper scheduling
of fast handshake loops.
STEP 1: In the source code, label all fast handshake loops with
a unique and consistent convention. For example, FASTLOOP_ That's all there is to it. Similar technique may be applied to other
situations. The advantage of using naming conventions for this should be
obvious: No scripting commands in the HDL source code.
Example of problem:
Example of the solution:
EXAMPLE VERILOG SOURCE CODE:
read_req_o <= 1'b1;
@(posedge clock);
begin: FASTLOOP_READ_BUS forever begin
data = data_i;
if (bus_grant_i == 1'b1) begin
read_req_o <= 1'b0;
@(posedge clock);
disable FASTLOOP_READ_BUS
end
@(posedge clock);
end end
STEP 2: Add Synopsys dc_shell script find() commands immediately
after elaboration that locate all your fast handshake loops. Follow this
with
set_cycles constraints to keep the loop lengths to 1. The
following assumes "FASTLOOP_" was used as part of the labeling
scheme described in step 1.
/* Immediately following elaboration */
$LOOP_LIST = find("cell","*FASTLOOP_*",-hierarchy)
foreach ($LOOP,$LOOP_LIST) {
set_cycles 1 -from_begin $LOOP -to_end $LOOP
}/*endforeach*/
This constrains the body of the loops to be one cycle long. Since the initial
output occurs on a clock boundary just prior to entering the loop, there
can be no more than a single cycle from activiating the output to entering
the loop. Similarly, the exit condition will only have a single cycle.
If additional code is added that implies more than a single clock cycle,
BC will indicate a scheduling failure.
(return to the tip index)
The following is a list of common traps encountered by folks using
BC. Hopefully this list may be of use to future designs. Some are technical
and others are psychological. These are presented in no particular order.
The list does not represent frequency of encounter except to note those
more frequently encountered in my experience are marked with an asterisk
(*).
01. Disabling the wrong named block *
02. Mixing blocked and non-blocking assignments *
03. Leaving out clocks in one edge of a case/if branch (usually else) *
04. Attempting to disable more than one level of hierarchy *
05. Getting lost with indentation of begin/end blocks *
06. Adding clocks unnecessarily
07. Casual mixing of RTL/BC code
08. Omitting simulation of behavior before scheduling
09. Too many operations as a result of unrolled for loops
10. Missing DesignWare-Foundation libraries
11. Overlooking bc_time_design reports of multicycle candidates
12. Missing clock edges on entering/leaving loops
13. Failure to adhere to signal naming conventions
14. Not reviewing clock edge usage
15. Cognitive dissonance
16. Wrong version of Synopsys tools
17. Ignoring warnings/error messages from bc_check and/or schedule
18. Overlooking clock edges in m4 macros
19. Over zealous desire to save clock cycles (+)
20. Over zealous desire to save registers and gates (+)
21. Not labeling begin/end blocks rigorously
22. Overlooking clocks because coding style hides them
23. Mistakes due to lack of sleep from overworking
24. Not running small experiments on new coding styles
25. Using styles discouraged or not supported officially
26. Back-to-back loops add a clock in superstate causing mismatches
(return to the tip index)
Sometimes folks like to code back-to-back loops without an intervening
clock edge. Unfortunately, BC doesn't support this and in superstate_fixed
scheduling mode will add the clock for you. If your code expects this to
be a tight interface, the extra clock will cause simulation mismatches.
// Get two bytes from the interface
request_o <= 1;
@(posedge clock);
begin :LOOP1 forever begin
data1 = data_i;
if (ack_i == 1) begin
@(posedge clock);
disable LOOP1;
end
@(posedge clock);
end //LOOP1
// BC will require a clock here
// Superstate will silently insert a clock
begin :LOOP2 forever begin
data2 = data_i;
if (ack_i == 1) begin
request_o <= 0;
@(posedge clock);
disable LOOP2;
end
@(posedge clock);
end //LOOP2
The solution described by Synopsys is to wrap the two loops into one. For
this example, the solution is straightforward. The BC style guide indicates
an alternative.
// Get two bytes from the interface
count = 2;
request_o <= 1;
@(posedge clock);
begin :LOOP forever begin
data1 = data_i;
count = count - 1'b1;
if (ack_i == 1 && count == 0) begin
request_o <= 0;
@(posedge clock);
disable LOOP;
end
@(posedge clock);
end //LOOP
(return to the tip index)
Synopsys manpages are available and may be used at the UNIX command
line quite easily. The following script will do the trick:
#!/bin/csh
#
# @(#)$Info: dcman script to display Synopsys manpages. $
#
# NOTE: This requires that $SYNOPSYS point to the Synopsys
# root directory. In dc_shell use 'list synopsys_root'
# to determine the correct setting if you don't know
# this already.
#
setenv MANPATH "$SYNOPSYS/doc/syn/man"
man $*
exit 0
Example of get manpage for set_structure command
% dcman set_structure
Example of get manpage for error (LINT-47)
% dcman LINT/LINT-47
(return to the tip index)
-------
Brought to you by the Qualis
Library
http://www.qualis.com/library/ |