D. Technical Approach, Rationale, and Constructive Plan

D.1 Technical Rationale

An integrated circuit is designed to meet certain functional specifications given timing, power and area constraints. The shrinking process geometries, coupled with the increased functionality of the integrated circuits have resulted in an increased complexity and level of integration of ICs since their introduction in 1961. The specified design constraints act as guidelines to optimize the integrated circuit design to meet the desired functionality. An example path distribution for an arbitrary circuit is shown in NO TAG. The figure shows the number of paths and their respective delays. If the target (specified) delay is Gt and the actual delay is Ga, then all the paths that are above the target delay are the slow signal paths, and their delays have to be reduced. The delay is not the only parameter that needs optimization -- there are other considerations as well, e.g., power and area. Ideally it is required that all the signal paths have the same delay, Gt, which implies that the fast signal paths be slowed down to save power by decreasing the width of the transistors of the gates on the these paths; however, the paths are slowed down only until there is no area or power penalty. The figure also shows the result of the circuit optimized for Area, Power and Delay" for which the delays of some of the fast paths have been increased to save power or area and there is no path with delay greater than Gt.

Fig 1. An arbitrary path distribution function

In the traditional approach, the constraint driven digital design for a given technology requires a large, fully characterized cell library presented in a catalog form, from which an automatic synthesizer or designer picks the cell that meets the specified criterion. It is highly unlikely that the designer or the synthesizer will always get the cell that meets all of the desired constraints. To increase the probability of finding the desired gate from the library, either the size of the library, in terms of number of gates will have to be increased, or some error in the constraints, e, will have to be tolerated, as shown in NO TAG.. The limits of the error will decrease with the increase in the size of the library. This could be acceptable, if the technology were not to change very rapidly.

Fig. 2. Existing block/chip design methodology for fixed design rules, functional blocks & device sizes.

Fig. 3. Proposed block/chip design methodology for variable design rules and device sizes with capabilities to realize new logic function with standard or non-standard logic style for CMOS/BiCMOS technologies.

New, evolving technologies and the extreme requirements in application results place considerable burden on our design methodologies. For a given technology, the application specific library with application specific CAD tools is not only costly but also time consuming. It is extremely important to be able to obtain maximum benefit from a newly emerged technology as soon as possible. NO TAG shows a typical design methodology that may be used to design integrated circuits with catalog based standard cell libraries and the associated data. NO TAG shows the enhancements to the existing design methodology that incorporates

  1. layout generation with parameterized device sizing along with parameterized design rules,
  2. macromodeling of the timing for parameterizable leaf cells,
  3. CAD tools for determining the appropriate device sizes (based on design constraints),
  4. interface utilities to the synthesis tools, and
  5. interface utilities to the CAD physical design tools.

Some high performance delay-sensitive design styles, e.g., wave pipelining, require a precise control of path delays. Delays incurred due to differing length routing traces such as found in standard cell layout or macro cell layout cannot be controlled via buffering to the high degree of precision needed by such applications. Instead, varying the sizes of transistors along paths once the routing has been completed can be used to match delays within a datapath. With the design methodology depicted in NO TAG, these high performance, delay-sensitive design styles can be implemented by using standard place and route schemes instead of the time consuming full custom layout design.

D.2 Technical Approach

The effort to attain the desired objective of global optimization has been divided into several integral tasks. The project has been divided into these tasks to incrementally develop and improve the design methodology to incrementally provide the technology to the community with the delivery of the fixed characterized libraries early in the project.

D.2.1 Layout Generator

The layout generator is a generic leafcell synthesizing tool which uses graph theory concepts to determine the placement of the transistors and routing between them. This activity is carried out as shown in NO TAG. A graph theory based heuristic algorithm has been developed that reads the schematic representation of the leafcell to determine the transistor placement that will provide maximum source/drain sharing and minimum routing tracks, thus minimizing the width and the height of the cell.

Fig. 4. Leafcell Layout Generation

The optimal transistor placement can either be defined by the user, or automatically by the use of heuristic algorithm. This mechanism will allow us to constantly refine the algorithm to improve the results. Various cell configurations from different libraries will be investigated and will serve as the basis of comparison between the algorithms and the hand crafted design. Existing routers, e.g., YACR, will be used, and the results obtained will be compared with the handcrafted designs.

D.2.2 Fixed Cell Libraries

D.2.2.1 General Purpose CMOS Libraries

To achieve maximum utilization of the aforementioned task, it is imperative that those cells be used which are most widely used in the community. The HP standard cell library and ITD cell library are currently the most widely used public domain libraries. These libraries will be first targeted, and the optimal transistor placement and routing information will be stored in a suitable programming language form, either by the use of the algorithm or through hand-crafting. It is to be noted that this step has to be carried out only once per cell. The solution obtained from this procedure can then be used to automatically generate the layout for any defined technology.

The technology files will be defined in terms of common variables. The variables will attain a new value for the different design rules defined by the technology. Thus the optimal solution obtained by the transistor place and route will be reproducible over a technology spectrum.

D.2.2.2 Datapath Cell Library

The use of datapath architectures with the various arithmetic units and other dataflow design styles is extremely important since it is key to obtaining high performance. The cells designed for a datapath design style are regular and are aimed at a particular design tool, which normally allows routing over the cells. Various templates have been developed and are used. The cell template with horizontal metal2 and vertical metal1 was used in the design of a self-timed router at MSU. This cell template allowed over the cell routing for a double layer metal technology. The over the cell routing was predominantly due to Vdd and GND signals in metal2, allowing the active and poly to grow underneath. But, employing metal3 to a cell with this philosophy does not help very much since the adjacent rows cannot be compacted and current CAD tools are generally incompatible. However, horizontal metal1, followed by vertical metal2 and horizontal metal3, allows the channel routing to be performed in metal3, which can be moved over the cell, allowing the adjacent rows to be compacted and then merged -- provided alternate rows are reflected. A layout tutorial for students which covers these concepts is attached to this proposal for reference.

This technique is not only suitable for the general logic design, but also for the datapath designs with all control signals being provided horizontally in metal3 and the data signals being provided vertically in metal2. The Vdd and GND signals are in metal1, and the intra cell routing is predominantly in metal1 and poly. To demonstrate that this approach, MSU will first capture the HYPER/Lager datapath library in this design style to produce characterized fixed standard cells, but supportive of parameterized design rules, as mentioned above. This will be followed by capturing the datapath cell library and the other CMOS libraries in our design methodology to support parameterized device sizes and design rules to produce optimal designs.

D.2.2.3 BiCMOS Libraries

The emerging BiCMOS technology allows the use of Bipolars and MOS transistors on the same chip. The Bipolars provide larger currents than the same size MOS transistors, thus providing high performance circuits with some increased power dissipation over purely CMOS designs. The technology files for the various CAD tools will be developed in a parameterized form to represent the superset of the various technologies. Variables will be defined for each design rule, through a spread sheet or manually. The parameterized technology files could then be compiled by using cpp (C pre-processor) to produce an instance of the technology file suitable for use by any subset technology. A rich predefined and characterized set of Bipolar transistors with a mechanism for choosing the correct transistor for a particular application will be provided, until good modelling procedures for Bipolar transistors become available. The necessary features required to support the use of predefined and characterized Bipolar transistors and definition of new transistors will be provided. The extension of the layout synthesis program will be investigated and demonstrated for the BiCMOS technologies.

D.2.2.4 Conventional Characterization

The MPL standard cell delay modeling and characterization methodologyNO TAG will be extended to include characterization of MSI functions, e.g., counters, adders, combinational logic, etc. To characterize a cell under a given environment", characteristic equations for the propagation delay and for the transition times for the various loading are required. To improve the modeling accuracy, the signal voltages are normalized by the power supply potential, and the timing is normalized by an output time constant. This output time constant is defined relative to the initial time rate of change of the output upon switching, and corresponds to the rail to rail transition time, if the output discharge current remained constant. The measured delays are obtained with HSpice using scalable signal generators (piece-wise linear) at the inputs which closely match that from fabricated CMOS. To circumvent the negative delay problem, the transition event has been defined as occurring at the signal change of 30%, e.g., at the 70% value for turn-on. The rise time transition is defined to be 2.5 times the interval from the 30% to the 70% points of a signal, and the fall time transition to be 2.5 times the interval from the 70% point to the 30% point of a signal.

Three different values of load capacitance are used in the simulations. Using a typical cell input capacitance, fan-outs of 1, 4, and 10 are selected. Ten different input transition times are used which vary up to 10 times the output time constant. These inputs are controlled by scaling the piecewise linear CMOS generators. For each input to output response four common measurements, turn-on, turn-off, rise time, and fall time, are measured over the 30 combinations of fan-outs and input transition times at a particular environmental" condition of power supply, temperature, and process models.

D.2.3 CAD Tool Interfaces

Ensuring that the cell libraries and generator methodology are compatible with the TimberWolfe/YACR, Mentor and Cadence toolsets is largely a mechanical problem. Up front planning will be done to identify common requirements of all three toolsets so that the libraries and generator methodology can be designed to meet these requirements. The output of the MSU generator tool will be a netlist of physical cells; the netlist format will be EDIF since that is a common format which can be read by all three toolsets.

D.2.4 Design Methodology

How to best make use of cells with parameterizable transistor sizes is an open question; however, there is no question that this capability will affect the entire design methodology. Current design styles emphasize a top down, modular, synthesis approach to design. Top down design requires that models for delay and power be available for building blocks so that tradeoffs can be made at a high level by the synthesis tool. Our approach will be to develop macromodels and characterization methods for cells with parameterizable transistors. These macromodels will abstract minimum and maximum bounds for delay and power based upon Vdd, temperature, transistor sizes, bus width (for datapath blocks) and output loading. The bounded delay model will be used by the synthesis tool to meet user constraints on path delays and power consumption. The synthesis tool will output a netlist which has specified delay and power targets attached to each block. (These delay and power targets must be within the minimum and maximum bounds specified by the macromodel for the block.) The layout generator will be responsible for meeting the specified delay and power targets via transistor sizing. In addition, we plan to add a post-routing transistor sizing capability which will be able to further fine-tune control path delays. The post-routing transistor sizing capability will be able to size transistors with a predictable perturbation of the routing geometry so that very precise control of path delays will be possible.

D.3 Constructive Plan

D.3.1 Layout Generator

The development of the automatic layout generator program will be done in three phases:

  1. Development of a prototype modular software model by using GENIE Language (Mentor Graphics) to capture the various aspects of the algorithm, given below:
  2. The modular development program will allow us to make various tradeoff decisions between the efficiency of the algorithm and the quality of the solution. The hand crafted cells will be used as the basis of comparison.
  3. During this phase the refined algorithm will be captured in a portable language like C or C++ that offers program portability and with the powerful data structures will provide better execution speed than an interpretive language like GENIE.

D.3.2 Fixed Cell Libraries

D.3.2.1 CMOS Cell Libraries

The vertex (circuit node) and edge (circuit transistor) weighting function or algorithm is used to set up the priority queues. The priority queue determines which vertex has to be examined to obtain a transistor chain(s), for a given input. The Best-First Search (BFS) has the highest execution speed, but may not provide an optimal solution. A breadth-first followed by a depth-first search for each vertex and each edge has the lowest execution speed, but provides a better solution. Various other schemes are also possible, and their use will be investigated to improve the quality of the solution for different cell configurations.

The MSU cell library is an extension of the ITD cell library and provides SSI functions, to the complexity of a full adder, incrementers, decrementers, etc., in various logic styles, e.g., dual/non-dual CMOS, CPL, etc. This library will be used to test and refine the various aspects of the algorithms. The solution produced from the algorithm will be captured and stored in a suitable programming language to enable reproduction of the solution for a different technology. The timing model for the parameterized cells will be developed to obtain the characterized data over a reasonable range of environments and transistor sizes. This data can be used to drive the transistor optimization tool to produce the structural view of the leafcell. The optimized structural view will be used to automatically generate the layout of the leafcell.

The above procedure will also be used for the HP standard cell library. The refined algorithm and the set of routines developed will be suitable for layout synthesis of cells over various spectrum of technologies for different implementations of the functional logic blocks.

D.3.2.2 Bi-CMOS Cell Libraries

A NPN bipolar library, modeled after Cypress, will be used to design and characterize the bipolar transistors in the technologies identified by MOSIS. This will serve as the basis of BiCMOS design. As the BiCMOS technology matures in the MOSIS environment, the mechanisms to define and characterize Bipolar transistors will be added to the design tools.

D.3.3 CAD Tool Interfaces

The Mentor and TimberWolfe/YACR tools are currently in use at MSU. The Cadence toolset will be acquired through their university program. The HYPER high-level synthesis system will be used to generate several complex datapaths in order to test the CAD tool interfaces.

D.3.4 Design Methodology

The design methodology will be developed in the following phases:

  1. Macromodels and a characterization methodology will be developed for parameterizable blocks. The macromodels will provide minimum and maximum bounds on delay and power. The macromodels will be parameterized via Vdd, temperature, transistor sizes, bus width (for datapath blocks) and output loading.
  2. The macromodels will be integrated into the HYPER synthesis system and used to generate designs which meet user specified delay and power constraints. The results will be compared with HYPER generated designs using conventional, fixed transistor-sized datapath components.
  3. A post-routing layout tool will be developed to further optimize path delays based upon known values for routing capacitance. These results will be compared against the results obtained in 2.