A JK Flip-Flop is designed as an example to illustrate an algorithmic approach to compact
standard cell designs which supports both two and three level metal silicon gate CMOS.
After reviewing a philosophy appropriate for utilizing an optional third level of metal in silicon gate CMOS which facilitates placing the global routing on top of active circuitry, appropriate design steps are presented as guidelines. Placement of devices are achieved to maximize the sharing of junctions by appropriate grouping and the sharing of supply connections after first considering routing congestion throughout the cell. Straight forward routing decisions facilitate the completion of the design.
Figure 1 illustrates a philosophy of using mostly first metal, poly, and diffused regions in the layout of standard cells in order to facilitate global routing with second and third metal running across the active circuitry. Primarily, first metal runs horizontally, parallel with the diffused regions. Second metal is used sparingly to provide simple, straight, vertical connections, primarily to connect NMOS and PMOS diffused regions which avoids needless blocking of second metal global channels.
While using place and route CAD tools with only two layers of metal, the rows of standard cells border routing channels above and below the rows. Horizontal global routing is implemented in first metal while vertical routing into and out of the cells is implemented with second metal. When using three layers of metal, the third metal is used for the horizontal routing interconnections (parallel to diffused regions) while second metal is used as before. With three layers, it is important to vertically mirror every other row in order to place similar supplies next to one another on adjacent rows. With the third metal scheme, the conventional place and route is achieved as before, but with the global parallel wiring in third metal. After routing, however, the adjacent rows are compacted, thus shoving the active regions under the routing channel. Conceivably, the supplies can be merged, as shown in figure 2. One observes the need to localize the second metal within the cell to the central region in order to facilitate compaction.
The logic for the example came from the MOSIS. The cell is traditional CMOS compatible with the Phillips style of layout. The F/F is a basic D type with the extra JK gate to create the D signal. The feedback is achieved with high resistance inverters in both the master and the slave latches. Both slave nodes are buffered for drive while the input clock uses buffers to drive the pair of transmission gates.
In order to maximize the sharing of diffused regions and the minimization of junction contacts for minimum capacitance and cell width, it is important to group transistors to achieve the maximum sequence of interconnected devices, for both NMOS and PMOS. This sequence should start with devices connected to the supplies and is not necessarily constrained within a given logic gate. As an example, within the master latch, combine the input inverter, the transmission gate, and the feedback inverter with M1 being the output, labeling this group M1. For the S1 group combine the master output inverter with the slave transmission gate and slave feedback inverter, with S1 as the main output and M2 as a secondary output. Both groups start and end with devices that are connected to the supplies in order to share with adjacent groups in the final layout. The remaining groups correspond to the remaining logic gates.
Within the standard cell layout the groups are arranged in a linear order to minimize the number of routing tracks required. Figure 4 illustrates a possible order obtained simply from the logic diagram.
One desirable property is the direct feed of an output to a neighbor, particularly when there is only a fan-out of one. The two clock outputs, Ph and PhN, feed to both the M1 and S1 groups and pass through the D group. Further, the D group has the two inputs Q and QN without using the clocks. Clearly, the D group is out of place. Consider exchanging the positions of the clock and D groups (Fig. 5). While the tracks required at the bottom of D dropped from 5 to 3, it really accomplished nothing in the most dense locations (6). Consider the notion of exchanging D and M1 in the original placement. The M1 cross-section (at the bottom of M1 unless otherwise indicated) has dropped from 6 to 5 tracks. The D group has a single connection to the M1 group but two to the output buffers.
As the D group moves away from the M1 group and towards the output buffers, the remaining cros-section has one less track. Q and QN tracks are removed, but the D track is added. Clearly, the D group should be moved lower until it is next to the output buffers. But where should the output buffers be? Exchanging the two buffers themselves only swaps the S1 and S2 extensions" which has only a slight advantage in performance by minimizing the interconnect loading on the higher output impedance S1. However, if the QN buffer is placed next to the S1 group, S1 only needs to extend to the S2 inverter (figure 7).
Exchanging S2 and Q, as in figure 8, provides no change in the congestion as long as the D group is above the group QN. Placing the D group below QN decreases the congestion by one track above D, but raises it by one track below D. Figure 9 reflects the group placement at this point. Although the number of metal1 tracks above D is four, the inputs JN and K will take at least one, if not two, more tracks for cell entry. This is determined only by examining the internal features of group D. If there are six tracks, then group D becomes the most congested group. Moving the S2 and Q inverters above D removes the S1 and S2 feedthroughs but adds Q. Placing S2 just below group S1 with the Q inverter minimizes the S1 interconnect length and the interconnect density above QN as illustrated in figure 10.
Starting at the source of the input inverter, the series path is through the inverter, the transmission gate to M1, and back to the source through the feedback inverter, as shown in figure 11. Similarly, the S1 group is determined as shown in figure 12.
Large devices often require folding into two (or more) smaller devices in parallel. This reduces the parasitic capacitance on the output node -- often with no additional area penalty since the outside source contacts can often be shared with adjacent groups.
The logic for the D gate is given in figure 14. The layout for a traditional CMOS circuit implementation with the NMOS and PMOS networks being the duals of each other is illustrated in figure 15.
However, this version requires more area and is slower than for a modified version implemented with a parallel NMOS network of series devices. From the Kmap the output D should be zero for D = Q*K +QN*JN. This layout does not have the shorting bar" in the NMOS network that the previous layout contains, nor does it have the extra junction contacts (and capacitance). However, the poly interconnect is no longer straight across as in the original Phillips style of layout. If one desires to reduce set-up time (instead of toggle frequency), then the JN and K inputs should be next to the D output. Arranging Q and QN as straight across leaves JN and K crossing as shown. The straight metal 2 interconnects prevent needless blocking of global interconnect. In addition, the metal 2 tracks align to the junction contacts, at least before compaction, which facilitates exit routing and forces the input routing off of the poly gates.
One very desirable attribute of this layout is the placement of the source nodes at the outside, which supports sharing with neighbors. The dual network has a length along the diffusion of 40 lambda (tesselation box width). With sharing of the supplies, the second version has a length of 26 lambda. In addition, four metal 1 tracks, corresponding to JN, K, D, and Q/QN, are required. Referring back to the first pass" group placement, the D group has signals S1 and S2, passing through; hence, six M1 tracks are required. Placing the D group on the bottom is required, as anticipated.
The groups can be mirrored or placed directly. As a general guideline, placing the inputs to facilitate short connections between nearest neighbors provides a sense of data flow. Starting at the highest density point, sharing supplies whenever possible, and working in both directions leads to figure 17. In order to provide a metal 2 track away from the top edge of the tesselation box, the PhN inverter is not placed on the top. Also shown are tentative PMOS to NMOS connections at the output nodes -- mostly metal 2.
In general, the gates with crossing poly can be handled with three center tracks, as illustrated in the group D layout (figure 16). The signal for the NMOS device in the slave transmission gate is assigned to track 2, i.e., Ph, while its complement, PhN, to track 4, as illustrated in figure 18. The master latch normally has greater feedback resistance which generally provides more room for the crossovers. Figure 18 also illustrates the extensions in Ph and PhN for the master transmission gate. Using track 2 for K completes the poly connections between the NMOS and PMOS gates.
There are no parallel shorting bars" to consume the lowest number track for the NMOS network and the highest number track for the PMOS network. In general, these connections should be the
next connections to be completed. In the absence of these shorting bars", the long interconnection D is assigned to track 1.
In general, assign the forward connecting data flow signals closest to the center track 3. For example, assign M1 and S1 to track 3. However, certain congested areas may require this center channel. The D group is the second most congested area within the cell and tracks 1, 2, and 4 are already consumed. Preferably, the QN signal needs to be routed through the center to reach its bottom destination
Assign S2 to track 4 since track 3 is used, and assign Q to track 2. Then, provide the final input terminals closest to the center track, e.g., JN, to track 4 and CLK to track 3. Notice that track 5 is used only for the master feedback.
A final clean-up should include reducing the metal2 interconnects to a minimum by moving their vias as close to the center as possible -- necessitating metal1 replacements. The M2 connection illustrates this reduction in metal2. PhN, M1, S1, S2, QN, Q, and D can all be compacted.
In reflection, one can observe that the layout could be improved by interchanging the roles of track 2 and track 4. Often, the NMOS devices are not as wide as the PMOS devices which translates into less area to place vias over the junctions. With the D running up through track 2, the vias cannot be moved closer to the center. Using this long run for D on the track 4 facilitates moving the vias off of the NMOS devices.
One final observation is in order. This exercise has demonstrated the uncertainty of trying to linearly order the groups prior to determining the local barriers within a group. Consequently, intragroup placement with local routing strategy should be determined first. Then proceed with the overall placement
The final layout of the jk flip flop is shown in figure 19. This layout is scalable from the MOSIS 2um to the MOSIS 0.8um technology, while making full use of the metal3 routing layer available with the 0.8um technology. However, the layout does not fully exploit the refractory silicides available with the MOSIS 0.8um technology. The sheet resistance of the doped polysilicon and the junctions are 20-40 W/V and 80-100 W/V respectively, a limitation in their use for long interconnection
.
An extra processing step, which does not require an extra mask can be used to reduce the sheet resistance of the polysilicon gates and the source/drain junctions. Submicron technology incorporate LDD (Lightly Doped Drain) at the channel edges to minimize punch through and hot electron injection/trapping reliability problem. The associated shallow junctions have high resistivity which can be reduced to about 1.5-5 W/V with the addition of refractory metallurgy, e.g., the process called the salicide process (Self Aligned siLICIDE), silicides the polysilicon and the shallow source and drain regions in a self aligned manner. As the source and drain resistance decreases, the number of contacts required for the source and drain junctions, can be reduced considerably. This allows the horizontal metal1 routing tracks to run over the diffusion regions, thereby reducing the height of the cell, with no loss in performance. By using extra masking steps the refractory metal can be extended into local interconnects which cross field oxides - even with the classic buried contacts similar to the poly to diffused region contacts in the NMOS technologies
The local interconnect allows a direct connection between the drains of the transistors to the polysilicon gate. This extra layer can be utilized to interconnect drain outputs to gate inputs, without the use of short metal stubs as shown in figure 20. The figure shows how the use of local interconnect, with buried contacts in CMOS eliminates a horizontal metal routing channel and a vertical metal2 interconnect. Since the gate can directly connect to the drain, the role of first metal as a connection means changes significantly. Due to advancements in the technology, at some point the special local interconnect layers, e.g., refractory metal etc., are going to be available to the MOSIS community.
The creation of different versions of the same logic function, to take full advantage of an emerging technology is extremely tedious and expensive; however, a layout generator, with built in options can easily generate the entire cell library to take the full advantages of the new technologies. The layout guide lines that have been used in this paper form the basis for an algorithm to create a general purpose layout generator.