CHAPTER 10 |
IP AND HIGH-SPEED |
So far we have explored the tasks involved in porting an SoC design onto an FPGA equivalent form and how following some Design-for-Prototyping rules, these tasks can be made easier. A significant part of the porting effort will be put towards handling pre-existing intellectual property (IP), and particularly peripheral IP for interfaces in the design and so it is deserving of its own chapter. In this chapter we shall explore the two most popular groups of IP, namely CPU and interfaces and how we can model these in an FPGA-based prototype given that we may or may not have access to hard models, to RTL or even to an equivalent IP already available in FPGA-compatible form.
Almost all SoC designs today include some form of IP, which we shall define as design blocks not originating with the project but instead brought in from “outside.” We are limiting this discussion to digital IP because analog or mixed-signal IP will clearly not function in an FPGA and will need external support (see chapter 4).
From our prototyping perspective, it is valuable to know that the IP components are pre-tested and should work as specified. If so, then our prototyping task becomes somewhat easier, however, the prototype is an excellent platform for testing the combination of the IP blocks and their interconnection. Our SoC design is probably the first platform in which a particular combination of IP blocks has ever been used together. As we shall see below, enabling our prototype to properly model the combination of IP blocks may require different approaches depending upon the format in which the IP is supplied. Some will be modeled inside the FPGA while others may require external test chips. In each case, the IP supplier should be willing and able to support its modeling in an FPGA-based prototyping environment. It is of great value in SoC design to know that the IP block is tried and tested on silicon but this should also be true for the prototyping options, such as a test chip or a working FPGA image. The SoC team should ask its IP supplier what help they can offer to the prototyping team.
Our challenge in linking from IP in our FPGA-based prototype to other forms of IP outside of the device is to remain functionally equivalent to the original SoC design.
Digital IP may take many forms and originate from many sources, internal and external. The obvious examples would be a CPU core from third-party suppliers such as ARM®, or peripheral IP from Synopsys. Smaller elements such as the DesignWare® Building Blocks from Synopsys or generated by the Xilinx® CORE Generator™ tool are also examples of IP and have very widespread use in SoC and FPGA respectively. Even the reuse of another designer’s block can be considered IP as long as it is packaged with all the documents, infrastructure and support necessary to bring the IP through the whole SoC tool flow. However, that is where we draw the line for the purposes of this discussion – just using somebody else’s RTL in our design does not qualify as IP.
IP is delivered in a number of forms, each of which presents us with a different challenge. The main formats are listed in Table 25 and most methods of IP delivery will fall into one of the listed categories.
Table 25: Formats for IP delivery to SoC Team
RTL source code | Whole source code in Verilog HDL or VHDL, either under vendor license or from open-source provider. |
Encrypted source code | RTL is protected and must be decrypted either explicitly or as part of the tool flow. Once decrypted, it behaves as any other RTL source. |
Soft IP | Delivered in an intermediate form, sometimes encrypted, but requiring back-end processing. |
Netlist | IP delivered as a pre-synthesized netlist of either SoC library elements or generic gate-level elements, such as Synopsys GTECH. |
Physical IP | Also known as hard IP. Pre laid-out by a silicon foundry. Would be represented by models or test chips during development and prototyping. |
Any mix of these types of IP may be found in any given SoC design. Sooner or later we may need to address all of them in our prototyping projects. Let’s look closer at each of these forms of IP in turn and explain how each can be handled in an FPGA-based prototype.
When we have the RTL source code for an IP block, our task would seem to be no different than it would be for any other RTL in the SoC design. There are, however, some differences in how much we can understand or alter the RTL within an IP block. For example, it may be that the RTL is delivered under a license that governs that it may not be altered without voiding any warranty or support agreements. In that case, the IP vendor could be asked for a separate license or consulting contract for support of prototyping.
Alternatively, the IP vendor may already have an FPGA-ready form of the IP but that may require an extra license agreement, possibly at extra cost. When choosing IP for the original SoC design, some thought might be given to the availability of FPGA versions of the IP from the vendor.
Licensing and support agreements aside, as long as the RTL is complete and well documented, there is no fundamental reason why we might not successfully prototype it along with the rest of our SoC design. Although the functionality of the IP will be the same in the FPGA, as with other SoC-targeted RTL, we should lower our expectations regarding performance. RTL that is developed and tailored for a leading edge SoC process will run considerably slower in even the fastest FPGA today.
One of the issues with supplying IP as RTL to large numbers of people within a company or to any third-party developers is IP pollution and even theft. IP pollution is the accidental or even deliberate alteration of the IP for a design-specific purpose without the knowledge of the IP vendor, leading to licensing and support issues. IP theft needs no explanation and is a key issue for to all IP vendors and RTL supply is something that is very carefully monitored and protected between supplier and customer.
As RTL is so valuable, many IP providers and users minimize their exposure by using a secure method of delivery, limiting the spread of the RTL and the possibility of “reverse engineering.” Methods used for this include supplying the IP in more abstracted formats such as simulation models, encrypted netlists, encrypted FPGA bitstreams and full-silicon test chips. The advantages and limitations of each are listed in Table 26.
Table 26: Advantages and limitations of IP delivery formats in place of RTL
A model is really only a representation of the IP for the purposes of simulation. Whether written in RTL or in a higher-level format such as SystemC, it is not intended for synthesis or implementation in FPGA or any other physical form. Models are nevertheless required, not only for simulation but also because they recover some of the visibility otherwise lost by using an encrypted or test chip form of the IP. For example, a test chip of some IP may be delivered with a transaction-level model (TLM) for inclusion in a high-level testbench but we should not synthesize the TLM into silicon. A good measure of the maturity of an IP block (and indeed a good measure of the IP vendor) is the amount of extra support, such as models, provided with the IP.
In the case of CPU IP, models may also be provided for instruction set simulators which have very limited knowledge of the cycle accuracy of the IP but are very fast and ideal for use in software development before RTL is available. As we shall explore in chapter 13, models running on functional, virtual or instruction-set simulators can be interfaced to real hardware using standards like SCE-MI to give a solution partitioned between models and the real SoC RTL.
Let us now consider how we can prototype with these forms of IP for which we do not receive the raw RTL.
10.2.3. IP as encrypted source code
Beyond the relatively trivial example of including IP as RTL source code we find the first degree of difficulty is in RTL which is delivered only in an encrypted form. This means that we need the license to use the IP and the decryption mechanism to access it. IP vendors will each have their own approach to encryption and decryption, including common public domain methods such as PGP (“pretty good privacy”) as well as proprietary methods. If the protection is only used for shipping the source code then after decryption we will be in possession of the RTL, and, as above, fully able to proceed with our prototyping work.
However, some RTL is not only shipped encrypted but remains encrypted throughout the tools flow, automatically decrypted “on-the-fly” at each step as required. This is only possible if each tool has built-in understanding of the required decryption and the necessary keys. Two common examples of such an approach are the synenc encryption flow from Synopsys® and the encrypted ngc files generated by Xilinx® tools.
The overall aim is to get the source code for the IP of possible, or create some route where the IP instantiation black box can be filled. This is an exercise that we will not go into in any further here except to say that the industry will eventually move towards an IEEE standard for IP encryption and encapsulation.
Encrypted netlists can be supplied to companies for inclusion into their own FPGA designs but offer a low level of security since they are inevitably decrypted in the design flow. The resulting output from the design flow is encrypted again to ensure the final image file can only be used in a deterministic way in the target FPGA hardware.
FPGA vendors offering encrypted IP (from third parties) for their products rely on different techniques to limit the use in the target FPGA hardware, this can range from IP that will only work for a limited time when programmed into an FPGA and/or requires the use of the download/debug hardware and software used to configure the FPGA. These are special netlist designs with additional logic to perform checking and security. They still rely on a legal agreement to ensure the final level of protection of the IP.
Some FPGAs have built-in encryption keys which are used to decrypt an encrypted FPGA bitstream on the fly as it is read in by the FPGA and configured. The latest -6 devices and the largest Xilinx® Spartan®-6 devices all use AES 256-bit encryption/decryption. The decryption key is stored in battery-backed key memory, or it may also be stored in a less secure poly eFUSE key. The battery-backed key is the most secure, as there are no known means to obtain the key from a device. Powering the key memory off causes the key to be lost completely. The eFUSE key is less secure, as destructive tear-down of the device may be used to read the value of the eFUSE bits. However, even then the eFUSE bits are not easy to read since there are three times as many bits used than are strictly required for the security key, further confusing the would-be key-copier.
This encryption methodology was originally intended to stop design cloning of FPGA-based products, but it is also a useful way to secure IP blocks supplied in FPGAs. Since all the decryption is performed within the FPGA and only the encrypted FPGA image is visible outside then the security of this is very good and an excellent method for vendors to deliver high value IP.
Figure 123 (courtesy of ARM) shows how the FPGA image would replicate the integration (top) level of an IP block design. This would effectively be the same interfaces that would be exposed when using a hardened macro from the silicon provider or when we have hardened an IP block to our requirements. Due to the high number of signals that are normally associated with this level of the design, there may be a requirement to multiplex these signals to and from the FPGA. This then requires the opposite logic to reconstruct the signals to join them to the rest of the design. The vendor supplying the FPGA image should also provide the application notes and support to enable us to do that.
Figure 123: Top-level of IP used as encrypted FPGA image (source: ARM Ltd.)
A useful feature of IP delivered in this form is that the clocks in the FPGA do not make use of the internal MMCM and such elements. This allows the system to be clocked at speeds below that minimum limit that would have been imposed had they been present. Indeed, this approach may even support clock stopping and single stepping in the IP block.
As users then, we would need to provide the clocks for the different domains in the FPGA’s internal logic and the pin multiplexing and so forth for our prototyping system in order to ensure proper clock alignment.
This approach is used by ARM in its software macro models, or SMMs, which are encrypted FPGA images. ARM feels that the omission of clock infrastructure gives the SMM a greater operational flexibility, which supports a more end-user applications without the need for altering, or even viewing, the RTL.
Probably the most secure method of delivery for the IP vendor is for the design to be pre-implemented in silicon, as it is very hard to reverse engineer. However, it is also the most costly to create, maintain and support, especially if the vendor has to build a new test chip for each revision of its IP. Test chips are usually available for higher value IP blocks with wide usage, for example, most ARM CPU cores have test chips. It is less likely that a test chip would be available for either a new block (e.g., supporting a very new communications standard) or for a specialized IP block which is customized by the vendor for each user.
Test chips may require associated components to support their operation (e.g., memory controllers) but the combination of external test chip and support will allow the FPGA-based prototype as a whole to run at the highest speed possible. However, to achieve these higher speeds we often need to compromise on features
The biggest limitation of the test chip is its lack of flexibility, owing to pre-defined interfaces and configuration options. This may impose restrictions on usage, for example memory maps or interrupt structures which may not match the expected use of the IP in the final SoC. Compromises to IP in order to allow its use in test chips include selective adherence to cycle accuracy, use of asynchronous bridges between the test chip and the FPGA, multiplexing of signals to accommodate the high pin count buses and even limiting the features of the implementation to meet the prototyper’s needs or silicon limitations (e.g., disabled test modes, less interrupts, merging buses on chip to bring a single bus to the pins, etc.).
Software or system settings would need to be altered to match the test chip’s capability, rather than the other way around and it may be that a more flexible RTL or FPGA-based delivery is required. This might mean that we need to obtain extra licenses from the IP vendor compared to a test-chip and model approach, but it may be worth the investment if it means that the FPGA-based prototype is going to be more useful with it.
Figure 124 (courtesy of ARM) gives an example test chip implemented for the same ARM processor example shown earlier in Figure 123. Here we can see the use of the SMC (static memory controller) and DMC (dynamic memory controller) to access to boot memory and peripherals, together with the DMC for run the time memory.
Figure 124: Top-level of test chip equivalent to Figure 123 (source: ARM Ltd.)
Test chips are able to support benchmarking and OS development in the early stages of a design ensuring that we can make an early start on the software. However having all these features does limit the flexibility of the test chip in the hardware prototyping system (due to memory map, interrupt and fixed configurations). The test chip of the kind illustrated is generally best placed to support OS development, benchmarking activities and development of extension IP blocks which will be connected via the AXI™ bus (in the case of the ARM).
As we saw in chapter 5, we recommend avoiding permanently linking FPGA pins to peripheral or other external components on the board. Instead we should keep such connections flexible in order to increase the chances for their reuse in future prototyping projects. In the case of IP test chips, we may need to connect our FPGA(s) to a great number of external pins on the test chip or a board/module upon which the test chip is mounted. We should try to make such connections via deferred or switched interconnect (see chapter 6) and this may involve adaptors or vendor-specific connectors.
Figure 125: ARM test chip on a CoreTile with associated adaptor for Synopsys HAPS®
An example of an ARM test chip mounted on a CoreTile and its associated adaptor with which it would communicate with a Synopsys HAPS® FPGA board is shown in Figure 125.
The use of wide buses in SoCs normally means that the connections between FPGA and test chip need to be multiplexed in order to reduce the number of FPGA pins required or to simply fit within the number of pins of the connector. This however, will also introduce extra delay of the interconnect IO pads and boards, probably reducing the overall system speed. More discussion of multiplexing and its impact on timing can be found in chapter 8. This can be mitigated to some degree by using high-speed serial signaling techniques and higher speed multiplexing rates (see HSTDM discussion in chapter 8).
Having discussed the different formats in which IP can be delivered, we shall explore the handling of soft IP and hard IP in more detail and also explain how these can be included in our FPGA-based prototype.
Soft IP can be any form of IP for which physical implementation is decided upon by the end-user. For example, IP delivered as RTL can be considered “soft” because we are at complete liberty to compile, synthesize and lay-out the IP in any way that
Figure 126: Example of soft IP: datasheet of a MAC from a DesignWare library
we choose. Therefore, in our earlier discussion of the various RTL delivery, with our without encryption, we were in fact exploring soft IP.
However, any form of IP for which we do not receive layout or any other physical information should be considered “soft.” For example, netlists or binary forms of the IP, or IP that is pre-compiled into an intermediate library. Relatively low value IP or IP for which performance and area targets are reasonably easy to achieve are often be delivered as soft IP.
A very common example soft IP is the DesignWare Building Block library from Synopsys. DesignWare is an extensive library of infrastructure IP for design and verification including arithmetic and datapath components, AMBA interconnect IP and microcontrollers. The datasheet for an example DesignWare component is shown in Figure 126 where we see an excerpt from the datasheet of a multiply-accumulate function, or MAC.
SoC designers will make use of such a soft IP block in one of three ways: instantiation, inference or operator replacement. Instantiation is the simplest, with the block’s module described and instantiated in the normal way. The descriptions would be available from a library or directly in the RTL. An example instantiation only is shown in the code excerpt in Figure 127.
Figure 127: Instantiation of DesignWare® MAC IP in Verilog HDL
DesignWare IP may also be inferred as function calls. This relies upon the inclusion of support references in the RTL (via and include statement in Verilog or library reference in VHDL). In the example shown in Figure 128, the tool must also be set up so that a search path points to the files to be included. Here we see that one of two different functions are included each inferring either a two’s complement or an unsigned version of the MAC. The synthesis combines these together into a common MAC with configurable least-significant-bit. Different forms of the IP may have been created, for example a power-optimized or a performance-optimized version of a multiplier. The user or the tools would have the ability to choose the most appropriate version in the context of the design at compile time.
Some tools may even infer soft macros synthetically during operator replacement or during a sub-function of synthesis called module compilation. For example, a multiplier soft IP macro would be inferred by the code simple code c< =a*b with the result that the gate-level netlist would have an extra level of hierarchy containing gates optimized for the target constraint.
Considering the value of soft IP, it is not surprising that most SoC designs today include such elements and so we need to be able handle them during FPGA-based prototyping. Continuing with DesignWare as an example of soft IP, each of the above three methods for including the IP in the SoC design will require different solutions for their correct operation in FPGA.
Figure 128: Inference of DesignWare® MAC by function call
An IP instantiation may not be understood by the FPGA synthesis in the same way as the SoC synthesis, if at all. In most cases, the IP instantiation would appear as a black box, requiring contents at some point in the FPGA flow. If we are able to advise the SoC designers during their initial choice of IP, then we should ask them to ensure that only IP with an available and proven FPGA equivalent is chosen. In this way the original SoC design could have ifdef branching between two instantiations, based on a single define variable.
Perhaps even more preferable would be a single instantiation which serves both SoC and FPGA purposes, with the tools in the two different flows providing the appropriate contents for the instantiation. In that case the SoC designers would not be required to make any special provision in their RTL, except to choose only from a supported library of soft IP for which FPGA equivalents are available.
One way to provide this in the FPGA flow is for the prototyping team to write an additional RTL file with the same functionality as the soft IP. To enable this more easily, we would use wrappers in the original SoC design, as we did for memories in chapter 7. We could ask the SoC team that each time they instantiate a soft IP element that it is placed in a wrapper so that it can easily be replaced with the FPGA equivalent. This is another example of good Design-for-Prototyping practice as discussed in chapter 9.
A more automated approach may require some investment in time and effort, but if a particular soft IP library is to be used often, then it would be worth the investment. For example, at Synplicity®, Bangalore, before the acquisition by Synopsys, each DesignWare building block was analyzed and functionally equivalent RTL was created for use in the Certify® tool. In that case, all properly instantiated DesignWare elements would be automatically interpreted, not as a black box but as a new bottom-level of the RTL hierarchy. This was not particularly optimized for FPGA, but during FPGA synthesis was interpreted in the same way as any other piece of RTL.
More recently, Synopsys has modified its FPGA synthesis tools to allow native use of the DesignWare blocks for FPGA designers. In addition the DesignWare IP developers themselves have also performed some optimization for the blocks to better operate in FPGA. Any instantiation or inference of a DesignWare building block element in an SoC design will also be automatically interpreted correctly by the FPGA synthesis tools.
The result for inferred soft IP is very similar to that for instantiated, however, now it is not a matter of “filling” an empty black box but of the FPGA tools inferring the same functionality as the original SoC synthesis for a given piece of RTL. In both cases, the library references and/or include statements must be resolved to the respective target’s implementation. In the example RTL in Figure 128, the SoC synthesis tool (Design Compiler®) has its search path set up to include the path to the functions for all the DesignWare used in the design. When this same RTL is passed to the FPGA tool during the prototyping project, the equivalent path variable needs to be set to point to a set of functions for the DesignWare used. Alternatively, if the DesignWare elements are few in number and rarely used, then the required extra function definitions could be placed in a local include file.
The function definitions included will not be identical to the design blocks used to fill the instantiations mentioned in section 10.3.1, however, we refer to other sources for how to use functions in Verilog HDL.
Figure 129: Logic automatically included for DW02_MAC functional inference
Referring back to our dw02_mac example in Figure 128, if we synthesize that in FPGA synthesis then the resultant logic created is show in Figure 129 and it is simple to see how this would be mapped into FPGA.
In SoC synthesis, an RTL operator – whether built into the language, like +, -, and *; or user-defined, like functions and procedures – can be linked to a synthetic operator. A synthetic operator is an abstraction that makes it possible for the synthesis tools to perform arithmetic and resource-sharing optimizations before binding the operation to a particular synthetic module.
The linking mechanism will vary from tool to tool but in Design Compiler it is a recognized HDL comment called a pragma. When the compiler sees the pragma, the map_to_operator as a comment in the RTL, then the logic is used in place of the operator. Operator inference occurs when the synthesis tool encounters an HDL operator whose definition contains a map_to_operator pragma. The tool finds the specified synthetic operator, inserts it into the user’s design, and performs high-level optimizations on the resulting netlist. In fact, this is the mechanism used for the functions in the example of inferring the dw02_mac in Figure 128.
Table 27, taken from the DesignWare Developers Guide, lists the HDL operators that are mapped to synthetic operators in the Synopsys standard synthetic library for Design Compiler (for more information in the references).
When soft IP is inferred by the SoC synthesis tools in the above way from generic
Table 27: Synthetic soft IP mapped to operators
RTL then its replacement for prototyping is an almost trivial task. The same RTL which infers the soft IP in the SoC synthesis will be automatically interpreted by the FPGA synthesis tool and mapped into relevant FPGA resources. For example, the SoC synthesis might employ a synthetic operator bound to a dw02_mult block in order to represent the * in a simple statement c< = a*b; . The exact same * will be inferred as a multiplier by the FPGA synthesis and mapped to a dedicated FPGA multiplier resource by default.
Because of this automation and simplicity, SoC teams should try to employ synthetic operators as often as possible rather than instantiate the soft IP directly into the RTL.
Xilinx has a large variety of IP which are licensed for use only within their own FPGAs. We can look at the functionality of the soft IP in the SoC design and find a close, or maybe even exact, equivalent in the FPGA library. Of course its use would be only temporary for the sake of prototyping but it may be that we can use a wrapper in the same way that we do for RAM. The more complex the IP, the more useful that this would be, but also the less likely that a match can be found between the SoC IP and the FPGA IP. The one exception to this is in the area of standards-based peripheral IP. Let’s look at that more closely now.
So far we have discussed handling of small elements or blocks of IP embedded in our RTL. For relatively low-level elements such as the DesignWare Building Blocks, the creation of an FPGA equivalent is a reasonable task but what of the subsystem level IP or whole CPU cores or peripheral functions, such as PCIe or USB? In some cases, we may need to spend a substantial effort replacing this kind of core IP with an FPGA equivalent or external component so before we do, we should be sure of our purpose.
There are a number of reasons high-speed IO needs to be prototyped and we summarize these in Table 28.
Table 28: Reasons and considerations in using peripheral IP in a prototype.
The prototype project is mainly performed in order to verify the IP design itself. In that case, our task is actually a small-scale version of a normal SoC prototyping project. We would prefer as high a speed as possible up to the full expected speed of the IP in its final use in silicon. We would also prefer to make the prototype as accurate as possible and to interface with real-world peripherals. This is the use model for most of the prototyping performed within the Synopsys IP groups, for example in Figure 130 (as previously mentioned in chapter 2) from Synopsys IP design group in Porto, Portugal, we see the IP under test at the bottom of the FPGA block diagram while the rest of the FPGA is used to create a validation environment for the IP under test.
The channel from the HDMI Rx IC through the audio/visual processing out into an external PHY transmitter is what we want to prototype. However, we also need support elements for control and integration with the AV channel. This would be performed differently in each SoC implementation but for the purposes of running some software to test the HDMI channel, we pull together an infrastructure inside the FPGA using IP readily available from Xilinx and standard external memory components. The two halves are linked together using standard Synopsys IP for the ARM AMBA® interconnect.
Figure 130: FPGA IP augmenting a test rig for SoC IP
In this chapter, we shall focus not so much on this use mode as, in many ways, prototyping a specific piece of IP is really a subset of the tasks involved in SoC prototyping in general. In fact, IP prototyping is probably easier than for a whole SoC because many IP designs will fit into a single FPGA and thus we avoid partitioning. Let us move on to the more general case, then, of including peripheral IP into a larger SoC prototype.
The second use model, and the one most relevant to this chapter, is the use of an IP within a larger SoC. In that case, although we would indeed prefer to run at full-silicon speed, we will probably find that our overall system speed is limited by the performance of the SoC RTL when mapped into the FPGA core. In that case our task becomes a matter of providing a method for scaling the overall system speed in order to more closely match the peripheral IP speed with the SoC core. This may involve data buffering or splitting an input stream into parallel processing channels.
What can we do when the peripheral data is arriving too fast for the FPGA to be able to handle? A common way to remedy clock differences between the FPGA system and an external interface is to add “rate adapter” circuits between the external stimuli and the SoC. These circuits are logically FIFOs that “absorb” (and drive) the external stimuli at the full external speed, although probably at its lowest acceptable rate to still meet the standard. The adaptor then only processes a subset of the stimuli that the FPGAs can process with its reduced clock rate, and it then drives the stimuli to the external interface at the reduced clock rate.
For example, if an FPGA-based prototype runs at one-third the speed of the SoC then, to properly emulate the SoC’s performance, the FPGAs should be driven by one-third the amount of stimuli.
Another alternative is to predict the maximum amount of data that will arrive in a burst at the full peripheral rate. We can then implement a shift-register first-in-first-out (FIFO) buffer to receive it at full rate and then pass forward to the FPGA prototype at the reduced rate after all data is received. The same approach is used in the other direction to step up the rate from the FPGAs to the external interface. Some designs will not tolerate this approach but for many others this rate-adapting is adopted with great success.
The FIFO can be implemented in external components, in another FPGA, as in the example in Figure 131 or in the receiving FPGA itself, depending in speed and size requirement. In the example, designed by Synopsys consultants in New England, we see two external USB2.0 channels being run at 125MHz, and asynchronous FIFOs being used in each case to buffer incoming and outgoing packets. Complete packets themselves are recognized by the “sniffer” circuit and signaled to the rest of the prototype as being ready to read in at one-tenth speed.
This kind of rate adaption works very well for regular packets of data and can handle reasonable rates as long as data is in bursts. It can also handle deep data if enough FIFO memory is available but, obviously, continuous data traffic would overflow the FIFO and packets would be lost or would need to be discarded until the rest of the prototype is ready to receive.
Figure 131: Rate adapter concept for 10x USB reduction
This area of prototype design requires some additional engineering beyond that already in the SoC, but many find that to be the most interesting part of a prototyping project. The best time to consider buffering between internal and external data streams is when the SoC is originally being designed. Having the prototyping team involved at this early stage will allow rate-scaling design to be completed in time and for software teams to pre-empt the necessary scaling changes in their code.
The third common use case in Table 28 (on page 304) is when we want only to run applications software on a platform and our task is to provide data to the software and receive results as if we were running on the final system. From a software viewpoint we do not have to be cycle accurate as long as the channels run with “enough speed” to keep the channels in sync and provide approximately the required functionally. For example, as long as that the software thinks that it is receiving video data from an HDMI source, then the transceiver channel for that video data might be quite different from that instantiated in the SoC design itself. In fact, we could even completely replace the SoC IP with an FPGA version. We could also consider replacing the HDMI channel with a transactor that performs the same job, as we shall see in chapter 13.
As mentioned in section 10.2, the only form of IP available in some cases is an external test chip or hard IP. There are many reasons why we would want to use a hard IP block alongside our FPGAs, some of which may be forced upon us, for example, the RTL is not available or not supported by the IP vendor for external usage.
The most common hard IP blocks found in prototypes are CPU cores and standard bus interfaces using PHY, including Ethernet, USB, PCI/PCI-X/PCIe and memory interfaces such as DDR2. Many vendors offer their IP as hard macros pre-implemented by various semiconductor partners and as end users we might choose a hard IP which most closely reflects our final silicon usage.
A good by product of using test chips and external hard IP is that they free-up FPGA resources. These are then freed up for other uses or left unused, lowering the device utilization and therefore decreasing runtime and potentially increasing performance. We should note that the FPGA resources required for large IP blocks such as DSPs or processor sub-systems could easily consume a whole FPGA or even multiple FPGAs. In the latter case the cost and performance impact of splitting IP across multiple FPGAs might be too high, especially if we do not have intimate visibility of the internal workings of the IP. In these cases, it would make sense to use an external hard IP or test chip, accepting the limitations this may impose compared to the effort required to develop a larger multi-FPGA solution. These are amongst the decisions to be made in the early stages of the project (see chapter 4).
There are situations where we may need to include logic modules in the design for which there is no RTL available, but the same functionality is available from another source. FPGA vendors, including Xilinx, offer a wide range of IP cores that can be placed into the design in place of the core instantiated in the SoC. These cores are often optimized for FPGA and should give better performance and area results than would have been achieved by simply porting the RTL of the original SoC IP. Note that these Xilinx® IP cores may not be used in the SoC (Xilinx only licenses its IP for use in its own devices).
Some examples for such IP are specialized memories, processors, communication controllers, multi-gigabit controllers and many more. The cores are in the form of a low-level FPGA netlist, either generated with a Xilinx supplied tool (CORE Generator tool) or from previous FPGA implementation. The IP is instantiated into the RTL source before synthesis which supports the inclusion of FPGA IP blocks in the following ways:
The example we shall use is a PCIe-to-SATA bridge that was developed at the Synopsys IP development lab near Dublin, Ireland. Synopsys IP teams have used FPGA-based prototyping extensively to validate IP and its connectivity with external systems. In order to do so, demonstration and early-adopter platforms are created which need to closely resemble as many of the eventual targets for the IP as possible. The references give more details but the block diagram of the design is shown in Figure 132.
This example used the Virtex®-5 LS330 device on a HAPS-51 board to implement both the PCIe and SATA interfaces and the bridge between them; another testimony to the scale of modern FPGA devices. The interfaces use the built-in fast serial transceivers on the FPGAs and have bespoke blocks to drive from the cores to the FPGA-specific hardware, we can see the special FPGA-ready modifications labeled as pipe2v5gtp and sata2v5gtp. Pipe is an intermediate interface between the PHY of a PCIe interface and the rest.
Figure 132: Top-level implementation of PCIe-to-SATA bridge design
This arrangement allowed the prototypers to fully test the two sub-systems with real world data and realistic speeds.
In an ideal world we would want to be able to clock IP at any chosen speed, from DC to 1GHz+ to allow us to fully validate the systems features and enhance debug and development. However, we know that a crossover point normally occurs at about 20-100MHz when we need to migrate from a simple clocked design to something that requires internal clock generation and de-skew logic as mentioned in chapter 8. This requires the use o PLL and MMCM type blocks but this will enforce a minimum operation speed on test chips and FPGAs alike. This is limited by the PLL/MMCM operating specification and it is risky to try to go slower than that, even if it may seem to work in the lab.
On another related note, the PLL in the SoC and the replacement in the FPGA may have different jitter, duty cycle, and even different frequencies of operation, drift, and accuracy, Therefore we need to be sure that the use of the MMCM or other clock circuitry is accurate and close enough to the SoC infrastructure give meaningful results. That is not usually in doubt for regular digital prototypes running within spec.
One of the biggest challenges for a new prototyping team is the inclusion of IP, especially high-speed peripheral IP. Third-party IPs are often provided without source code, so they cannot be synthesized as part of the RTL and an alternative must be found. This could be a netlist in the SoC library or pre-mapped to FPGA elements. It could also be an external testchip or an encrypted version of the RTL.
All of these forms of the IP can be used but we should not lose sight of the purpose which is not to verify the IP; that should have been guaranteed already. The reason to model the IP is so we do not leave a hole in the design. Having the IP present in some form allows us to validate the rest of the hardware and especially the software running upon it.
The authors gratefully acknowledge significant contribution to this chapter from
Spencer Saunders of ARM, Cambridge
Antonio Costa of Synopsys IP Group
Peter Gillen of Synopsys IP Group
Torrey Lewis of Synopsys IP Group
David Taylor of Xilinx, Edinburgh