Project Spring '00: Matrix Multiply Revisited | ||
|
ObjectiveYou are to implement the matrix multiply defined in Lab #8 in 12 clock cycles or less and to operate at a clock frequency of not less than 25 Mhz. This will illustrate the resource versus time tradeoff that is present in all digital system designs. To Do. You are to implement the matrix multiply defined in Lab #8 in 12 clock cycles or less and to operate at a clock frequency of not less than 25 Mhz. There are no resource limitations; use as many multipliers, adders, registers, RAMs etc as you want. The device must be the EPF10K20RC240-4 (Flex 10K, note the speed grade!!!!) The interface to the design has been altered to define two modes of operation: continuous mode and normal mode. The new interface for mmult is defined as follows: Inputs
Outputs
There are two main changes from the previous implementation:
Clock Cycle ConstraintThe clock cycle constraint of 12 clocks is measured between falling edges of the input_rdy output while in continuous mode (this number of clocks defines the initiation rate of the design). The smallest possible number of clocks is 4; this represents a continuous stream of X,Y,Z,W values on the DIN bus. Extra PointsYou can earn extra points in two ways:
The bonus points are added directly to the point total of your tests. IF you earn the MAXIMUM bonus for #2, you have the option of keeping all of the bonus points or dropping one test grade (the final exam does not count as a test grade). Extra points will ONLY be awarded if you meet all functionality specs and timing specs (i.e, a design that fails functionality tests, but fails them fast at 100 Mhz, still gets 0 extra points) Hints
Sample WaveformsThis waveform file (mmslow.scf) shows a solution that has an 11 clock initiation rate. The computation at 20 us shows the continuos mode of operation, the computation at 38 us shows the normal mode. This waveform file (mmfast.scf) is a solution for a 4 clock initiation rate. Again, both continuos and normal operation modes are demonstrated. TestbenchThis schematic (tbproj.gdf) is a testbench that MUST BE USED to demonstrate the final checkoff of your design. Your register-to-register timing check using the testbench must exceed 25 Mhz; however I will use the register-to-register timing check on your mmult design (minus the testbench) for assigning bonus points. The testbench has been designed to run on the Altera UP1 FPGA board. The testbench uses the pushbuttons PB1 and PB2 to control operation; PB1 starts a computation sequence that loads in a coefficient matrix and then computes 64 matrix multiplies using the continuous operation mode. A 14-bit Linear Feedback Shift Register (LFSR) is used to provide a pseudo-random number stream that is used for all data inputs. An 8-bit XOR-checksum register is used to capture all dout data values whenever output_rdy is asserted. The output of the 8-bit XOR-checksum register is displayed via the two 7-segment displays on the Altera UP1 board. The FSM testbench has been designed such that input values and output values are supplied using the input_rdy, output_rdy handshaking signals. This means that all designs, regardless of the initiation rate, will give the same checksum value after the 64 matrix multiplies have finished. The PB2 pushbutton is used to reset the FSM to its initial state. The input switches SW7-SW3 can be used to alter the initial value used by the 14-bit LFSR; this will cause a different pseudo-random number stream and thus a different checksum value. The testbench uses approximately 6% of the Flex10K20 resources and cannot be altered. These two sample waveform files, tslow.scf and tbfast.scf, show testbench simulations for designs with initiation_rate = 11 and initiation_rate = 4. The bus labeled accout is the 8-bit XOR register checksum value. You will note that for the slow design, the first computation cycle (64 matrix multiplies) runs from about 50us to 320us, with a final checksum value of E7. The SW inputs are then changed to alter the initial value of the LFSR, and the 2nd computation cycles gives a checksum of 3F. The fast design first computation cycle runs from 50 us to only 150us, and gives the same checksum of E7. The 2nd computation cycle for the fast design also gives a checksum of 3F. The files you need for the testbench are: tbproj.gdf (testbench schematic), tbfsm.vhd, and tbproj.acf (this file gives the pin number definitions needed for the Altera UP1 board). One of the tests that I will perform on your project submission is to download it to the Altera UP1 board and see if it works. You would be wise to do the same before submission; I will have boards available for checkout and there will also be a board in the Digital Systems workstation area. Project SubmissionI will a place a script called "submit_cadproject.pl" in the /home/reese/bin directory on Leto (download the script submit_cadproject.pl if you are at a remote site). You will need to create a directory called "project", and copy all of your files to that directory. You will then need to execute the program "submit_cadproject.pl". This script will remove unnecessary files from that directory, then zip it up, and email it to me. You must make sure that ALL files that I will need to compile and simulate your design are included. If you leave out any files such that I have to contact you after the submission date, then I will deduct 5% from your final project grade. Academic DishonestyI will be grading all of the projects. You MUST do your own work; treat this as a take home test. I will treat any cases of project copying extremely harshly. GroupingYou may do this project in groups of two or by yourself. If in a group of two, I would like to see some indication in the report as to how the labor was divided up. For a group of two, there only needs to be one report and both will recieve the same project grade and same bonus points. ReportI want this report typed, and in an MSU lab folder. You must have the following:
Everything must be typed; nothing handwritten will be accepted. Submission Dates:All files must have been submitted via the submission script by Thursday April 27, 8:00 am. All final REPORTs must be in my Simrall Office box or placed under my Simrall office door by 9:00 am, Friday April 28th. GradingThe project is worth 10% of your total grade. I will use the following guidelines for grading: a. Project does not work at all ( poor report turned in: 0%, average report: 10%, stellar report: 20 %) b. Matrix Multiply seems to work somewhat, but testbench fails because handshaking is messed up ( poor report: 30%, average report 40%, stellar report 50%) c. Testbench works fine, but fails either the initiation rate or clock rate
requirements d. Meets all specs ( poor report: 80%, average report 90%, stellar report 100%) Obviously, the biggest impact on your final class grade is via the extra points. Start early so that you can get a fully functioning design and be able to earn these extra points.
|
|