1 / 33

Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project.

Technion – Israel Institute of Technology Faculty of Electrical Engineering High Speed Digital System Lab (HS DSL). Exploring new implementation tools for GIDEL PROCSTAR platform ( PART I - PROC _HILs ). Elad Hadar Omer Norkin Supervisor: Mike Sumszyk

lilia
Download Presentation

Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technion – Israel InstituteofTechnologyFacultyofElectricalEngineeringHigh Speed Digital System Lab (HS DSL) Exploring new implementation tools for GIDEL PROCSTAR platform(PART I - PROC_HILs) EladHadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:30/5/11

  2. What is PROC_HILs ? • PROC_HILs is a Hardware-In-the-Loop acceleration tool for running Simulink designs on FPGAs. • Automatically translate Simulink designs into FPGA code (compatible with the PROC board installed on the target PC) and run it under Simulink.

  3. Why do we need PROC_HILs ? • Dramatically improves simulation speed, with a dedicated accelerator for Simulink designs. • Enables building a design visually and downloading it directly, with minimal effort, into the PROC board. • Enables concurrent engineering at an early stage. • Cuts development cycle time (and costs). • Improve design reliability.

  4. Project motivation • Implementing a video analysis designs on GIDEL PROCSTAR III platform that will enable usage and exploration of a new development platform (PART I – PROC_HILs). • Proper usage of development tools throughout all stages of implementation from algorithm to hardware.

  5. How it works • PROC_HILs enables the user to download a Simulink design into PROC board and run it. • The design runs on the on-board FPGAs, communicating with Simulink in real time. • Generation process is fully automatic.

  6. How it works – Main stages 2. An HDL code is generated, synthesized and compiled to get an .rbf file (FPGA binary file) compatible with the specific PROC board 3. A new Simulink design file is generated.Single HIL block including all the inputs and outputs that were present in the original design, connected to all the sources and sinks 1. Simulink design 4.The design runs on the hardware fully synchronized with Simulink, receiving the signals from the simulation sources and outputting the results into the sinks.

  7. Hardware and Development environment • Main development stages were made on a GiDELPROCe III (AlteraStratix III) board (1-FPGA) • GiDELPROC_HILs (Version 2.1.2) • ALTERA’s DSPBuilderblockset for Simulink (Version 10.1) • ProcWizard (Version 8.8) • Quartus II (version 10.1) • Matlab (Version 2009a) • Additional development was made on a GiDELPROCStar III (AlteraStratix III) board (4-FPGA)

  8. Simulink Design - NLD • NLD is a hardware implementation of Non Linear Diffusion algorithm for video images. • Enable local smoothing of the picture while preserving edges. • The Simulink design in this project is based on a previous project (Performed in the Technion HS-DSLab by TsionBublil & YonyDekell). • The original Project was implemented on a PROCStar II (AlteraStrartix II)board (4-FPGAs), using SynplifyDSPblockset library for Simulink.

  9. Simulink Design - NLD

  10. Design guide - lines • All I/Os Must be placed on the top level of the design. • Simulink sources must be configured to the same clock that toggles the input port they feed. • All signals from the workspace blocks feeding inputs blocks and all frame output blocks must use the same frame size (as seen in the previous slide). • The design must obey the following table rules: * PROC_HILs User Guide V2.1.2 p. 49

  11. Simulink Design - NLD

  12. Simulink Design - NLD

  13. Simulink Design - NLD

  14. Timing considerations • Determining clock rate • Video processing algorithm will have to process 15 Iterations of 256 by 256 pixels for a frame, achieving a reasonable rate of 15 frames per second. • Long logical path prevents meeting clock rate demands, and fails compilation. • AlteraDSPBuilder Advanced blockset supports automatic pipelining (was not implemented in this project). • AlteraDSPBuilderblockset supports user pipelining using internal pipeline definition of the block (determined by user), or inserting Delays throughout the logical path. This method requires careful attention of the designer, that must assure full synchronization of the logical paths, guarantied by design.

  15. DSPBuilder internal blocks pipelining

  16. Simulink Design - NLD

  17. Compilation and synthesis flow • Validating performance of the completed design, using Simulink environment. • A full automatic compilation and synthesis starts by activating the GiDEL HIL generation tool block. • Preliminary compability test starts by pressing the prompt GUI button. • Checks meeting design rules. • Does not check Hardware fitting and feasibility. GiDEL HIL Generation Tool

  18. Compilation and synthesis (cont.) • “GO” button issues a full compilation • and synthesis of the design. • The generation flow can be adjusted • by selecting the “Advanced Mode”. • Controls the enabling/disabling of • different flow stages.

  19. Compilation and synthesis (cont.) • Generation ends with a new Simulink design file. • PROC_HILs does not fully elaborate the feasibility and hardware consumption of the design. • Quartus file are generated only while the generation process is active and then automatically deleted. • Solution: During generation extract Quartus top design and independently compile it with Quartus. <your_design_name>_HIL

  20. Quartus compilation example • NLD Hardware consumption:

  21. NLD output example • Original image: • Smoothed image (3 Iterations):

  22. Performances • Calculated warm-up time: • Simulation overhead: 9.9712 [sec] • Hardware overhead: 9.60422 [sec]

  23. Performances • Reduced overhead, time ratio: 128.645454 Eli’s comment: All Simulations were made on:

  24. Applications • Implementing NLD as part of Video capture/view real-time streaming. • Web cam envelopment: • Resizing image (256x256) • Performing “log” on resized image • Spreading image to vector form • Reshaping to matrix form • Performing “power” on processed image

  25. Applications (cont.) • NLD algorithm Hardware block is inserted into the webcam envelop. • Hardware is dramatically decreasing frame rate though it is designed with the capabilities of the desired frame rate. • Operating frequency is 15MHz. • Conclusion: interface Simulink/Hardware overhead is to high to allow proper streaming in real-time applications.

  26. Hardware Loop • A possible way to gain advantage of PROC_HILs is using a hardware loop. Pipeline levels: 256 FIFO Size: 256X256-(256)

  27. Hardware Loop (cont.) • Multiple tries of the full HL designs showed problems of convergence to the hardware limits of the PROCe-III Board. • The same design was implemented on a PROCStar III board, with no problems reported in the generation flow. • Problem encountered: While Simulink simulation showed reasonable results, hardware simulation showed different results (efforts to find origin and fix were stopped due to the project’s time constraints).

  28. Problems encountered • Strict software compatibility demands • There is only one combination of involved software version that matches (matlab, PROC HIL, AlteraDSPBuilder, Quartus, PROC wizard) • Moderate algorithms do not fit the common boards using Proc HIL and AlteraDSPBuilderblockset. • Altera DSP blockset variety is poor, and does not contain common operations (log, exp, power, root, not, min/ max…) • For effective usage, one should use the Altera advanced DSP Blockset, but it requires the simulink fixed point license.

  29. Problems encountered (cont.) • Demands data flow as vectors and does not support matrices. • Inconsistency between simulation and Hardware Performances. • Inconvenient existing blocks • Square Root: accepts and returns only whole numbers. • Divider: returns only in the form of: whole number and res.

  30. Conclusions • Allows to easily design and implement algorithms in Simulink environment. • Direct Hardware Burn. • Direct generation HDL code that matches the target board. • Fast HW simulation using Simulink/Matlab interface. 2. Extremely efficient on resources consuming processing algorithms. 3. Not suited for applying on streaming data designs (Real-Time designs).

  31. Future project plan (PART II) Motivation: Learning and practice of effective debug methodology using PROC API. GIDEL PROC_API – enable real-time configuration and querying of the board. Main goals/phases: • Learning PROC API, PROC MegaFIFO • Define and build an integrated DSPbuilder design combining PROC API video streaming functions, data channels and PROC MegaFIFO memories.

  32. Video stream diagram (PART II) PROC MegaFIFO RX - FIFO TX - FIFO PROC API

  33. Time table to final presentation

More Related