1 / 16

Fast SoC Architecture Exploration Using Traffic Simulation Techniques

hova
Download Presentation

Fast SoC Architecture Exploration Using Traffic Simulation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Fast SoC Architecture Exploration Using Traffic Simulation Techniques Nadjib Mammeri, ARM

    2. Problems we are trying to solve What interconnect topology should I use? What arbitration and QoS schemes? How should I configure my memory controller? DMC queue length? Memory width? How to optimally size my interconnect/memory system and still meet my performance requirements?

    3. SoC Architecture Exploration Current Techniques Spreadsheet: Not accurate, Fast, Cheap RTL simulation: 100% Accurate, Slow, Expensive RTL emulation: Accurate, Fast, Expensive Behavioural SystemC models: Accurate, Fast, Expensive Traffic Profiling: ~Accurate, Fast, Cheap Abstracting away some components or parts of the system and replacing them with bus transactors that can: Generate realistic traffic which is statistically equivalent to SoC data flows Re-use existing data flows to explore new architectures Uses constrained random techniques

    4. Our proposed approach VPE provides the accuracy of RTL simulation but drastically reduces cycle time when compared to building a conventional system for analysis Faster than developing a cycle-accurate System-C “Generating a Mali200 traffic profile took us 3 days to create given the RTL testbench“ – Project Technical Lead PD Fabric Verification More accurate than Excel “VPE animates benchmarking data to bridge the gap between spreadsheet analysis and slow RTL simulation" – Senior Technical Marketing Manager PD Fabric Marketing The main advantage of AVIP is its ability to execute much more quickly than RTL, but at the same time, enable you to generate traffic that you can represent and control by emulating its traffic patterns instead of executing functions within the emulated master or slave device. For this reason, it is quicker to use traffic profiling than to develop a cycle accurate SystemC model. It is also more accurate to use traffic profiling than to perform spreadsheet analysis, because traffic profiling bridges the gap between spreadsheet analysis and slow RTL simulation.VPE provides the accuracy of RTL simulation but drastically reduces cycle time when compared to building a conventional system for analysis Faster than developing a cycle-accurate System-C “Generating a Mali200 traffic profile took us 3 days to create given the RTL testbench“ – Project Technical Lead PD Fabric Verification More accurate than Excel “VPE animates benchmarking data to bridge the gap between spreadsheet analysis and slow RTL simulation" – Senior Technical Marketing Manager PD Fabric Marketing The main advantage of AVIP is its ability to execute much more quickly than RTL, but at the same time, enable you to generate traffic that you can represent and control by emulating its traffic patterns instead of executing functions within the emulated master or slave device. For this reason, it is quicker to use traffic profiling than to develop a cycle accurate SystemC model. It is also more accurate to use traffic profiling than to perform spreadsheet analysis, because traffic profiling bridges the gap between spreadsheet analysis and slow RTL simulation.

    5. How is it done? When analysing performance, content or functional intent of the data is not important but the nature and flow of traffic is. Reduction in simulation time can be achieved by trading off functional accuracy of end points. Accuracy should be preserved in the DUT and in the interconnect because it is the performance bottleneck. How simulation speed-up is achieved By ‘giving-up’ execution of functions within the emulated device in favour of emulating its traffic No need to model their cycle-accurate behaviour By replacing real data with constrained random data -> So we need these bus transactors that can generate meaningful and controllable traffic. This is VPE.-> So we need these bus transactors that can generate meaningful and controllable traffic. This is VPE.

    6. Functional Verification Complete AXI functional Verification solution System Verilog Master, Slave, Monitor RTL Protocol assertions RTL Coverage Points Performance Exploration Profile editor toolkit GUI RTL Profile extraction RTL Profile generation AXI Traffic Characterization and Analysis AXI Traffic Replay and Adaptation What is VPE (formerly AVIP) ? VPE provides the benefit of 2 products in one: 1/ The unprecedented facility to capture, analyse and replay AXI pus performance statistics 2/ VPE also provides all of the hygiene factors that conventional EDA from VIP provides : Functional directed and constrained random testing Protocol checking Protocol coverage VPE is compatible with all main System Verilog simulators: Synopsys VCS, Cadence Incisive and Mentor Questa (ModelSim) VPE provides the benefit of 2 products in one: 1/ The unprecedented facility to capture, analyse and replay AXI pus performance statistics 2/ VPE also provides all of the hygiene factors that conventional EDA from VIP provides : Functional directed and constrained random testing Protocol checking Protocol coverage VPE is compatible with all main System Verilog simulators: Synopsys VCS, Cadence Incisive and Mentor Questa (ModelSim)

    7. Abstraction example1 If I would like to investigate my interconnect topology, I would keep the RTL for my interconnect and abstract away all end points (masters and slaves). Replace them with VPE masters and slaves

    8. Abstraction example2 If I would like to investigate my memory controller configurability, I would use the RTL for my interconnect and DMC and abstract away other end points. Replace them with VPE masters and slaves

    9. Traffic Profiling (1) Traffic profiles statistically characterise the traffic (transactions) on an AXI connection Traffic flow is an identifiable stream of traffic (AXI transactions) between two points in a system Examples: When profiling at slave 1, traffic coming from Master 2 can be identified using AxID If we know Master 1 always does 4-beat bursts we can identify its traffic flow based on AxLEN

    10. Traffic Profiling (2) A profile is associated with a connection and can have multiple flows Flows contain histograms that store statistical data of both payload and timings information. Payload histograms Histograms describing traffic payload information (control of a transaction, response of a transaction but no data content) ADDRESS, ID, BURST, SIZE, LEN, RESP etc… Timing histograms Histograms describing traffic timings information ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc…

    11. AXI Timing Histograms Inter transaction timings ITT: Histogram parameter defining the inter-transaction timings in a flow (time between successive transactions). Intra transaction timings Flow timings: timings that describe the flow of traffic. Connection timings: timings that are considered as properties of the connection -If I’m the master. Set AW payload and then set the Valid signal and then wait for a Ready signal. I do not control the time between my AWValid and Ready This time is a property of the connection. This is a connection timing - But I do control the time between my AW request and when to send Data on the W channel. This is a flow timing.-If I’m the master. Set AW payload and then set the Valid signal and then wait for a Ready signal. I do not control the time between my AWValid and Ready This time is a property of the connection. This is a connection timing - But I do control the time between my AW request and when to send Data on the W channel. This is a flow timing.

    12. AXI Intra-Transaction Timings RIL: Time between handshake on the AR channel and the first read transfer on the R channel RW: Time between RVALID and RREADY WIL: Time between handshake on the AW channel and the first write transfer on the W channel WW: Time between WVALID and WREADY

    13. How accurate is it? The waveform view in the background shows two simulation traces separated by a green clock line in the middle of the waveform display A/ The top trace is a capture from a Mali200 running in an RTL simulation which executed 2 million cycles in approx 4 hours B/ The bottom trace is a replay capture from and VPE master emulating the bus traffic of a Mali200. This simulation took 4 minutes to run 2M cycles! The build shows how: 1/ The traffic profiling toolkit was used to define an ‘empty’ Mali200 profile which was then populated using an VPE monitor. The current graphic is showing a populated profile with bandwidth analysis figures visible. 2/ How the populated traffic profile was then used to drive an VPE master in a simulation where the Mali200 was ‘swapped out’ by the VPE master 3/ The next build points out the waveform generated by the VPE master – note visually how the distribution and payloads statistically match the original 4/ The final build shows that the same monitor was used to re-capture the profile from the VPE master giving the same bandwidth/latency distribution The waveform view in the background shows two simulation traces separated by a green clock line in the middle of the waveform display A/ The top trace is a capture from a Mali200 running in an RTL simulation which executed 2 million cycles in approx 4 hours B/ The bottom trace is a replay capture from and VPE master emulating the bus traffic of a Mali200. This simulation took 4 minutes to run 2M cycles! The build shows how: 1/ The traffic profiling toolkit was used to define an ‘empty’ Mali200 profile which was then populated using an VPE monitor. The current graphic is showing a populated profile with bandwidth analysis figures visible. 2/ How the populated traffic profile was then used to drive an VPE master in a simulation where the Mali200 was ‘swapped out’ by the VPE master 3/ The next build points out the waveform generated by the VPE master – note visually how the distribution and payloads statistically match the original 4/ The final build shows that the same monitor was used to re-capture the profile from the VPE master giving the same bandwidth/latency distribution

    14. 14 More VPE Features

    15. Conclusion System architects requires novel techniques with short iteration times to analyze performance and fine tune their SoCs. VPE introduces a new approach that combines high level modeling and statistical low level random generation techniques to explore and verify IP performance. Traffic profiling can be used by VPE masters and slaves to generate statistically equivalent traffic and by VPE monitors when monitoring performance.

    16. Questions

More Related