660 likes | 680 Views
Architectural Modeling in VCC EE 249. Agenda. System-level SoC Design – Message and Use Models A commercial solution - The VCC Design Flow Abstraction – A Brief History Performance Modeling System-level Design Exploration How to Get The Performance Numbers Architectural Services Example
E N D
Agenda • System-level SoC Design – Message and Use Models • A commercial solution - The VCC Design Flow • Abstraction – A Brief History • Performance Modeling • System-level Design Exploration • How to Get The Performance Numbers • Architectural Services Example • Summary
SystemEnvironment Zone 4: Global Requirements Specification Satellite Specification Untimed, Unclocked, C/C++ Level Zone 3: Suburban Embedded Systems Design Zone 2: Urban Zone 1: In-Building Pico-Cell Micro-Cell Macro-Cell EmbeddedSoftware RefinementDesign Export P/C µ Testbench Analog Memory SOC Implementation Timed, Clocked, RTL Level Firmware Software CORE Embedded System on Chip (SoC) Design Characterization Implementation
Felix Partnership Members Hitachi Micro Systems Infineon Technologies AG Motorola SPS National Semiconductor NEC Electronics Philips Semiconductors ST Microelectronics Texas Instruments BMW Infineon Technologies AG Magneti Marelli S.p.A. Motorola National Semiconductor Nokia Telefonaktiebolaget LM Ericsson Thomson CSF System Houses Virtual Component (IP) Providers Semiconductor Houses Manufacturing Virtual Component (IP) Providers ARM debis Systemhaus (now Infineon) Symbionics Ltd (now Cadence) ARM Symbionics Ltd (now Cadence) SOC Creator and System Integrator Enabling the Electronic Design Chain
Foundation Block + Reference Design Pre-Qualified/Verified Foundation-IP* Scaleable bus, test, power, IO, clock, timing architectures MEM Hardware IP Processor(s), RTOS(es) and SW architecture CPU FPGA SW IP Programmable Foundry-Specific Pre-Qualification Foundry Targetting Flow The Platform-Based Design ConceptTaking Design Block Reuse to the Next Level Application Space Methodology / Flows: System-level performance evaluation environment Rapid Prototype for End-Customer Evaluation *IP can be hardware (digital or analogue) or software. IP can be hard, soft or ‘firm’ (HW), source or object (SW) SoC Derivative Design Methodologies
DMA DSP CPU MPEG C MEM I O The Platform-Based Design ConceptPlatform Type Examples SONICs Architecture Improv JAZZ Platform { SiliconBackplane™ (patented)
Application Space Platform Specification System Platform Platform Design Space Exploration Architectural Space System House Requirements… exploring and developing on top of SoC Platforms Platform Based Design Objectives • Define the application instance to be implemented to satisfy product requirements defined by consumer • Specify the system platform together with suppliers accordingly • Evaluate top down different instances of SOC platforms
System Houses and SOC Providers ……enabling a close communication! "The increasing complexity of telecom applications requires that we spend more time upfront exploring system architectures and IP alternatives. The Cierto VCC environment assisted us in providing a platform to clearly articulate these needs to our IP providers and we believe it will help architect the next-generation system design solutions.” Jan-Olof Kismalm, Director, Microelectronics,Corporate Function Technology,Telefonaktiebolaget LM Ericsson,January 10th 2000
Platform Based Design Objectives Define the SOC platform instance so that multiple instances of applications can be mapped to the same system platform Present this to system customers as SOC Design-Kit and optimally leverage economy of scale for SOC platform instance Provide bottom up instances of SOC platform for evaluation without disclosing the details of the IP Application Space Platform Design Space Exploration System Platform Platform Specification Architectural Space SOC Provider Requirements… designing SoC Platforms and Sub-systems
Customer Testimonials! "As an original development partner during the development of VCC our focus has been the modeling of the IP in our SOCplatforms. The memory and cache modeling features in VCC 2.0 will allow us and our customers to explore the impact of different memory hierarchies on overall system performance before we commit to implementation of our SOC platforms. VCC 2.0 will significantly optimize the interactionwith our SOC customers to negotiate the system specification.“ Jean-Marc ChateauDirector of Design, Consumer and Micro GroupsST Microelectronics September 25th, 2000
Embedded System Requirements Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration VCC Front End • Enabling communication within the SOC Design Chain • Design Space Exploration with abstracted Performance Models • Untimed Functional and Performance Verification • Integration Platform Design, Optimization and Configuration Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level
Embedded System Requirements Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration VCC Front EndFunctional Integration and Analysis Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level
Embedded System Requirements Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration VCC Front EndDefine Architectural Options and Configuration Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level
Embedded System Requirements Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration VCC Front EndDefine Function Architecture Mapping Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level
Embedded System Requirements Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration VCC Front EndRun Performance Analysis for Platform Configuration Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level Processor Load Process Gant Chart Analysis Cache Results
CommunicationRefinement, Integration & Synthesis Software Assembly Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. VCC Backend • Linking System Level Design to Implementation • Fast track to prototyping • Fast track to software development • Design consistency through the design flow Design Export… after initial platform configuration through design refinement and communication synthesis!
VCC Model VCC Model to RTOS Protocol Component RTOS VCC Model RTOS to CPU Protocol Component Bus Slave to VCC Model Component CPU Bus Slave CPU to Bus Protocol Component Bus to Bus Slave Component Bus Bus Bus Model CommunicationRefinement, Integration & Synthesis Software Assembly Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. VCC BackendCommunication Refinement and Synthesis Communication Refinement Communication Synthesis Abstract Token Abstract Token Design Export… after initial platform configuration through design refinement and communication synthesis!
VCCSystem ExplorationCommunication Refinement CommunicationRefinement, Integration & Synthesis Flow To Implementation Hardware Top-level System Test Bench Software on RTOS Software Assembly Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. VCC BackendExport to Implementation (Design and Test Bench) Design Export… after initial platform configuration through design refinement and communication synthesis!
Architecture IP CPU/DSPRTOS Bus, Memory HWSW Functional IP C/C++ SDL SPW Simulink Platform Configuration… at theun-clocked, timing-awaresystem level Embedded System Requirements Platform Function Platform Architecture System Integration Design Export… after initial platform configuration through design refinement and communication synthesis! Performance Analysis and Platform Configuration CommunicationRefinement, Integration & Synthesis Software Assembly Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. VCC Flow Summary
Digital Abstraction Switching delay of the transistor Interconnect delay between transistors 1970’s The design complexity exceeds what designers can comprehend and think through at the layout level Transistor level simulation allows to verify the logic of digital and analog designs based on transistor switching characteristics Transistor Model Capacity Load abstract 1970’s How did we use abstraction in the past?Step 1 – Layout to Transistor cluster
Gate Level Model Capacity Load Transistor Model Capacity Load cluster cluster abstract abstract 1970’s 1980’s How did we use abstraction in the past?Step 2 – Transistors to Gates • Digital Abstraction • Gate delay • Interconnect delay between gates • 1980’s • The design complexity exceeds what designers can comprehend and simulate at the transistor level • Gate level simulation allows to verify the logic of digital designs based on gate switching characteristics.
Gate Level Model Capacity Load RTL cluster abstract abstract 1990’s How did we use abstraction in the past?Step 3 – Gates to RTL-HDL • Digital Abstraction • Not really a abstraction of performance (e.g. SDF only used for gate to layout to gate) • Textual statements result in “many gates” after synthesis • 1990’s • The design complexity exceeds what designers can comprehend and simulate at the gate level alone • HDL is first used for fast verification, synthesis allows translation of text into gates • Synthesis algorithms map text to actual registers and logic in between based on characterized gate and wire-load libraries • Gate and wire-load delays are refined after layout. SDF emerges as format 1980’s
So what did we do all the time? • The industry abstracted the system function • Layout to transistor switching • Transistor to gate schematics • Gate schematics to RTL • From level to level the industry abstracted performance data • Spice models to transistor models (switch+interconnect) • Transistor models to gate level models (gate switch+interconnect) • No real “new” performance models when going to RTL • Resulting standard formats • SDF for delay characterization • Gate delays and wire-load (.db) enabling synthesis
IP Block Performance DMAC uC Register File Ports Timers • Modeling of Performance for IP Blocks • … by attaching performance data to timing free functional models MPEG Audio Decoder MPEGVideo Decoder Graphics Engine I/F Bus/Cache Control On-Chip Ram abstract RTL RTL Clusters I-Cache D-Cache DRAM Ctrl SDFGate Level Model Capacity Load cluster cluster Transistor Model Capacity Load cluster abstract abstract abstract 1990’s 1970’s 1980’s And what is the next step? Year 2000 +
Inter IP Communication Performance Modeling of Performance for Communication between IP Blocks abstract RTL RTL Clusters SDFGate Level Model Capacity Load cluster cluster Transistor Model Capacity Load cluster abstract abstract abstract 1990’s 1970’s 1980’s And what is the next step? Year 2000 +
IP Block Performance Inter IP Communication Performance Tasks DMAC uC Register File RTOS Ports Timers Apply this to Hardware and Software MPEG Audio Decoder Driver MPEGVideo Decoder Graphics Engine I/F On-Chip Ram abstract RTL RTL Clusters SW Models I-Cache D-Cache Bus/Cache Control DRAM Ctrl SDFGate Level Model Capacity Load cluster Discontinuity: Embedded Software cluster Transistor Model Capacity Load cluster abstract abstract abstract 1990’s 1970’s 1980’s And what is the next step? Year 2000 +
Functional Simulation Gate switching defines functionality Combination of gate functionality defines “functionality” of the design Simulation slow in complex systems as huge amounts of events are to be processed Function Functional SimulationGate Level
Functional Simulation Function of system blocks executed General Descriptions C, C++, State Charts, OMI Application specific SPW, Telelogic SDL, Matlab Simulink, ETAS Ascet Functional execution defined as “fire and return” with a OMI 4.0 compliant discrete event simulation infrastructure Simulation is as fast as the abstract, un-timed models simulate SPW StateCharts Function SDL Simulink C++ C Functional SimulationUsing VCC at the System-Level Abstraction
Functional Simulation Gate switching functionality Performance Simulation functionality annotated with intrinsic gate delay interconnect delay modeled from capacity Refinement SDF data is refined after layout is carried out Function Performance SDF andGate Level Library Dt Performance Inter- Connect Capacity Performance SimulationGate Level
Performance Simulation functionality annotated with intrinsic delay models Delay Script and Inline Models, refined after implementation Function Performance Dt VCC Performance SimulationSystem-Level Block Performance Modeling Performance Abstraction Interleaver Dt IP Functional Model Forward Error Correction FEC() { f = x.read(); // FEC function here y.write(r); } Inline Delay Model Scripted Delay Model Annotated IP Functional Model FEC() { f = x.read(); // FEC function part A here __DelayCycles(60*cps); // FEC function part B here __DelayCycles(78*cps); // FEC function part C here __DelayCycles(23*cps); y.write(r); } IP Functional Model Forward Error Correction FEC() { f = x.read(); // FEC function here y.write(r); } FEC on CPU // FEC_ip_implem delay() { input(x); run(); delay(200*cps); output(y); } FEC in slow HW // FEC_ip_implem delay() { input(x); run(); delay(128*cps); output(y); } FEC in fast HW Delay Script // FEC_ip_implem delay() { input(x); run(); delay(64*cps); output(y); }
Value()/Enable() from Behavior 2 Post() from Behavior 1 Shared Memory Communication Pattern Function Sender Receiver RTOS Standard C Library CPU RAM Memory Access Memory Inter- Connect Capacity Performance Pattern Services CPU Port RAM Port ASIC Port Architecture Services Bus Adapter Slave Adapter Bus Adapter Bus Bus Arbiter VCC Performance SimulationSystem Level Block Interconnect Performance Modeling Abstraction
A B Post(5) Value() SemProt_Recv SwMutexes SemProt_Send SemProt_Send mutex_lock;memcpy; signal setEnabled wait;memcpy; signal RTOS Pattern Services Architecture Services MemoryAccess Mem CPU SlaveAdapter BusMaster BusArbiter VCC Performance SimulationEnabled through Architecture Services in VCC Semaphore Protected User Visible write read busIndication busIndication busRequest busRequest arbiterRequest/Release arbiterRequest/Release
Classical Gate Level Technology VCC System Level Technology SDF andGate Level Library IP BlockPerformance Function Performance Performance System Level Library FunctionC, C++,SPW, SDL,Simulink, Statecharts Interleaver Dt D t SPW StateCharts Inter- Connect Capacity InterconnectPerformance IP BlockInterconnectPerformance SDL Simulink C++ C VCC Performance Modeling …… the System Level extension of SDF !
Technology provider characterizes silicon technology for gates and interconnects Synthesis Tools map constructs from RTL into registers and logic in between registers, does logic optimization explore the design space (“performance” – “area”) using gradient methods in a optimization process Synthesis Library SDF Wire Load RTL Models Abstracted from Layout synthesize Performance Area Design Space Exploration From RTL through Gate Level options
Tasks DMAC uC Register File Ports RTOS Perf. Model Library Timers Synthesis Library MPEG Audio Decoder IP Block Performance Interconnect Performance MPEGVideo Decoder Driver Graphics Engine I/F On-Chip Ram I-Cache abstract RTL Clusters SW Models D-Cache Bus/Cache Control Optimal Mapping DRAM Ctrl Function SDF Wire Load Architecture RTL Models Abstracted from Layout synthesize integrate Performance Area Design Space Exploration… through Function and Architecture Abstraction
SOC Silicon provider characterizes IP portfolio (typically in Integration Platforms) for intrinsic IP Block Performance and Inter IP Block Interconnect Performance System Integrator and SOC Provider map function to architecture setting up design experiments determine using performance simulation feedback suitability of function-architecture combination explore design space through “function” and ”architecture” Tasks DMAC uC Register File Ports RTOS Perf. Model Library Timers Synthesis Library MPEG Audio Decoder IP Block Performance Interconnect Performance MPEGVideo Decoder Driver Graphics Engine I/F On-Chip Ram I-Cache abstract RTL Clusters SW Models D-Cache Bus/Cache Control DRAM Ctrl SDF Wire Load RTL Models Abstracted from Layout synthesize integrate Performance Area Design Space Exploration using VCC… through Function and Architecture Abstraction
Top Down Flow In a pure top down design flow the performance models are “Design Requirements” for functional models They are refined using bottom up techniques in due course throughout the project Bottom Up Flow SOC Provider characterizes IP portfolio, e.g. of a Integration platform using HDL model simulation using software simulation on ISS using benchmarking on SOC IP Functional Model Forward Error Correction FEC() { f = x.read(); // FEC function here y.write(r); } Inline Delay Model Scripted Delay Model Annotated IP Functional Model FEC() { f = x.read(); // FEC function part A here __DelayCycles(60*cps); // FEC function part B here __DelayCycles(78*cps); // FEC function part C here __DelayCycles(23*cps); y.write(r); } IP Functional Model Forward Error Correction FEC() { f = x.read(); // FEC function here y.write(r); } FEC on CPU // FEC_ip_implem delay() { input(x); run(); delay(200*cps); output(y); } FEC in slow HW // FEC_ip_implem delay() { input(x); run(); delay(128*cps); output(y); } FEC in fast HW Delay Script // FEC_ip_implem delay() { input(x); run(); delay(64*cps); output(y); } How to get the performance numbers…IP Block Performance Modeling
Top Down Flow Datasheets for architectural IP information are entered in parameters for architectural services Can be done fast by System Integrator without SOC Provider! Refinement with SOC Provider models Bottom Up Flows Architectural IP is profiled using HDL simulation, ISS or silicon and data is entered in VCC architectural services Value()/Enable() from Behavior 2 Post() from Behavior 1 Shared Memory Communication Pattern Sender Receiver RTOS Standard C Library CPU RAM Memory Access Memory Pattern Services CPU Port RAM Port ASIC Port Architecture Services Bus Adapter Slave Adapter Bus Adapter Bus Bus Arbiter How to get the performance numbers… IP Block Interconnect Performance Modeling
Estimation of software performance prior to implementation CPU characterized as Virtual Processor Model Using a Virtual Machine Instruction Set Used for dynamic control SW estimation during performance simulation taking into account bus loading, memory fetching, and register allocation Value True co-design: SW estimation using annotation into C Code (as opposed to to simulation in instruction simulators used in co-verification) Good for early system scheduling, processor load estimation Two orders of magnitude faster than ISS Greater than 80 percent accuracy Enables pre-implementation decision but is not a verification model How to get the performance numbers…Software Estimation for ANSI C code (“Whitebox C”)
Data Book Approach CPU data book information to count cycles and estimate VIM Calibration Suite using “Best Fit” Run Calibration Suite on VIM and ISS Solve a set of linear equations to minimize difference Application Specific Calibration Suite using the “Best Fit” method but use application specific routines for automotive, wireless telecom, multimedia etc. Exact Count on ISS cycle counts exactly derived from ISS run Filter specific commands out (e.g. OPi etc.) How to get the performance numbers…Virtual Processor Model Characterization Methods
Virtual MachineInstruction Set Model LD,3.0 Load from Data Memory LI,1.0 Load from Instr. Mem. ST,3.0 Store to Data Memory OP.c,3.0 Simple ALU Operation OP.s,3.0 OP.i,4.0 OP.l,4.0 OP.f,4.0 OP.d,6.0 MUL.c,9.0 Complex ALU Operation MUL.s,10.0 MUL.i,18.0 MUL.l,22.0 MUL.f,45.0 MUL.d,55.0 DIV.c,19.0 DIV.s,110.0 DIV.i,118.0 DIV.l,122.0 DIV.f,145.0 DIV.d,155.0 IF,5.0 Test and Branch GOTO,2.0 Unconditional Branch SUB,19.0 Branch to Subroutine RET,21.0 Return from Subroutine How to get the performance numbers…Software Estimation for ANSI C code (“Whitebox C”)
Assembler VirtualProcessorModel ld #event,R1 ld #proc,R2 add R1,R2,R3 ld (R3),R4 ldi #0x1, R5 and R4, R5, R6 cmp R0, R6, R7 br R7, LTRUE ba LFALSE Analyse basic blocks compute delays char *event; int proc; if (*(event+proc) & 0x1: 0x0) ... ANSI C Input ld ld op ld li op ts -- br Œ Whitebox C declare ports Generate new C with delay counts Compile generated C and run natively Architecture Characterization Performance Estimation How to get the performance numbers…Software Estimation for ANSI C code (“Whitebox C”)
Architecture Service • The service is the element that defines the functionality of architecture • A service is coded in C++ and performs a specific role to model architecture, for example: • bus arbitration • memory access • interrupt propagation • etc.
Example of Services ASIC Bus Behavior Post Pattern Sender BusMaster BusArbiter Mem BusSlave Memory