270 likes | 360 Views
HAsim Status Update. VSSAD, Intel CSG Group, CSAIL MIT UT Austin Princeton University. Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan Nikhil Patil Abhishek Bhattacharjee. Recap: Virtual Platform. Set of Abstractions
E N D
HAsim Status Update VSSAD, Intel CSG Group, CSAIL MIT UT Austin Princeton University Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan Nikhil Patil Abhishek Bhattacharjee
Recap: Virtual Platform • Set of Abstractions • Provide common set of functionalities across multiple physical platforms • XUP Board • PCI-express Board • Intel FSB Socket • Bluesim/Vsim • BEE3 • Leverage Asim Plug N Play • Minimize module replacements/recoding while moving across platforms
Virtual Platform Infrastructure FPGA Modules Software Modules Fetch Decode Exe Memory Front Panel Control FuncModel Decode Virtual Platform Platform Interface Front Panel Memory RRR Layers RRR Layers Communication Layers Communication Layers Hardware Software
RRR Specification Language // ---------------------------------------- // create a new service called ISA_EMULATOR // ---------------------------------------- service ISA_EMULATOR { // -------------------------------- // declare services provided by CPU // -------------------------------- server CPU <- FPGA; { method UpdateRegister(in REG_INDEX i, in REG_VALUE v); method Emulate(in INST_INFO i, out INST_ADDR a); }; // --------------------------------- // declare services provided by FPGA // --------------------------------- server FPGA <- CPU; { method SyncRegister(in REG_INDEX i, in REG_VALUE v); }; };
Remote Request/Response RRR specification files ClientStub_ISA_EMULATOR cpu; ... ... cpu.UpdateRegister_MakeRequest( REG_R27, regFile[REG_R27]); ... ... cpu.Emulate_MakeRequest(inst); ... ... targetPC <- cpu.Emulate_GetResponse(); ISA_EMULATOR::UpdateRegister( REG_INDEX i, REG_VALUE v) { regFile[i] = v; } ISA_EMULATOR::Emulate( INST_INFO inst) { // emulate the instruction return target_PC; } User Code User Code Client Stub Server Stub Communication Layers (Runtime System) FPGA CPU
Virtual Platform/RRR Status Update • Software + Hardware, Client + Server Stubs • Multiple Arguments for method calls • Auto-generation of Soft Connections through Platform Interface, and Remote Stubs • PCI-Express Physical Platform • Physical Channel implementation using CSRs • Soft Reset • Several services in HAsim • Very positive feedback from developers
HAsim: MIPS Alpha • Motivation • Couldn’t find any Full System MIPS simulator with multi-processor + large memory support • HAsim-Alpha • M5 “running” in software • Target Memory Image • Syscall Emulation • Other instructions not implemented on FPGA (e.g. FP currently) • Functional + Timing model on FPGA
HAsim-Alpha Highlights • Implemented Alpha Functional Model • Primary changes • ISA spec • Instruction Format + Queries • Datapath • Execution Semantics • Unchanged • Dependency logic • Register File • Memory Subsystem (incl. Store Buffer) • Multiple timing models • Unpipelined • 5 Stage • In order with caches • OoO • Running long Alpha programs (e.g. SPEC2k)
Time Old Instruction Emulation with Cache Flush FPGA Execute Execute Execute FunctionalCache Done Flush … … Emulation Done Sync Registers Sync Registers Emulate Instruction RRRLayer … Write Line MemoryServer EmulationServer Software Instruction Simulator
Time Hybrid Instruction Emulation FPGA Execute Execute FunctionalCache … … Sync Registers Sync Registers Emulate Instruction Emulation Done RRRLayer Write Back orInvalidate Done Write Line Ack MemoryServer EmulationServer EmulationServer Software Instruction Simulator Instruction Simulator
RRR ISA Emulation Specification service ISA_EMULATOR { server sw (cpp, method) <- hw (bsv, connection) { method sync(in RNAME[RNAME_BITS] rname,in RVAL[RVAL_BITS] rval); method emulate(in INST[INST_BITS] inst,in ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] pc,out ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] newPc); }; server hw (bsv, connection) <- sw (cpp, method) { method sync(in RNAME[RNAME_BITS] rname,in RVAL[RVAL_BITS] rval); }; };
Time Dynamic Simulator Configuration Param Node Param Node Param Node Param Node FPGA DynamicParam Controller DynamicParam Controller EnableFunctional Cache? RRRLayer Set Value Done Set Parameters Done? Software
RRR Dynamic Parameter Specification service PARAMS { // // Send one dynamic parameter ID and value to the hardware. // An ACK is returned to guarantee that the parameter has // been received. // server hw (bsv, connection) <- sw (cpp, method) { method sendParam(in UINT32[32] pname, in UINT64[64] pval, out UINT8[8] ack); }; };
Other Uses of RRR • Stats • Events • Assertions • Control Messages • Streams
Modeling Back-Pressure using A-Ports Producer Interface: Bool canSend() Do we have enough credits? Action enq(Maybe#(t) x) Send data or invalid. Action pass() Indicate end of cycle A-Port Producer Consumer if (canSend) enq(x) else pass() Consumer Interface: Bool canReceive() Is data available? AV#(Data) pop() Receive data Action done (cred) Indicate end of cycle, and send back credits Data A-Port Producer Consumer Credits A-Port Credit Port if (canReceive) x <- pop() done(x) No buffering present within the Ports
Structures using Credit Ports Model FIFOs using Credit Ports Data (A1) Producer Consumer Credits (A1) “Stall ports”: A stall down the pipeline doesn’t get combinationally propagated Data (A1) Producer Consumer Credits (A0) “Pipeline ports”: The pipeline registers in traditional pipelines
Caches • Functional Partition • Functional Cache • Target memory image data from M5 • Functional TLB • Target V P translations • Timing Partition • I and D Cache models • Attempting to unify interface for all caches
Timing Partition Cache Interface MEMORY stage • Cache Req Interface: • LOAD • STORE • PREFETCH • INVALIDATE LINE • INVALIDATE ALL • KILL ALL • FLUSH LINE • FLUSH ALL Immediate Response Delayed Response Request L1 Cache • Cache Response: • Immediate Response: • HITMISS SERVICING • MISS RETRY • Delayed Response: • MISS RESPONSE MAIN MEMORY
Ongoing/Future Work • Virtual Platform Infrastructure • More Sophisticated Type System • Virtual Memory for FPGA • Share page tables with software application • Cache V P translations in a TLB • FPGA requests user software for translations • Software kernel must shootdown FPGA TLB when mapping changes • Note: distinct from HAsim Functional TLB • Functional Model • Multiple Contexts • Ultimate goal: Run a full system • Timing Model • Multiple Contexts • Realistic Microarchitecture
“Connection”-style Stubs typedef struct {...} REG_INFO deriving (Bits, Eq); Connection_Send#(REG_INFO) link <- mkConnection_Send( “ISA_EMULATOR_UpdateRegister”); link.send(reg_info); User Code Connections: Per-method or Per-service? hand-written Soft connections How does Platform Interface get the RRR types? Connection_Receive#(REG_INFO) link <- mkConnection_Receive( “ISA_EMULATOR_UpdateRegister”); ClientStub_ISA_EMULATOR <- mkClient... let a = link.receive(); stub.makeRequest_UpdateRegister(a); Platform Interface auto-generated interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( REG_INFO reg_info); endinterface Stub auto-generated RRR Stack
typedef struct {...} REG_INFO deriving (Bits, Eq); `include “remote_client_stub_ISA_EMULATOR.bsh” ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM... stub.makeRequest_UpdateRegister(reg_info); User Code hand-written Connection_Receive#(Bit#(70)) link <- mkConnection_Send(“ISA_EMULATOR_UpdateRegister”); method Action makeRequest_UpdateRegister( REG_INFO reg_info); link.send(pack(reg_info)); endmethod Remote Stub auto-generated Soft connections Connection_Receive#(Bit#(70)) link <- mkConnection_Receive(“ISA_EMULATOR_UpdateRegister”); ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM... let a = link.receive(); stub.makeRequest_UpdateRegister(a); Platform Interface auto-generated interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( Bit#(70) reg_info); endinterface Stub auto-generated RRR Stack
Hello, World! hello.bsv module mkSystem#(LowLevelPlatformInterface llpi)(); Streams streams <- mkStreams(llpi); Reg#(Bool) done <- mkReg(False); rule hello (!done); streams.makeRequest(`STREAMS_MESSAGE_HELLO); done <= True; endrule endmodule hello.dict def STREAMS.MESSAGE.HELLO "Hello, World!\n";
RRR Memory Interface Specification service FUNCP_MEMORY { server sw (cpp, method) <- hw (bsv, connection) { method Load (in MEM_ADDRESS_RRR[64] addr, out MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] data); method LoadCacheLine (in MEM_ADDRESS_RRR[64] addr, out MEM_CACHELINE[FUNCP_CACHELINE_BITS] data); method Store(in MEM_STORE_INFO_RRR[MEMORY_STORE_INFO_SIZE] info); method StoreCacheLine(in MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE] info); // Store cache line with ACK method StoreCacheLine_Sync(in MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE] info, out UINT32[32] ack); method VtoP(in MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] va, out MEM_ADDRESS_RRR[64] pa); }; server hw (bsv, connection) <- sw (cpp, method) { method Invalidate(in MEM_INVAL_CACHELINE_INFO_RRR[96] info, out UINT32[32] ack); method InvalidateAll(in UINT32[32] req, out UINT32[32] ack); }; };
Timing Partition Cache Interface MEMORY stage • Cache Req Interface: • LOAD • STORE • PREFETCH • INVALIDATE LINE • INVALIDATE ALL • KILL ALL • FLUSH LINE • FLUSH ALL Immediate Response Delayed Response Request L1 Cache • Cache Response: • Immediate Response: • HIT • HIT SERVICINGMISS SERVICING • MISS RETRY • Delayed Response: • MISS RESPONSEHIT RESPONSE MAIN MEMORY
Credit Ports Producer Interface: Bool canSend() Do we have enough credits? Action enq(Maybe#(t) x) Send data or invalid. Action pass() Indicate end of cycle Data Producer Consumer Credits if (canSend) enq(x) else pass() Consumer Interface: Bool canReceive() Is data available? AV#(Data) pop() Receive data Action done (cred) Indicate end of cycle, and send back credits Data A-Port Producer Consumer Credits A-Port if (canReceive) x <- pop() done(x) No buffering present in the Ports
Structures using Credit Ports • Since buffering is not modeled in credit ports using FIFOs, any sort of buffer can sit on the consumer side • Reduced the code size of timing models drastically Data Consumer Completion Buffer Producer Credits