1 / 35

LUM final presentation

LUM final presentation. Chanit Giat Rachel Stahl Instructor: Artyom Borzin. PROXY CACHE ENGINE. The proxy cache engine gives hardware support to a server ’ s OS in order to improve its service rate, and adds security features.

nellie
Download Presentation

LUM final presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

  2. PROXY CACHE ENGINE • The proxy cache engine gives hardware support to a server’s OS in order to improve its service rate, and adds security features. • The main memory of a network server is the quick storage device, where the recently accessed data is saved. When a new request for data is received, the application must search the memory. If the data are found - send the response; otherwise the data must be read from a slower storage device (disk, tape) and then sent to the user.

  3. PROXY CACHE ENGINE • The system stores the information about all the files’ mapping in main memory and calculates the exact path to the required file if present in main memory. If not present, orders the operating system to bring it from the storage device, and supplies the path to the free memory space is supplied. • The system holds 2 main data bases: • A main memory, which holds up to 2Meg paths to the server’s memory, and their aging parameters. • A bit map table, which allows faster memory management by holding the free space image of the main memory.

  4. SEARCH: CID=1 ASIS Site# Length Data Main functions: • Search– returns the path to the main memory, or a path to a free space in the memory. • Set attributes– sets the file’s aging attributes, as supplied by the OS. • Delete– deletes a certain path from the memory. • Count free– returns number of free path slots in the memory. • Init– initialize the machine. • (age – when number of records exceeds a specified number, the system cleans up some of them.)

  5. Previous uArchitecture Local Bus Interface Reg. file Output FIFO Data Stream controller Input FIFO Database Manager (DBM) SRAM (Bit Map) CRC unit Decoder UTCAM

  6. uArchitecture changes: • Doubling the front-end of the machine, including: • Input FIFO • Decoder • CRC unit • Buffering between the decoders and the DBM with a FIFO. • The search for a free index in the Bit Map is now done in parallel to the rest of the command execution.

  7. Previous uArchitecture Local Bus Interface Reg. file Output FIFO Data Stream controller Input FIFO Database Manager (DBM) SRAM (Bit Map) CRC unit Decoder UTCAM

  8. New uarchitecture FrontEnd Decoder CRC Input FIFO

  9. New uarchitecture FrontEnd0 Decoder CRC Input FIFO FrontEnd1 Decoder CRC Input FIFO

  10. New uarchitecture Double FrontEnd0 DBM Fifo Decoder CRC Input FIFO F I F O FrontEnd1 Decoder CRC Input FIFO

  11. New uarchitecture LOCAL BUS INTERFACE Double FrontEnd0 DBM Fifo Decoder CRC Input FIFO Data Stream Controller F I F O FrontEnd1 Decoder CRC Input FIFO Output FIFO Reg. file

  12. LOCAL BUS INTERFACE Double D B M FrontEnd0 DBM Fifo Decoder CRC Input FIFO Data Stream Controller F I F O FrontEnd1 Decoder CRC Input FIFO Output FIFO Reg. file New uarchitecture

  13. Data Flow LOCAL BUS INTERFACE Double D B M FrontEnd0 DBM Fifo Decoder CRC Input FIFO Data Stream Controller F I F O FrontEnd1 Decoder CRC Input FIFO Output FIFO Reg. file

  14. Data Flow LOCAL BUS INTERFACE Double D B M FrontEnd0 DBM Fifo Decoder CRC Input FIFO Data Stream Controller F I F O FrontEnd1 Decoder CRC Input FIFO Output FIFO Reg. file

  15. Data Flow LOCAL BUS INTERFACE Double D B M FrontEnd0 DBM Fifo Decoder CRC Input FIFO Data Stream Controller F I F O FrontEnd1 Decoder CRC Input FIFO Output FIFO Reg. file

  16. !sot & lwr Sys_clr FIFO 0 FIFO 1 !sot & lwr Data stream ctrl LOCAL BUS INTERFACE Input FIFO 0 Data Stream Controller Input FIFO 1 Output FIFO Reg. file SOT – start of transaction. lwr – specifies write/read from the system.

  17. Sim: Data Stream ctrl Data enters FIFO 0 Data enters FIFO 1 Reading from register file (crc)

  18. DBM FIFO Sys_clr go0/1: FrontEnd0/1 (decoder0/1) are ready dbm_full: dbm FIFO is full. fifo_wrdone: Write to FIFO is done. D B M DBM Fifo WAIT ON GO0 go0 & !dbm_full fifo_wrdone F I F O DEC1 DEC0 WAIT ON GO1 go1 & !dbm_full fifo_wrdone

  19. Sys_clr WAIT ON GO0 go0 & !dbm_full fifo_wrdone DEC1 DEC0 WAIT ON GO1 go1 & !dbm_full fifo_wrdone Sim: DBM FIFO State encoding: 1 – wait on go0 2 – DEC0 4 – wait on go1 8 – DEC1 DBM FIFO samples data from decoder 0 DBM FIFO samples data from decoder 1

  20. DBM interface Saves the last bad Decoder status, Which goes to the Output FIFO with the Next successful Command DOUBLE DBM DBM fifo ISSUE LOGIC PACKER REQ packet BIT MAP UNIT EXECUTION UNIT

  21. Sim: bad decoder status

  22. Register file • Previously, the user could read the system’s current parameters from the register file: command id, CRC value, file’s site etc. • Since we have 2 pipes, the register file had to be changed: • Some registers contain data from both pipes. • For others, there is a need to specify the pipe of which to read the parameters.

  23. IDLE (ad_en)&& (!ad_done) (bm_s4f_done) ACK_NINDX FND_NINDX Updating Bit map ~10 clk cycles Finding a new Free index ~40 clk cycles (ad_error) !Sys_clr IDLE ADD_NINDX (ad_new_done) (bm_s4f_done) !Bm_s4f_new_ack Fnew_done Bm_s4f_new_ack FNEW ACKN Ackn_done ADD - old s4f - old

  24. New index is found while the ‘ADD’ module is idle ! (which is for more than 50 cycles…) !Sys_clr IDLE (ad_en)&&(!ad_done)&& (bm_index_valid) (bm_ack_rcvd) (ad_err) ADD_NINDX ACK_NINDX !Sys_clr WT_FOR_ACK (ad_new_done) Add_ack Bm_index_valid FNEW ACKN Ackn_done ADD - new s4f - new

  25. Sim: add, s4f add state encoding: 1 – idle 2 – add index 4 – ack index s4f state encoding: 0 – wait for ack 2 – ack old index 1 – find new index

  26. Sim: add, s4f add state encoding: 1 – idle 2 – add index 4 – ack index s4f state encoding: 0 – wait for ack 2 – ack old index 1 – find new index

  27. Sim: add, s4f add state encoding: 1 – idle 2 – add index 4 – ack index s4f state encoding: 0 – wait for ack 2 – ack old index 1 – find new index

  28. Sim: add, s4f add state encoding: 1 – idle 2 – add index 4 – ack index s4f state encoding: 0 – wait for ack 2 – ack old index 1 – find new index

  29. performance • Main function is the ‘search’ command: • Long path (up to 512 bytes) => long CRC calculation => long decoding stage. • Access to main memory => if failed to find the path requested, adding a new record to the memory, which includes finding a new index and acknowledge of the record added (at least 4 memory accesses).

  30. performance • 2 input FIFOs – double rate receiving data from OS. • 2 decoders – allows decoding of 2 commands in parallel. Significant for several long ‘search’ commands in a row. • DBM FIFO – separates between the decoding and execution of commands, enables them to perform in parallel.

  31. Old Architecture New Architecture Ads_n falls (first search) 628n 628n First dword in Input fifo (is_usedw) 719n 718n End of decoding (crc_done) 6128n (dbm_fifo->fifo_input is ready) 6202n Pck__en raises 9344n 8560n Sot falls 9468 2318n First dword in Input fifo (is_usedw) 2380n 2380n End of decoding (crc_done)14574n 7869 Pck__en raises 15408n 9486 performance • 2 search commands each with 102 bytes of path (on which crc is working): 8625 7942 926 6064

  32. performance • Search for a free index now executes in parallel to other execution stages of a command. Saves ~50 clock cycles per ‘search’ command, which usually takes ~400-1000 cycles.

  33. The end…

  34. !Sys_clr WT_FOR_ACK Add_ack Bm_index_valid FNEW ACKN Ackn_done State encoding: 0 – wait for ack 2 – ack old index 1 – find new index Sim: s4f

  35. !Sys_clr WT_FOR_ACK Add_ack Bm_index_valid FNEW ACKN Ackn_done State encoding: 0 – wait for ack 2 – ack old index 1 – find new index Sim: s4f

More Related