160 likes | 379 Views
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra. SPARSE MATRIX VECTOR MULTIPLICATION . SPARSE MATRICES. Simply, Matrices with a large number of zero elements. Processing of Sparse matrices require large processing time
E N D
Mihir AwatramaniLakshmi kiran TondehalXinying WangY. Ravi Chandra SPARSE MATRIX VECTOR MULTIPLICATION
SPARSE MATRICES • Simply, Matrices with a large number of zero elements • Processing of Sparse matrices require large processing time • There is a huge overhead due to storing redundant elements • Sparse Matrices are when systems are modelled into large differential equations • Typical domains are Image processing , Industrial process simulations, Data retrieval WHY CONVENTIONAL ALGORITHMS NOT EFFICIENT FOR SPARSE MATRICES? WHAT ARE THEY ? WHERE ARE THEY USED ?
BASICS OF SPARSE MATRICES • Compressed Sparse Row / Column to Matrix Market
MOTIVATION for ALTERNATE STRATEGIES • Low Memory Bandwidth • Irregular memory access patterns • High latency of load/store instructions • High Ratio of Load/Store Instructions
CONVEY - A QUICK LOOK INSIDE • It has 4 FPGAs for user defined Application Personalities as well !!! • 8 Memory Controllers enable parallel and pipelined • access to memory • 256 MB Coherent Cache for memory requests • from coprocessor to host memory • The AEH runs scalar instructions and routes memory requests from AE
Details of the C Code SEQUENTIAL PROCESSOR AEH AE1 AE2 AE3 AE4 A8 A9 A10 MB1 MB2 MB3 HOST PROCESSOR POPULATES INPUT MATRICES COP_CALL ROUTINE PASSES THE BASE ADDRESS TO COPROCESSOR Memory allocated for array 1 from mem_base 1 Memory allocated for array 2 from mem_base 2 Memory allocated for result from mem_base 3
Details of the Assembly Code AEH AE1 MB1 MB2 AEG 0 AEG 1 AEG 2 AEG 31 MB3 USE ASSEMBLY TO MOVE BASE ADDRESSES TO APPLICATION ENGINE REGISTERS • Logical operations – AND,OR,XOR • Arithmetic Operations- Multiplication, Addition • Complex calculations involving vectors could be • done without writing VHDL code MAIN MEMORY
MEMORY INTERFACING OUR MODULE A 1 A 0 DATA ADDRESS POP DATA VALID REQ ID ROQ 0 MC 0 ID 0 ID 1 ID 2 ID 3 ID 255 ID 4 I &D 4 I &D 3 I &D 1 I &D 2 I &D 0 MAIN MEMORY D 1 D 0
IMPLEMENTATION WE NOW HAVE THE REQUIRED INPUTS FOR SMVM IN THIS WAY, WE WRITE ALL 11 OUTPUTS TO MEMORY AFTER PROCESSING, THE SMVM GIVES A DONE SIGNAL IN THIS WAY, WE DO 21 READS FROM THE DATA BUS ONE CYCLE OF COMPUTATION IS COMPLETE !!! GENERATE LD SIGNAL GIVE BASE ADDRESS GENERATE 21 LOAD SIGNALS LOAD COMPLETE SIGNAL MASTER CONTROL ADD. DECODER MCs, ROQs AND MEMORY 0X454C…..400 0X454C…..040 DATA READ ENGINE DATA VALID START READ DATA BUS READ COMPLETE INPUT BUFFER INPUT BUFFER START WRITE OUTPUT BUFFER DONE START SMVM OUTPUT BUFFER
Simulation Results - Co-Processor Instruction Execution Base Address & Size values moved to internal Registers Decode Move Instruction ( 6 Move Instructions ) Decode CAEP Instruction Start’s Custom Personality
Simulation Results – Load Request to MC Starts Read Procedure ID from ROQ With Load Request Append ID from ROQ Start Read Process after send request to MC Check Address Decoded Address Send Load Request to respective MC Send 21 Data Load Requests
Simulation Results – Receive Data from MC through ROQ Start Next Read ( if nothing to Write) Start Load Process Valid Data Available at MC’s But, Read Data Sequentially from MC0 – MC1 – MC2 Load Process done after receiving 21 Data Inputs
Simulation Results – Write Back Results from SpMV-Engine using MC Write Process Done after 11 store operations Start Write if valid data received from SpMV Engine Decode Address Send Store Request to respective MCs Write Process Done and Start next read cycle
FUTURE SCOPE • Increasing memory bandwidth • Partitioning SMVM calculations across four Application Engines