210 likes | 399 Views
Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai. Outline. Introduction Tensor Product Notation Matrix Transposition Designing Matrix Transposition on Various Interconnection Networks
E N D
Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai
Outline • Introduction • Tensor Product Notation • Matrix Transposition • Designing Matrix Transposition on Various Interconnection Networks • Conclusions and Future Work
Introduction • Matrix transposition is a simple, but an important computational problem. • A matrix is a two-dimensional data structure which is stored in a one-dimensional computer memory. • A simple double-loop transposition program will perform poorly in modern computer architecture with memory hierarchy.
Introduction (cont’d) • We develop matrix transposition algorithms on various interconnection networks, including omega, baseline and hypercube networks. • Tensor product has been successfully used for designing block recursive algorithm, such as FFT, Strassen’s matrix multiplication, parallel prefix algorithm, Hilbert space-filling curve, and Karatsuba’s multiplication. • Tensor product formulas are also suitable for specifying interconnection networks.
Introduction (cont’d) • Different interconnection networks have their own architectural characteristics and properties. • Distributed-memory algorithms and VLSI circuit design. • A major goal of this study is to provide an effective way for designing VLSI circuits of DSP algorithms.
Tensor Product Notation • Let A and B be two matrices of size and , respectively • Stride permutation
Matrix Transposition • Matrix transposition can be viewed as changing the elements from the row-major order to column-major order. • Matrix A is stored in a computer memory, the index scheme of element : • Row-major order • Column-major order • Various matrix transposition algorithms can be designed by manipulating stride permutation:
Matrix Transposition (cont’d) Step1: blocks with qs elements of each block Step2: perform transposition of matrix for pr blocks Step3: transpose a block matrix with each block of qs elements Step4: convert a block structure order of blocks with qs elements of each blcok to the row- major order of the transposed matrix
Designing Matrix Transposition on Various Interconnection Networks • We consider two kinds of networks: • multistage interconnection network, • direct interconnection network. • The basic component of multistage interconnection network is a switching element. • A direct interconnection network is a set of processors connected by a set of links. x0 y0 x0 y0 x1 y1 x1 y1
Designing Matrix Transposition on Various Interconnection Networks • Suppose that N=2n, • Omega network • Baseline network • Hypercube network
0 1 8 9 0 4 8 12 2 10 3 11 1 9 5 13 4 12 5 13 2 10 6 14 6 14 7 15 3 11 7 15 0 1 2 3
0 0 0 0 0 1 8 4 2 1 2 1 8 4 2 3 9 12 6 3 4 2 1 8 4 5 10 5 10 5 6 3 9 12 6 7 11 13 14 7 8 4 2 1 8 9 12 6 3 9 10 5 10 5 10 11 13 14 7 11 12 6 3 9 12 13 14 2 11 13 14 7 11 13 14 15 15 15 15 15 Omega Interconnection Network
Deviation of Algorithm on Baseline Interconnection Network Bit-reversal operation Partial bit-reversal operation
0 0 0 0 0 1 2 2 1 4 2 1 1 4 8 3 3 3 5 12 4 4 4 8 1 5 6 6 9 5 6 5 5 12 9 7 7 7 13 13 8 8 8 2 2 9 10 10 3 6 10 9 9 6 10 11 11 11 7 14 12 12 12 10 3 13 14 14 11 7 14 13 13 14 11 15 15 15 15 15 Baseline Interconnection Network
2 2 0 0 0 1 0 0 0 1 1 1 2 2 2 3 3 3 1 3 3 3 2 1 Hypercube Interconnection Network
0 0 0 0 0 4 1 4 4 1 2 8 2 8 8 12 9 9 6 6 2 1 8 4 1 10 12 10 3 9 12 3 5 5 5 13 14 11 13 7 2 2 1 4 8 10 10 10 12 3 12 3 6 5 5 13 11 14 14 7 9 6 3 9 6 11 11 11 14 14 13 13 7 7 7 15 15 15 15 15 Hypercube Interconnection Network (cont’d)
Conclusions and Future Work • We use tensor product as the framework to design matrix transposition algorithms on various interconnection networks. • To manipulate stride permutation operations to fit into networks. • VLSI circuit design for DSP and image processing algorithms on various interconnection networks.