1 / 77

Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department

Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh. Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department Kent State University. Presentation Outline. String matching and its variations

Download Presentation

Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department Kent State University

  2. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  3. Presentation Outline • Reconfigurable Network in the ASC Processor • Modifying the Network for LCS Algorithm • Longest Common Subsequence on Reconfigurable 2D Mesh • Exact match • Longest Common Subsequence on Reconfigurable 2D Mesh • Approximate match • Summary and Future work

  4. Presentation Outline • String Matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  5. String Matching • Fundamental operation in computing • Comparison of characters, words etc. to determine their similarity • Interest is in the area of bioinformatics, in particular searching genetic databases • String are enormous, efficient string processing is therefore a requirement

  6. String Matching Variations • Is Exact match the only solution? • What if the pattern does not occur in the text? • Find the longest subsequence that occurs both in the pattern and in the text. • Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem

  7. GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII |||::::| : |::| ||:::||||:|:|||:: ::| |:::: GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV Sequence alignment vs. LCS • Sequence alignment • Procedure of comparing 2 or more sequences • Searches series of individual character pattern in the same order in the sequence • LCS • Find a common string for both the sequences preserving symbol order

  8. Presentation Outline • String matching and its variations • Motivation ofLCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  9. Molecular Biology File comparison Screen redisplay Cheater finder Plagiarism detection Codes and Error Control Motivation of LCS • Spell checking • Human speech • Gas Chromatography • Bird song analysis • Data compression • Speech recognition

  10. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  11. Role of LCS in Molecular biology • DNA sequences (genes) represented by four letters ACGT, corresponding to the four submolecules forming DNA • When biologists find a new sequences, they typically want to know what other sequences it is most similar to • One way of computing how similar (homologous) two sequences are is to find the length of their longest common subsequence

  12. Role of LCS in Molecular biology • This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also i.e., how gaps occur when the LCS is embedded in the two original sequences. • An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order) • This by definition, is the longest common subsequence of the strings

  13. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  14. Longest Common Subsequences • Formally, we compare two strings, X[1..m] and Y[1..n], which are elements of the set Σ*; here Σ denotes the input alphabet containing σ symbols • The LCS of strings X and Y, lcs(X,Y) is a common subsequences of maximal length • Special case of the edit distance problem • The distance between X and Y is defined as the minimal number of elementary operations needed to transform the source string X to the target string Y • In practical applications, operation are restricted to insertions, deletions and substitutions • For each operation, an application dependent cost is assigned

  15. Longest Common Subsequences • LCS(X,Y) typically solved with the dynamic programming technique and filling an mxn table • Table elements acts as a vertices in a graph, and the simple dependencies between the table values defines the edges • The task is to find the longest path between the vertices in the upper left and lower right corner of the table

  16. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  17. Folklore Algorithm • Foundation of most of the LCS algorithms • Given two strings, find the LCS common to both strings. • Example: • String 1: AGACTGAGGTA • String 2: ACTGAG • AGACTGAGGTA • - -ACTGAG - - - list of possible alignments • - -ACTGA - G- - • A- -CTGA - G- - • A- -CTGAG - - - • The time complexity of this algorithm is clearly O(nm);

  18. Folklore Algorithm • Complexity does not depend on the sequences u and v themselves but only on their lengths • By choosing carefully the order of computing the d(i,j)'s one can execute the above algorithm in space O(n+m) • The bottleneck in efficient parallelization of LCS problem are the calculating the value of diagonal elements, as shown

  19. Folklore Algorithm • As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found. • We may have more then one LCS for the same problem • In order to find the best LCS, we associate some parameter • The Smith-Waterman Algorithm uses the same concept that of Folklore algorithm, but gives us the optimal result (LCS)

  20. Folklore Algorithm A G A C T G A G G T A 00 0 0 0 0 0 0 0 0 0 0 A C T G A G 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 2 3 3 3 3 3 3 3 1 2 2 2 3 4 4 4 4 4 4 1 2 3 3 3 4 5 5 5 5 5 1 2 3 3 3 4 5 6 6 6 6

  21. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  22. Parallel Counterpart • Serial LCS algorithm runs in O(nm) time, where n is the length of the text string, and m is the length of pattern string • Efficient Parallel algorithm do exist to solve this computational extensive task • Some algorithm runs in O(max{n,m}) using O(min{n,m}) processors • O(logn) using O(mn/logn) processors • There are constant time algorithm for this LCS problem using the DP approach, using some assumptions

  23. Computation Model • Various Network Models have been used to solve this LCS problem • PRAM model, Suffix Tree, 2D-Mesh Network, Mesh with Reconfigurable buses, Mesh with Multiple buses etc • Algorithm which runs in constant time, assume that most of the operation are done in constant time • In parallel version, one of the important task is to distribute data efficiently and easy manner

  24. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  25. The ASC Processor • A scalable design implemented on a million gate Altera FPGA • SIMD-like architecture • Searches data by content instead of address • 8-bit Instruction Stream (IS) control unit with 8-bit Instruction and Data addresses, 32-bit instructions

  26. The ASC Architecture

  27. The ASC Architecture • Each PE listens to the IS through the broadcast and reduction network • PEs can communicate amongst themselves using the PE Network • PE may either execute or ignore the microcode instruction broadcast by IS under the control of the Mask Stack

  28. The ASC Features • Associative Search • Each PE can search its local memory for a key under the control of IS • Responder Resolution • A special circuit signals if ‘at least one’ record was found • Masked Operation • Local Mask Stacks can turn on or off the execution of instruction from IS

  29. Communication between PE’s • In 2D mesh network, • Communication between P.E’s themselves take place in two different ways • By using the nearest neighbors mesh interconnection network • Powerful variation on the nearest-neighbor mesh called the “Coterie network”, developed in response to the requirement for nonlocal communication • Processors in a group share common properties and purpose, we call the group a coterie, and hence the name coterie network

  30. Presentation Outline • String matching and its variations • Motivation of LCS • Role of LCS in Molecular Biology • Overview of LCS • Discussion on Folklore algorithm • Parallel Algorithms for LCS • Discussion on ASC processor • Brief introduction on Coterie Network

  31. Coteries[ Weems & Herbordt ] “A small often selected group of persons who associate with one another frequently” • Features: • Related to other Reconfigurable broadcast network • Describable using hypergraphs • And they are dynamic in nature • Advantages: • Propagation of information quickly over long distances at electrical speed • Support of one-to-many communication within coterie, reconfigurability of the coterie

  32. Coterie Network • Provides method of performing operations on regions of an image in parallel • Used extensively for Matrix Arithmetic, FFT, Convex Hull Computation, Simulating a pyramid processors, General Permutation Routing and Parallel Prefix • Note that the coterie network is separate from the nearest-neighbor mesh, which we refer to as the SEWN network • Coterie network results in a new mode of parallelism that falls between SIMD and MIMD

  33. PE’s form Coteries 5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)

  34. In the physical implementation, each PE controls set of switches Four of these switches control access in the different directions (N,S,E,W) Two switches H and V are used to emulated horizontal and vertical buses The two switches NE and NW are used to creation of eight way connected region Coteries Structure Coterie’s Physical Structure N NE NW V E W H WS ES S : Switch

  35. Coterie Network • The isolated group of processors called coterie’s, have access only to the multicast within a coterie • When the switches are set, connected processors form a Coterie • The coterie network switches are set by loading the corresponding bits of the mesh control register in each P.E

  36. Basic Coterie structure algorithm • The complexity is assumed to be O(1) unless otherwise stated • Transfer of data between two adjacent coteries • Symmetry breaking between a pair of nodes in a coterie • Two nodes within a coterie exchange information

  37. Presentation Outline • Reconfigurable Network in the ASC Processor • Modifying the Network for LCS Algorithm • Longest Common Subsequence on Reconfigurable 2D Mesh • Exact match • Longest Common Subsequence on Reconfigurable 2D Mesh • Approximate match • Summary and Future work

  38. Scalable design with Reconfigurable network Can be used as dedicated ASIC or Co-processor Implemented on Altera APEX20KC1000, single CPU, 50 pipelined PE & linear PE interconnection network Key to reconfigurability is the Data Switch inside each PE Reconfigurable Network in the ASC Processor N W E S DATA SWITCH

  39. Reconfigurable Network in the ASC Processor • Linear network, PE communicates both ways • 2D Reconfigurable Network, PE communicates with all of its neighbors (N-E-S-W) • Data switch has bypass mode to allow PE communication to skip non-responder, so as to support Associative computing

  40. Presentation Outline • Reconfigurable Network in the ASC Processor • Modifying the Network for LCS Algorithm • Longest Common Subsequence on Reconfigurable 2D Mesh • Exact match • Longest Common Subsequence on Reconfigurable 2D Mesh • Approximate match • Summary and Future work

  41. Modifying the Network for LCS Algorithm • Coterie Network, one of the powerful network • But we don’t need full features of the same for the LCS Algorithm • Augmented ASC with new 2D Mesh, with row and column broadcast buses • Modified linear network into 2D Mesh • Added features inspired by Coterie network • A PE can communicate now, with any of its four neighbors • Bypass mode augmented to support H and V bypass as well

  42. Presentation Outline • Reconfigurable Network in the ASC Processor • Modifying the Network for LCS Algorithm • Longest Common Subsequence on Reconfigurable 2D Mesh • Exact match • Longest Common Subsequence on Reconfigurable 2D Mesh • Approximate match • Summary and Future work

  43. LCS Algorithm on Reconfigurable 2D Mesh • We assume, initially all the internal switch of the PEs are open • Each PEs have a Match Register “M” and Length Register “L”, initially having value 0 • Let the Text string T=T(1)T(2)…T(n) been fed into row 1 of the Reconfigurable 2D Mesh • PE(0,j) stores T(j), where 0<=j<=n, as shown • This steps take unit time.

  44. LCS Algorithm on Reconfigurable 2D Mesh A G A C T G A C T G A

  45. LCS Algorithm on Reconfigurable 2D Mesh • Broadcast each character of the text string along the column, using column broadcast bus • In case of Coterie network • Form coteries along the column • Perform operation multicast in all coteries • This step takes unit time.

  46. LCS Algorithm on Reconfigurable 2D Mesh A G A C T G A C T G A A G A C T G A C T G A A G A C T G A C T G A A G A C T G A C T G A A G A C T G A C T G A A G A C T G A C T G A

  47. LCS Algorithm on Reconfigurable 2D Mesh • Let the Pattern string P=P(1)P(2)…P(m) been fed into column 1 of the Reconfigurable 2D Mesh • PE(i,0) stores P(j), where 0<=i<=m, as shown • This steps take unit time

  48. LCS Algorithm on Reconfigurable 2D Mesh A C T G A C

  49. PE’s form Coteries • Broadcast each character of the Pattern string along the row, using row broadcast bus • In case of Coterie network • Form coteries along the rows • Perform operation multicast in all coteries • This step takes unit time

  50. LCS Algorithm on Reconfigurable 2D Mesh A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C A C T G A C

More Related