290 likes | 555 Views
Suffix Trees and Suffix Arrays. OUTLINE. Suffix trees Suffix arrays. Suffix trees. Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the suffix tree. Suffix is the short sub-sequence. Suffix trees. Problems:
E N D
Suffix Trees and Suffix Arrays
OUTLINE • Suffix trees • Suffix arrays
Suffix trees • Indexing techniques are used to locate highest – scoring alignments. • One method of indexing uses the suffix tree. • Suffix is the short sub-sequence.
Suffix trees • Problems: • Given a pattern P (sub-sequence) find all occurances of P in text S. • Given two strings find their longest common sub-string
Suffix trees • Problems in Bioinformatics: • Multiple genome alignment • Identification of sequence repeats
Suffix trees • Suffix tree: • For example: • S: abdfrg (length:6) • S has 6 suffixes: g, rg, frg, dfrg, bdfrg, abdfrg
Suffix trees • Suffixes can be stored in a suffix tree and this tree. in O(n) time (n: length of the string) • A string pattern of length m can be searched in O(m) time
Suffix trees • Suffix tree: • S = S[1…n] is a string of length n, • A suffix tree is a tree with n leaves, • n leaves represent n suffixes of the string, • ababc$
Suffix trees • If a suffix is a prefix of another suffix we can not construct a tree with leaves as suffixes • xabxa xa and a are not leaf nodes.
Suffix trees • Insert e special character (for example $) at the end of the string to solve the problem • xabxa$
Suffix trees • How to construct suffix tree: • Assume we have a string S[1…n] • Start from the suffix S • For example consdier vbacxad$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix S[2…n] • Which is bacxad$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix S[3…n] • Which is acxad$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is cxad$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is xad$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is d$
Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is $
Suffix trees • Suffix tree of vbacxad$:
Suffix trees • Pattern match using suffix trees: • Try to match a pattern on a path, starting from the root: • The pattern does not match, • The match ends in a node u of the tree, • The match ends inside an edge.
Suffix trees • Example: (considervbacxad$ ) • Suffixes: • vbacxad$ • bacxad$ • acxad$ • cxad$ • xad$ • ad$ • d$ • $
Suffix trees • Example: (considervbacxad$ ) • Suffixes: • vbacxad$ • bacxad$ • acxad$ • cxad$ • xad$ • ad$ • d$ • $ • Search for: • cxa • a • xdb
Suffix arrays • Considerthestring: • Thesuffixarray:
Suffix arrays • Search is in mississippi$:
References • M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science • Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.