1 / 32

On some researches...

On some researches. Chiara Epifanio. Outline. The multidimensional Critical Factorizazion Theorem. Compact representation of local automata. The multidimensional Critical Factorization Theorem Chiara Epifanio, Filippo Mignosi.

fausto
Download Presentation

On some researches...

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On some researches... Chiara Epifanio

  2. Outline The multidimensional Critical Factorizazion Theorem Compact representation of local automata

  3. The multidimensional Critical Factorization TheoremChiara Epifanio, Filippo Mignosi

  4. A word is a sequence of characters over an alphabet A, w A{1,2,…n}, AN, AZ w=a1…an isperiodic if$ pN s. t. w(x+p)= w(x) "x,1xn-p W p is a period of w

  5. a word may have more than a period (e. g. abaababaabaababaaba, that has periods 8 and 13) the smallest period of w is called “the” period of w.

  6. A factor v=wj…wj+n-1 of length n of w is a repetition of order a if there exists a natural number p, 0pn such that wi=wi+p for i = j,…,j+n-1-p and such that n/pa. The number p is called a period of the repetition. The smallest period of the repetition is called the period of the repetition. Ex: abaaba Repetition of period 6 and order 1 period 5 and order 6/5 period 3 and order 2

  7. i Word w has a central repetition of order a in position i if there exists a factor v centered in i that is a repetition of order a. In this case we denote ca(w,i) the smallest period among all the central repetitions of order a in position i and we call it the central local period of order a in i. v We denote by Pa(w) the maximum of the central local periods of order a in w. A position i is critical if ca(w,i)=Pa(w).

  8. The Critical Factorization Theorem Let w be a word having length |w|  2. In every sequence of l max {1, p(w)-1} consecutive positions there is a critical one and Pa(w)=p(w), a=2.

  9. The Critical factorization Theorem in particular states that for a=2 there exists at least one point such that the central local period detected at this point coincides with the (global) period of the word, i.e., there exists an integer j, 1  j  |w|, such that ca(w,j)=p(w), a=2. We have given a new proof for a=4.

  10. Lemma 1 Let u, v, w be words such that uv and vw have period p and |v|p. Then the word uvw has period p. (cf. Lemma 8.1.2,Lothaire 2 chapter 8) u v v w u v w

  11. Lemma 2 Suppose that w has period q and that there exists a factor v of w with |v| q that has period r, when r divides q. Then w has period r. (cf. Lemma 8.1.3,Lothaire 2 chapter 8) w v w v

  12. Fine and Wilf Theorem Let w be a word having periods p and q, with q  p. If |w| ³ p + q - gcd(p,q), then w has also period gcd(p,q).

  13. Multidimensional case (Multidimensional periodicity was introduced by Amir and Benson for the design of Pattern Matching algorithms (1991). Since then, lots of people worked on it giving slightly different definitions).

  14. If u is a factor of w then v is a periodicity vector for u ifw((x,y)+v) = w(x,y)  (x,y)Dom(u) t.c. ((x,y)+ v)Dom(u) u v is a periodicity vector for w if w((x,y)+v) = w(x,y)  (x,y)

  15. A factor u of w is lattice-periodic with respect to v1 and v2 if v<v1,v2> is a periodicity vector for u. L=<(2,2), (-2,2)> = <(2,2),(4,0)>

  16. Given a subgroup H of Zd, a transversal TH of H is a subset of Zd such that for any element i Zd, there exists an unique element jTH such that i-j H. An n-cubic factor v is a repetition of order a, if v is L periodic, L lattice; n is such that n/hL, where hL is the smallest integer such that every hypercube of side hL contains a transversal of L. The lattice L is called a period of the a-repetition v.

  17. Word w has a central repetition of order a in position jZd if there exists a factor v of w centered in j that is a repetition of order a. If w has at least a central repetition of order a and period L in j, the set H={hL s.t. every hypercube of side hL contains a transversal of L} We denote ca(w,j)=min(H). Let Pa(w) = limsup{ca(w,j), j position in w}

  18. Lemma 3 sh(v) Let v1 and v2 be two factors of same word w  Zd that have both period a subgroup H. If sh(v1)sh(v2) contains a transversal of H then the factor v having shape sh(v)= sh(v1)sh(v2) has also period H. sh(v1) sh(v2)

  19. Lemma 4 Let v1 and v2 be two factors of same word w  Zd such that sh(v2)  sh(v1). Suppose that v1 has period H1 and that v2 has period H2, with H1 subgroup of H2 and that sh(v2) contains a transversal of H1. Under these hypotheses v1 has period H2. sh(v1) sh(v2)

  20. A generalization of the Fine & Wilf Theorem If w has two periodicity vectors v1 and v2 and w is “big enough” with respect to v1 and v2, then w is lattice-periodic with respect to v1 and v2.

  21. The multidimensional Critical Factorization Theorem • Informally, the C.F.T. states that the maximal local repetition of order 2 is also a period of the whole word. • But …. there is no total order among lattices!! • Our solution is to order lattices by using the length hL of the side of the smallest hypercube that contains a transversal of L. • We have further to prove that all the lattices with same maximal hL coincide over the word. • To do this, for the moment, we loose the tightness of the local repetition order (4 instead of 2).

  22. Theorem Let w be a cubic bidimensional word, X be a cube included in the shape of w. Every cube T  X, of side max(1,P4(X)-1) contains a position l such that c4(w,l)=P4(w). Let v be the factor of w having shape the intersection between sh(w) and the union X’ of the shapes of the 4-repetitions centered in position lX such that c4(w,l)=P4(X). Then v has period L, where L is a subgroup such that every cube of side P4(X) contains a transversal of L. sh(v)

  23. Proof of the theorem Lemma 3 Fine & Wilf generalization Lemma4 Thesis

  24. Conclusions and open problems Importance of the extension to the d-dimensional case (d2). Difficulties on such an extension (new definitions, extension of already known results). It is known that for d=1 the tight value is a=2. It remains an open problem to find the tight value of a for any dimension. Applications.

  25. Compact representation of local automataM. Crochemore, C. Epifanio, R. Grossi, F. Mignosi

  26. Compacting is a standard technique used for reducing the size of data structures such as factor automata, DAWG and suffix trees and consists on replacing paths in automata with single edges. In 2000 Crochemore, Mignosi, Restivo and Salemi gave an algorithm for “self-compressing” trie of antifactorial binary sets of words. The aim of that algorithm was to represent in a compact way antidictionaries to be sent to the decoder of a static compression scheme. What we have worked on is an improvement scheme of that algorithm that works for sets of words over any alphabet.

  27. The suffix trie of a word Tr(w) is a trie where the set of leaves is the set of suffixes of w that does not appear previously as a factor in w. Ex.:

  28. The suffix tree T(w) of a word w is a compressed suffix trie, where only leaves and forks are kept. Each edge is labelled with a substring of w. In this way the number of nodes and leaves of T(w) is smaller than 2|w|. But if the labels of arcs are stored explicitely, the implementation can have quadratic size. The simple solution is to represent labels by pairs of integers (position, position) or (position, length) and to keep the text aside. Ex.:

  29. There are classical on-line linear time implementations. All of them use suffix link function s, that is defined over all the nodes of the suffix trie and suffix tree by s(root)=root s(v)=v’, where sv=asv’, sv being the labelling of the path form the root to v and a being the first letter of sv. Ex.:

  30. Our new approach is basically the same one of the suffix tree, but we compact a bit less, i.e. we keep all nodes of the suffix tree and some more nodes of the trie, that are all the nodes v of the trie such that s(v) is a node of the suffix tree. In this case for any arc of the form (v,v’) with label a in the trie we have an arc (v,x) with same label in our compacted trie T2(w), where x is v’, if v’T2(w); the first node in T2(w) that is a descendant of v’ in the original trie, if v’T2(w). In this second case, we consider that (v,x) represents the whole path from v to x in the suffix trie and we add a sign + to node x in order to maintain this information.

  31. To complete the definition of T2(w) we keep the suffix link function over these nodes. Notice that, by definition, for any node v of T2(w), s(v) is always a node of the suffix tree T(w) and hence it also belongs to T2(w). This new approach let us not to maintain the text aside.

  32. State of the art We have given compacting and decompacting algorithms; we have proved that the number of nodes in our compacted suffix tree is still linear; we have given an algorithm that can be used to check whether a pattern is present in a text, without “decompacting” the automaton; actually we are doing some experiments on the Calgary and Canterbury corpus.

More Related