240 likes | 251 Views
This research focuses on automating the migration of documents while preserving the semantic properties of embedded queries. The paper presents a framework for tracing and evaluating properties, and constructing automated query evaluation. Results and conclusions are also discussed.
E N D
Towards Automatic Document Migration: Semantic Preservation of Embedded Queries Thomas Triebsees University of the German Federal Armed Forces Munich Department of Computer Science Thomas.Triebsees@unibw.de Winnipeg, 31th August 2007 Thomas Triebsees, Department of Computer Science
Agenda • Research Context and Motivation • Our Approach • Property Specification and Tracing • Automated Query Evalutation and Construction • Results • Conclusions Thomas Triebsees, Department of Computer Science
Research Context and Motivation Thomas Triebsees, Department of Computer Science
Research Context Task:Semantic preservation • high degree of process reliability necessary (trustworthyness) • amount of documents requires automation • document representations (formats) change • still: most QA done hand-crafted Thomas Triebsees, Department of Computer Science
Example Property – Link Consistency WWW harvest Aim: improve portability 137.193.60.82 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> <html> <head> <title>Calculation</title> </head> <body> <a href=“./calc05/calc.pdf/"> documents </a> </body> </html> store source source style.css style.css calc05 calc05 start.html start.html calc.pdf calc.pdf Website Calculation Website Calculation Thomas Triebsees, Department of Computer Science
Example Property – Link Consistency WWW harvest <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.82 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source index.html html resources style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science
Semantic Queries 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> Queries embedded in documents; Formalize semantic preservation: - evaluation - construction? Calculation html resources index.html calc05 style.css calc05 calc.pdf Examples: • URLs query server/directory structure • style sheets (CSS) query XML/HTML documents • XPath expressions query XML documents • … Thomas Triebsees, Department of Computer Science
Our Approach – Semantic Evaluation and Construction of Embedded Queries Thomas Triebsees, Department of Computer Science
Our Approach Trace relevant object histories. Verify preservation requirements w.r.t. source and target objects. What are the relevant properties? What are the different representation forms? What is to be preserved? (4) automated verification Framework tracing property specifications preservation requirements (2) (1) notification property matching property matching source documents target documents migration process (3) Implement transformation: Notify system on transformation steps Thomas Triebsees, Department of Computer Science
(1) Property Specification Concept + Interface • define role names for property • assign roles in different implementations LinksTo link_source link_anchor link_target Context LinkAbs Context LinkRel <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.99 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science
presK( {s → link_source, a → link_anchor, t → link_target}, LinksTo(s, a, t), {LinkAbs,LinkRel}, {LinkRel}) (2) Expressing Preservation Requirements Requirement: When transforming a website, translate all absolute links to relative links while preserving link consistency. Expressed semi-formally using concepts and contexts: When transforming a link source, a link anchor, and a link target to a new representation, preserve the concept LinksTo for these objects in the context LinkRel. Expressed formally: Thomas Triebsees, Department of Computer Science
(3) Tracing Semantic Properties - Preservation presK( {s → link_source, a → link_anchor, t → link_target}, LinksTo(s, a, t), {LinkAbs,LinkRel}, {LinkRel}) LinksTo link_source link_anchor link_target LinkAbs LinkRel <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.99 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science
Preservation of Embedded Queries Integrating embedded queries Targets:Semantic preservation of link consistency • links can be evaluated semantically • only valid URLs are accepted as links • links can be constructed automatically • only valid URLs are constructed • constructions allow for formal proofs w.r.t. preservation requirement Steps: Formalize queried structure for link evaluation and construction Formalize syntactically valid URLs Combine both Can be generalized to other applications Tools: • Automata Theory (Finite State Automata, FSA) • Graph Theory Thomas Triebsees, Department of Computer Science
Specification of Queried Structure (1) Formalize queried structure • vertices (objects) yield query semantics • labels carry URL substrings • generate finite state automaton Thomas Triebsees, Department of Computer Science
Specification of Queried Structure Thomas Triebsees, Department of Computer Science
Specification of Syntactically Valid URLs (2) Formalize syntactically valid URLs • reduce URI-reference grammar • construct query automaton Grammar for URI-references Thomas Triebsees, Department of Computer Science
Specification of Syntactically Valid URLs Construction of Query automaton Thomas Triebsees, Department of Computer Science
Combine both – Full link automaton (3) Combine both • basically: Let both automata run in parallel • match non-terminal transitions of URL automaton with appropriate transitions of struture automaton Thomas Triebsees, Department of Computer Science
Integration and Benefit LinksTo link_source link_anchor LinkAbs LinkRel link_target 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store working provably correct Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 construction calc.pdf evaluation Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science
Results Thomas Triebsees, Department of Computer Science
Conclusions and Outlook Thomas Triebsees, Department of Computer Science
Automated evaluation and construction of embedded queries • Based on formal, automata-theoretic constructions -> provable correctness • Integration into framework for semantic preservation • Future work: • Computing structures on demand • Regular expressions as queries • Include extensions like CSS or XPath predicates Thomas Triebsees, Department of Computer Science
Subject to your questions… ? Thomas Triebsees Universität der Bundeswehr München Department of Computer Science www.unibw.de/Thomas.Triebsees Thomas.Triebsees@unibw.de Thomas Triebsees, Department of Computer Science