1 / 48

The use of SGML and XML at the Publications Office

The use of SGML and XML at the Publications Office. Dr. Holger Bagola Dir A – Cell “Methods and Development — Formats” Holger.Bagola@cec.eu.int. Table of contents. Historical overview Formex Other areas of XML usage Conclusion. Table of contents. Historical overview Formex

danae
Download Presentation

The use of SGML and XML at the Publications Office

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The use of SGML and XML at the Publications Office Dr. Holger BagolaDir A – Cell “Methods and Development — Formats”Holger.Bagola@cec.eu.int

  2. Table of contents • Historical overview • Formex • Other areas of XML usage • Conclusion The use of SGML and XML at the Publications Office

  3. Table of contents • Historical overview • Formex • Other areas of XML usage • Conclusion The use of SGML and XML at the Publications Office

  4. Historical overview • Tasks of the Publications Office • Archiving of legislative publications • First steps in SGML • Migration to XML • Basic advantage: availability of tools The use of SGML and XML at the Publications Office

  5. Table of contents • Historical overview • Formex • Other areas of XML usage • Conclusion The use of SGML and XML at the Publications Office

  6. Formex (1) • Basic principles • XML Schema instead of DTD • One single schema • Number of root elements 12 instead of 30 • Number of elements about 350 instead of 1200 • Distinction between semantic and physical markup The use of SGML and XML at the Publications Office

  7. Formex (2) ARTICLE (TI.ARTICLE, (PARAG+ | ALINEA+)) TI.ARTICLE (#PCDATA) PARAG (NO.PARAG, ALINEA+) NO.PARAG (#PCDATA) ALINEA ((#PCDATA | NOTE | HT| FT)* | (P | LIST | TABLE)+) . . . Blue: semantic markup Red: physical markup The use of SGML and XML at the Publications Office

  8. Formex (3) • Table model • Analysis of CALS, HTML, Formex v. 3 • Choice: • Model close to HTML (top-down approach, nested tables) • Maintenance of semantic information such as in Formex v. 3 The use of SGML and XML at the Publications Office

  9. Formex (4) • Footnotes • Distinction between notes in text and tables for readability and production simplicity • Insertion of text notes into the surrounding text • ID/IDREF to signal identical footnotes • Numbering is an object of presentation • Table notes assembled at the top of the table The use of SGML and XML at the Publications Office

  10. Formex (5) • Quotations • Structured quotations vs. ‘#PCDATA’ quotations • Elements signaling start and end of a quotation (quotation marks) • Element with function of a container for structured quotations. The use of SGML and XML at the Publications Office

  11. Formex (6) Example: Article 2 In article 1(2) of regulation (EC) 1234/94 the word ‘car’ is replaced by ‘bus’. Article 6 of the same regulation is replaced by the following text: ‘Article 6 This is the new text of article 6.’ The use of SGML and XML at the Publications Office

  12. Formex (7) Example: <ARTICLE IDENTIFIER=“002”> <TI.ARTICLE>Article 2</TI.ARTICLE> <ALINEA>In article 1(2) of regulation (EC) 1234/94 the <QUOT.START ID=“QS0001” REF.END=“QE0001” CODE=“2018”/>car <QUOT.END ID=“QE0001” REF.START=“QS0001” CODE=“2019”/> is replaced by <QUOT.START ID=“QS0002” REF.END=“QE0002” CODE=“2019”/>bus<QUOT.END ID=“QE0002”REF.START=“QS0002” CODE=“2019”/>.</ALINEA> <ALINEA> <P>Article 6 of the same regulation is replaced by the following text:</P> <QUOT.S> <ARTICLE IDENTIFIER=“006”> <TI.ARTICLE><QUOT.START ID=“QS0003” REF.END=“QE0003” CODE=“2018”/>Article 6</TI.ARTICLE> <ALINEA>This is the new text of article 6.<QUOT.END ID=“QE0003” REF.START=“QS0003” CODE=“2019”/></ALINEA> </ARTICLE> </QUOT.S> </ALINEA> </ARTICLE> The use of SGML and XML at the Publications Office

  13. Formex (8) • Splitting large documents • Fragmentation by definition of inclusions for the main document • Secondary instances referencing the inclusions by means of XML entity mechanism • Inclusions may not necessarily be valid XML instances The use of SGML and XML at the Publications Office

  14. Formex (9) frag-1.frg <text>…</text> <text>…</text> main.xml <?xml version=“1.0”?> <doc> <ti>title</ti> <chap no=“1”> <incl ref=“frag-1.frg”/> </chap> </doc> container.xml <?xml version=“1.0”?> <!DOCTYPE frag [ <!ENTITY cnt SYSTEM “frag-1.frg”> ]> <frag>&cnt;</frag> The use of SGML and XML at the Publications Office

  15. Formex (10) • Character set • OJ publications in 20 (21) languages • Different alphabets • International character set definition Unicode (UTF-8) • Definition of allowed character ranges • Special font ‘EU-Albertina’ The use of SGML and XML at the Publications Office

  16. Formex (11) • Meta-data • OJ publications are composed of different levels: • Publication • Document • ‘Contents’ • Meta-data separated according to these levels The use of SGML and XML at the Publications Office

  17. Formex (12) Publication Meta-data concerning the publication Structure of thepublication withreferences to documents Document Meta-data for document References to components Contents main part 001 Contents Annex 1 001.001 Contents Annex 2 001.002 Document Meta-data for document References to components Contents main part 002 ProCat The use of SGML and XML at the Publications Office

  18. Formex (13) • Meta-data (continued) • Extraction of meta-data by means of automatic processes (pre-notices) • Extension of pre-notices by juridical analysis • Availability of notices in ProCat for other productions (Celex) and projects The use of SGML and XML at the Publications Office

  19. Formex (14) • Final remark on Formex specifications • Only few complete production chains from the author to the printer • Concentration on publication of Official Journal The use of SGML and XML at the Publications Office

  20. Formex (15) • Validation of Formex deliveries • In-depth validation necessary • Automatic procedures • Manual procedures The use of SGML and XML at the Publications Office

  21. Formex (16) • Validation of Formex deliveries (continued) • Automatic procedures • Control of filename conventions • Parsing of various components • Control of completeness • Execution of additional validation rules • Comparison of contents between Formex and PDF  Report (XML instance) The use of SGML and XML at the Publications Office

  22. Formex (17) • Validation of Formex deliveries (continued) • Manual procedures • Verification of the report generated by the automatic validation procedure • Control of the use of Formex specifications in all language versions  Report (XML instance) = basis for archiving or rejection The use of SGML and XML at the Publications Office

  23. Formex (18) • Conversion of Formex v. 3 into Formex v. 4 • Conversion of character set (ISO 2020 – UTF8) • Transformation of SGML instances into well-formed XML instances • Extraction of tables and conversion into an intermediate model • Generation of meta-data levels • Conversion of old elements and generation of new elements • Validation of the results The use of SGML and XML at the Publications Office

  24. Formex (19) • Specifications: http://formex.publications.eu.int/ The use of SGML and XML at the Publications Office

  25. Table of contents • Historical overview • Formex • Other areas of XML usage • Conclusion The use of SGML and XML at the Publications Office

  26. Other areas of XML usage (1) • Index of OJ publications • Biannual issues • Monthly issues • Extraction from Celex/ProCat • Transformation into PDF by means of XSLT and XSL FO (biannual version only) The use of SGML and XML at the Publications Office

  27. Other areas of XML usage (2) • Consolidation of legal documents • Mainly based on Formex • Additional administrative data in XML • Relations between historical levels • Description of the composition of a given historical level • Concordance of information on numbering schemes (articles, …) for each level The use of SGML and XML at the Publications Office

  28. Other areas of XML usage (3) • Conversion to RTF • Compatibility with other EU services • Input in SGML or XML • Results with LegisWrite templates The use of SGML and XML at the Publications Office

  29. Other areas of XML usage (4) SGML instance(Formex v. 3) XMLinstance(Formex v. 4) Transformation into internalXML format Characterconversion Transformationinto RTF(LegisWrite) Transformationinto well-formed XML Output inRTF (Legis-Write) The use of SGML and XML at the Publications Office

  30. Other areas of XML usage (5) • Production of the EU budget • Creation and maintenance of a common central repository (XML) • Markup of modified elements during the decision process in working language • Translation only of parts modified • Update of repository after publication The use of SGML and XML at the Publications Office

  31. Other areas of XML usage (6) Budget services Translationservice Publications Office Budget XMLrepository Formexarchive pre-printing post-printing Printer The use of SGML and XML at the Publications Office

  32. Other areas of XML usage (7) • ‘Secondary legislation’ • Publication of legislation in force in ‘new’ languages • XML production on basis of Formex archive • Transformation of translated input • Transformation of SGML into XML of Formex instance • Merging of XML instances The use of SGML and XML at the Publications Office

  33. Other areas of XML usage (8) Worddocument Formexarchive Celex Conversioninto XML Conversioninto XML Extractionof text Extractionof skeleton ProCat Mergingskeleton &text Simplifystructure Publication The use of SGML and XML at the Publications Office

  34. Other areas of XML usage (9) • European document repository • TIFF of publications • PDF of publications • Formex instances of OJ publications • Exchange of information by XML messages The use of SGML and XML at the Publications Office

  35. Other areas of XML usage (10) • Publication of calls for tender (OJ-S) • Input in different electronic formats • Harmonization in XML • Updating database TED • Production of CD-ROM version The use of SGML and XML at the Publications Office

  36. Table of contents • Historical overview • Formex • Other areas of XML usage • Conclusion The use of SGML and XML at the Publications Office

  37. Conclusion • Difficult start with SGML • Successful use of XML as well as of other standards such as XSLT/XPath, XSL FO • Powerful possibilities of re-use of XML instances • How to profit from our experiences? The use of SGML and XML at the Publications Office

  38. Proposal for technical solution • An example: a regulation in the European legislative context and a ‘Verordnung’ in German legislation • Evident structural differences • Evident common structural objects The use of SGML and XML at the Publications Office

  39. EU regulation Title Preamble Citations Recitals Enacting terms Articles Article header Numbering Paragraphs or alineas German regulation Title Preamble Paragraphs Enacting terms Articles Article header Numbering + text alineas Differences and common objects (1) The use of SGML and XML at the Publications Office

  40. Final Applicability Signature Final Signature Differences and common objects (2) The use of SGML and XML at the Publications Office

  41. Differences and common objects (3) • preamble • European model PREAMBLE (PREAMBLE.INIT,CITATION+,RECITAL+, PREAMBLE.FINAL) PREAMBLE.INIT (P) CITATION (P) RECITAL (NP) PREAMBLE.FINAL (P) • German model PREAMBLE (P) The use of SGML and XML at the Publications Office

  42. Differences and common objects (4) • article • European model ARTICLE (ARTICLE.HEADER, (PARAG+ |ALINEA+)) ARTICLE.HEADER (#PCDATA) PARAG (NO.PARAG, ALINEA+) ALINEA (P|LIST)+ • German model ARTICLE (ARTICLE.HEADER, (PARAG+ |ALINEA+)) ARTICLE.HEADER (NP) NP (NO.P,TXT) PARAG (NO.PARAG, ALINEA+) ALINEA (P|LIST)+ The use of SGML and XML at the Publications Office

  43. Differences and common objects (5) • final • European model FINAL (APPLICABILITY,SIGNATURE) APPLICABILITY (P) SIGNATURE (PL.DATE,SIGNATORY) PL.DATE (P) SIGNATORY (P+) • German model FINAL (SIGNATURE) SIGNATURE (PL.DATE,SIGNATORY) PL.DATE (P) SIGNATORY (P+) The use of SGML and XML at the Publications Office

  44. Differences and common objects (6) Specific models for European regulation Common models for European and German regulation Specific models for German regulation The use of SGML and XML at the Publications Office

  45. Differences and common objects (7) • Common grammar fragment <!ELEMENT ALINEA (P | LIST)+ > <!ELEMENT ARTICLE (ARTICLE.HEADER, (ALINEA+ | PARAG+)) > <!ELEMENT ENACTING.TERMS (ARTICLE+) > <!ELEMENT ITEM (NP, (P | LIST) > <!ELEMENT NO.P (#PCDATA) > <!ELEMENT NOTE (P+) > <!ATTLIST NOTE NOTE.ID ID #REQUIRED > <!ELEMENT NP (NO.P, TXT) > <!ELEMENT P (#PCDATA | NOTE)* > <!ELEMENT PARAG (PARAG.NO, ALINEA+) > <!ELEMENT PARAG.NO (#PCDATA) > <!ELEMENT PL.DATE (P+) > <!ELEMENT REGULATION (TITLE, PREAMBLE, ENACTING.TERMS, FINAL) > <!ATTLIST CTRY (DE | EU-EN) #REQUIRED > <!ELEMENT SIGNATORY (P+) > <!ELEMENT SIGNATURE (PL.DATE, SIGNATORY) > <!ELEMENT TITLE (P+) > <!ELEMENT TXT (#PCDATA | LIST | NOTE)* > The use of SGML and XML at the Publications Office

  46. Differences and common objects (8) • Specific grammar for EU regulation <!ENTITY % common SYSTEM “regulation-common.dtd”> %common; <!ELEMENT APPLICABILITY (P) > <!ELEMENT ARTICLE.HEADER (P) > <!ELEMENT CITATION (P) > <!ELEMENT FINAL (APPLICABILITY, SIGNATURE) > <!ELEMENT PREAMBLE (PREAMBLE.INIT, CITATION+, RECITAL.INIT?, RECITAL+, PREAMBLE.FINAL) > <!ELEMENT PREAMBLE.FINAL (P) > <!ELEMENT PREAMBLE.INIT (P) > <!ELEMENT RECITAL (P | NP) > <!ELEMENT RECITAL.INIT (P) > The use of SGML and XML at the Publications Office

  47. Differences and common objects (9) • Specific grammar for German regulation <!ENTITY % common SYSTEM “regulation-common.dtd”> %common; <!ELEMENT ARTICLE.HEADER (NP) > <!ELEMENT FINAL (SIGNATURE) > <!ELEMENT PREAMBLE (P+) > The use of SGML and XML at the Publications Office

  48. Final remarks • Possible objects: • Metadata on document level • Metadata on archiving level (research aspects) • Common models for complex objects: tables, quotations, etc. The use of SGML and XML at the Publications Office

More Related