1 / 53

Physical and Logical Structure

Physical and Logical Structure. SNU IDB Lab. XML Documents 1 : structure. Peeping into XML document at Physical view : Entity at logical view : DTD. Peeping into XML document(1/5). <?xml version=“1.0” standalone=“yes”?> <GREETING> Hello, XML!! <!--this is greeting--> </GREETING>.

gayora
Download Presentation

Physical and Logical Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Physical and Logical Structure SNU IDB Lab.

  2. XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

  3. Peeping into XML document(1/5) <?xml version=“1.0” standalone=“yes”?> <GREETING> Hello, XML!! <!--this is greeting--> </GREETING> Mark-up data Mark-up and character data

  4. Peeping into XML document(2/5) XML declaration xml문서임을 선언. <? 로 시작하여 ?>로 끝난다. <? xml version=“1.0” standalone=“yes” ?> <!DOCUMENTDATE[ <!ELEMENTDATE(#PCDATA)> ] > DTD(Document Type Definition) user가 사용할 tag를 정의한다. 여기서는 DATE tag를 정의. <!--This is date --> Comment : parser는 이를 무시. <DATE> 001224 </DATE> Content XML document : date.xml

  5. Peeping into XML document(3/5) • Structure of XML document • physical structure : • allows components of the document, called entities • logical structure : • allows a document to be divided into named units and sub-units, called elements

  6. Peeping into XML document(4/5) Physical Structure Logical Structure Document entities Unit (internal) (separate) Sub-unit elements 5

  7. <person> <person> <name> kim </name> <name> kim </name> <ID>771224</ID> <ID>771224</ID> <phone>1830</phone> <phone>1830</phone> <office>301-453</office> <office>301-453</office> <photo source= /> “k.jpg” <photo source=“k.jpg”/> </person> </person> Peeping into XML document(5/5) entity element

  8. XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

  9. Content of Physical structure Entity Figures of Document Entity Defining an entity Grammar in Declaring Entity Examples of EntityDeclaration URL format

  10. entity Physical Structure <person> <name> kim </name> entities (internal) (separate) <ID>771224</ID> <phone>1830</phone> <office>301-453</office> <photo source= /> “k.jpg” </person> Entity (1/3) SNU OOPSLALab. • unit of physically isolating and storing any part of a document (정보저장단위) • Each unit of information is called an entity

  11. <person> <name> kim </name> <ID>771224</ID> <phone>1830</phone> <office>301-453</office> “k.jpg” <photo source= /> </person> Entity (2/3) Document entity Image entity • Purpose of Entity • contain all the information • (well-formed XML data , other text file, binary data…)

  12. Entity (3/3) • Internal Entity • 해당 document 안에서 완전하게 정의되는 entity • External Entity • URL을 통해 알려진 외부의 source로부터 그들의 content를 받아 오는 entity

  13. Figures of Document Entity document entity (no entities) document entity (main content) document entity (framework file) A A B C D

  14. Defining an entity <!DOCTYPE DOCUMENT [ <!ENTITY EMAIL “sjlee@oopsla.snu.ac.kr”> <!ENTITY TEXT “(#PCDATA)”> ]> Entity definition in DTD Entity must be defined before the first reference to them in the data stream Declared in the DTD(Document Type Definition)

  15. Example : EntityDeclaration(1/3) &li; &gt; &amp; &apos; &quot; for ‘<‘ for ‘>’ for ‘&’ for ‘ ’ ’ for ‘ ” ’; • Internal text entities • <!ENTITY XML “eXtensible Markup Language”> • <!ENTITY DemoEntity ‘The rule is 6” long.’> • Built-in entities (내장entity) • <!ENTITY sample “Use &quot; and ‘as delimiters.”>

  16. Example : EntityDeclaration(2/3) • External text entities • <!ENTITY myent SYSTEM “/EMTS/MYENT.XML”> • <!ENTITY myent PUBLIC “-//MyCorp//ENTITY Syperscript Chars//EN”….> • Binary entities • <!ENTITY Jsphoto SYSTEM “/ENTS/Jsphoto.tif” NDATA “TIFF”>

  17. Example : EntityDeclaration(3/3) <!ENTITY ent9 SYSTEM “../entities/entity9.xml”> /xml/docs/document.xml/ entities/entity9.xml <!ENTITY ent9 SYSTEM “entities/entity9.xml”> /xml/document.xml/entities/entity9.xml xml xml entity9.xml document.xml entity9.xml document.xml entities docs entities URL format

  18. XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

  19. Content of Logical structure Concepts DTD Structure Element Declaration Attribute Declarations Parameter Entities Conditional Sections Notation Declarations DTD Processing Issues

  20. Concepts of DTD(1/3) • DTD(Document Type Definition) • An optional but powerful feature of XML • Comprises a set of declarations that define a document structure tree • XML processors read the DTD and check whether the document is valid and use it to build the document model in memory • Describes user’s own tag set as meta markup language

  21. Concepts of DTD(2/3) • DTD describes.. • Element , attribute , notation , relation between each elements • Establishes formal document structure rules

  22. Concepts of DTD(3/3) Well formed XML Document Valid XML Document • Declare Vs. Define • Declare  “This document is a concert poster” • Define  “A concert poster must have the following features” • DTD define • Element type + Attribute + Entities • Valid Vs. Invalid • Valid  conforms to DTD • Invalid  fail to conform to DTD

  23. Valid & Invalid Documents Example: <!DOCTYPE GREETING[ <ELEMENT GREETING (#PCDATA)> ]> • Valid: • <GREETING> • various random text but no markup • </GREETING> • Invalid: anything else including • <GREETING> • <sometag>various random text</sometag> • <someEmptyTag/> • <GREETING>

  24. DTD structure • DTD is composed of a number of declarations • ELEMENT (tag definition) • ATTLIST (attribute definitions) • ENTITY (entity definition) • NOTATION(data type notation definition) • DTD can be stored in an external subset or an internal subset

  25. Internal and External Subset(1/3) • Internal subset • Form : • <!DOCTYPE … [ • <!-- Internal Subset --> • … • ]> • Pros • Easy to write XML • Cons • Editing two files without moving • Other document can’t reuse without copying internal subset

  26. Internal and External Subset(2/3) • External subset • better to use external DTDs • Reason why? • Many benefits • document management • updating • editing • Few reasons • If you use an external DTD, you can use public DTDs(capability) • External DTDs provide for better document management • External DTDs make it easier to validate you document

  27. Internal and External Subset(3/3) full parsing path internal external Internal subset external subset

  28. Element Declarations ELEMENT Type declaration ‘<!ELEMENT’ S Name S Contentspec S? ‘>’ Used to define a new element, specify its allowed content and gives the name and content model of the element Each tag must be declared in a <!ELEMENT> declaration. The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

  29. Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty

  30. ANY <!ELEMENT SEASON ANY> A SEASON can contain any child element and/or raw text (parsed character data) Rarely used in practice, due to the lack of constraint on structure it encourages.

  31. #PCDATA <!ELEMENT YEAR (#PCDATA)> Parsed Character Data; i.e. raw text, no markup Represent normal data and preceded by the hash-symbol, ‘#’, to avoid confusion with an identical element name, when used within a model group( for example, ‘(#PCDATA | PCDATA)’)

  32. Use of #PCDATA in XML • Valid: • Invalid: • <YEAR>1999</YEAR> • <YEAR>99</YEAR> • <YEAR>1999 .E.</YEAR> • <YEAR> • The year of our Lord one thousand, nine hundred, and ninety-nine • </YEAR> <YEAR> <MONTH>January</MONTH> <MONTH>February</MONTH> <MONTH>March</MONTH> <MONTH>April</MONTH> <MONTH>May</MONTH> <MONTH>June</MONTH> <MONTH>July</MONTH> <MONTH>August</MONTH> <MONTH>September</MONTH> <MONTH>October</MONTH> <MONTH>November</MONTH> <MONTH>December</MONTH> </YEAR>

  33. Child Elements • <!ELEMENT LEAGUE (LEAGUE_NAME)> • <!ELEMENT LEAGUE_NAME (#PCDATA)> • To declare that a LEAGUE element must have a LEAGUE_NAME child:

  34. Sequences(1/2) • <!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)> • <!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)> • <!ELEMENT DIVISION_NAME (#PCDATA)> • <!ELEMENT DIVISION (DIVISION_NAME, TEAM+)> • Separate multiple required child elements with commas; e.g. • One or More Children +

  35. Sequences(2/2) • <!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)> • <!ELEMENT TEAM_CITY (#PCDATA)> • <!ELEMENT TEAM_NAME (#PCDATA)> <!ELEMENT PAYMENT (CASH | CREDIT_CARD)> <!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)> Zero or More Children * Choices

  36. Grouping With Parentheses • <!ELEMENT dl (dt, dd)*> • <!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)> Parentheses combine several elements into a single element. Parenthesized element can be nested inside other parentheses in place of a single element. The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.

  37. Mixed Content <!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*> Empty elements <!ELEMENT BR EMPTY> • Both #PCDATA and child elements in a choice • #PCDATA must come first • #PCDATA cannot be used in a sequence

  38. Attribute Declarations <!ATTLIST Element_nameAttribute_nameTypeDefault_value> • <GREETING LANGUAGE="Spanish"> • Hola! • </GREETING> • <!ELEMENT GREETING (#PCDATA)> • <!ATTLIST GREETING LANGUAGE CDATA "English"> Consider this element: It is declared like this:

  39. Multiple Attribute Declarations <RECT LENGTH="70px" WIDTH="85px"/> • <!ELEMENT RECTANGLE EMPTY> • <!ATTLIST RECTANGLE LENGTH CDATA "0px"> • <!ATTLIST RECTANGLE WIDTH CDATA "0px"> • <!ATTLIST RECTANGLE LENGTH CDATA "0px" • WIDTH CDATA "0px"> • Consider this element • With two attribute declarations: • With one attribute declaration • Indentation is a convetion, not a requirement

  40. Attribute Types • CDATA • ID • IDREF • IDREFS • ENTITY • ENTITIES • NOTATION • NMTOKEN • NMTOKENS • Enumerated

  41. CDATA Most general attribute type Value can be any string of text not containing a less-than sign (<) or quotation marks (")

  42. ID • Value must be an XML name • May include letters, digits, underscores, hyphens, and periods • May not include whitespace • May contain colons only if used for namespaces • Value must be unique within ID type attributes in the document • Generally the default value is #REQUIRED

  43. IDREF IDREFS A list of ID values in the same document Separated by white space Value matches the ID of an element in the same document Used for links and the like

  44. ENTITY ENTITIES Value is a list of unparsed general entities declared in the DTD Separated by white space Value is the name of an unparsed general entity declared in the DTD

  45. 1 2 3 4 NOTATION • <!NOTATION Tex SYSTEM “..\TEXVIEW.EXE”> • <!ENTITY Logo SYSTEM “LOGO.TEX” NDATA Tex> LOGO.TEX TEXVIEW.EXE Value is the name of a notation declared in the DTD

  46. NMTOKEN NMTOKENS Value is a list of XML names Separated by white space Value is any legal XML name

  47. Enumerated • <!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE"> Not a keyword Refers to a list of possible values from which one must be chosen Default value is generally provided explicitly

  48. Attribute Default Values • A literal string value • One of these three keywords • #REQUIRED • #IMPLIED • #FIXED

  49. #REQUIRED • <!ELEMENT IMG EMPTY> • <!ATTLIST IMG ALT CDATA #REQUIRED> • <!ATTLIST IMG WIDTH CDATA #REQUIRED> • <!ATTLIST IMG HEIGHT CDATA #REQUIRED> No default value is provided in the DTD Document authors must provide attribute value for each element

  50. #IMPLIED No default value in the DTD Author may(but does not have to) provide a value with each element

More Related