150 likes | 237 Views
XML – an introduction. David Nathan ELDP training March 2010. XML. an in-line markup system single sequence of plain text only (but can be unicode) equivalent to a tree structure consists of elements and content elements: tag syntax entities syntax reserved characters < > & " ‘.
E N D
XML – an introduction David Nathan ELDP training March 2010
XML • an in-line markup system • single sequence of plain text only (but can be unicode) • equivalent to a tree structure • consists of elements and content • elements: tag syntax • entities syntax • reserved characters < > & " ‘
XML syntax • structures are defined by tags in angle brackets: eg: <noun> • tags are usually in pairs: • a start/open tag, and an end/close tag: the <noun> dog </ noun> chased ... • but can also be single and closed: the dog <pause /> sat down
XML syntax • tags can have attributes with values : the <noun num=“1”> dog </ noun> sat down • you can name your tags, attributes or values (almost) anything • there are some restrictions: • you can have hierarchies, but not overlaps: <a>the <b><c>cat</c> sat</b> on the mat</a> <a>the <b><c>cat</b> sat</c> on the mat</a>
XML is used to add knowledge ... • add knowledge to content: • usually structures and labels • add the knowledge that’s relevant to your domain or task • knowledge priorities: • what’s required • what’s visually represented (eg by format/layout) • what’s implicit
Compare to HTML • ... the man who really liked the book The Lawyer Who Lost, about habeas corpus ... • in HTML: ... the man who <i>really</i> liked the book <i>The Lawyer Who Lost</i>, about <i>habeas corpus</i> ... • in XML, we can define our own elements that focus on logical structure rather than visial format
Compare to HTML • XML: • is flexible and extensible • must be well-formed • can be validated • is application-, platform-, and vendor- independent • is machine readable (ie parsable, or understandable by computer programs)
... in XML <story> <metaDataField>The Guardian</metaDataField> <metaDataField>July 1, 1997</metaDataField> <metaDataField>Andrew Higgins in Hong Kong</metaDataField> <headLine>A last hurrah and an empire closes down </headLine> <p>With a clenched-jaw nod from the Prince of Wales, a last rendition of <title>God Save the Queen</title>, and a wind machine to keep the Union flag flying for a final 16 minutes of indoor pomp...</p> </story>
Where does XML come from? • write “raw” XML (we will do this) • XML editors • generated, eg from databases, programs
What is XML used for? • any symbolic data • data exchange • data transformation • structure • format • content
Why do I need to know about it? • you already consume a lot of XML! • many linguistic software tools (eg ELAN) use XML as data format • XML is very powerful and flexible, especially for certain tasks, and for archiving • XML is an ISO standard • XML is growing in use and support! • XML is easy!
XML exercise 1 James departed from Manilla on Wednesday 11 May and arrived in Boston on Thursday 12 May. • identify times and names, code these as XML • draw as a tree structure THEN • add more information to your XML as attributes • draw a tree structure again
XML exercise 2 • draw a simple linguistic tree structure • represent it as XML
Congratulations • you have now taken your first steps in XML