1 / 10

Introduction to XML and TEI for Digital Archives

Introduction to XML and TEI for Digital Archives. Entities, ROMA, the ODD, and Transcribing Manuscripts. Character Entities. Character entities are special characters not included in the basic ASCII set. These must be defined in a “DOCTYPE” declaration at the beginning of a document.

tait
Download Presentation

Introduction to XML and TEI for Digital Archives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XML and TEI for Digital Archives Entities, ROMA, the ODD, and Transcribing Manuscripts

  2. Character Entities Character entities are special characters not included in the basic ASCII set. These must be defined in a “DOCTYPE” declaration at the beginning of a document. Two characters that are predefined in XML (and must be transcribed as character entities so that they aren’t mistaken for markup) are “&” (“&amp;”) and “<“ (“&lt;”). <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE rootelement [ <!ENTITY Aelig "&#198;"> <!ENTITY thorn "&#254;"> <!ENTITY aelig "&#230;"> ]>

  3. Character Entity Codes Character entities can be represented in a number of ways, most of which can be read by most browsers that display them. The simplest representation is a decimal representation, which takes the format &#1234;. Hexadecimal representations use the format &#x1234;. You can find lists of character entities online: http://www.w3schools.com/tags/ref_entities.asp And an exhaustive series of lists here: http://www.unicode.org/charts/

  4. Marking Up Manuscripts There are a number of special tags designed to represent features in manuscripts. The most important of these are those tagging abbreviations, additions, and deletions. Abbreviations can be marked using an <abbr> tag. Using this in combination with a <choice> tag and a <expan> tag permits one to expand the abbreviation. <choice><abbr>HMS</abbr><expan>Her Majesty’s Ship</expan></choice>

  5. Manuscript Additions and Deletions Manuscript additions are shown using (surprise surprise!) an <add> tag. There are a number of useful attributes which can be added to this, most useful of which is the “place” attribute, used to indicate where the addition appears: <addplace=”margin”>Stuff added</add> Manuscript deletions are shown in a similar fashion, using a <del>tag. <delrend=”crossout”>Stuff that’s been deleted</del>

  6. Combined Additions and Deletions Where additions and deletions occur together, as in a substitution of one word for another, they can be nested within a <subst> tag: <subst> <delrend=“strikethrough">hastle</del> <addplace=“overline”>hassle</add> </subst>

  7. Indicating Responsibility There are a number of forms of “responsibility” that you may wish to indicate when tagging manuscripts. Most obviously, you may wish to indicate your best estimation as to the author of additions and deletions made to the manuscript. The best way to indicate responsibility is by using an “xml:id” attribute, which is set in the document header, inside a “profile description.” The <profileDesc>element “provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.”

  8. Indicating Responsibility II The <handNotes> tag appears in the <profileDesc>, which is in turn a child of the <teiHeader> element. <teiHeader> … <profileDesc> <handNotes> <handNotexml:id=“RH"scribe=“RobertHooke" script="handwritten" medium=“pen“> <p>the document's main hand, Robert Hooke</p> </handNote> </handNotes> </profileDesc> … <teiHeader>

  9. Indicating Responsibility III Once you’ve established the identity of your “hands,” you can make abbreviated reference to this in your markup of the manuscript deletions and insertions. <subst> <del hand=“#RH” rend=“strikethrough"> hastle</del> <add hand=“#RH” place=“overline”>hassle</add> </subst>

  10. Showing Restored Text In some instances, there may be an indication that deleted text has been “restored.” You can indicate this as well. <restorehand="#RH”> <subst> <delrend="crossout">out</del> <addplace="margin">off</add> </subst> </restore> This would indicate that the original deletion has been deemed acceptable after all, and is again a legitimate part of the “reading” of the text.

More Related