1 / 14

The TEI Gaiji module: Representing non-standard characters and glyphs

M. J. Driscoll Arnamagnæan Institute. The TEI Gaiji module: Representing non-standard characters and glyphs. Using Unicode. In most cases, Unicode already covers most of the characters most scholars need for transcribing texts in most writing systems.

saima
Download Presentation

The TEI Gaiji module: Representing non-standard characters and glyphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M. J. Driscoll Arnamagnæan Institute The TEI Gaiji module: Representing non-standard characters and glyphs

  2. Using Unicode • In most cases, Unicode already covers most of the characters most scholars need for transcribing texts in most writing systems. • There are, however, many characters and uncommon glyphs which have yet to make it into Unicode. • Moreover, one may wish to record variants of a single character in order to facilitate scribal identification or for statistical purposes, or simply to reproduce the original as closely as possible. • The TEI ‘Gaiji’ module provides a means of doing this.

  3. Phonemes, characters and glyphs • The phoneme /a/ can be represented in different ways, but in most cases by the character <a>(in lower case), Unicode character point U+0061 (Latin alphabet), U+0430 (Cyrillic alphabet). • The following are all glyphs of <a>: • Characters like <a> can also be referred to as graphemes, and glyphs as allographs of those graphemes.

  4. Characters vs. glyphs • Particularly in older documents, phonemes can be represented by several different characters or combinations of characters. • In medieval Icelandic manuscripts, for example, the following characters can be used to represent /á/ (long /a/): • For each of these characters, various glyphs may occur:

  5. Variant letter forms • Variant letter forms (glyphs) are often distinguished in diplomatic transcriptions of manuscripts (and early printed materials). For Icelandic sources, such variant forms include: • high and round s • ordinary and round r (r-rotunda) • ordinary and round d • ordinary and insular forms of f and v • dotted and dotless i • small capitals, used originally to denote geminates (principally N and R, but occasionally also D, G, M, S and T)

  6. Retaining special characters and glyphs • In a strictly diplomatic transcription variant letter forms such as high s, undotted i, small capital r and so on are retained.

  7. Semi- and fully-normalised transcriptions • In a semi-normalised transcription most – and in a fully normalised transcription all – of these variant letter forms are replaced with their standard equivalents.

  8. Defining characters and glyphs • Using the 'Gaiji' module one can encode characters or glyphs by defining them in one or more <charDesc> (‘character description’) elements in the TEI header and thenreferring to them using the <g> element in the body of the text. • Within <charDesc> one then uses either the <char> element to define a new character, or <glyph> to define a glyph of an existing character. • Within these, several sub-elements are available, including: • <charName>/<glyphName> contains the name of the character or glyph, expressed following Unicode conventions. • <charProp> provides a name and value for some property of the character or glyph, in keeping with Unicode conventions and/or according to some locally defined scheme. • <mapping> contains one or more mappings for the character or glyph, in accordance with some typology, specified by the type attribute. • <graphic> can be used to provide a picture, in some suitable format, of the character or glyph.

  9. Defining characters • A new character can be defined and assigned to a position in the Unicode Private Use Area (PUA), and/or described in terms of Unicode combining characters: • The use of entities (e.g. &aacute; for <á>) is now deprecated in XML, but a human-readable entity-like name can be used as the value of the @xml:id attribute, rather than, say, the Unicode code point. This is then referred to in the body of the text as the value of the @ref attribute on <g>. • <g ref="#vdot"/>

  10. Defining glyphs • Glyphs are defined in the same way:

  11. Using <g> in the text • The characters and glyphs are then invoked in the text using <g>:

  12. Generating multi-level transcriptions • Using mark-up like this, multi-level transcriptions – from strictly diplomatic to fully normalised – can easily be generated from a single encoded text by choosing the <reg> or <orig> form along with the relevant mapping (‘dipl’ or ‘norm’). • <w><choice><reg>sat</reg><orig><g ref="#slong"/>at</orig></choice></w> • Default values can also be built into the encoding: • <w><g ref="#slong">s</g>at</w>

  13. Including character declarations in the header • Character declarations may be integrated into <encodingDesc> in two separate ways: • Directly as XML elements. • XIncluded from another location. • Which of these methods is used depends on the circumstances of a particular project. The advantage of using XInclude is that character declarations are always drawn from a single, external source which is distinct from an XML document; they need not be added manually to each document to which they apply. In this way, character declarations function essentially as an authority file. For a project with many documents, this can facilitate management, reduce redundency and reduce the incidence of errors.

  14. Processing <g> elements • <xsl:template match="tei:g[@ref]"><!-- load our charDecls into a variable --><xsl:variable name="charDecls" select="doc('encodingDesc_fasnl_mss.xml')/descendant::tei:charDecl"/><!-- load the value of @ref into a variable, chopping the leading '#' --><xsl:variable name="href" select="substring-after(@ref, '#')"/><!-- determine how we will process the element --><xsl:choose><!-- lookup <g> with @xml:id corresponding to @ref • in the charDecls variable using id() function --><xsl:when test="$charDecls//id($href)"><!-- lookup was successful, so load the <g> element into a variable --><xsl:variable name="g" select="$charDecls//id($href)"/><!-- begin the HTML output --><span class="g"><span class="g_dipl"><xsl:value-of select="$g/tei:mapping[@type = 'dipl']"/></span><span class="g_norm"><xsl:value-of select="$g/tei:mapping[@type = 'norm']"/></span></span></xsl:when><!-- fallback if no conditions above are true --><xsl:otherwise><!-- No hit in encodingDesc, so throw error. In this case, a '?' --><xsl:value-of select="string('?')"/></xsl:otherwise></xsl:choose></xsl:template>

More Related