Text Encoding Initiative |
|
The XML Version of the TEI Guidelines<character> |
<character> | defines one unit in a writing system, supplementing or overriding information provided in the base coded character sets, writing system declarations, and entity sets. | |||||||||||||||||||||||||||||||||||||
Attributes |
(In addition to global attributes)
|
|||||||||||||||||||||||||||||||||||||
Example |
|
|||||||||||||||||||||||||||||||||||||
Note |
The notion of `characters' as units in a writing system is widely spread, but not consistently defined; the <character> element should be used to identify whatever units the encoder wishes to distinguish as the meaningfully distinct graphic units of the writing system. In most cases, these will correspond to the units of coded character sets, but that this is not a requirement: a-umlaut, for example, may be treated as one character or two, depending on the user's preference, regardless of how the coded character set in use treats it. In most cases, also, the units distinguished by the <character> element will be the `graphemic' units of the writing system in question; however, since experts disagree on whether items like umlaut (let alone a given set of Chinese characters with regional variations in China, Korea, and Japan) are best treated as distinct graphemes or not, the association of <character> elements with the graphemes of a writing system provides at most a heuristic device for making reasonable decisions, rather than a definitive unambiguous test. Different forms of the same `character' may be distinguished for whatever reason, as in the three-R example of chapter 4 Languages and Character Sets. In this case the different letter forms are distinguished by documenting them in different <form> elements; the fact that the different letter shapes do not make a lexical difference in the text may be expressed by grouping all three letter forms under the same <character> element. (Alternatively, the three forms may be treated as three distinct characters, for convenience or for whatever reason, by defining a distinct <character> element for each.) |
|||||||||||||||||||||||||||||||||||||
Module | Declared in file teiwsd2; Auxiliary tag set for Writing System Declarations | |||||||||||||||||||||||||||||||||||||
Data Description | May contain one or more description elements (optional), a series of one or more <form> elements identifying different forms of the character, and an optional series of notes. | |||||||||||||||||||||||||||||||||||||
May contain | desc form note | |||||||||||||||||||||||||||||||||||||
May occur within | ||||||||||||||||||||||||||||||||||||||
Declaration | <!ELEMENT character %om.RO; (desc*, form+, note*)> <!ATTLIST character %a.global; class (lexical | punc | lexpunc | digit | space | DL | LD | dia | joiner | other) "lexical"> |
|||||||||||||||||||||||||||||||||||||
See further | 25.4.2 Exceptions in the WSD |
Up: 35 Elements