string |
gives the byte string used to encode the letter form in the text. |
|
Datatype: CDATA |
|
Values: any string of characters (often a single byte) |
|
Default: #IMPLIED |
|
Example: <form string="a/">
<desc>lowercase Greek alpha with acute accent</desc>
</form>
|
Note |
If the character is encoded only using entity
references, then the value of string should be ''
(the empty string).
In coded character sets which use character-set shifting (e.g. JIS
0208), the string attribute should typically contain the
required shift characters, in order to render the value unambiguous. In
such a case, there is no expectation that every occurrence of the
character will be immediately preceded by the shift sequence; processing
software is responsible for understanding the shift mechanism and acting
accordingly.
The same string value may not appear on more than one <form>
elements (except the empty string), unless each occurrence is
associated with a different coded character set.
|
codedCharSet |
(coded character set)
specifies which base coded character set the string
value occurs in. |
|
Datatype: IDREF |
|
Values: a reference to the identifier of a <codedCharSet>
element in the current writing system declaration. |
|
Default: #IMPLIED |
|
Example:
|
Note |
If more than one <codedCharSet> is specified as a
base component of the writing system declaration, then it is expected
that character-set shifting is in use, as described in ISO 2022 or some
equivalent. In this case, each <form> element which has a value
for the string attribute should also identify, by means of
the codedCharSet attribute, which identifies which coded
character set actually contains the string in question. Proper shifting
among character sets is the responsibility of the user.
|
entityStd |
(standard entity name)
gives the name of one or more entities defined for this character
form in some standard entity set(s). |
|
Datatype: ENTITIES |
|
Values: One or more valid SGML entity names declared in the document
type definition of the WSD; the entity must also be included in an
entity set mentioned in an <entitySet> declaration in the current
writing system declaration or in some base writing system referred to by
a <baseWsd> element. |
|
Default: #IMPLIED |
|
Example: <form entityStd="thorn">
<desc>lowercase Old English or Icelandic thorn</desc>
</form>
|
Note |
If the same letter form is defined by more than
one public entity set, more than one value may appear in this
attribute.
The same entity name may not appear in the entityStd or
entityLoc attributes of more than one <form> element.
|
entityLoc |
(local entity name)
gives one or more entity names used locally for this character
form. |
|
Datatype: ENTITIES |
|
Values: One or more valid SGML entity names declared in the document
type definition of the WSD; the entity must also be included in an
entity set mentioned in an <entitySet> declaration in the current
writing system declaration or in some base writing system referred to by
a <baseWsd> element. |
|
Default: #IMPLIED |
|
Example: <form entityStd="thorn" entityLoc="t">
<desc>lowercase Old English or Icelandic thorn</desc>
<note>The standard entity name is <ident>thorn</ident>; the local
entity <ident>t</ident>
is used for brevity and legibility.</note>
</form>
|
Note |
The same entity name may not appear in the entityStd or
entityLoc attributes of more than one <form> element.
|
ucs-4 |
(universal-character-set code)
gives the position of the character form in the thirty-two bit
`universal character set' defined by ISO 10646. |
|
Datatype: CDATA
|
|
Values: one or more sets of two or four two-digit hexadecimal numbers
giving a valid ISO 10646 code point for the character form; for
legibility the two-digit hexadecimal numbers should be separated by
hyphens. If more than one UCS-4 code is associated with a given
character form, the two UCS-4 codes should be given separated by blanks.
If the character form is associated with a sequence of UCS-4 codes (e.g.
a base character followed by one or more non-spacing diacritics), then
the components of the sequence should be separated by +.
|
|
Default: #IMPLIED |
|
Example:
|
Note |
The same UCS-4 code (or sequence) may not appear within more
than one <character> element within the writing system
declaration. It may however appear on several forms of the same
character.
Multiple UCS-4 codes can be given for a single character; this allows
sequences treated as distinct by ISO 10646 to be documented as referring
to a single `character' as defined by the WSD (e.g.
‘lowercase a-umlaut’ and ‘lowercase a’ plus ‘umlaut’).
If a single UCS-4 code is to be treated as relating to two distinct
`characters' as defined by the WSD (e.g. to reverse
the effects of Han unification on some character), then one of the
<character> elements should be associated with the UCS-4 code in
the normal way, and the others should call attention to the relevant
UCS-4 code by a comment in a <note> element.
|