<form>

<form> (letter form) identifies one letter form taken by a particular character in a writing system declaration.

Attributes (In addition to global attributes)

string gives the byte string used to encode the letter form in the text.
Datatype: CDATA

Values: any string of characters (often a single byte)

Default: #IMPLIED

Example:
<form string="a/"> <desc>lowercase Greek alpha with acute accent</desc> </form>

Note
If the character is encoded only using entity references, then the value of string should be '' (the empty string).

In coded character sets which use character-set shifting (e.g. JIS 0208), the string attribute should typically contain the required shift characters, in order to render the value unambiguous. In such a case, there is no expectation that every occurrence of the character will be immediately preceded by the shift sequence; processing software is responsible for understanding the shift mechanism and acting accordingly.

The same string value may not appear on more than one <form> elements (except the empty string), unless each occurrence is associated with a different coded character set.

codedCharSet (coded character set) specifies which base coded character set the string value occurs in.
Datatype: IDREF

Values: a reference to the identifier of a <codedCharSet> element in the current writing system declaration.

Default: #IMPLIED

Example:

Note
If more than one <codedCharSet> is specified as a base component of the writing system declaration, then it is expected that character-set shifting is in use, as described in ISO 2022 or some equivalent. In this case, each <form> element which has a value for the string attribute should also identify, by means of the codedCharSet attribute, which identifies which coded character set actually contains the string in question. Proper shifting among character sets is the responsibility of the user.

entityStd (standard entity name) gives the name of one or more entities defined for this character form in some standard entity set(s).
Datatype: ENTITIES

Values: One or more valid SGML entity names declared in the document type definition of the WSD; the entity must also be included in an entity set mentioned in an <entitySet> declaration in the current writing system declaration or in some base writing system referred to by a <baseWsd> element.

Default: #IMPLIED

Example:
<form entityStd="thorn"> <desc>lowercase Old English or Icelandic thorn</desc> </form>

Note
If the same letter form is defined by more than one public entity set, more than one value may appear in this attribute.

The same entity name may not appear in the entityStd or entityLoc attributes of more than one <form> element.

entityLoc (local entity name) gives one or more entity names used locally for this character form.
Datatype: ENTITIES

Values: One or more valid SGML entity names declared in the document type definition of the WSD; the entity must also be included in an entity set mentioned in an <entitySet> declaration in the current writing system declaration or in some base writing system referred to by a <baseWsd> element.

Default: #IMPLIED

Example:
<form entityStd="thorn" entityLoc="t"> <desc>lowercase Old English or Icelandic thorn</desc> <note>The standard entity name is <ident>thorn</ident>; the local entity <ident>t</ident> is used for brevity and legibility.</note> </form>

Note
The same entity name may not appear in the entityStd or entityLoc attributes of more than one <form> element.

ucs-4 (universal-character-set code) gives the position of the character form in the thirty-two bit `universal character set' defined by ISO 10646.
Datatype: CDATA

Values: one or more sets of two or four two-digit hexadecimal numbers giving a valid ISO 10646 code point for the character form; for legibility the two-digit hexadecimal numbers should be separated by hyphens. If more than one UCS-4 code is associated with a given character form, the two UCS-4 codes should be given separated by blanks. If the character form is associated with a sequence of UCS-4 codes (e.g. a base character followed by one or more non-spacing diacritics), then the components of the sequence should be separated by +.

Default: #IMPLIED

Example:

Note
The same UCS-4 code (or sequence) may not appear within more than one <character> element within the writing system declaration. It may however appear on several forms of the same character.

Multiple UCS-4 codes can be given for a single character; this allows sequences treated as distinct by ISO 10646 to be documented as referring to a single `character' as defined by the WSD (e.g. ‘lowercase a-umlaut’ and ‘lowercase a’ plus ‘umlaut’).

If a single UCS-4 code is to be treated as relating to two distinct `characters' as defined by the WSD (e.g. to reverse the effects of Han unification on some character), then one of the <character> elements should be associated with the UCS-4 code in the normal way, and the others should call attention to the relevant UCS-4 code by a comment in a <note> element.

Example

Note
The <form> element documents one form of a character; in most cases, there will be only one. If more than one form is given, in general, they are to be regarded as free variants of the character unless otherwise specified in the notes.

The distinction between <character> and <form> makes it possible to distinguish, in an encoding, among different letter forms (which may have historical, aesthetic, linguistic, or other significance) without having to claim that the different forms constitute different `characters' in any normal sense. (Using the technical terms occasionally encountered, the <form> element can be used to record each allograph of a given character or grapheme.) The concepts of `character' and `letter form', however, vary from analyst to analyst; the decision to treat a given set of forms as a single character or as a set of characters is not always obvious, and may require the application of considerable learning and judgement. The <note> element should be used to record the reasoning behind any particularly difficult decision.

Module Declared in file teiwsd2; Auxiliary tag set for Writing System Declarations

Data Description May contain a series of description element, optionally one or more figure elements showing the character form in question, and optionally a series of notes.

May contain desc extFigure figure note

May occur within dictScrap eg entry entryFree form hom re sense superEntry trans

Declaration
<!ELEMENT form %om.RO; (desc+, (figure | extFigure)*, note*)> <!ATTLIST form %a.global; string CDATA #IMPLIED codedCharSet IDREF #IMPLIED entityStd ENTITIES #IMPLIED entityLoc ENTITIES #IMPLIED ucs-4 CDATA #IMPLIED>

See further 25.4.2 Exceptions in the WSD

Up: 35 Elements

Text Encoding Initiative

The XML Version of the TEI Guidelines

<form>