target |
points at the
elements whose markup is
uncertain. |
|
Datatype: IDREFS |
|
Values: one or more valid
identifiers, separated by white space. |
|
Default: #REQUIRED |
|
Example: Elizabeth went to <persName id="P1">Essex</persName>
<certainty target="P1" locus="gi" degree="0.6"/>
|
Note |
If more than one identifier is given, the
<certainty> element is interpreted as applying to all.
If no identifier is present on the element being annotated, the
attribute should give the identifier of a <ptr> element which
points at the element being annotated; for further discussion
of this indirect pointing mechanism, see chapter 14 Linking, Segmentation, and Alignment.
|
locus |
indicates the precise location of the uncertainty in the
markup: applicability of the element, precise position of the
start- or end-tag, value of a specific attribute, etc. |
|
Datatype: CDATA |
|
Suggested values include:
#gi |
uncertain whether the element used actually applies
to the passage. |
#startloc |
start-tag may not be correctly located. |
#endloc |
end-tag may not be correctly located. |
#location |
both the start-tag and the end-tag may not
be correctly located. |
name |
the value given for the attribute
name is uncertain. |
#transcribedContent |
the content of the element may
not be a correct transcription of the source text. |
#suppliedContent |
the content of the element may not
have been correctly supplied by the reader, e.g. as in the
cases of corr and abbrev elements. |
|
|
Default: #REQUIRED |
Note |
The ‘#’ distinguishes the terms of the
controlled vocabulary from possible collisions with attribute
names. Extensions to this vocabulary should also use this
prefix.
|
assertedValue |
provides an alternative value for the aspect of the markup in
question—an alternative generic identifier, transcription,
or attribute value, or the identifier of an <anchor> element (to
indicate an alternative starting or ending location). If an
assertedValue is given, the confidence level specified by
degree applies to the alternative markup specified by
assertedValue; if none is given, it applies to the markup
in the text. |
|
Datatype: CDATA |
|
Values: generic identifier, attribute value, location (e.g.
indicated by a reference to an <anchor> element or to an
<xptr> element), or other appropriate alternative
value. |
|
Default: #IMPLIED |
|
Example: Elizabeth went to <persName id="p1">Essex</persName>
<certainty target="p1" locus="#gi" assertedValue="place"
degree="0.2"/>
|
Note |
This attribute makes it possible to indicate the
degree of confidence in a specific alternative to some aspect
of the markup. In the example above the encoder is expressing
the likelihood (.2) that the generic identifier should be
<place> rather than <persName>, which is the coded
element.
|
desc |
further describes the uncertainty in prose, perhaps
indicating its nature, cause, or the justification for the
degree of confidence asserted. |
|
Datatype: CDATA |
|
Values: a prose description of how and why the markup is
uncertain. |
|
Default: #IMPLIED |
|
Example: Elizabeth went to
<persName id="p1">Essex</persName>
<certainty target="p1" locus="#gi" degree="0.2"
desc="Time of writing indicates the Earl rather than the town" />
|
Note |
In a given project, it may be possible to enumerate
a finite list of recognized types and causes of uncertainty; in
such cases, it will be useful to control the vocabulary used in
this attribute, to aid later mechanical manipulation. It is
not possible to suggest such a controlled vocabulary for
general use.
|
given |
indicates conditions assumed in the assignment of a
degree of confidence. |
|
Datatype: CDATA |
|
Values: a characterization of the conditions which are assumed
in the assignment of a degree of confidence. This may be in
prose. |
|
Default: #IMPLIED |
|
Example: <!-- in the header, hand H1 is identified as that of MSM -->
<hand id="H1" scribe="MSM"/>
<!-- ... -->
<!-- in the text, the scribe has corrected 'Wessex' to 'Essex' -->
Elizabeth went to <corr id="C1" sic="Wessex" resp="MSM">Essex</corr>.
<!-- we are 60%; certain that hand H1 is MSM,
and 90%; certain that if H1 is MSM, then
it is MSM who corrected 'Wessex' into 'Essex'. -->
<certainty target="H1" locus="scribe" degree="0.6" id="P1"/>
<certainty target="C1" locus="resp" given="P1" degree="0.9"/>
|
Note |
A project may wish to control the vocabulary used
in this attribute.
The envisioned typical value of this attribute would be the
identifier of another <certainty> element or a list of
such identifiers.
It may thus be possible to
construct probability networks by chaining <certainty>
elements together. Such networks would ultimately be grounded
in unconditional <certainty> elements (with no value for
given). The semantics of this chaining would be
understood in this way: if a <certainty> element is
specified, via a reference, as the assumption, then it is not the
attribution of uncertainty that is the assumption, but rather
the assertion itself. For instance, in the example above,
the first <certainty> element indicates that the
confidence in the identification of the new scribe as
‘msm’. The second indicates the degree of confidence that
‘Essex’ is a personal name, given that the new scribe is
‘msm’. Note that the given in the second <certainty>
element is not the assertion that the likelihood that msm is
the new scribe is 0.6, but simply the assertion that msm is the
new scribe; this is a recommended convention to facilitate
building networks.
The ambitious encoder may wish to attempt complex networks
or probability assertions, experimenting with references to
other elements or prose assertions, and deploying feature
structure connectives such as <alt>, <join>, and
<not>. However, we do not believe that the
<certainty> element gives, at this time, a comprehensive
ambiguity-free system for indicating certainty.
|
degree |
indicates the degree of confidence assigned to the aspect
of the markup named by the locus attribute. |
|
Datatype: CDATA |
|
Values: Values of degree might be yes or no, the
reals between 0 and 1, or traditional characterizations such as
‘doubtful’, ‘circa’, etc. Generally we recommend
decimal numbers between 0 and 1, where larger numbers denote
a greater degree of confidence in the assertions; 0
representing ‘certainly false’ and 1 representing ‘certainly
true’. |
|
Default: #IMPLIED |