In addition to the morphosyntactic specifications (described in detail in further sections) and corpus tags (specified in the Multext-East files tbl.tag.corpus.xx, xx standing for the two-letter language code), a common set of corpus tags for punctuation marks has been defined for all the languages involved in the project. The table below gives the list of punctuation marks along with the corpus tags assigned. The corpus punctuation tags appear in the cesAna format of the disambiguated parallel multilingual corpus as described in the Deliverable D2.3 F. All 7 components (for each language involved) share this common set of corpus tags for punctuation.
=========== ========== =============================
Orthography Corpus tag Definition
=========== ========== =============================
. PERIOD period (full-stop)
, COMMA comma
; SCOLON semi-colon
: COLON colon
? QUEST question mark
! EXCL exclamation mark
... HELLIP ellipsis
— DASH dash
( LPAR left (opening) parenthesis
) RPAR right (closing) parenthesis
" ODBLQ open double-quotes
" CDBLQ close double-quotes
- HYPHEN hyphen
/ SLASH slash
[ LSQR left (opening) square bracket
] RSQR right (closing) square bracket