CES is an effort supproted by EAGLES (the EAGLES Text Representation subgroup), European projects MULTEXT (LRE) and MULTEXT-East. It aims to develop a Corpus Encoding Standard (CES) optimally suited for use in language engineering, which can serve as a widely accepted set of encoding standards for corpus-based work. The overall goal is the identification of a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and linguistic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding conventions for more extensive encoding and for linguistic annotation. CES is, to a large extent, TEI conformant.
CES provides:
Linguistic annotation is encoded in separate documents, linked to the primary data.