Corpus Encoding Standard - Document CES 1. Annex 1. Version 1.1. Last Modified 18 June 1996.

Relevant standards

This is a list of standards referred to in the CES or relevant to text encoding generally.

Document encoding

ISO 8879:1986

Information Processing--Text and Office Systems--Standard Generalized Markup Language (SGML)

ISO/IEC DIS 13673:1993

Information Technology -- Text and Office Systems -- Conformance Testing for Standard Generalized Markup Language (SGML) Systems

TEI P3:1994

Sperberg-McQueen, C.M., Burnard, L. (Eds.) (1994) Guidelines for Electronic Text Encoding and Interchange, TextEncoding Initiative, Chicago and Oxford. Available online at

<URL:http://etext.virginia.edu/TEI.html>

ISO/IEC DIS 10744:1992

Hypermedia/Time-based Document Structuring Language (Hytime)

ISO 12083

Standardized SGML document type definitions for books, articles with tables, formulaes, etc.

ISO 8601:1988

Representation of dates and times.

"This standard defines a lot of details of the calendar. E.g. the ISO definition of the week numbers is that the first day (day number 1) of a week is Monday and that the first week in a year (week number 1) is the week that includes the first Thursday in January, i.e. the first week that has at least four days in January. Other definitions are, e.g., that hours of a day are counted from 0 to 24 and that the international notation of dates is the Bigendian format year-month-day, e.g. 1993-04-17 and that for time is e.g. 20:36:04 (hh:mm:ss). There are also string formats for computer applications specified that have to represent date and time in files and protocol packets. (See

<URL:ftp://ftp.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z>
for a very detailed summary.)"

ISO 4217

Codes for the representation of currencies and funds

ITU-T/CCITT Recommendation E.123

Notation for international telephone numbers (a '+' followed by the country code, followed by a space, ...).

Language and country codes

ISO 639:1988

Code for the representation of names of languages

Provides two-letter codes for about 140 languages and is intended primarily for use in terminology, lexicography and linguistics.

The list is available online at
<URL:http://www.stonehand.com/unicode/standard/iso639.html>

ISO 639-2:1995

Code for the representation of names of languages--Alpha-3 code

Three-letter codes for the representation of names of languages for information interchange", developed by a Joint Working Group of ISO TC37/SC2 and TC46/SC2. Covers a wider range of the world's languages than ISO 639.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/cd639-2.html>

ISO 3166:1993

Codes for the representation of names of countries

This standard defines a 2-letter, a 3-letter and a numeric code for each country on this planet. E.g. US/USA/840=United States, DE/DEU/276=Germany, GB/GBR/826=United Kingdom, FR/FRA/250=France, ...). The 2-letter codes are well known in the Internet as top-level domain names. The 3-letter versions are often used at international sports events.

Character sets

ISO 646.IRV:1991

Information Processing -- ISO 7-bit coded character set for information interchange [=ANSI X3.4-1986]

ISO-8859

Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No. 5, ISO 8859-9, 1990.

ISO/IEC 10646-1

Information technology - Character sets and information coding -Universal multiple-octet coded character set - Part 1 - Architecture and basic multilingual plane

GLOSIX 0.1

EAGLES Tools subgroup. DOCUMENT MUL/EAG--LSD1 Version of December 1995.
Guidelines for Linguistic Software Development - Draft proposal

<URL:http://www.lpl.univ-aix.fr/projects/multext/LSD/LSD.html>

UNICODE 1.1

"The Unicode Standard, Version 1.1": Version 1.0, Volume 1 (ISBN 0-201-56788-1), Version 1.0, Volume 2 (ISBN 0- 201-60845-6), and "Unicode Technical Report #4, The Unicode Standard, Version 1.1" (available from The Unicode Consortium, and soon to be published by Addison- Wesley).

[This character set is identical with the character repertoire and coding of the international standard ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2; Subset=300; Implementation Level=3.]