Recoding Between Different Character Encodings

Apart from vendor solutions (e.g. iconv(1) on HP-UX), there are several free tools for conveting text files from one character encoding to another existing. The most comprehensive among them are:

GNU iconv
Author: Ulrich Drepper, Red Hat, Inc.
Current release: 2.2.4 (part of GNU libc 2.2.4)
Supported character encodings:
437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865, 866, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4, 8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110, ARABIC, ARABIC7, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5, BIG-FIVE, BIG5, BIG5HKSCS, BIGFIVE, BS_4730, CA, CN-BIG5, CN-GB, CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278, CP280, CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424, CP437, CP500, CP737, CP775, CP813, CP819, CP850, CP851, CP852, CP855, CP856, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP868, CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP903, CP904, CP905, CP912, CP915, CP916, CP918, CP920, CP922, CP930, CP932, CP933, CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1026, CP1046, CP1047, CP1070, CP1079, CP1081, CP1084, CP1089, CP1124, CP1129, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258, CP1361, CPIBM861, CSA7-1, CSA7-2, CSASCII, CSA_T500-1983, CSA_T500, CSA_Z243.4-1985-1, CSA_Z243.4-1985-2, CSDECMCS, CSEBCDICATDE, CSEBCDICATDEA, CSEBCDICCAFR, CSEBCDICDKNO, CSEBCDICDKNOA, CSEBCDICES, CSEBCDICESA, CSEBCDICESS, CSEBCDICFISE, CSEBCDICFISEA, CSEBCDICFR, CSEBCDICIT, CSEBCDICPT, CSEBCDICUK, CSEBCDICUS, CSEUCKR, CSEUCPKDFMTJAPANESE, CSGB2312, CSHPROMAN8, CSIBM037, CSIBM038, CSIBM273, CSIBM274, CSIBM275, CSIBM277, CSIBM278, CSIBM280, CSIBM281, CSIBM284, CSIBM285, CSIBM290, CSIBM297, CSIBM420, CSIBM423, CSIBM424, CSIBM500, CSIBM851, CSIBM855, CSIBM856, CSIBM857, CSIBM860, CSIBM863, CSIBM864, CSIBM865, CSIBM866, CSIBM868, CSIBM869, CSIBM870, CSIBM871, CSIBM880, CSIBM891, CSIBM903, CSIBM904, CSIBM905, CSIBM918, CSIBM922, CSIBM930, CSIBM932, CSIBM933, CSIBM935, CSIBM937, CSIBM939, CSIBM943, CSIBM1026, CSIBM1124, CSIBM1129, CSISO4UNITEDKINGDOM, CSISO10SWEDISH, CSISO11SWEDISHFORNAMES, CSISO14JISC6220RO, CSISO15ITALIAN, CSISO16PORTUGESE, CSISO17SPANISH, CSISO18GREEK7OLD, CSISO19LATINGREEK, CSISO21GERMAN, CSISO25FRENCH, CSISO27LATINGREEK1, CSISO49INIS, CSISO50INIS8, CSISO51INISCYRILLIC, CSISO58GB1988, CSISO60DANISHNORWEGIAN, CSISO60NORWEGIAN1, CSISO61NORWEGIAN2, CSISO69FRENCH, CSISO84PORTUGUESE2, CSISO85SPANISH2, CSISO86HUNGARIAN, CSISO88GREEK7, CSISO89ASMO449, CSISO90, CSISO92JISC62991984B, CSISO99NAPLPS, CSISO103T618BIT, CSISO111ECMACYRILLIC, CSISO121CANADIAN1, CSISO122CANADIAN2, CSISO139CSN369103, CSISO141JUSIB1002, CSISO143IECP271, CSISO150, CSISO150GREEKCCITT, CSISO151CUBA, CSISO153GOST1976874, CSISO646DANISH, CSISO2022CN, CSISO2022JP, CSISO2022JP2, CSISO2022KR, CSISO2033, CSISO5427CYRILLIC, CSISO5427CYRILLIC1981, CSISO5428GREEK, CSISO10367BOX, CSISOLATIN1, CSISOLATIN2, CSISOLATIN3, CSISOLATIN4, CSISOLATIN5, CSISOLATIN6, CSISOLATINARABIC, CSISOLATINCYRILLIC, CSISOLATINGREEK, CSISOLATINHEBREW, CSKOI8R, CSKSC5636, CSMACINTOSH, CSNATSDANO, CSNATSSEFI, CSN_369103, CSPC8CODEPAGE437, CSPC775BALTIC, CSPC850MULTILINGUAL, CSPC862LATINHEBREW, CSPCP852, CSSHIFTJIS, CSUCS4, CSUNICODE, CUBA, CWI-2, CWI, CYRILLIC, DE, DEC-MCS, DEC, DIN_66003, DK, DS2089, DS_2089, E13B, EBCDIC-AT-DE-A, EBCDIC-AT-DE, EBCDIC-BE, EBCDIC-BR, EBCDIC-CA-FR, EBCDIC-CP-AR1, EBCDIC-CP-AR2, EBCDIC-CP-BE, EBCDIC-CP-CA, EBCDIC-CP-CH, EBCDIC-CP-DK, EBCDIC-CP-ES, EBCDIC-CP-FI, EBCDIC-CP-FR, EBCDIC-CP-GB, EBCDIC-CP-GR, EBCDIC-CP-HE, EBCDIC-CP-IS, EBCDIC-CP-IT, EBCDIC-CP-NL, EBCDIC-CP-NO, EBCDIC-CP-ROECE, EBCDIC-CP-SE, EBCDIC-CP-TR, EBCDIC-CP-US, EBCDIC-CP-WT, EBCDIC-CP-YU, EBCDIC-CYRILLIC, EBCDIC-DK-NO-A, EBCDIC-DK-NO, EBCDIC-ES-A, EBCDIC-ES-S, EBCDIC-ES, EBCDIC-FI-SE-A, EBCDIC-FI-SE, EBCDIC-FR, EBCDIC-GREEK, EBCDIC-INT, EBCDIC-INT1, EBCDIC-IS-FRISS, EBCDIC-IT, EBCDIC-JP-E, EBCDIC-JP-KANA, EBCDIC-PT, EBCDIC-UK, EBCDIC-US, ECMA-114, ECMA-118, ECMA-128, ECMA-CYRILLIC, ELOT_928, ES, ES2, EUC-CN, EUC-JP, EUC-KR, EUC-TW, EUCCN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000, GB18030, GBK, GB_1988-80, GOST_19768-74, GOST_19768, GREEK-CCITT, GREEK, GREEK7-OLD, GREEK7, GREEK8, HEBREW, HP-ROMAN8, HU, IBM-856, IBM-922, IBM-930, IBM-932, IBM-933, IBM-935, IBM-937, IBM-939, IBM-943, IBM-1046, IBM-1124, IBM-1129, IBM037, IBM038, IBM256, IBM273, IBM274, IBM275, IBM277, IBM278, IBM280, IBM281, IBM284, IBM285, IBM290, IBM297, IBM367, IBM420, IBM423, IBM424, IBM437, IBM500, IBM775, IBM813, IBM819, IBM850, IBM851, IBM852, IBM855, IBM856, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, IBM868, IBM869, IBM870, IBM871, IBM874, IBM875, IBM880, IBM891, IBM903, IBM904, IBM905, IBM912, IBM915, IBM916, IBM918, IBM920, IBM922, IBM930, IBM932, IBM933, IBM935, IBM937, IBM939, IBM943, IBM1004, IBM1026, IBM1046, IBM1047, IBM1089, IBM1124, IBM1129, IEC_P27-1, INIS-8, INIS-CYRILLIC, INIS, ISIRI-3342, ISO-2022-CN-EXT, ISO-2022-CN, ISO-2022-JP-2, ISO-2022-JP, ISO-2022-KR, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4, ISO-10646/UTF-8, ISO-10646/UTF8, ISO-IR-4, ISO-IR-6, ISO-IR-8-1, ISO-IR-9-1, ISO-IR-10, ISO-IR-11, ISO-IR-14, ISO-IR-15, ISO-IR-16, ISO-IR-17, ISO-IR-18, ISO-IR-19, ISO-IR-21, ISO-IR-25, ISO-IR-27, ISO-IR-37, ISO-IR-49, ISO-IR-50, ISO-IR-51, ISO-IR-54, ISO-IR-55, ISO-IR-57, ISO-IR-60, ISO-IR-61, ISO-IR-69, ISO-IR-84, ISO-IR-85, ISO-IR-86, ISO-IR-88, ISO-IR-89, ISO-IR-90, ISO-IR-92, ISO-IR-98, ISO-IR-99, ISO-IR-100, ISO-IR-101, ISO-IR-103, ISO-IR-109, ISO-IR-110, ISO-IR-111, ISO-IR-121, ISO-IR-122, ISO-IR-126, ISO-IR-127, ISO-IR-138, ISO-IR-139, ISO-IR-141, ISO-IR-143, ISO-IR-144, ISO-IR-148, ISO-IR-150, ISO-IR-151, ISO-IR-153, ISO-IR-155, ISO-IR-156, ISO-IR-157, ISO-IR-166, ISO-IR-179, ISO-IR-193, ISO-IR-197, ISO-IR-199, ISO-IR-203, ISO-IR-226, ISO646-CA, ISO646-CA2, ISO646-CN, ISO646-CU, ISO646-DE, ISO646-DK, ISO646-ES, ISO646-ES2, ISO646-FI, ISO646-FR, ISO646-FR1, ISO646-GB, ISO646-HU, ISO646-IT, ISO646-JP-OCR-B, ISO646-JP, ISO646-KR, ISO646-NO, ISO646-NO2, ISO646-PT, ISO646-PT2, ISO646-SE, ISO646-SE2, ISO646-US, ISO646-YU, ISO6937, ISO8859-1, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, ISO8859-10, ISO8859-13, ISO8859-14, ISO8859-15, ISO8859-16, ISO_646.IRV:1991, ISO_2033-1983, ISO_2033, ISO_5427-EXT, ISO_5427, ISO_5427:1981, ISO_5428, ISO_5428:1980, ISO_6937-2, ISO_6937-2:1983, ISO_6937, ISO_6937:1992, ISO_8859-1, ISO_8859-1:1987, ISO_8859-2, ISO_8859-2:1987, ISO_8859-3, ISO_8859-3:1988, ISO_8859-4, ISO_8859-4:1988, ISO_8859-5, ISO_8859-5:1988, ISO_8859-6, ISO_8859-6:1987, ISO_8859-7, ISO_8859-7:1987, ISO_8859-8, ISO_8859-8:1988, ISO_8859-9, ISO_8859-9:1989, ISO_8859-10, ISO_8859-10:1992, ISO_8859-14:1998, ISO_8859-15:1998, ISO_9036, ISO_10367-BOX, IT, JIS_C6220-1969-RO, JIS_C6229-1984-B, JOHAB, JP-OCR-B, JP, JS, JUS_I.B1.002, KOI-7, KOI-8, KOI8-R, KOI8-U, KSC5636, L1, L2, L3, L4, L5, L6, L7, L8, L10, LATIN-GREEK-1, LATIN-GREEK, LATIN1, LATIN2, LATIN3, LATIN4, LATIN5, LATIN6, LATIN7, LATIN8, LATIN10, MAC-IS, MAC-UK, MAC, MACINTOSH, MS-ANSI, MS-ARAB, MS-CYRL, MS-EE, MS-GREEK, MS-HEBR, MS-TURK, MSCP949, MSCP1361, MSZ_7795.3, MS_KANJI, NAPLPS, NATS-DANO, NATS-SEFI, NC_NC00-10, NC_NC00-10:81, NF_Z_62-010, NF_Z_62-010_(1973), NF_Z_62-010_1973, NO, NO2, NS_4551-1, NS_4551-2, OS2LATIN1, OSF00010001, OSF00010002, OSF00010003, OSF00010004, OSF00010005, OSF00010006, OSF00010007, OSF00010008, OSF00010009, OSF0001000A, OSF00010020, OSF00010100, OSF00010101, OSF00010102, OSF00010104, OSF00010105, OSF00010106, OSF00030010, OSF0004000A, OSF0005000A, OSF05010001, OSF100201A4, OSF100201A8, OSF100201B5, OSF100201F4, OSF100203B5, OSF1002011C, OSF1002011D, OSF1002035D, OSF1002035E, OSF1002035F, OSF1002036B, OSF1002037B, OSF10010001, OSF10020025, OSF10020111, OSF10020115, OSF10020116, OSF10020118, OSF10020122, OSF10020129, OSF10020352, OSF10020354, OSF10020357, OSF10020359, OSF10020360, OSF10020364, OSF10020365, OSF10020366, OSF10020367, OSF10020370, OSF10020387, OSF10020388, OSF10020396, OSF10020402, OSF10020417, PT, PT2, R8, ROMAN8, SE, SE2, SEN_850200_B, SEN_850200_C, SHIFT-JIS, SHIFT_JIS, SJIS, SS636127, ST_SEV_358-88, T.61-8BIT, T.61, TIS-620, TIS620-0, TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK, UNICODE, UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF8, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256, WINDOWS-1257, WINDOWS-1258, WS2, YU

Availability: any major FTP site mirroring GNU suite.

GNU recode/Free recode
Author: François Pinard, Université de Montréal <pinard@iro.umontreal.ca>
Current release: 3.6 (2001-01-22).
Supported character encodings:
ANSI_X3.4-1968 ASCII-BS ASMO_449 Apple-Mac AtariST BS_4730 BS_viewdata Bang-Bang CDC-NOS CSA_Z243.4-1985-1 CSA_Z243.4-1985-2 CSA_Z243.4-1985-gr CSN_369103 DEC-MCS DIN_66003 DS_2089 EBCDIC EBCDIC-AT-DE EBCDIC-AT-DE-A EBCDIC-CA-FR EBCDIC-CCC EBCDIC-DK-NO EBCDIC-DK-NO-A EBCDIC-ES EBCDIC-ES-A EBCDIC-ES-S EBCDIC-FI-SE EBCDIC-FI-SE-A EBCDIC-FR EBCDIC-IBM EBCDIC-IT EBCDIC-PT EBCDIC-UK EBCDIC-US ECMA-cyrillic ES ES2 GB_1988-80 GOST_19768-74 HTML IBM-PC IBM037 IBM038 IBM1026 IBM273 IBM274 IBM275 IBM277 IBM278 IBM280 IBM281 IBM284 IBM285 IBM290 IBM297 IBM420 IBM423 IBM424 IBM437 IBM500 IBM850 IBM851 IBM852 IBM855 IBM857 IBM860 IBM861 IBM862 IBM863 IBM864 IBM865 IBM868 IBM869 IBM870 IBM871 IBM880 IBM891 IBM903 IBM904 IBM905 IBM918 IEC_P27-1 INIS INIS-8 INIS-cyrillic INVARIANT ISO_10367-box ISO_2033-1983 ISO_5427 ISO_5427:1981 ISO_5428:1980 ISO_646.basic:1983 ISO_646.irv:1983 ISO_6937-2-25 ISO_8859-1:1987 ISO_8859-2:1987 ISO_8859-3:1988 ISO_8859-4:1988 ISO_8859-5:1988 ISO_8859-6:1987 ISO_8859-7:1987 ISO_8859-8:1988 ISO_8859-9:1989 ISO_8859-supp IT Icon-QNX JIS_C6220-1969-jp JIS_C6220-1969-ro JIS_C6229-1984-a JIS_C6229-1984-b JIS_C6229-1984-b-add JIS_C6229-1984-hand JIS_C6229-1984-hand-add JIS_C6229-1984-kana JIS_X0201 JUS_I.B1.002 JUS_I.B1.003-mac JUS_I.B1.003-serb KSC5636 LaTeX Latin-greek-1 MSZ_7795.3 NATS-DANO NATS-DANO-ADD NATS-SEFI NATS-SEFI-ADD NC_NC00-10:81 NF_Z_62-010 NF_Z_62-010_(1973) NS_4551-1 NS_4551-2 NeXT PT PT2 RFC SEN_850200_B SEN_850200_C T.61-7bit Texte dk-us flat greek-ccitt greek7 greek7-old hp-roman8 latin-greek latin-lap latin6 macintosh us-dk

Availability: any major FTP site mirroring GNU suite.

N.B. Microsoft codepages 1250-1259 are not supported in GNU recode 3.4 and earlier versions. If you are unwilling to upgrade and CP1250 is your only concern, apply Branimir Dolicki's patch.

trans
Author: Kosta Kostis, <kostis@acm.org>
Current release: 1.13 (1997-06-14).
Supported character encodings:
adobeiso adobestd adobesym applecro applegk2 applegrk appleice applerom applerom.net applerum appletur atarist cp1250 cp1251 cp1252 cp1252.net cp1253 cp1254 cp1255 cp1256 cp437 cp737 cp850 cp850.net cp851 cp852 cp852.net cp853 cp853.net cp855 cp857 cp857.net cp860 cp861 cp862 cp863 cp864 cp865 cp866 cp866.net cp869 cp869.net cp895 cyrilbas decmcs ebc037 ebc1047 ebc1047.net ebc875 ebc875.net hp48 hproman8 hproman8.net iso10646 iso646.ca iso646.ch iso646.de iso646.es iso646.fi iso646.fr iso646.gb iso646.irv iso646.it iso646.nl iso646.no iso646.pt iso646.se iso8859.1 iso8859.10 iso8859.2 iso8859.3 iso8859.4 iso8859.5 iso8859.6 iso8859.7 iso8859.8 iso8859.9 koi8-r koi8-r.net mslinedr nextstep.enc nextstep.net symbol template tex-dcr.in tex-dcr.out wingding wingreek

Availability: Universität Erlangen

C3 System (Coded Character set Conversion)
Authors: C3 Task Force within TERENA (Borka Jerman-Blazic <jerman-blazic@ijs.si>, Olle Jarnefors <ojarnef@admin.kth.se>, Peter Svanberg <psv@nada.kth.se>, Keld Simonsen <keld@dkuug.dk>).
Current release: ap45 (1994-12-15).
Supported character encodings:
ANSI X3.4, SS 63 61 27, SS 63 61 27, NS 4551, BS 4730, JUS I.Bl.002, ISO 8859-1, ISO 8859-2, ISO 8859-5, IBM CP437, IBM CP850, ISO 10646

Availability: Kungl Teknikska Högskolan

MtRecode: The Multext character translation program
Author: Claude de Loupy, <loupy@lpl.univ-aix.fr>
Current release: 1.2 (1996-08-07)
Supported character encodings:
ISO 646-IRV, ISO 8859-1, ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-6, ISO 8859-7, ISO 8859-8, ISO 8859-9, ISO 8859-10, Mac Roman, EasyFrench, SGML entities

Availability: Centre National de la Recherche Scientifique


Created 1996-09-10 by P. Peterlin
Last revision $Date: 2001/09/13 13:21:40 $ ($Author: gnusl $)
To ISO 8859-2 (Latin 2) Resources