ISO-IR-165
The CCITT Chinese Primary Set[2] is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992.[3] It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex.[2] It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165,[4] and encodable in the ISO-2022-CN-EXT code version.[1]
MIME / IANA | iso-ir-165 |
---|---|
Alias(es) | CN-GB-ISOIR165 (EUC form)[1] |
Language(s) | Simplified Chinese, English, Russian Partial support: Greek, Japanese |
Standard | ITU T.101, annex C |
Definitions | ISO-IR 165 |
Extends | GB 2312 |
Encoding formats | ISO-2022-CN-EXT, Videotex Data Syntax 2 |
Succeeded by | GB 18030 |
It is an extended modification of GB 2312-80, and corresponds to the union of the Mainland Chinese GB standards GB 6345.1-86 and GB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated into GB 18030, while GB 8565.2 serves as the Mainland Chinese source reference for certain CJK Unified Ideographs.
GB 6345.1
GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both a corrigendum and an extension for GB 2312. The corrigendum alters the following two characters:[3]
Row-cell | EUC | Unamended | GB 6341.1 | Notes |
---|---|---|---|---|
03-71 | 0xA3E7 | ɡ | [lower-alpha 1] | |
79-81 | 0xEFF1 | 鍾 | 锺 | [lower-alpha 2] |
- Corresponds to U+FF47 g in Unicode; however, the unamended reference glyph can also correspond to U+0261 ɡ . See below for how U+0261 is mapped to/from GB 6341.1, versus how it is mapped to/from ISO-IR-165.
- The unamended reference glyph is a Traditional Chinese character corresponding to U+937E. The character in question is usually replaced with 钟 (U+949F, also the simplification of 鐘) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding to U+953A.
Deployed implementations incorporating GB 2312, such as Windows code page 936, generally follow these corrections when selecting their Unicode mappings.[5]
The extension adds half-width ISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3), extends the set of 26 non-ASCII pinyin characters in row 8 with six additional such characters, and adds half-width forms of these 32 pinyin characters to row 11.[3] These GB 6345.1 extensions are also incorporated into GB/T 12345, the Traditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.[3][6]
The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345, but not the half-width forms, are included in the classic Mac OS encoding for Simplified Chinese (a modification of EUC-CN),[7] and also as two-byte codes in GB 18030.[8] The additional pinyin characters are as follows:[7]
Row-cell | EUC | Character[7][8] | Notes |
---|---|---|---|
08-27 | 0xA8BB | U+0251 ɑ | |
08-28 | 0xA8BC | U+1E3F ḿ | [lower-alpha 1] |
08-29 | 0xA8BD | U+0144 ń | |
08-30 | 0xA8BE | U+0148 ň | |
08-31 | 0xA8BF | U+01F9 ǹ | [lower-alpha 2] |
08-32 | 0xA8C0 | U+0261 ɡ | [lower-alpha 3] |
- Mapped to the Private Use Area U+E7C7 by the first (2000) edition of GB 18030; this was amended by the 2005 edition.[8]
- This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e. U+006E+0300) by Apple.[7] This change predates the stabilisation of Unicode normalisation forms, which was introduced in Unicode 3.1.[9]
- Matches the unamended reference glyph for 03-71 (see above). ISO-IR-165 differs here (see below).
GB 8565.2
GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.[3]
The Unihan database references GB 8565.2 as the Mainland Chinese source of several hanzi included in Unicode. Its Unihan source abbreviation is G8
.[2]
CCITT changes
ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).[3][4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of the Unihan database.[2] In total the set contains 8446 characters.
A number of patterned semigraphic characters are included in row 6.[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese[7] and GB 18030.[8]
The GB 6345.1 corrections to GB 2312 are only partly applied, resulting in two Unicode mappings being reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions:
Row-cell | EUC | GB 2312 (unamended) | GB 6341.1 | GB 6341.1 mapping[7][8] | ISO-IR-165[4] | ISO-IR-165 mapping[10] |
---|---|---|---|---|---|---|
03-71 | 0xA3E7 | ɡ | U+FF47 | ɡ | U+0261 | |
08-32 | 0xA8C0 | (absent) | ɡ | U+0261 | U+FF47 | |
79-81 | 0xEFF1 | 鍾 | 锺 | U+953A | 锺 | U+953A |
References
- Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.
- Chung, Jaemin (2018-01-24). "Pseudo-G8 characters" (PDF). ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.
- Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–111. ISBN 978-0-596-51447-1.
- CCITT (1992-07-13). Codes of the Chinese graphic character set for communication (PDF). ITSCJ/IPSJ. ISO-IR-165.
- Steele, Shawn (2000). "cp936 to Unicode table". Microsoft, Unicode Consortium.
- Lunde, Ken (1998). "Appendix F: GB/T 12345" (PDF). CJKV Information Processing. O'Reilly Media. ISBN 9781565922242.
- "Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later". Apple, Inc.
- Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.
- "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
- Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)
External links
- ISO-IR-165: Code of the Chinese graphic character set for communication (registered 1992, amended 1994)
- Unicode mappings for ISO-IR-165