KS X 1001

KS X 1001, "Code for Information Interchange (Hangul and Hanja)",[lower-alpha 1][1] formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.

KS X 1001
MIME / IANAks_c_5601-1987
Alias(es)KS C 5601
Language(s)Korean, English, Russian
Partial support:
Greek, Japanese
StandardKS X 1001
ClassificationISO-2022-compatible DBCS, CJK encoding
Encoding formatsEUC-KR, ISO 2022, UHC, Johab
Preceded byN-byte Hangul code (KS C 5601-1974)
Other related encoding(s)KS X 1002, KPS 9566, JIS X 0208, GB 2312, GB 12052

KS X 1001 is encoded by the most common legacy (pre-Unicode) character encodings for Korean, including EUC-KR and Microsoft's Unified Hangul Code (UHC). It contains Korean Hangul syllables, CJK ideographs (Hanja), Greek, Cyrillic, Japanese (Hiragana and Katakana) and some other characters.

KS X 1001 is arranged as a 94×94 table, following the structure of 2-byte code words in ISO 2022 and EUC. Therefore, its code points are pairs of integers 1–94. However, some encodings (UHC and Johab), in addition to providing codes for every code point, provide additional codes for characters otherwise representable only as code point sequences.

History

This standard was previously known as KS C 5601. There have been several revisions of this standard. For example, there were revisions in 1987, 1992, 1998 and 2002.

The present, double-byte, Wansung (완성, Wanseong, 'precomposing')[1] character set was standardised by the third edition of KS C 5601,[2] which was published in 1986.[3] It is an ISO 2022 compatible encoding, typically used in EUC form, which assigns double-byte codes for non-Hangul, Hangul jamo, and the most common Hangul syllables, in contrast to Johab (조합, Johap, 'combining')[1] which assigns double-byte codes to all Hangul syllables using modern jamo. Wansung is technically a variable-length encoding, allowing other syllables to be represented with eight-byte sequences (using the jamo and Hangul Filler character), but this feature is not always implemented.[4]

The earliest edition of KS C 5601, published in 1974,[2] defined a variable-length[2] 7-bit character set which assigned single-byte code points to 51[3] basic Hangul jamo, somewhat analogously to JIS C 6220, in an encoding known as "N-byte Hangul".[5] The second edition, published in 1982, retained the main character set from the 1974 edition but defined two supplementary sets, including Johab. Neither edition was adopted as widely as intended.[2]

Wansung was kept unchanged in the 1987 and 1992 editions. In the 1992 edition, additional annex material was added,[3] including the definition of the Johab encoding[6] in annex 3, and the older N-byte Hangul encoding in annex 4.[1][5] It was published in response to industry use of Johab as a competing encoding to Wansung, being used at the time by Hangul Word Processor. Following the introduction of Unified Hangul Code by Microsoft in Windows 95, and Hangul Word Processor abandoning Johab in favour of Unicode in 2000, Johab ceased to be commonly used.[2]

Encodings

Various CJK encodings, including four based on KS X 1001, supported by Mozilla Firefox as of 2004. (This support has been reduced in later versions to avoid certain cross site scripting attacks.)

Encoding schemes of KS X 1001 include EUC-KR (in both ASCII and ISO 646-KR based variants, the latter of which includes a won currency sign () at byte 0x5C rather than a backslash) and ISO-2022-KR,[7] as well as ISO-2022-JP-2 (which also encodes JIS X 0208 and JIS X 0212). These all have the drawback that they only assign codes for the 2350 precomposed Hangul syllables which have their own KS X 1001 codepoints (out of 11172 in total, not counting those using obsolete jamo), and require others to use eight-byte composition sequences, which are not supported by some partial implementations of the standard.[4]

The Johab encoding (stipulated in annex 3 of the 1992 version of the standard) and the EUC-KR superset known as Unified Hangul Code (UHC, also called Windows-949) provide single codes for all 11172 Hangul syllables.[7][6] ISO-2022-KR and Johab are rarely used. Some operating systems extend this standard in other non-uniform ways, e.g. the EUC-KR extensions MacKorean on the classic Mac OS, and IBM-949 by IBM.

Hangul Filler

The Hangul Filler character is used to introduce eight-byte Hangul composition sequences[8][9] and to stand in for an absent element (usually an empty final) in such a sequence.[9]

Unicode includes the Wansung code Hangul Filler in the Hangul Compatibility Jamo block for round-trip compatibility, but uses its own system (with its own, differently used, filler characters) for composing Hangul. The KS X 1001 Hangul composition system is not used in Unicode, and the filler renders merely as an empty space; KS X 1001 composition sequences using modern jamo may be mapped to precomposed characters in Unicode.[9] This is not usually done with Unified Hangul Code.

For round-trip compatibility, Unicode also includes the N-byte Hangul code Hangul Filler separately in the Halfwidth and Fullwidth Forms block, named the "Halfwidth Hangul Filler".

N-byte Hangul code

This is the N-byte Hangul code,[5] as specified by KS C 5601-1974 and by annex 4 of KS C 5601-1992. The second half of IBM's Code page 1040[10] is a superset of this, assigning the characters ¢¬\~ (although not £) to the same locations as in Code page 1041. Character 0x40/0xC0 is a Hangul Filler (see above), used in combining sequences.

Similarly to its Japanese counterpart JIS C 6220 (JIS X 0201), N-byte Hangul code could be used as a 7-bit encoding, with character allocations over the range 0x40 through 0x7C.[5] The chart below shows the code in an 8-bit environment with the high bit set (i.e. over 0xC0 through 0xFC), as it is used in e.g. code page 1040.

KS C 5601-1974 / N-byte Hangul[11]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
8_
128
9_
144
A_
160
B_
176
C_
192
HWHF
FFA0

FFA1

FFA2

FFA3

FFA4

FFA5

FFA6

FFA7

FFA8

FFA9

FFAA

FFAB

FFAC

FFAD

FFAE

FFAF
D_
208

FFB0

FFB1

FFB2

FFB3

FFB4

FFB5

FFB6

FFB7

FFB8

FFB9

FFBA

FFBB

FFBC

FFBD

FFBE
E_
224

FFC2

FFC3

FFC4

FFC5

FFC6

FFC7

FFCA

FFCB

FFCC

FFCD

FFCE

FFCF
F_
240

FFD2

FFD3

FFD4

FFD5

FFD6

FFD7

FFDA

FFDB

FFDC

Wansung code charts

Following are the code charts for KS X 1001 in Wansung layout. Where a pair of hexadecimal numbers is given, the smaller is used when encoded over GL (0x21-0x7E), as in ISO-2022-KR when the Korean set has been shifted to, and the larger is used in the more typical case of it being encoded over GR (0xA1-0xFE), as in EUC-KR or UHC. Johab changes the arrangement to encode all 11172 Hangul clusters separately and in order.

Character set 0x21 / 0xA1 (row number 1, special characters)

This set contains punctuation and other symbols, excluding punctuation present in KS X 1003 (which is included in row 3). Encodings which combine KS X 1001 with single-byte ASCII may use alternative Unicode mapping to the Halfwidth and Fullwidth Forms block for the backslash. Unicode mapping of the wave dash (tilde dash) also differs between vendors, and may be U+301C (favoured by IBM and Apple)[12][13][14] or U+223C (favoured by Microsoft).[15][16] Compare the similar but not identical handling of the JIS wave dash, and the handling of the tilde in the next row.

Except for the backslash, if two mappings are shown below, the first is used by Apple and the second is used by Microsoft.[14][16]

KS X 1001 (prefixed with 0x21 / 0xA1)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 
IDSP
3000
1-1

3001
1-2

3002
1-3
·
00B7
1-4

2025
1-5

2026
1-6
¨
00A8
1-7

3003
1-8
/SHY
2013/00AD
1-9
/
2014/2015
1-10
/
2016/2225
1-11
\/
005C/FF3C
1-12
/
301C/223C
1-13

2018
1-14

2019
1-15
3_/B_
201C
1-16

201D
1-17

3014
1-18

3015
1-19

3008
1-20

3009
1-21

300A
1-22

300B
1-23

300C
1-24

300D
1-25

300E
1-26

300F
1-27

3010
1-28

3011
1-29
±
00B1
1-30
×
00D7
1-31
4_/C_ ÷
00F7
1-32

2260
1-33

2264
1-34

2265
1-35

221E
1-36

2234
1-37
°
00B0
1-38

2032
1-39

2033
1-40

2103
1-41

212B
1-42
¢/
00A2/FFE0
1-43
£/
00A3/FFE1
1-44
¥/
00A5/FFE5
1-45

2642
1-46

2640
1-47
5_/D_
2220
1-48

22A5
1-49

2312
1-50

2202
1-51

2207
1-52

2261
1-53

2252
1-54
§
00A7
1-55

203B
1-56

2606
1-57

2605
1-58

25CB
1-59

25CF
1-60

25CE
1-61

25C7
1-62

25C6
1-63
6_/E_
25A1
1-64

25A0
1-65

25B3
1-66

25B2
1-67

25BD
1-68

25BC
1-69

2192
1-70

2190
1-71

2191
1-72

2193
1-73

2194
1-74

3013
1-75

226A
1-76

226B
1-77

221A
1-78

223D
1-79
7_/F_
221D
1-80

2235
1-81

222B
1-82

222C
1-83

2208
1-84

220B
1-85

2286
1-86

2287
1-87

2282
1-88

2283
1-89

222A
1-90

2229
1-91

2227
1-92

2228
1-93
¬/
00AC/FFE2
1-94
 
 
 

  Letter  Number  Punctuation  Symbol  Other  Undefined

Character set 0x22 / 0xA2 (row number 2, special characters)

This set contains additional punctuation and symbols. Similarly to the tilde character in the previous row, different mappings are used by Apple and Microsoft for the tilde character in this row (U+02DC by Apple, FF5E by Microsoft),[14][16] which is intended to be shown as a raised tilde, whereas the tilde in the previous row is intended to be shown in-line at dash height.[17] Mapping of the circled dot also differs.[14][16]

The euro and registered trademark sign were added in 1998, while the postal mark (㉾) was added in 2002.[1]

KS X 1001 (prefixed with 0x22 / 0xA2)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

21D2
2-1

21D4
2-2

2200
2-3

2203
2-4
´
00B4
2-5
˜/
02DC/FF5E
2-6
ˇ
02C7
2-7
˘
02D8
2-8
˝
02DD
2-9
˚
02DA
2-10
˙
02D9
2-11
¸
00B8
2-12
˛
02DB
2-13
¡
00A1
2-14
¿
00BF
2-15
3_/B_ ː
02D0
2-16

222E
2-17

2211
2-18

220F
2-19
¤
00A4
2-20

2109
2-21

2030
2-22

25C1
2-23

25C0
2-24

25B7
2-25

25B6
2-26

2664
2-27

2660
2-28

2661
2-29

2665
2-30

2667
2-31
4_/C_
2663
2-32
/
25C9/2299
2-33

25C8
2-34

25A3
2-35

25D0
2-36

25D1
2-37

2592
2-38

25A4
2-39

25A5
2-40

25A8
2-41

25A7
2-42

25A6
2-43

25A9
2-44

2668
2-45

260F
2-46

260E
2-47
5_/D_
261C
2-48

261E
2-49

00B6
2-50

2020
2-51

2021
2-52

2195
2-53

2197
2-54

2199
2-55

2196
2-56

2198
2-57

266D
2-58

2669
2-59

266A
2-60

266C
2-61

327F
2-62

321C
2-63
6_/E_
2116
2-64

33C7
2-65

2122
2-66

33C2
2-67

33D8
2-68

2121
2-69

20AC
2-70
®
00AE
2-71

327E
2-72
 
 
2-73
 
 
2-74
 
 
2-75
 
 
2-76
 
 
2-77
 
 
2-78
 
 
2-79
7_/F_  
 
2-80
 
 
2-81
 
 
2-82
 
 
2-83
 
 
2-84
 
 
2-85
 
 
2-86
 
 
2-87
 
 
2-88
 
 
2-89
 
 
2-90
 
 
2-91
 
 
2-92
 
 
2-93
 
 
2-94
 
 
 

Character set 0x23 / 0xA3 (row number 3, basic Latin / ISO 646-KR)

This set corresponds to KS X 1003 (the ISO 646 variant for Korean, a similar set to ASCII), but as two-byte codes preceded by 0x23 (or 0xA3 in GR-delegated (EUC) form). It includes the English alphabet / Basic Latin alphabet, western Arabic numerals and punctuation.

Compare the Roman set of JIS X 0201, which differs by including a Yen sign rather than a Won sign. Contrast the third rows of KPS 9566 and of JIS X 0208, which follow the ISO 646 layout but only include letters and digits.

KS X 1001 (prefixed with 0x23 / 0xA3); non-fullwidth mappings
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 
!
0021
3-1
"
0022
3-2
#
0023
3-3
$
0024
3-4
%
0025
3-5
&
0026
3-6
'
0027
3-7
(
0028
3-8
)
0029
3-9
*
002A
3-10
+
002B
3-11
,
002C
3-12
-
002D
3-13
.
002E
3-14
/
002F
3-15
3_/B_ 0
0030
3-16
1
0031
3-17
2
0032
3-18
3
0033
3-19
4
0034
3-20
5
0035
3-21
6
0036
3-22
7
0037
3-23
8
0038
3-24
9
0039
3-25
:
003A
3-26
;
003B
3-27
<
003C
3-28
=
003D
3-29
>
003E
3-30
?
003F
3-31
4_/C_ @
0040
3-32
A
0041
3-33
B
0042
3-34
C
0043
3-35
D
0044
3-36
E
0045
3-37
F
0046
3-38
G
0047
3-39
H
0048
3-40
I
0049
3-41
J
004A
3-42
K
004B
3-43
L
004C
3-44
M
004D
3-45
N
004E
3-46
O
004F
3-47
5_/D_ P
0050
3-48
Q
0051
3-49
R
0052
3-50
S
0053
3-51
T
0054
3-52
U
0055
3-53
V
0056
3-54
W
0057
3-55
X
0058
3-56
Y
0059
3-57
Z
005A
3-58
[
005B
3-59

20A9
3-60
]
005D
3-61
^
005E
3-62
_
005F
3-63
6_/E_ `
0060
3-64
a
0061
3-65
b
0062
3-66
c
0063
3-67
d
0064
3-68
e
0065
3-69
f
0066
3-70
g
0067
3-71
h
0068
3-72
i
0069
3-73
j
006A
3-74
k
006B
3-75
l
006C
3-76
m
006D
3-77
n
006E
3-78
o
006F
3-79
7_/F_ p
0070
3-80
q
0071
3-81
r
0072
3-82
s
0073
3-83
t
0074
3-84
u
0075
3-85
v
0076
3-86
w
0077
3-87
x
0078
3-88
y
0079
3-89
z
007A
3-90
{
007B
3-91
|
007C
3-92
}
007D
3-93

203E
3-94
 
 
 

Encodings such as EUC-KR and UHC combine KS X 1001 with single-byte ASCII or KS X 1003, and hence use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block for the double-byte representations of these characters.

KS X 1001 (prefixed with 0x23 / 0xA3); fullwidth mappings
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

FF01
3-1

FF02
3-2

FF03
3-3

FF04
3-4

FF05
3-5

FF06
3-6

FF07
3-7

FF08
3-8

FF09
3-9

FF0A
3-10

FF0B
3-11

FF0C
3-12

FF0D
3-13

FF0E
3-14

FF0F
3-15
3_/B_
FF10
3-16

FF11
3-17

FF12
3-18

FF13
3-19

FF14
3-20

FF15
3-21

FF16
3-22

FF17
3-23

FF18
3-24

FF19
3-25

FF1A
3-26

FF1B
3-27

FF1C
3-28

FF1D
3-29

FF1E
3-30

FF1F
3-31
4_/C_
FF20
3-32

FF21
3-33

FF22
3-34

FF23
3-35

FF24
3-36

FF25
3-37

FF26
3-38

FF27
3-39

FF28
3-40

FF29
3-41

FF2A
3-42

FF2B
3-43

FF2C
3-44

FF2D
3-45

FF2E
3-46

FF2F
3-47
5_/D_
FF30
3-48

FF31
3-49

FF32
3-50

FF33
3-51

FF34
3-52

FF35
3-53

FF36
3-54

FF37
3-55

FF38
3-56

FF39
3-57

FF3A
3-58

FF3B
3-59

FFE6
3-60

FF3D
3-61

FF3E
3-62
_
FF3F
3-63
6_/E_
FF40
3-64

FF41
3-65

FF42
3-66

FF43
3-67

FF44
3-68

FF45
3-69

FF46
3-70

FF47
3-71

FF48
3-72

FF49
3-73

FF4A
3-74

FF4B
3-75

FF4C
3-76

FF4D
3-77

FF4E
3-78

FF4F
3-79
7_/F_
FF50
3-80

FF51
3-81

FF52
3-82

FF53
3-83

FF54
3-84

FF55
3-85

FF56
3-86

FF57
3-87

FF58
3-88

FF59
3-89

FF5A
3-90

FF5B
3-91

FF5C
3-92

FF5D
3-93

FFE3
3-94
 
 
 

Character set 0x24 / 0xA4 (row number 4, Hangul jamo)

This set includes modern Hangul consonants, followed by vowels, both ordered by South Korean collation customs, followed by obsolete consonants. When used individually, these characters map to the Unicode Hangul Compatibility Jamo block, and do not have a one-to-one mapping with the position-specific characters in the Hangul Jamo block. Compare with row 4 of the North Korean KPS 9566. Character 04-52 is a Hangul Filler (see above), used in combining sequences.

KS X 1001 (prefixed with 0x24 / 0xA4)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

3131
4-1

3132
4-2

3133
4-3

3134
4-4

3135
4-5

3136
4-6

3137
4-7

3138
4-8

3139
4-9

313A
4-10

313B
4-11

313C
4-12

313D
4-13

313E
4-14

313F
4-15
3_/B_
3140
4-16

3141
4-17

3142
4-18

3143
4-19

3144
4-20

3145
4-21

3146
4-22

3147
4-23

3148
4-24

3149
4-25

314A
4-26

314B
4-27

314C
4-28

314D
4-29

314E
4-30

314F
4-31
4_/C_
3150
4-32

3151
4-33

3152
4-34

3153
4-35

3154
4-36

3155
4-37

3156
4-38

3157
4-39

3158
4-40

3159
4-41

315A
4-42

315B
4-43

315C
4-44

315D
4-45

315E
4-46

315F
4-47
5_/D_
3160
4-48

3161
4-49

3162
4-50

3163
4-51
HF
3164
4-52

3165
4-53

3166
4-54

3167
4-55

3168
4-56

3169
4-57

316A
4-58

316B
4-59

316C
4-60

316D
4-61

316E
4-62

316F
4-63
6_/E_
3170
4-64

3171
4-65

3172
4-66

3173
4-67

3174
4-68

3175
4-69

3176
4-70

3177
4-71

3178
4-72

3179
4-73

317A
4-74

317B
4-75

317C
4-76

317D
4-77

317E
4-78

317F
4-79
7_/F_
3180
4-80

3181
4-81

3182
4-82

3183
4-83

3184
4-84

3185
4-85

3186
4-86

3187
4-87

3188
4-88

3189
4-89

318A
4-90

318B
4-91

318C
4-92

318D
4-93

318E
4-94
 
 
 

Character set 0x25 / 0xA5 (row number 5, Roman numerals and Greek)

This set contains Roman numerals and basic support for the Greek alphabet, without diacritics or the final sigma.

Contrast row 6 of KPS 9566, which includes the same characters but in a different layout.

KS X 1001 (prefixed with 0x25 / 0xA5)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

2170
5-1

2171
5-2

2172
5-3

2173
5-4

2174
5-5

2175
5-6

2176
5-7

2177
5-8

2178
5-9

2179
5-10
 
 
5-11
 
 
5-12
 
 
5-13
 
 
5-14
 
 
5-15
3_/B_
2160
5-16

2161
5-17

2162
5-18

2163
5-19

2164
5-20

2165
5-21

2166
5-22

2167
5-23

2168
5-24

2169
5-25
 
 
5-26
 
 
5-27
 
 
5-28
 
 
5-29
 
 
5-30
 
 
5-31
4_/C_  
 
5-32
Α
0391
5-33
Β
0392
5-34
Γ
0393
5-35
Δ
0394
5-36
Ε
0395
5-37
Ζ
0396
5-38
Η
0397
5-39
Θ
0398
5-40
Ι
0399
5-41
Κ
039A
5-42
Λ
039B
5-43
Μ
039C
5-44
Ν
039D
5-45
Ξ
039E
5-46
Ο
039F
5-47
5_/D_ Π
03A0
5-48
Ρ
03A1
5-49
Σ
03A3
5-50
Τ
03A4
5-51
Υ
03A5
5-52
Φ
03A6
5-53
Χ
03A7
5-54
Ψ
03A8
5-55
Ω
03A9
5-56
 
 
5-57
 
 
5-58
 
 
5-59
 
 
5-60
 
 
5-61
 
 
5-62
 
 
5-63
6_/E_  
 
5-64
α
03B1
5-65
β
03B2
5-66
γ
03B3
5-67
δ
03B4
5-68
ε
03B5
5-69
ζ
03B6
5-70
η
03B7
5-71
θ
03B8
5-72
ι
03B9
5-73
κ
03BA
5-74
λ
03BB
5-75
μ
03BC
5-76
ν
03BD
5-77
ξ
03BE
5-78
ο
03BF
5-79
7_/F_ π
03C0
5-80
ρ
03C1
5-81
σ
03C3
5-82
τ
03C4
5-83
υ
03C5
5-84
φ
03C6
5-85
χ
03C7
5-86
ψ
03C8
5-87
ω
03C9
5-88
 
 
5-89
 
 
5-90
 
 
5-91
 
 
5-92
 
 
5-93
 
 
5-94
 
 
 

Character set 0x26 / 0xA6 (row number 6, box drawing)

KS X 1001 (prefixed with 0x26 / 0xA6)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

2500
6-1

2502
6-2

250C
6-3

2510
6-4

2518
6-5

2514
6-6

251C
6-7

252C
6-8

2524
6-9

2534
6-10

253C
6-11

2501
6-12

2503
6-13

250F
6-14

2513
6-15
3_/B_
251B
6-16

2517
6-17

2523
6-18

2533
6-19

252B
6-20

253B
6-21

254B
6-22

2520
6-23

252F
6-24

2528
6-25

2537
6-26

253F
6-27

251D
6-28

2530
6-29

2525
6-30

2538
6-31
4_/C_
2542
6-32

2512
6-33

2511
6-34

251A
6-35

2519
6-36

2516
6-37

2515
6-38

250E
6-39

250D
6-40

251E
6-41

251F
6-42

2521
6-43

2522
6-44

2526
6-45

2527
6-46

2529
6-47
5_/D_
252A
6-48

252D
6-49

252E
6-50

2531
6-51

2532
6-52

2535
6-53

2536
6-54

2539
6-55

253A
6-56

253D
6-57

253E
6-58

2540
6-59

2541
6-60

2543
6-61

2544
6-62

2545
6-63
6_/E_
2546
6-64

2547
6-65

2548
6-66

2549
6-67

254A
6-68
 
 
6-69
 
 
6-70
 
 
6-71
 
 
6-72
 
 
6-73
 
 
6-74
 
 
6-75
 
 
6-76
 
 
6-77
 
 
6-78
 
 
6-79
7_/F_  
 
6-80
 
 
6-81
 
 
6-82
 
 
6-83
 
 
6-84
 
 
6-85
 
 
6-86
 
 
6-87
 
 
6-88
 
 
6-89
 
 
6-90
 
 
6-91
 
 
6-92
 
 
6-93
 
 
6-94
 
 
 

Character set 0x27 / 0xA7 (row number 7, unit symbols)

KS X 1001 (prefixed with 0x27 / 0xA7)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

3395
7-1

3396
7-2

3397
7-3

2113
7-4

3398
7-5

33C4
7-6

33A3
7-7

33A4
7-8

33A5
7-9

33A6
7-10

3399
7-11

339A
7-12

339B
7-13

339C
7-14

339D
7-15
3_/B_
339E
7-16

339F
7-17

33A0
7-18

33A1
7-19

33A2
7-20

33CA
7-21

338D
7-22

338E
7-23

338F
7-24

33CF
7-25

3388
7-26

3389
7-27

33C8
7-28

33A7
7-29

33A8
7-30

33B0
7-31
4_/C_
33B1
7-32

33B2
7-33

33B3
7-34

33B4
7-35

33B5
7-36

33B6
7-37

33B7
7-38

33B8
7-39

33B9
7-40

3380
7-41

3381
7-42

3382
7-43

3383
7-44

3384
7-45

33BA
7-46

33BB
7-47
5_/D_
33BC
7-48

33BD
7-49

33BE
7-50

33BF
7-51

3390
7-52

3391
7-53

3392
7-54

3393
7-55

3394
7-56

2126
7-57

33C0
7-58

33C1
7-59

338A
7-60

338B
7-61

338C
7-62

33D6
7-63
6_/E_
33C5
7-64

33AD
7-65

33AE
7-66

33AF
7-67

33DB
7-68

33A9
7-69

33AA
7-70

33AB
7-71

33AC
7-72

33DD
7-73

33D0
7-74

33D3
7-75

33C3
7-76

33C9
7-77

33DC
7-78

33C6
7-79
7_/F_  
 
7-80
 
 
7-81
 
 
7-82
 
 
7-83
 
 
7-84
 
 
7-85
 
 
7-86
 
 
7-87
 
 
7-88
 
 
7-89
 
 
7-90
 
 
7-91
 
 
7-92
 
 
7-93
 
 
7-94
 
 
 

Character set 0x28 / 0xA8 (row number 8, extended Latin, encircled, fractions)

KS X 1001 (prefixed with 0x28 / 0xA8)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 
Æ
00C6
8-1
Ð
00D0
8-2
ª
00AA
8-3
Ħ
0126
8-4
 
 
8-5
IJ
0132
8-6
 
 
8-7
Ŀ
013F
8-8
Ł
0141
8-9
Ø
00D8
8-10
Œ
0152
8-11
º
00BA
8-12
Þ
00DE
8-13
Ŧ
0166
8-14
Ŋ
014A
8-15
3_/B_  
 
8-16

3260
8-17

3261
8-18

3262
8-19

3263
8-20

3264
8-21

3265
8-22

3266
8-23

3267
8-24

3268
8-25

3269
8-26

326A
8-27

326B
8-28

326C
8-29

326D
8-30

326E
8-31
4_/C_
326F
8-32

3270
8-33

3271
8-34

3272
8-35

3273
8-36

3274
8-37

3275
8-38

3276
8-39

3277
8-40

3278
8-41

3279
8-42

327A
8-43

327B
8-44

24D0
8-45

24D1
8-46

24D2
8-47
5_/D_
24D3
8-48

24D4
8-49

24D5
8-50

24D6
8-51

24D7
8-52

24D8
8-53

24D9
8-54

24DA
8-55

24DB
8-56

24DC
8-57

24DD
8-58

24DE
8-59

24DF
8-60

24E0
8-61

24E1
8-62

24E2
8-63
6_/E_
24E3
8-64

24E4
8-65

24E5
8-66

24E6
8-67

24E7
8-68

24E8
8-69

24E9
8-70

2460
8-71

2461
8-72

2462
8-73

2463
8-74

2464
8-75

2465
8-76

2466
8-77

2467
8-78

2468
8-79
7_/F_
2469
8-80

246A
8-81

246B
8-82

246C
8-83

246D
8-84

246E
8-85
½
00BD
8-86

2153
8-87

2154
8-88
¼
00BC
8-89
¾
00BE
8-90

215B
8-91

215C
8-92

215D
8-93

215E
8-94
 
 
 

Character set 0x29 / 0xA9 (row number 9, extended Latin, encircled, superscript and subscript)

KS X 1001 (prefixed with 0x29 / 0xA9)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 
æ
00E6
9-1
đ
0111
9-2
ð
00F0
9-3
ħ
0127
9-4
ı
0131
9-5
ij
0133
9-6
ĸ
0138
9-7
ŀ
0140
9-8
ł
0142
9-9
ø
00F8
9-10
œ
0153
9-11
ß
00DF
9-12
þ
00FE
9-13
ŧ
0167
9-14
ŋ
014B
9-15
3_/B_ ʼn
0149
9-16

3200
9-17

3201
9-18

3202
9-19

3203
9-20

3204
9-21

3205
9-22

3206
9-23

3207
9-24

3208
9-25

3209
9-26

320A
9-27

320B
9-28

320C
9-29

320D
9-30

320E
9-31
4_/C_
320F
9-32

3210
9-33

3211
9-34

3212
9-35

3213
9-36

3214
9-37

3215
9-38

3216
9-39

3217
9-40

3218
9-41

3219
9-42

321A
9-43

321B
9-44

249C
9-45

249D
9-46

249E
9-47
5_/D_
249F
9-48

24A0
9-49

24A1
9-50

24A2
9-51

24A3
9-52

24A4
9-53

24A5
9-54

24A6
9-55

24A7
9-56

24A8
9-57

24A9
9-58

24AA
9-59

24AB
9-60

24AC
9-61

24AD
9-62

24AE
9-63
6_/E_
24AF
9-64

24B0
9-65

24B1
9-66

24B2
9-67

24B3
9-68

24B4
9-69

24B5
9-70

2474
9-71

2475
9-72

2476
9-73

2477
9-74

2478
9-75

2479
9-76

247A
9-77

247B
9-78

247C
9-79
7_/F_
247D
9-80

247E
9-81

247F
9-82

2480
9-83

2481
9-84

2482
9-85
¹
00B9
9-86
²
00B2
9-87
³
00B3
9-88

2074
9-89

207F
9-90

2081
9-91

2082
9-92

2083
9-93

2084
9-94
 
 
 

Character set 0x2A / 0xAA (row number 10, Hiragana)

This set contains Hiragana for writing the Japanese language.

Compare row 10 of KPS 9566, which uses the same layout. Compare and contrast row 4 of JIS X 0208, which also uses the same layout, but in a different row.

KS X 1001 (prefixed with 0x2A / 0xAA)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

3041
10-1

3042
10-2

3043
10-3

3044
10-4

3045
10-5

3046
10-6

3047
10-7

3048
10-8

3049
10-9

304A
10-10

304B
10-11

304C
10-12

304D
10-13

304E
10-14

304F
10-15
3_/B_
3050
10-16

3051
10-17

3052
10-18

3053
10-19

3054
10-20

3055
10-21

3056
10-22

3057
10-23

3058
10-24

3059
10-25

305A
10-26

305B
10-27

305C
10-28

305D
10-29

305E
10-30

305F
10-31
4_/C_
3060
10-32

3061
10-33

3062
10-34

3063
10-35

3064
10-36

3065
10-37

3066
10-38

3067
10-39

3068
10-40

3069
10-41

306A
10-42

306B
10-43

306C
10-44

306D
10-45

306E
10-46

306F
10-47
5_/D_
3070
10-48

3071
10-49

3072
10-50

3073
10-51

3074
10-52

3075
10-53

3076
10-54

3077
10-55

3078
10-56

3079
10-57

307A
10-58

307B
10-59

307C
10-60

307D
10-61

307E
10-62

307F
10-63
6_/E_
3080
10-64

3081
10-65

3082
10-66

3083
10-67

3084
10-68

3085
10-69

3086
10-70

3087
10-71

3088
10-72

3089
10-73

308A
10-74

308B
10-75

308C
10-76

308D
10-77

308E
10-78

308F
10-79
7_/F_
3090
10-80

3091
10-81

3092
10-82

3093
10-83
 
 
10-84
 
 
10-85
 
 
10-86
 
 
10-87
 
 
10-88
 
 
10-89
 
 
10-90
 
 
10-91
 
 
10-92
 
 
10-93
 
 
10-94
 
 
 

Character set 0x2B / 0xAB (row number 11, Katakana)

This set contains Katakana for writing the Japanese language. However, the Japanese long vowel mark, which is used in katakana text and included in row 1 of JIS X 0208, is not included.[18]

Compare row 11 of KPS 9566, which uses the same layout. Compare and contrast row 5 of JIS X 0208, which also uses the same layout, but in a different row.

KS X 1001 (prefixed with 0x2B / 0xAB)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 

30A1
11-1

30A2
11-2

30A3
11-3

30A4
11-4

30A5
11-5

30A6
11-6

30A7
11-7

30A8
11-8

30A9
11-9

30AA
11-10

30AB
11-11

30AC
11-12

30AD
11-13

30AE
11-14

30AF
11-15
3_/B_
30B0
11-16

30B1
11-17

30B2
11-18

30B3
11-19

30B4
11-20

30B5
11-21

30B6
11-22

30B7
11-23

30B8
11-24

30B9
11-25

30BA
11-26

30BB
11-27

30BC
11-28

30BD
11-29

30BE
11-30

30BF
11-31
4_/C_
30C0
11-32

30C1
11-33

30C2
11-34

30C3
11-35

30C4
11-36

30C5
11-37

30C6
11-38

30C7
11-39

30C8
11-40

30C9
11-41

30CA
11-42

30CB
11-43

30CC
11-44

30CD
11-45

30CE
11-46

30CF
11-47
5_/D_
30D0
11-48

30D1
11-49

30D2
11-50

30D3
11-51

30D4
11-52

30D5
11-53

30D6
11-54

30D7
11-55

30D8
11-56

30D9
11-57

30DA
11-58

30DB
11-59

30DC
11-60

30DD
11-61

30DE
11-62

30DF
11-63
6_/E_
30E0
11-64

30E1
11-65

30E2
11-66

30E3
11-67

30E4
11-68

30E5
11-69

30E6
11-70

30E7
11-71

30E8
11-72

30E9
11-73

30EA
11-74

30EB
11-75

30EC
11-76

30ED
11-77

30EE
11-78

30EF
11-79
7_/F_
30F0
11-80

30F1
11-81

30F2
11-82

30F3
11-83

30F4
11-84

30F5
11-85

30F6
11-86
 
 
11-87
 
 
11-88
 
 
11-89
 
 
11-90
 
 
11-91
 
 
11-92
 
 
11-93
 
 
11-94
 
 
 

Character set 0x2C / 0xAC (row number 12, Cyrillic)

This set contains the modern Russian alphabet, and is not necessarily sufficient to represent other forms of the Cyrillic script.

Compare row 5 of KPS 9566 and row 7 of JIS X 0208, which use the same layout (but in a different row).

KS X 1001 (prefixed with 0x2C / 0xAC)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_/A_  
 
 
А
0410
12-1
Б
0411
12-2
В
0412
12-3
Г
0413
12-4
Д
0414
12-5
Е
0415
12-6
Ё
0401
12-7
Ж
0416
12-8
З
0417
12-9
И
0418
12-10
Й
0419
12-11
К
041A
12-12
Л
041B
12-13
М
041C
12-14
Н
041D
12-15
3_/B_ О
041E
12-16
П
041F
12-17
Р
0420
12-18
С
0421
12-19
Т
0422
12-20
У
0423
12-21
Ф
0424
12-22
Х
0425
12-23
Ц
0426
12-24
Ч
0427
12-25
Ш
0428
12-26
Щ
0429
12-27
Ъ
042A
12-28
Ы
042B
12-29
Ь
042C
12-30
Э
042D
12-31
4_/C_ Ю
042E
12-32
Я
042F
12-33
 
 
12-34
 
 
12-35
 
 
12-36
 
 
12-37
 
 
12-38
 
 
12-39
 
 
12-40
 
 
12-41
 
 
12-42
 
 
12-43
 
 
12-44
 
 
12-45
 
 
12-46
 
 
12-47
5_/D_  
 
12-48
а
0430
12-49
б
0431
12-50
в
0432
12-51
г
0433
12-52
д
0434
12-53
е
0435
12-54
ё
0451
12-55
ж
0436
12-56
з
0437
12-57
и
0438
12-58
й
0439
12-59
к
043A
12-60
л
043B
12-61
м
043C
12-62
н
043D
12-63
6_/E_ о
043E
12-64
п
043F
12-65
р
0440
12-66
с
0441
12-67
т
0442
12-68
у
0443
12-69
ф
0444
12-70
х
0445
12-71
ц
0446
12-72
ч
0447
12-73
ш
0448
12-74
щ
0449
12-75
ъ
044A
12-76
ы
044B
12-77
ь
044C
12-78
э
044D
12-79
7_/F_ ю
044E
12-80
я
044F
12-81
 
 
12-82
 
 
12-83
 
 
12-84
 
 
12-85
 
 
12-86
 
 
12-87
 
 
12-88
 
 
12-89
 
 
12-90
 
 
12-91
 
 
12-92
 
 
12-93
 
 
12-94
 
 
 

Precomposed Hangul sets (rows number 16 through 40)

Code points for precomposed Hangul are included in a continuous sorted block between code points 16-01 and 40-94 inclusive. Not all possible syllable clusters are included in this range. Compare the different ordering and availability in KPS 9566.

Note that initial+vowel+final syllables 뢨, 썅, 쏀, 쓩, and 쭁 are included but their initial+vowel counterparts 뢔, 쌰, 쎼, 쓔, and 쬬 are not. This used to cause problems when inputting, because input methods have to go through an initial+vowel syllable first in order to input an initial+vowel+final syllable (e.g. ㅎ → 하 → 한).

Those which are not listed here may be represented using eight-byte composition sequences. All other modern-jamo clusters are assigned codes elsewhere by UHC. All possible modern-jamo clusters are assigned codes by Johab.

  • Row 16: 가 각 간 갇 갈 갉 갊 감 갑 값 갓 갔 강 갖 갗 같 갚 갛 개 객 갠 갤 갬 갭 갯 갰 갱 갸 갹 갼 걀 걋 걍 걔 걘 걜 거 걱 건 걷 걸 걺 검 겁 것 겄 겅 겆 겉 겊 겋 게 겐 겔 겜 겝 겟 겠 겡 겨 격 겪 견 겯 결 겸 겹 겻 겼 경 곁 계 곈 곌 곕 곗 고 곡 곤 곧 골 곪 곬 곯 곰 곱 곳 공 곶 과 곽 관 괄 괆
  • Row 17: 괌 괍 괏 광 괘 괜 괠 괩 괬 괭 괴 괵 괸 괼 굄 굅 굇 굉 교 굔 굘 굡 굣 구 국 군 굳 굴 굵 굶 굻 굼 굽 굿 궁 궂 궈 궉 권 궐 궜 궝 궤 궷 귀 귁 귄 귈 귐 귑 귓 규 균 귤 그 극 근 귿 글 긁 금 급 긋 긍 긔 기 긱 긴 긷 길 긺 김 깁 깃 깅 깆 깊 까 깍 깎 깐 깔 깖 깜 깝 깟 깠 깡 깥 깨 깩 깬 깰 깸
  • Row 18: 깹 깻 깼 깽 꺄 꺅 꺌 꺼 꺽 꺾 껀 껄 껌 껍 껏 껐 껑 께 껙 껜 껨 껫 껭 껴 껸 껼 꼇 꼈 꼍 꼐 꼬 꼭 꼰 꼲 꼴 꼼 꼽 꼿 꽁 꽂 꽃 꽈 꽉 꽐 꽜 꽝 꽤 꽥 꽹 꾀 꾄 꾈 꾐 꾑 꾕 꾜 꾸 꾹 꾼 꿀 꿇 꿈 꿉 꿋 꿍 꿎 꿔 꿜 꿨 꿩 꿰 꿱 꿴 꿸 뀀 뀁 뀄 뀌 뀐 뀔 뀜 뀝 뀨 끄 끅 끈 끊 끌 끎 끓 끔 끕 끗 끙
  • Row 19: 끝 끼 끽 낀 낄 낌 낍 낏 낑 나 낙 낚 난 낟 날 낡 낢 남 납 낫 났 낭 낮 낯 낱 낳 내 낵 낸 낼 냄 냅 냇 냈 냉 냐 냑 냔 냘 냠 냥 너 넉 넋 넌 널 넒 넓 넘 넙 넛 넜 넝 넣 네 넥 넨 넬 넴 넵 넷 넸 넹 녀 녁 년 녈 념 녑 녔 녕 녘 녜 녠 노 녹 논 놀 놂 놈 놉 놋 농 높 놓 놔 놘 놜 놨 뇌 뇐 뇔 뇜 뇝
  • Row 20: 뇟 뇨 뇩 뇬 뇰 뇹 뇻 뇽 누 눅 눈 눋 눌 눔 눕 눗 눙 눠 눴 눼 뉘 뉜 뉠 뉨 뉩 뉴 뉵 뉼 늄 늅 늉 느 늑 는 늘 늙 늚 늠 늡 늣 능 늦 늪 늬 늰 늴 니 닉 닌 닐 닒 님 닙 닛 닝 닢 다 닥 닦 단 닫 달 닭 닮 닯 닳 담 답 닷 닸 당 닺 닻 닿 대 댁 댄 댈 댐 댑 댓 댔 댕 댜 더 덕 덖 던 덛 덜 덞 덟 덤 덥
  • Row 21: 덧 덩 덫 덮 데 덱 덴 델 뎀 뎁 뎃 뎄 뎅 뎌 뎐 뎔 뎠 뎡 뎨 뎬 도 독 돈 돋 돌 돎 돐 돔 돕 돗 동 돛 돝 돠 돤 돨 돼 됐 되 된 될 됨 됩 됫 됴 두 둑 둔 둘 둠 둡 둣 둥 둬 뒀 뒈 뒝 뒤 뒨 뒬 뒵 뒷 뒹 듀 듄 듈 듐 듕 드 득 든 듣 들 듦 듬 듭 듯 등 듸 디 딕 딘 딛 딜 딤 딥 딧 딨 딩 딪 따 딱 딴 딸
  • Row 22: 땀 땁 땃 땄 땅 땋 때 땍 땐 땔 땜 땝 땟 땠 땡 떠 떡 떤 떨 떪 떫 떰 떱 떳 떴 떵 떻 떼 떽 뗀 뗄 뗌 뗍 뗏 뗐 뗑 뗘 뗬 또 똑 똔 똘 똥 똬 똴 뙈 뙤 뙨 뚜 뚝 뚠 뚤 뚫 뚬 뚱 뛔 뛰 뛴 뛸 뜀 뜁 뜅 뜨 뜩 뜬 뜯 뜰 뜸 뜹 뜻 띄 띈 띌 띔 띕 띠 띤 띨 띰 띱 띳 띵 라 락 란 랄 람 랍 랏 랐 랑 랒 랖 랗
  • Row 23: 래 랙 랜 랠 램 랩 랫 랬 랭 랴 략 랸 럇 량 러 럭 런 럴 럼 럽 럿 렀 렁 렇 레 렉 렌 렐 렘 렙 렛 렝 려 력 련 렬 렴 렵 렷 렸 령 례 롄 롑 롓 로 록 론 롤 롬 롭 롯 롱 롸 롼 뢍 뢨 뢰 뢴 뢸 룀 룁 룃 룅 료 룐 룔 룝 룟 룡 루 룩 룬 룰 룸 룹 룻 룽 뤄 뤘 뤠 뤼 뤽 륀 륄 륌 륏 륑 류 륙 륜 률 륨 륩
  • Row 24: 륫 륭 르 륵 른 를 름 릅 릇 릉 릊 릍 릎 리 릭 린 릴 림 립 릿 링 마 막 만 많 맏 말 맑 맒 맘 맙 맛 망 맞 맡 맣 매 맥 맨 맬 맴 맵 맷 맸 맹 맺 먀 먁 먈 먕 머 먹 먼 멀 멂 멈 멉 멋 멍 멎 멓 메 멕 멘 멜 멤 멥 멧 멨 멩 며 멱 면 멸 몃 몄 명 몇 몌 모 목 몫 몬 몰 몲 몸 몹 못 몽 뫄 뫈 뫘 뫙 뫼
  • Row 25: 묀 묄 묍 묏 묑 묘 묜 묠 묩 묫 무 묵 묶 문 묻 물 묽 묾 뭄 뭅 뭇 뭉 뭍 뭏 뭐 뭔 뭘 뭡 뭣 뭬 뮈 뮌 뮐 뮤 뮨 뮬 뮴 뮷 므 믄 믈 믐 믓 미 믹 민 믿 밀 밂 밈 밉 밋 밌 밍 및 밑 바 박 밖 밗 반 받 발 밝 밞 밟 밤 밥 밧 방 밭 배 백 밴 밸 뱀 뱁 뱃 뱄 뱅 뱉 뱌 뱍 뱐 뱝 버 벅 번 벋 벌 벎 범 법 벗
  • Row 26: 벙 벚 베 벡 벤 벧 벨 벰 벱 벳 벴 벵 벼 벽 변 별 볍 볏 볐 병 볕 볘 볜 보 복 볶 본 볼 봄 봅 봇 봉 봐 봔 봤 봬 뵀 뵈 뵉 뵌 뵐 뵘 뵙 뵤 뵨 부 북 분 붇 불 붉 붊 붐 붑 붓 붕 붙 붚 붜 붤 붰 붸 뷔 뷕 뷘 뷜 뷩 뷰 뷴 뷸 븀 븃 븅 브 븍 븐 블 븜 븝 븟 비 빅 빈 빌 빎 빔 빕 빗 빙 빚 빛 빠 빡 빤
  • Row 27: 빨 빪 빰 빱 빳 빴 빵 빻 빼 빽 뺀 뺄 뺌 뺍 뺏 뺐 뺑 뺘 뺙 뺨 뻐 뻑 뻔 뻗 뻘 뻠 뻣 뻤 뻥 뻬 뼁 뼈 뼉 뼘 뼙 뼛 뼜 뼝 뽀 뽁 뽄 뽈 뽐 뽑 뽕 뾔 뾰 뿅 뿌 뿍 뿐 뿔 뿜 뿟 뿡 쀼 쁑 쁘 쁜 쁠 쁨 쁩 삐 삑 삔 삘 삠 삡 삣 삥 사 삭 삯 산 삳 살 삵 삶 삼 삽 삿 샀 상 샅 새 색 샌 샐 샘 샙 샛 샜 생 샤
  • Row 28: 샥 샨 샬 샴 샵 샷 샹 섀 섄 섈 섐 섕 서 석 섞 섟 선 섣 설 섦 섧 섬 섭 섯 섰 성 섶 세 섹 센 셀 셈 셉 셋 셌 셍 셔 셕 션 셜 셤 셥 셧 셨 셩 셰 셴 셸 솅 소 속 솎 손 솔 솖 솜 솝 솟 송 솥 솨 솩 솬 솰 솽 쇄 쇈 쇌 쇔 쇗 쇘 쇠 쇤 쇨 쇰 쇱 쇳 쇼 쇽 숀 숄 숌 숍 숏 숑 수 숙 순 숟 술 숨 숩 숫 숭
  • Row 29: 숯 숱 숲 숴 쉈 쉐 쉑 쉔 쉘 쉠 쉥 쉬 쉭 쉰 쉴 쉼 쉽 쉿 슁 슈 슉 슐 슘 슛 슝 스 슥 슨 슬 슭 슴 습 슷 승 시 식 신 싣 실 싫 심 십 싯 싱 싶 싸 싹 싻 싼 쌀 쌈 쌉 쌌 쌍 쌓 쌔 쌕 쌘 쌜 쌤 쌥 쌨 쌩 썅 써 썩 썬 썰 썲 썸 썹 썼 썽 쎄 쎈 쎌 쏀 쏘 쏙 쏜 쏟 쏠 쏢 쏨 쏩 쏭 쏴 쏵 쏸 쐈 쐐 쐤 쐬 쐰
  • Row 30: 쐴 쐼 쐽 쑈 쑤 쑥 쑨 쑬 쑴 쑵 쑹 쒀 쒔 쒜 쒸 쒼 쓩 쓰 쓱 쓴 쓸 쓺 쓿 씀 씁 씌 씐 씔 씜 씨 씩 씬 씰 씸 씹 씻 씽 아 악 안 앉 않 알 앍 앎 앓 암 압 앗 았 앙 앝 앞 애 액 앤 앨 앰 앱 앳 앴 앵 야 약 얀 얄 얇 얌 얍 얏 양 얕 얗 얘 얜 얠 얩 어 억 언 얹 얻 얼 얽 얾 엄 업 없 엇 었 엉 엊 엌 엎
  • Row 31: 에 엑 엔 엘 엠 엡 엣 엥 여 역 엮 연 열 엶 엷 염 엽 엾 엿 였 영 옅 옆 옇 예 옌 옐 옘 옙 옛 옜 오 옥 온 올 옭 옮 옰 옳 옴 옵 옷 옹 옻 와 왁 완 왈 왐 왑 왓 왔 왕 왜 왝 왠 왬 왯 왱 외 왹 왼 욀 욈 욉 욋 욍 요 욕 욘 욜 욤 욥 욧 용 우 욱 운 울 욹 욺 움 웁 웃 웅 워 웍 원 월 웜 웝 웠 웡 웨
  • Row 32: 웩 웬 웰 웸 웹 웽 위 윅 윈 윌 윔 윕 윗 윙 유 육 윤 율 윰 윱 윳 융 윷 으 윽 은 을 읊 음 읍 읏 응 읒 읓 읔 읕 읖 읗 의 읜 읠 읨 읫 이 익 인 일 읽 읾 잃 임 입 잇 있 잉 잊 잎 자 작 잔 잖 잗 잘 잚 잠 잡 잣 잤 장 잦 재 잭 잰 잴 잼 잽 잿 쟀 쟁 쟈 쟉 쟌 쟎 쟐 쟘 쟝 쟤 쟨 쟬 저 적 전 절 젊
  • Row 33: 점 접 젓 정 젖 제 젝 젠 젤 젬 젭 젯 젱 져 젼 졀 졈 졉 졌 졍 졔 조 족 존 졸 졺 좀 좁 좃 종 좆 좇 좋 좌 좍 좔 좝 좟 좡 좨 좼 좽 죄 죈 죌 죔 죕 죗 죙 죠 죡 죤 죵 주 죽 준 줄 줅 줆 줌 줍 줏 중 줘 줬 줴 쥐 쥑 쥔 쥘 쥠 쥡 쥣 쥬 쥰 쥴 쥼 즈 즉 즌 즐 즘 즙 즛 증 지 직 진 짇 질 짊 짐 집 짓
  • Row 34: 징 짖 짙 짚 짜 짝 짠 짢 짤 짧 짬 짭 짯 짰 짱 째 짹 짼 쨀 쨈 쨉 쨋 쨌 쨍 쨔 쨘 쨩 쩌 쩍 쩐 쩔 쩜 쩝 쩟 쩠 쩡 쩨 쩽 쪄 쪘 쪼 쪽 쫀 쫄 쫌 쫍 쫏 쫑 쫓 쫘 쫙 쫠 쫬 쫴 쬈 쬐 쬔 쬘 쬠 쬡 쭁 쭈 쭉 쭌 쭐 쭘 쭙 쭝 쭤 쭸 쭹 쮜 쮸 쯔 쯤 쯧 쯩 찌 찍 찐 찔 찜 찝 찡 찢 찧 차 착 찬 찮 찰 참 찹 찻
  • Row 35: 찼 창 찾 채 책 챈 챌 챔 챕 챗 챘 챙 챠 챤 챦 챨 챰 챵 처 척 천 철 첨 첩 첫 첬 청 체 첵 첸 첼 쳄 쳅 쳇 쳉 쳐 쳔 쳤 쳬 쳰 촁 초 촉 촌 촐 촘 촙 촛 총 촤 촨 촬 촹 최 쵠 쵤 쵬 쵭 쵯 쵱 쵸 춈 추 축 춘 출 춤 춥 춧 충 춰 췄 췌 췐 취 췬 췰 췸 췹 췻 췽 츄 츈 츌 츔 츙 츠 측 츤 츨 츰 츱 츳 층
  • Row 36: 치 칙 친 칟 칠 칡 침 칩 칫 칭 카 칵 칸 칼 캄 캅 캇 캉 캐 캑 캔 캘 캠 캡 캣 캤 캥 캬 캭 컁 커 컥 컨 컫 컬 컴 컵 컷 컸 컹 케 켁 켄 켈 켐 켑 켓 켕 켜 켠 켤 켬 켭 켯 켰 켱 켸 코 콕 콘 콜 콤 콥 콧 콩 콰 콱 콴 콸 쾀 쾅 쾌 쾡 쾨 쾰 쿄 쿠 쿡 쿤 쿨 쿰 쿱 쿳 쿵 쿼 퀀 퀄 퀑 퀘 퀭 퀴 퀵 퀸 퀼
  • Row 37: 큄 큅 큇 큉 큐 큔 큘 큠 크 큭 큰 클 큼 큽 킁 키 킥 킨 킬 킴 킵 킷 킹 타 탁 탄 탈 탉 탐 탑 탓 탔 탕 태 택 탠 탤 탬 탭 탯 탰 탱 탸 턍 터 턱 턴 털 턺 텀 텁 텃 텄 텅 테 텍 텐 텔 템 텝 텟 텡 텨 텬 텼 톄 톈 토 톡 톤 톨 톰 톱 톳 통 톺 톼 퇀 퇘 퇴 퇸 툇 툉 툐 투 툭 툰 툴 툼 툽 툿 퉁 퉈 퉜
  • Row 38: 퉤 튀 튁 튄 튈 튐 튑 튕 튜 튠 튤 튬 튱 트 특 튼 튿 틀 틂 틈 틉 틋 틔 틘 틜 틤 틥 티 틱 틴 틸 팀 팁 팃 팅 파 팍 팎 판 팔 팖 팜 팝 팟 팠 팡 팥 패 팩 팬 팰 팸 팹 팻 팼 팽 퍄 퍅 퍼 퍽 펀 펄 펌 펍 펏 펐 펑 페 펙 펜 펠 펨 펩 펫 펭 펴 편 펼 폄 폅 폈 평 폐 폘 폡 폣 포 폭 폰 폴 폼 폽 폿 퐁
  • Row 39: 퐈 퐝 푀 푄 표 푠 푤 푭 푯 푸 푹 푼 푿 풀 풂 품 풉 풋 풍 풔 풩 퓌 퓐 퓔 퓜 퓟 퓨 퓬 퓰 퓸 퓻 퓽 프 픈 플 픔 픕 픗 피 픽 핀 필 핌 핍 핏 핑 하 학 한 할 핥 함 합 핫 항 해 핵 핸 핼 햄 햅 햇 했 행 햐 향 허 헉 헌 헐 헒 험 헙 헛 헝 헤 헥 헨 헬 헴 헵 헷 헹 혀 혁 현 혈 혐 협 혓 혔 형 혜 혠
  • Row 40: 혤 혭 호 혹 혼 홀 홅 홈 홉 홋 홍 홑 화 확 환 활 홧 황 홰 홱 홴 횃 횅 회 획 횐 횔 횝 횟 횡 효 횬 횰 횹 횻 후 훅 훈 훌 훑 훔 훗 훙 훠 훤 훨 훰 훵 훼 훽 휀 휄 휑 휘 휙 휜 휠 휨 휩 휫 휭 휴 휵 휸 휼 흄 흇 흉 흐 흑 흔 흖 흗 흘 흙 흠 흡 흣 흥 흩 희 흰 흴 흼 흽 힁 히 힉 힌 힐 힘 힙 힛 힝

Hanja sets

Johab encoding

Diagram of Johab encoding layout

KS X 1001, since 1992, also defines an alternative encoding known as Johab. This represents a hangul syllable as the sequence of three five-bit values, split across two 8-bit bytes, most significant bit first. The most significant bit of the lead byte is always set (allowing combination with single-byte ASCII or KS X 1003). This encoding is also used for the modern jamo from row 4 of KS X 1001, by using the filler values for the other components. The Johab encoding for hangul is shown in the table below.[19]

Johab encodes the remainder of KS X 1001 using lead bytes which do not correspond to an initial jamo (0xE0–0xF9 for hanja and 0xD9–0xDE[20] for non-hanja, excluding hangul syllables and modern jamo), with trail bytes in the ranges 0x31–0x7E and 0x91–0xFE.[19] These codes are algorithmically mapped from the characters' KS X 1001 code points,[20] with two KS X 1001 rows per lead byte (compare and contrast Shift JIS).

Five-bit sequenceAs initialAs vowelAs final
00000Not usedNot used[lower-alpha 2]Not used
00001FillerNot used[lower-alpha 3]Filler (empty final)
00010Filler
00011
00100
00101
00110
00111
01000Not used[lower-alpha 2]
01001Not used[lower-alpha 3]
01010
01011
01100
01101
01110
01111
10000Not used[lower-alpha 2]
10001Not used[lower-alpha 3]
10010Not used
10011
10100
10101Not used
10110Non-Hangul lead bytes
10111Non-Hangul lead bytes
11000Non-Hangul lead bytesNot used[lower-alpha 2]
11001Non-Hangul lead bytesNot used[lower-alpha 3]
11010Non-Hangul lead bytes
11011Non-Hangul lead bytes
11100Non-Hangul lead bytes
11101Non-Hangul lead bytes
11110Non-Hangul lead bytesNot usedNot used
11111Not usedNot usedNot used

Footnotes

  1. Korean: 정보 교환용 부호계 (한글 및 한자), romanized: Jeongbo Gyohwan'yong Buhogye (Hangeul mich Hanja)
  2. Were this one used, it would result in a trail byte in the C0 control codes range.
  3. Were this one used, it would result in trail bytes in the 0x2_ and 0x3_ rows of ASCII. Johab does not use the 0x2_ row for trail bytes, similarly to most common legacy CJK encodings (compare Shift JIS, GBK, Big5).

References

  1. Lunde, Ken (2009). "Chapter 3: Character Set Standards". CJKV Information Processing. p. 143-148. ISBN 978-0596514471.
  2. Hwang, Jinsang (2005). The Social Shaping of ICTs Standards: A Case of National Coded Character Set Standards Controversy in Korea (PDF). University of Edinburgh.
  3. Lunde, Ken (1995-12-18). "2.4.6: Obsolete Standards". CJK.INF Version 1.9.
  4. Shin, Jungshik. "What are KS X 1001(KS C 5601) and other Hangul codes?". Hangul & Internet in Korea FAQ.
  5. Lunde, Ken (1995-12-18). "3.3.6: N-byte Hangul". CJK.INF Version 1.9.
  6. "INFO: Hangul (Korean) Character Sets", Microsoft Support, Microsoft
  7. Zsigri, Gyula (2002-06-18). "KSC and UHC".
  8. Chang, Hye-Shik. "cpython/Modules/cjkcodecs/_codecs_kr.c (revision d3faf43)". cPython source tree. Python Software Foundation.
  9. Chung, Jaemin (2017-03-30). Proposal to add an informative note to U+3164 HANGUL FILLER (PDF). Unicode Consortium. UTC L2/17-081.
  10. "Code Page 01040" (PDF). IBM. Archived from the original (PDF) on 2015-07-08.
  11. "KSRI-87-37-IR: 항을・한자 코드 표준화에 관한 예연구: A Study on Standardization of Hangul and Hanja Codes" (PDF) (in Korean). Ministry of Science and Technology. 1987. p. 68. Archived from the original (PDF) on 2019-03-01.
  12. "ibm-1363_P110-1997 (lead byte A1)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  13. "euc-kr (lead byte A1)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  14. "Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later". Apple.
  15. "windows-949-2000 (lead byte A1)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  16. "Lead Byte A1-A2 (Code page 949)". MSDN. Microsoft.
  17. Korea Bureau of Standards (1988-10-01). Korean Graphic Character Set for Information Interchange (PDF). ITSCJ/IPSJ. ISO-IR-149.
  18. Lunde, Ken (2009). "Seemingly Missing Characters". CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. p. 180. ISBN 978-0-596-51447-1.
  19. Lunde, Ken (2008). "Chapter 4: Encoding Methods (§ Johab Encoding—KS X 1001:2004)". CJKV Information Processing (2nd ed.). Sebastopol, California: O'Reilly Media. pp. 268–273. ISBN 978-0-596-51447-1.
  20. Shin, Jungshik (2011-10-14) [1999-08-16]. Johab to Unicode table. Unicode Consortium.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.