Fold All / Expand All

2008年5月30日 星期五

Unicode problem of wave dash and fullwidth tilde

Test platform: Windows XP x64 ver.
File Generation: using Python 2.5
out = open(output, 'wb')

string = u'w%sf%s' % (wave_dash, fullwidth_tilde)

out.write(string.encode(codec))

Hex code of the file in UTF-16 encoding (including BOM)
FF FE 77 00 1C 30 66 00 5E FF

Test Software
1. EmEditor Professional x64 Edition 6.00.4
Save as other encoding.
Shift_JIS: warning message occurred, 1C 30 cannot be correctly converted. 5E FF is converted to 81 60 (wave dash code in sjis).
EUC-JP: warning message occurred, 1C 30 cannot be correctly converted. 5E FF is converted to A1 C1 (wave dash code in euc-jp).

2. MadEdit v0.2.8 Beta
Tools -> Convert File Encoding
from the UTF16 file
convert to SHIFT-JIS: 5E FF can be converted to 81 60. 1C 30 cannot be converted, and in MadEdit it shows U+301C.

3. gVim 7.1 (on Windows)
open UTF8 encoding file
:set fenc=sjis
:w
warning occurred. the problem comes from 1C 30 cannot be converted.

4. VIM 7.1.39 on FreeBSD 6.2
open UTF8 encoding file
:set fenc=sjis
:w
warning occurred. the problem comes from FF 5E cannot be converted.


5. Python 2.5
fullwidth_tilde = unichr(0xFF5E)
wave_dash = unichr(0x301C)

wave_dash.encode('sjis') # OK
fullwidth_tilde.encode('sjis') # OK
wave_dash.encode('euc-jp') # OK
fullwidth_tilde.encode('euc-jp') # error
wave_dash.encode('big5') # error
fullwidth_tilde.encode('big5') # error
wave_dash.encode('gbk') # error
fullwidth_tilde.encode('gbk') # OK

結論:在Windows上,利用系統查碼會是U+FF5E和sjis的wave dash對應。內建對應的,例如Python就會是U+301C和wave dash對應,而在Linux, FreeBSD(目前測試過的Vim的平台)也是U+301C和wave dash對應。值得注意的是big5在非Windows系上,U+301C和U+FF5E都沒辦法轉到big5碼,Windows上當然是對到U+FF5E上。

沒有留言: