Kaixo!
On Mon, Oct 08, 2001 at 03:07:25PM -0000, stephen.holmes@eircom.net wrote:
Hi again,
I have a bunch of Big5 PO files. I want to compile them to MO but am getting the error as msgfmt attempts to compile some of the strings. Converting to EUC-TW etc, works just fine. This only occurs in Big5. Now, these files do compile on Solaris 8 as Big 5, so I suspect that it's related to the \ that is part of some of the double- byte trail-byte sequence in some of the GNU packages.
Yes, the problem is the "" used by big5 (big5 encoding is a very bad encoding for programmers). Previous versions of GNU gettext didn't have any notion of charset; they handled only sequences of bytes; and "" had a special meaning, so it had to be escaped (wirtten as "\") when part of a mutlibyte big5 char.
The result was that po files were not in big5, but in a bizarre encoding similar to big5 but incompatible with it...
Now, newer versions of gettext know about charsets, and they can see *chars* instead of bytes; and they can detect invalid byte sequences for a given charset. Those new versions require that real big5 be used.
What is the recommended version?
The new versions of GNU gettext are better, as the allow using real big5 encoding, that is, the po files can be read and edit with any text editor; which is a big plus.
PS: note that the produced *.mo file are the same in both cases; so you can keep a copy of old msgfmt, and convert the files with:
msgfmt.old -o tmpfile foo.po msgunfmt -o foo.po tmpfile
Note also that the encoding used for the po files is independent of the encoding used to display the text to the user (gettext does the conversion if needed); so you can use utf-8 for the po files if you want. It is also adviced to not put the encoding name in the file name; that is, the po files should be named zh_TW.po and zh_CN.po and not zh_TW.Big5.po or zh_CN.GB2312.po. The use of "zh" alone should be avoided too, as it is completly ambiguous, it doesn't allow to know if it is traditionnal or simplified Chinese.