I am interested to know whether the following characters which are part of ETen and the Microsoft Big5 implementation are so commonplace nowadays in Traditional Chinese "Big5" locales that they might be considered to be 'de-facto' part of the common Big5 standards.
0xF9D6, 0xF9D7, 0xF9D8, 0xF9DA, 0xF9DB, 0xF9DC
The Big5 mapping on unicode.org site and the list of source characters from the original (1984?) Big5 de-facto standard mapping table (in Unihan.txt for Unicode 3.1) doesn't include these characters.
In reality are these characters now so widely implemented in Tradional Chinese/Taiwanese Big5 locales on all major information processing platforms (eg windows/unix/mainframe) that it makes sense to fold them back into the common Big5 character set and to view them as de-facto rather than characters which are particular to one or more of the Big5 extensions?
--Ian.
------------------------ Yahoo! Groups Sponsor ---------------------~--> Secure all your Web servers now: Get your FREE Guide and learn to: DEPLOY THE LATEST ENCRYPTION, DELIVER TRANSPARENT PROTECTION, and More! http://us.click.yahoo.com/VihfLB/nT7CAA/yigFAA/23wwlB/TM ---------------------------------------------------------------------~->
To unsubscribe from this group, send an email to: i18n-chinese-unsubscribe@egroups.com
URL to this group: http://www.egroups.com/group/i18n-chinese
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Kaixo!
On Wed, Aug 22, 2001 at 11:38:12AM -0000, ianlittle@yahoo.com wrote:
I am interested to know whether the following characters which are part of ETen and the Microsoft Big5 implementation are so commonplace nowadays in Traditional Chinese "Big5" locales that they might be considered to be 'de-facto' part of the common Big5 standards.
0xF9D6, 0xF9D7, 0xF9D8, 0xF9DA, 0xF9DB, 0xF9DC
That is the case on XFree86, *-big5-0 and *-big5.eten-0 are aliases (now, some fonts may not have them, when using TTF ones there shouldn't be that problem; and if a bitmap font is missing them, it should be reported as a bug imho)
In reality are these characters now so widely implemented in Tradional Chinese/Taiwanese Big5 locales on all major information processing platforms (eg windows/unix/mainframe) that it makes sense to fold them back into the common Big5 character set and to view them as de-facto rather than characters which are particular to one or more of the Big5 extensions?
As Big5 is in fact a Windows encoding (cp950) I think it is wise to follow whatever is the current state in Windows platform, same as for iso standards the authoritative source is the ISO, for Microsoft charsets the authoritative source should be Microsoft.
--- In i18n-chinese@y..., Pablo Saratxaga <pablo@m...> wrote:
Kaixo!
On Wed, Aug 22, 2001 at 11:38:12AM -0000, ianlittle@y... wrote:
I am interested to know whether the following characters which are part of ETen and the Microsoft Big5 implementation are so commonplace nowadays in Traditional Chinese "Big5" locales that they might be considered to be 'de-facto' part of the common Big5 standards.
0xF9D6, 0xF9D7, 0xF9D8, 0xF9DA, 0xF9DB, 0xF9DC
That is the case on XFree86, *-big5-0 and *-big5.eten-0 are aliases (now, some fonts may not have them, when using TTF ones there
shouldn't be
that problem; and if a bitmap font is missing them, it should be
reported
as a bug imho)
In reality are these characters now so widely implemented in Tradional Chinese/Taiwanese Big5 locales on all major information processing platforms (eg windows/unix/mainframe) that it makes sense to fold them back into the common Big5 character set and to view them as de-facto rather than characters which are particular to one or more of the Big5 extensions?
As Big5 is in fact a Windows encoding (cp950) I think it is wise to
follow
whatever is the current state in Windows platform, same as for iso
standards
the authoritative source is the ISO, for Microsoft charsets the
authoritative
source should be Microsoft.
Many thanks for the quick response!
Just to clarify - by Big5 here I mean to refer to the de-facto standard for which I guess there is no authorative source other than some kind of periodic re-appraisal of which sets of characters minimally are expected to be part of any Big5 implementation. It seems that originally these 7 characters weren't listed as part of the Big5 repertoire and that over time as vendors and extensions implementations became available these characters became widespread enough that it might be reasonable to assume that these have become de-facto absorbed into the common Big5 repertoire. Also, do you consider the graphical (box/line) characters in the range 0xF9DD-0xF9DE inclusive also to be part of the common Big5 repertoire. Just to underscore - by common here I mean to suggest common across all major platforms as opposed to common within the most prevalent client platform (ie. win32)
--Ian.
------------------------ Yahoo! Groups Sponsor ---------------------~--> Get VeriSign's FREE GUIDE: "Securing Your Web Site for Business." Learn about using SSL for serious online security. Click Here! http://us.click.yahoo.com/KYe3qC/I56CAA/yigFAA/23wwlB/TM ---------------------------------------------------------------------~->
To unsubscribe from this group, send an email to: i18n-chinese-unsubscribe@egroups.com
URL to this group: http://www.egroups.com/group/i18n-chinese
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Kaixo!
Disclaimer: I don't speak chinese nor I live in a country using it; so my comments should be taken with care.
As Big5 is in fact a Windows encoding (cp950) I think it is wise to follow whatever is the current state in Windows platform, same as for iso standards the authoritative source is the ISO, for Microsoft charsets the authoritative source should be Microsoft.
Many thanks for the quick response!
Just to clarify - by Big5 here I mean to refer to the de-facto standard for which I guess there is no authorative source other than some kind of periodic re-appraisal of which sets of characters minimally are expected to be part of any Big5
But am right in believing that it is Microsoft the driving force behind the making of de-facto standard for any new character included in Big5 ?
Also, do you consider the graphical (box/line) characters in the range 0xF9DD-0xF9DE inclusive also to be part of the common Big5 repertoire.
This is a different thing. Those are only useful in console/xterm, and even if missing they don't harm reading as they are non letters, and aren't any punctuation amrks nor nothing like that either.
The same holds for koi8-r and koi8-u cyrillic encodings; they complete character repertoire includes those drawing chars; however a lot of koi8-{r,u} fonts or support of koi8 encodings just ignore them. It is just proof that those encodings (as well as big5) originated in DOS world. Anyway, in POSIX world hardcoding drawing chars in a given character set is not portable, it should be done trough upper layer text toolkits like ncurses, newt, etc. or at worst trough vt100 sequences or things like that.
Just to underscore - by common here I mean to suggest common across all major platforms as opposed to common within the most prevalent client platform (ie. win32)
I think it isn't very important for those drawing chars, as they have no meaning, even if you display a text without them it won't change the text meaning.
That being said, they are listed in the file /usr/X11R6/lib/X11/fonts/encodings/large/big5.eten-0.enc.gz (which gives the encoding for *-big5.eten-0 and *-big5-0 fonts built on the fly out of unicode encoded TTF fonts). So, yes, they should be common, but some fonts may lack them if the one that draw the font felt it wasn't worth the time to draw those drawing chars (well, for a font with several thousands glyphs it is a rather strange thing to say; but for the koi8 ones for example it is quite common the dawing chars have not been done).
In short: I think yes, they should be included in the big majority of 'big5' fonts out there; but even if not, it is much less a problem than the lack of a hanzi.