About Language.

Ag. D. Hatzimanikas a.hatzim at gmail.com
Thu May 31 23:39:41 PDT 2007


On Thu, May 31, at 01:25 Matthias Feichtinger wrote:
> 
> Thinking about UTF-16 could be fine too, couldn't it?

You can do it even today if you like it, glibc supports it with
converted functions, although Ulrich Drepper stated that native support 
in glibc ( and for pango/gtk+ application(s) I would add) will never 
happen for UTF-16.

But, is there any reason why this switch will ever happen? because I
am under the impression that UTF-16 carries many of the drawbacks of UTF-8
and UTF-32 and few of their advantages.

But also is true that for text intensive applications like database
software with high memory load and single code unit access to characters [1],
UTF-16 is much more sufficient than UTF-8 and still uses less than 50% of 
space of UTF-32.

Right now windows uses it (since NT? if am not mistaken) and Java uses it
and some like Python and Mozilla ECMASscript uses it internally, although
Python can be compiled also for UTF-32* (there is a switch while you are
configuring it).

* UTF-32 seems like a natural choice for the future and eventually will
  become the default mode (compatibility is no issue in that case),
  but today it's seems just wasteful.

For the moment and for quite sometime UTF-8 will be the standard in
Linux, and UTF-16 will stay only for compatibility with those which already use
it.

For a lot more detailed and more accurate information than my amateur approach,
there are archives also available for download in:
http://mail.nl.linux.org/linux-utf8/

where it's quite fascinated to watch all the evolution to UTF-8 (how various
'well known' application get ported to UTF-8 from plain ascii); definitely 
worths a reading, but you have to clean the archive from the spam. 
I had to filter through formail to catch most of it (about 500 spam
messages in a total of 7200 emails).

1. http://www.unicode.org/unicode/reports/tr17/



More information about the alfs-discuss mailing list