[elinks-users] unicode conversion
Rick R
rick.richardson at gmail.com
Mon Jan 28 14:28:44 MST 2008
Hrrm.. Now that I look more closely at the results.. for either the
http-equiv tag, or using your command line fix, the result is that
it's placing the unicode replacement char (EFBFBD) where there is the
E0-E9 chars.
But é (E9) is a valid unicode char, the input is unicode and the
output is unicode. Does it need any additional information to decide
to (not) translate that character?
The two method's I've tested (with a patched v0.12) are:
(no elinks.conf for either method)
./elinks -no-home -eval 'set document.codepage.assume = "utf8"' \
-eval 'set document.codepage.force_assumed = 1' \
-dump -dump-charset utf8 -dump-color-mode 0 \
./$1 > out/$2
OR
./elinks -dump -dump-charset utf8 -dump-color-mode 0 \
./$1 > out/$2
for this I added: <meta http-equiv="content-type" content="text/html;
charset=UTF-8" />
to the <head>
In both cases, the results: (copied from less)
Oberthur a publi<EF><BF><BD> ce mercredi, au titre de son
3<EF><BF><BD>me trimestre 2007, un CA
en croissance de 19,2% <EF><BF><BD> 148,2 ME. Au cours de la
p<EF><BF><BD>riode, Oberthur Card
Systems a livr<EF><BF><BD> 97,7 millions de cartes <EF><BF><BD>
microprocesseur, soit une
augmentation de 47,3% par rapport au T3 2006.
Dans les communications mobiles, "la demande sur le
march<EF><BF><BD> de la carte SIM
est toujours soutenue", avec plus de 64 millions de cartes SIM vendues au
cours du trimestre. Ceci repr<EF><BF><BD>sente une croissance de
49% par rapport <EF><BF><BD> la
m<EF><BF><BD>me p<EF><BF><BD>riode de 2006. Les ventes sur ce
segment ont atteint 57,6 ME
(+26,6%). La Soci<EF><BF><BD>t<EF><BF><BD> a enregistr<EF><BF><BD>
la meilleure performance de son
histoire, en d<EF><BF><BD>passant ainsi les records du T4 2006 et T2 2007.
...
I have attached the test_fr.html file.
On Jan 28, 2008 3:31 PM, Kalle Olavi Niemitalo <kon at iki.fi> wrote:
> "Rick R" <rick.richardson at gmail.com> writes:
>
> > I invoke this as such:
> > ./elinks -config-file ./elinks.conf -dump -dump-charset utf8
> > -dump-color-mode 0 ./test_fr.html > out/test_fr.txt
>
> -config-file must be relative to -config-dir.
> So this normally reads ~/.elinks/./elinks.conf, not $PWD/elinks.conf.
> I don't know whether that is a bug or a feature.
> Anyway, you could try this instead:
>
> ./elinks -no-home -eval 'set document.codepage.assume = "utf8"' \
> -eval 'set document.codepage.force_assumed = 1' \
> -dump -dump-charset utf8 -dump-color-mode 0 \
> ./test_fr.html > out/test_fr.txt
>
> _______________________________________________
> elinks-users mailing list
> elinks-users at linuxfromscratch.org
> http://linuxfromscratch.org/mailman/listinfo/elinks-users
>
--
"Myths and legends die hard in America. We love them for the extra
dimension they provide, the illusion of near-infinite possibility to
erase the narrow confines of most men's reality. Weird heroes and
mould-breaking champions exist as living proof to those who need it
that the tyranny of 'the rat race' is not yet final." -- Hunter S.
Thompson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linuxfromscratch.org/pipermail/elinks-users/attachments/20080128/0026119b/attachment.html
More information about the elinks-users
mailing list