Hi,
I had a strange character encoding effect when using maypole. As it
took me quite a while to figure out what's causing the problem and how
to get rid of it, I'll post it here. Maybe it will save someone else's
time. I'm also curious to find out one last detail about the problem.
My templates are encoded in iso-8859-1 and so are the strings in the
database, but somehow everything was magically converted to utf-8
before beeing set to the webbrowser. As maypole has a default of
sending a utf-8-header everything worked find. I veryfied that a utf-8
header was sent and the files I received really where utf-8.
There is a problem with utf-8 and Class::DBI::AsForm (see my E-Mail:
<86b14df50412141456417988e1 at mail.gmail.com> ) and the database
needs to be in iso-8859-1 (because other applications I don't control also
use this database). I also didn't want to decode and encode all the strings
coming from and going to the database. So I thought it would easier making
the application work with iso-8859-1 encoding.
I know that Maypole/View/Base.pm has
$r->{document_encoding} ||= "utf-8";
but this only affects the declaration that is sent to the browser.
It will be used for http-header:
Content-Type: text/html; charset=utf-8
and html-header:
<DEFANGED_meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
The character encoding that is set here must be the same one that was
using for writing the templates and the same as the encoding of the
strings retrieved from the database (unless I explicitly do a encoding
myself in the application).
So if my templates are iso-8859-15 and I output a header with utf-8
the browser will not display the no ASCII characters correctly. On the
other hand if the templates are utf-8 and I output iso-8859-15 we have
the same problem.
This is the exact behavior I get with the beerdb example. My
application with a few 1000 lines behaves differently as described
above. Every is converted to utf-8 no matter which encoding is set in
maypole.
After many hours I managed to track down the cause for this automatic
conversion from iso-8859-15 to utf-8. I store the uri base (and other
stuff) in a XML-file and retrieve and set it like this:
$config = XMLin("conf.xml");
__PACKAGE__->config->uri_base($config->{uri_base});
XMLin sets the utf-8-flag on all strings, even though the file
contains only ascii and the ecoding of the file is set to iso-8859-15
(this behavior is documented for XML::Simple).
My hack to solve this problem was to clear the utf-8 flag on the
strings I got from XMLin using Encode::_utf8_off().
At the moment I have no idea where this magical conversion takes
place. I know that concatenating a string with cleared utf-8 flag and
a string with set utf-8-flag will result in a string with utf-8-flag
set, but I can't see where the actual conversion takes place.
If you want to try this, take the beerdb example and add
use Encode;
and change the uri-base to
my $uri_base = "http://localhost/beerdb/";
Encode::_utf8_on($uri_base);
BeerDB->config->uri_base($uri_base);
Then add some non ASCII characters to the frontpage template. I added
iso-8859-15 "ä" (0xE4 LATIN SMALL LETTER A WITH DIAERESIS)
Leaving the maypole default set to utf-8 will give a perfectly valid
utf-8-page where my umlaut has been converted to the utf-8 two-byte
coding À and will show correctly in the browser.
Does anyone have an idea where this conversion takes place? Maybe in
mod_perl, perl-file-io, apache, template toolkit or maypole?
Btw. I you additinal_data to set the document encoding back to the old
default:
sub additional_data {
my $r = shift;
$r->{document_encoding} = "iso-8859-15";
}
Is that the way it is supposed to be done or is there a better way?
Regards, Kester.
_______________________________________________
maypole-dev mailing list
maypole-dev at lists.netthink.co.uk
http://lists.netthink.co.uk/listinfo/maypole-dev
This archive was generated by hypermail 2.1.3 : Thu Feb 24 2005 - 22:25:57 GMT