You are not logged in.
Pages: 1
hello,
did anyone use Xinha or Htmlarea with a different charsets?
has anybody expirience with utf8?
currently in the iframe is no charset defined, but it could be done using a new config-variable, here a small patch:
--- htmlarea.js (Revision 23)
+++ htmlarea.js (Arbeitskopie)
@@ -241,6 +241,9 @@
if (this.baseURL && this.baseURL.match(/(.*)\/([^\/]+)/))
this.baseURL = RegExp.$1 + "/";
+ // custom Charset for the iframe document
+ this.charSet = "";
+
// URL-s
this.imgURL = "images/";
this.popupURL = "popups/";
@@ -1387,6 +1390,10 @@
doc.open();
var html = "<html>\n";
html += "<head>\n";
+ if(editor.config.charSet != '')
+ {
+ html += "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=" + editor.config.charSet + "\">";
+ }
if(typeof editor.config.baseHref != 'undefined')
{
html += "<base href=\"" + editor.config.baseHref + "\"/>";
...what happens if the user switches to Text-Mode?
then the charset from the Textarea will be used - which might be different!
or is is like that:
the iframe used the same charset (if none defined) as the parent-page.
Edit: ...i just tested this... and it doesn't work
so imo we NEED a charset for the iframe if we would like to use utf-8.
tanks for any help on this...
niko
Last edited by niko (2005-03-03 06:36:47)
Niko
Offline
I don't know about iframe, but textarea definitely uses the charset of the parent page (if none supplied). It would be really strange if iframe behaved differently.
Last edited by anzenews (2005-03-03 20:32:27)
Offline
I don't know about iframe, but textarea definitely uses the charset of the parent page (if none supplied). It would be really strange if iframe behaved differently.
hmmm... ok, i take everything back....
it works without any change, the iframe uses the charset of the parent page....
thanks
Niko
Offline
hello,
did anyone use Xinha or Htmlarea with a different charsets?
has anybody expirience with utf8?
Yes, I use it as utf8. The one problem with that though is that the translations are not (all) utf8 which kinda messes things up a bit.
When I finish the new translation system I would hope to see translations converted to utf8, possibly they could also made available also in other "common" character sets for the given language.
James Sleeman
Offline
We are using utf8, as MSIE tended to forget all post-data when submitting a text which contained characters like (c) or (r) or TM or other Word-stuff, which led into database-errors. Xinha seems to work like a charm with utf8.
Offline
It was the same with htmlarea2 and ISO-8859-2 (Latin2). I even wrote some JS to convert offending characters before submitting. It worked, but it was clumsy. You wouldn't believe how many characters MSWord replaces automatically...
About converting codepages: there is a utility iconv under Linux (have no idea about Windows) that translates codepages rather nicely You jst say:
iconv -f iso-8859-2 -t utf8 somefile.txt > somefile.utf8.txt
If you wish, I can take care of that... Let me know.
Enjoy!
Anze
Offline
OK... utf-8-encoded pages are working great....
but i have a problem with non-utf-8-pages:
the SuperClean plugin doesn't work correctly when calling tidy.
this is because tidy is called with the parameter -utf8
when i not have a page that uses latin1 all chars like üä get replaced by ? when i call tidy.
A solution is calling tidy with the right charset.
But we would have to pass the right charset then to the php-script as get-parameter i think(?)
please comment on this - if this would be a solution - i could then write a patch...
Niko
Offline
OK... utf-8-encoded pages are working great....
but i have a problem with non-utf-8-pages:
the SuperClean plugin doesn't work correctly when calling tidy.
this is because tidy is called with the parameter -utf8when i not have a page that uses latin1 all chars like üä get replaced by ? when i call tidy.
A solution is calling tidy with the right charset.
But we would have to pass the right charset then to the php-script as get-parameter i think(?)please comment on this - if this would be a solution - i could then write a patch...
Yes, I think we'll need to add a config variable to HTMLArea.Config to specify the character set that should be used, probably defaulting to utf8, I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.
Tricky bit is I'm not totally sure how the character set in the iframe is set currently. Well, I do know, it's simply not set, what I don't know is if that means the iframe inherits the page's character set or if it uses the browsers default (which is probably ISO-8859-1 for most). it may be that we will also need to add a meta tag into the iframe to try and persuade it into using the correct encoding.
James Sleeman
Offline
So to summarise my rambling, niko if you would like to add the config variable, the meta tag to the iframe and fix up any of those minor character set issues such as in superclean you can handle (you might try a search for utf that should find any), then submit a patch as a ticket that'd be just great
James Sleeman
Offline
Why not just use utf8 everywhere? We could convert all the lang stuff to utf8 and set iframe's meta tag to utf8 - both are easy tasks.
That way it would be much easier to implement and maintain and nobody loses anything - utf8 is as good as all other charsets combined.
The only possible side-efect is that the Xinha contents would be sent to PHP script in utf8 and not in the charset that is specified on the parent page - but personally I wouldn't care about that.
Still better than specifying iframe's character set to something that is set in my language - and getting different encoding based on language setting.
Just my 0.02 EUR.
Last edited by anzenews (2005-03-07 05:34:10)
Offline
Why not just use utf8 everywhere? We could convert all the lang stuff to utf8 and set iframe's meta tag to utf8 - both are easy tasks.
Problem is, that while learned developers such as we already use utf8, most of the world is really still using ISO-8859-1, and most of the asian world I believe is using the various other encodings (EUC-JP, BIG-5, Shift_JIS).
James Sleeman
Offline
Here's a link on character encoding which may be useful in this discussion
http://www.crazysquirrel.com/compgen/form-encoding.php
James Sleeman
Offline
I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.
I found some non-standard document-properties:
gecko:
document.characterSet
document.actualEncoding
ie:
document.charset
The difference between characterSet and actualEncoding i don't know.
The good thing about this is if you switch the charset in your browser (in the view-menu) this property gets updated too!
so we don't need a new config-setting!
basically this patch would be enough:
--- htmlarea.js (Revision 23)
+++ htmlarea.js (Arbeitskopie)
@@ -1387,6 +1387,12 @@
doc.open();
var html = "<html>\n";
html += "<head>\n";
+ if (HTMLArea.is_gecko) {
+ var charSet = editor._mdoc.characterSet; //or should i use document.characterSet direclty?
+ } else {
+ var charSet = editor._mdoc.charset;
+ }
+ html += "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=" + charSet + "\">";
if(typeof editor.config.baseHref != 'undefined')
{
html += "<base href=\"" + editor.config.baseHref + "\"/>";
But there are surely better solutions to implement this.
imho it could be a good thing to have a editor.charSet variable avaliable (in plugins)
so it might be better to set editor.charSet somewhere else (don't know where there is the right place)
greets
niko
Niko
Offline
gogo wrote:I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.
I found some non-standard document-properties:
...
The difference between characterSet and actualEncoding i don't know.
Sweet I suspect that actualEncoding is the encoding returned by the server in the response while characterSet is what it is being displayed as (possibly modified by a meta tag, or user-selection).
so we don't need a new config-setting!
I think it would be better, as you allude to, to have
HTMLArea.Config.charSet
config variable which is defaulted to the main document's charset as reported by those document properties (so that everybody can leave it as the default unless they have some peculiar need). Makes it easier in the long run.
James Sleeman
Offline
...i had some troubles getting the SuperClean to work with utf-8.
...try using the SuperClean-Plugin with an ä for example - it won't work (even the current svn-head-version!)
this is waht i found out:
HTMLArea._postback uses encode(data[i]) (about line 4351)
which makes ALWAYS %E4 out of my ä - ignoring any character-set (correct would be %C3%A4)
a possible solution for that problem:
use function encodeURIComponent instead of encode!
encodeURIComponent uses ALWAYS utf-8 encoding - even if the charset of the page is different.
(see http://www.js-examples.com/javascript/r … oplev.php)
and this is very good for the super-clean-tidy-part (and other plugins that call external application) - as they get ALWAYS utf-8-data and we don't have to handle different charsets!
Please report if this change from encode to encodeURIComponent would cause any problems.
thanks!
niko
Niko
Offline
ok, submitted:
http://xinha.gogo.co.nz/cgi-bin/trac.cgi/ticket/57
btw: james, i want to thank you for your work here!
It is really great when i want to add/change anything in the code i just can post a patch and it will be included! (no-more hacking the code and applying all changes again and again )
niko
Niko
Offline
I think the naming of the language files should at least be logical, e.g
gb.js <= this is gb2312
i think it should be named
zh-CN.GB2312.js or
zh-CN.UTF-8.js for utf-8 charset
or something similar
In htmlarea, i had to write a hash that converts the ISO locale names (e.g. "zh-CN", "en-AU") into disorganized javascript language files.
Cheers, Wei.
Offline
gogo wants to write a complete new i18n-system....
i don't know how this works and how far he is...
Niko
Offline
Pages: 1