corruption of HTML copyright entity

raj · 2006-02-21 12:48:24

I'm experiencing a problem with HTML entities in my Xinha content. If the user tries to enter an html entity like © it works fine but on subsequent saves the entity is corrupted. I think this is because FireFox interprets the entity and displays it using it's character set and that starts the problems with Xinha on subsequent saves.

I've tried a few things to fix this issue by referring to the xinha docs. Specifically, I've looked at:

http://xinha.python-hosting.com/wiki/CharacterSets
and was sure to set my file encodings properly and added a charset="utf-8" to my xinha-containing page like so:
<script type="text/javascript">
_editor_url = "/cms/xinha/" ;
_editor_lang = "en";
</script>
<script type="text/javascript" src="/cms/xinha/htmlarea.js" charset="utf-8"></script>
<script type="text/javascript" src="/cms/xinha/my_config.js" charset="utf-8"></script>

I've also referred to the following bug report: http://xinha.python-hosting.com/ticket/127
and based on mharrisonline's comment, I added the following line to the htmlEncode function in htmlarea.js -
str = str.replace(/©/g, "©");

(OK - I can't get the first © to display a copyright symbol in this preview box but that is what I am trying to search on in my javascript code. I was able to put together a test case where a regular expression was able to search for the copyright symbol and assume that is not the problem.)

But this does not seem succeed in replacing the charset copyright symbol with the html code. I've been looking at the htmlarea.js code but am still digesting it and would love any pointers.

Thanks!

mleiv · 2006-03-23 15:26:20

I had the same problem and I found that the easiest way to fix it was to switch to the iso-8859-1 charset (this was on the full page, not the script tag). You may have a different issue, but for me it was that when I submitted to the server-side script (PHP) using UTF-8, it would corrupt the characters at that point. When using ISO, the characters came through correctly.

However, before I discovered that, I added this to the bottom of htmlEncode():
var len = str.length;
var res = "";
for (var i = 0; i < len; i++)
{
var charOrd = str.charCodeAt(i);
if (charOrd >= 160) res += '&#'+charOrd+";";
else res += str.charAt(i);
}
return res;
It uses ascii char codes instead of named entities (©). **Note: I don't think this works for extended non-ascii charsets, like German umlaut characters.

Last edited by mleiv (2006-03-23 15:28:17)

ray · 2006-03-23 20:01:38

Frankly, why do you need the entity when you have Unicode? It was designed to supply all these charakters you other wise have to call inderectly with the enities. The only character that have to be entitized are <>"" and &, because they have reserved functions in HTML.

Xinha Discussion Forum

Announcement

#1 2006-02-21 12:48:24

corruption of HTML copyright entity

#2 2006-03-23 15:26:20

Re: corruption of HTML copyright entity

#3 2006-03-23 20:01:38

Re: corruption of HTML copyright entity

Board footer