You are not logged in.
Pages: 1
Hi,
is there anyway to get rid of everything that microsoft produces (apart from the standard HTMl of course!) Stuff like the <span>, <div> etc tags so it basically copies it as plain text when a user chooses so.
If we could have something along the lines of 'copy as plain text' - it'll be very handy indeed... apart from tables (obviously)
Not asking for much am I ;-)
P.S. By the way, I am using ASP on an IIS server - I think this can be done in PHP but not seen it doe for ASP/IIS (without CGI/Perl)
Last edited by januszjasinski (2005-07-14 05:29:20)
Offline
I guess you have checked out the new SuperClean plugin which does a real good job. Other sollutions involve using htmltidy, but my past experience is that htmltidy doesnt work that well with Word bloated code. (I some cases htmltidy infact tidyed the entire code...).
You are on the windows platform however and have more possibilities, I have seen quite a few wordcleaners out there, and Im sure most of them can be ran from the command line (Meaning you could easily create your own plugin which tidies the code nicely).
There is another sollution which isnt as nice, but then again works just as well. Include a little textfield on your page. Tell your customers that this is the "converter" window. Let them paste their word clipboard into this window with a quick CTRL+V, CTRL+A and a CTRL+X to get the HTML free code available so they can paste it into the editor. No plugin needed, just a little coaching of your customers who might aswell be happy to learn some shortcuts in Windows,
Offline
Just to throw out an idea, one way to clean up all the Word code could be to manually do it with PHP. You could split the entire HTML code into an array and manually walk through it and remove all tags that you dont accept.
Here is some quick code to illustrate the procedure :
<?php
// Quick tag remover / validator / something....
$html = '<a href="test" onclick="alert(1);" onmouseover="alert(2);">test</a> <p style="border:1px;">text</p>';
$tags_to_inspect = array('img','a','span','div');
$accepted_attributes = array('href','onclick');
echo htmlentities($html) . "<br>\r\n";
$r = preg_split('((<)|(>))', $html, -1, PREG_SPLIT_DELIM_CAPTURE);
for ($i = 0; $i < count($r); $i++) {
if ($r[$i] != "<") {
continue;
}
$i++;
$l = explode(' ',$r[$i]);
if(!$l[0] || !in_array($l[0], $tags_to_inspect)){
continue;
}
for($ii=1;$ii<count($l);$ii++){
$a = explode('=',$l[$ii]);
if(!in_array($a[0],$accepted_attributes)){
unset($l[$ii]);
}
}
$r[$i] = implode(' ',$l);
}
$html = implode('',$r);
echo htmlentities($html) . "<br>\r\n";
?>
Using this tecnique you could do it two ways, either specify all the attributes you are infact wanting to delete or the attributes you infact want to keep. You could also easily extend so that you expand the style attribute also and look for surtain CSS code you want to either keep or delete aswell.
Offline
Pages: 1