Automatically detect and fix HTML errors on comments/posts. December 6, 2005 2:45 PM   Subscribe

Automatically detect and fix HTML errors on comments/posts. In case of invalid HTML, it will automatically try to fix the markup using HTML Tidy, list the errors and force a preview.
posted by Sharcho to Feature Requests at 2:45 PM (21 comments total)

It will? Cool.
posted by gleuschk at 3:17 PM on December 6, 2005


I don't see any ports of tidy for coldfusion
posted by mathowie (staff) at 3:20 PM on December 6, 2005


Actually, this looks like an ugly hack for it, but if anyone spots anything better, feel free to post it.
posted by mathowie (staff) at 3:23 PM on December 6, 2005


Really, how many HTML errors are there?
posted by smackfu at 4:38 PM on December 6, 2005


There are a few.
posted by cortex at 4:42 PM on December 6, 2005


I dont <see any?/
posted by blue_beetle at 5:05 PM on December 6, 2005


Hmm.. how is it a hack?

I just say that because it looks like a simple Java call, albeit with a lot of lines of wrapper code. Doesn't look like anything can be trimmed though.
posted by holloway at 6:02 PM on December 6, 2005


By the way, Radium's been going through comment validation issues in rewriting the SA forum software.
posted by holloway at 6:29 PM on December 6, 2005


Why not just use body.onLoad regex's on the comments?
posted by Civil_Disobedient at 6:44 PM on December 6, 2005


Regexes don't catch half the things tidy/sgml/xml parsers do. Try making a regex that understands what's wrong in this,

<table><tr><td>
    &bsp; <table><tr><td>
</td></tr></table>
posted by holloway at 6:55 PM on December 6, 2005


Heh... excluding the entity problem.
posted by holloway at 6:55 PM on December 6, 2005


When did the concept "automatically fix" actually start working? I missed it. I thought that was all a
Microsoft beat-off fantasy.
posted by scarabic at 6:59 PM on December 6, 2005


What?
posted by holloway at 7:08 PM on December 6, 2005


Try making a regex that understands what's wrong in this...

Oh, that's easy. Just replace the < and>with &lt; and &gt;-- users should be making tables in their comments anyway. :)
posted by Civil_Disobedient at 7:09 PM on December 6, 2005


shouldn't, that is.
posted by Civil_Disobedient at 7:14 PM on December 6, 2005


how is it a hack?

It turns text strings into files before running through tidy. Seems that doing that many thousands of times a day here could be problematic, given all that filesystem and memory use.
posted by mathowie (staff) at 7:17 PM on December 6, 2005


Ok then, as you're being awkward how about this,

<small><b><small>what</b><i></small></i>
posted by holloway at 7:19 PM on December 6, 2005


Ah, right. Good point about the temp files. There's a function named tidyParseString in the .net bindings -- maybe there's something similar for Java.
posted by holloway at 7:28 PM on December 6, 2005


BTW, thanks for that link Holloway. Interesting stuff.
posted by smackfu at 9:42 PM on December 6, 2005


Yeah, that site's pretty good. He's coming at it from the standpoint of writing code Knuth would love. Parsing variables the fewest number of times, and with something that understands *ML, rather than adding another line of regex replacement.

It's that kind of thinking that got me to give up on PHP. It's better than old-style ASP, CFMX, and the rest, but that's not the test anymore. It's up against Ruby and Python, .Net and Perl. 'cause when I want to write state-machines and use SAX/STX those former languages just get in the way.

I've been writing a cached XML pipeline streaming system based around SAX/STX. It's like the 2 previous versions of Phpilfer, XML but not so pure as Apache Cocoon so it can be fast. It's built to avoid the filesystem and to scale across boxes (memcached). Each subsequent version has less code and has been easier to program - it's a really good feeling :)

I don't know whether Radium is a great programmer but his approach is inspiring and it's helped me appreciate algorithms again. Years writing commercial software gave me the idea that a good engineer would pragmatically concentrate on architecture and UI because that's what users want. That's true, but unbalanced. thx radium.
posted by holloway at 1:46 AM on December 7, 2005


There's a better code sample here
There are also quite a few alternatives to JTidy here
posted by Sharcho at 3:36 AM on December 7, 2005


« Older Why is Projects top bar different from the others?   |   Glasgow Meetup Newer »

You are not logged in, either login or create an account to post comments