You can now leave tags open wily nily on MetaTalk November 4, 2001 12:00 AM   Subscribe

You can now leave tags open wily nily on MetaTalk (coming soon to MetaFilter).

This is a test to make sure unclosed tags get closed. Of course it will work. (I purposefully left the bold and italics tags open on this post
posted by mathowie (staff) to MetaFilter-Related at 12:00 AM (93 comments total)

Thanks to the hyper coding work of Leonard Lin (also of randomfoo.net fame), there's now a function to automatically close any tags you leave open. No more italicized front pages of MetaFilter. Go ahead and try it out, and view source after posting. It's pretty bulletproof so far.
posted by mathowie (staff) at 12:04 AM on November 4, 2001


I just added it to MetaFilter comment and new thread posting as well. Much thanks to Leonard for building this functionality!
posted by mathowie (staff) at 12:14 AM on November 4, 2001


I just wiped out the old tests, and updated the tag looking for unclosed tags, so continue testing now with the new code
posted by mathowie (staff) at 9:17 AM on November 4, 2001


So how is it done? Does it actually parse, diagnose and fix the code? Seems like a lot of cycles.
posted by rodii at 9:33 AM on November 4, 2001


It only does it when commiting the post to the database. It does a quick search for things encased in < and > and makes sure there is a matching one surrounded with </ and >, and if not, adds the appropriate one.
posted by mathowie (staff) at 9:48 AM on November 4, 2001


OK, Now what?
posted by Steven Den Beste at 10:14 AM on November 4, 2001


Another test.
posted by Steven Den Beste at 10:16 AM on November 4, 2001


Surely I can come up with something here.
posted by Steven Den Beste at 10:19 AM on November 4, 2001


i'll assume it leaves tags that don't have a closing tag alone? like <img>, for example.

although, i guess the page would'nt necessarily break with an </img> tag...

testing:
posted by pnevares at 11:05 AM on November 4, 2001


good work, Leonard Lin and mathowie.
posted by pnevares at 11:06 AM on November 4, 2001


...and yet it seems as though something's gone horribly wrong...maybe it's just me but after 'The Chairman' link I gets nuttin'
posted by yonderboy at 11:32 AM on November 4, 2001


...until the bottom of the sideblog sidebar
posted by yonderboy at 11:39 AM on November 4, 2001


Will it catch a broken link?
posted by alana at 12:12 PM on November 4, 2001


Looks like the Tidy functionality, or the core of it anyway. Neat.
posted by rodii at 12:19 PM on November 4, 2001


How about embeded tags
posted by alana at 1:24 PM on November 4, 2001


Er, i mean nested (-:
posted by alana at 1:24 PM on November 4, 2001


Oooooooh, a chance to break things! Cool.

So, can it catch lots of things that are weird oh yes oooooh la laaaaaaaa hummmm lets see just ignore me traaaa lee laaaa laa de daa.

Can't think of any other way to break it :-\

posted by smaugy at 1:32 PM on November 4, 2001


This breaks it:

So, can it catch <a href=> <a href="> lots of <i><i><> things that are <b><b><b>weird oh yes</> <b> <i>< i > oooooh la laaaaaaaa </i> <i><i > hummmm lets see just ignore me traaaa lee laaaa <i<i<i> laa de daa.
posted by smaugy at 1:55 PM on November 4, 2001


fun and healthy
posted by fuq at 4:18 PM on November 4, 2001


Smaugy, it seems to have worked correctly. I still wonder whether it will handle smaller and smaller text without enough closes.

And I want them on multiple
lines.


Another interesting thought is overlapping formatting.

Of course, it has to be kept in mind: "Whenever you make something foolproof they invent a better fool" and "You can't make anything foolproof because fools are so ingenious". Even if this fools our white-box testers in this thread, someone out there will come up with a way of blowing it up on the main board.

posted by Steven Den Beste at 6:32 PM on November 4, 2001


Well, a partial success. It caught one unclosed format but not the other until the very end of the string.
posted by Steven Den Beste at 6:35 PM on November 4, 2001


wee, so coding for mefi has finally persuaded me to get an account so that i can post.

thanks for the user testing. there have been two bugs caught. one with the original code (i was popping the tags off the stack and closing tags before appending the last token of text).

the old code was just a tag balancer, but i got off my butt today and forced myself to convert the tag filtering code cold fusion. so, the new code does malicious tag and attribute filtering, as well as html entity conversions. should handle just about anything now (a few of the above posts weren't really testing the balancer, but rather sneaking html entities and malformed tags. now that the new version is installed, that should now be valid testing).

steven - due to the way the algorithm works, it can't know your intentions about when the tag should be closed. :P in all seriousness, although function does a stack search to try to insure proper nesting, the tags aren't necessarily guaranteed to closed until the very end, where it pops off any unclosed tags it sees.

also, there was a misplaced lcase in the attribute cleaner that was causing this problem.
posted by lhl at 8:15 PM on November 4, 2001


hmm, that link should be to 1300.

oh, in case anyone's interested, the php version, which i coded first is 6-8x faster and was a heckuvalot easier to code.

of course, i wanted to do a cf version to learn the language better, which certainly worked. for example, i learned that mixing js syntax and cfml (used for cfscript udfs) conventions is moronic (wow, macromedia outdoes itself [think actionscript]), the cfscript parser is braindead, cf regex's are retarded (and the docs are wrong, the escape codes don't work), and the whole counting from 1 isn't just unconventional, but really does make a bunch of things much, more difficult.
posted by lhl at 8:27 PM on November 4, 2001


Testing...">
posted by nicwolff at 8:49 PM on November 4, 2001


More testing">
posted by nicwolff at 8:54 PM on November 4, 2001


Well, if it caught those then it's probably better than the Hotmail HTML filter!
posted by nicwolff at 8:59 PM on November 4, 2001


Testing.

posted by Neale at 9:02 PM on November 4, 2001


Testing.
posted by Neale at 9:03 PM on November 4, 2001


Testing.

posted by Neale at 9:05 PM on November 4, 2001




Very interesting.
posted by Neale at 9:08 PM on November 4, 2001


Testing

posted by Neale at 9:12 PM on November 4, 2001




Hello

posted by Neale at 9:16 PM on November 4, 2001


But that last post looked wild in preview.
posted by Neale at 9:18 PM on November 4, 2001


LHL, I think you've done a fine job. But I find it confusing that it closed my dangling underscores and bolds in one place but closed the italics at the very end. Why was the italic different? (Maybe because it was the first? I'll try another experiment here.)

Normal fancy and the theory is that this time it will be the bold tag which auto-close is deferred on.

posted by Steven Den Beste at 9:44 PM on November 4, 2001


I'm also going to try fooling it with some nbsp's:

Hello there and this is outside my crime.
posted by Steven Den Beste at 9:46 PM on November 4, 2001


>> Inside and outside
posted by Steven Den Beste at 9:48 PM on November 4, 2001


Ha! Success! (or failure, depending on your point of view
posted by Steven Den Beste at 9:49 PM on November 4, 2001


>> And let's see if we can turn bold on>
posted by Steven Den Beste at 9:52 PM on November 4, 2001


>> And try again>
posted by Steven Den Beste at 9:53 PM on November 4, 2001


I think maybe the reason that bold isn't sticking is that you're explicitly setting it in the style sheet for the "posted by" section so that its stuck setting no longer matters.

Anyway, this is what I added for the italics to make it stick:

&lt;i &lt;i &nbsp>>&lt;i &lt;i &nbsp>> inside&lt;/i &lt;i &nbsp>> and outside

After hitting "preview", the nbsps had been replaced by actual spaces, and I put the nbsps back in again before posting.

(And now it looks as if I did manage to make "bold" stick, too, by doing the same thing.)
posted by Steven Den Beste at 9:56 PM on November 4, 2001


Hmmm... well, what I tried to provide as my example of what I entered looked right in the preview but got munged in the post. Anyway, you can see it honestly here.
posted by Steven Den Beste at 10:00 PM on November 4, 2001


By the way, remember the motto of SIGHACK:

"Nobody would ever actually do that!"

(A very old joke... I saw it when I was a student intern in college, about 1974.)
posted by Steven Den Beste at 10:02 PM on November 4, 2001


Now for another experiment: can I turn all that crap off again?

>>Inside>>>> and outside
posted by Steven Den Beste at 10:06 PM on November 4, 2001


testing
posted by riffola at 10:21 PM on November 4, 2001


hmm, steven, i'm looking at the code on your page, and it processes to:

&lt;i &lt;i >>&lt;i &lt;i >>Inside&lt;/i>> &lt;/i>

on the local version of my parser. of course, it'll die on validation, but when i placed text after that string, i didn't get bad affected text after it. let me try it out...


>>Inside> will this text be affected?
posted by lhl at 10:32 PM on November 4, 2001


ahh, well, it affects the text after it, but again, catches it at the very end. all is well with the world. ;)

i'll be posting the code up (cf and php versions) once my i get a server back up (new dedicated server should be running by monday or tuesday), so if you're interested in all the particulars, you can take a look then.

this is by no means a black box operation. the tag balancing isn't really a security risk (although it's the more fun part), but the tag/attribute cleaner will be good to have reviewed.
posted by lhl at 10:37 PM on November 4, 2001


I want to do a wholesale brute-force closure of tags just to see if that will clean things up. So that was ten of each.
posted by Steven Den Beste at 10:40 PM on November 4, 2001


&lt;
posted by lhl at 10:40 PM on November 4, 2001


And it did close everything in the preview but you seem to have taken them all out again in the post and we're still fratzed. Sigh.
posted by Steven Den Beste at 10:40 PM on November 4, 2001


One thing I think you should do is to not get rid of unbalanced closes. That way good-hearted citizens can try to undo the damage which sneaks through your filter.
posted by Steven Den Beste at 10:41 PM on November 4, 2001


ahh, my filterText function is being a bit overzealous, heheh. will fix that up.
posted by lhl at 10:44 PM on November 4, 2001


hmm, steven, you're right about the unbalanced closes. i actually have a switch for that. it was really a decision of aesthetics vs paranoia.

for the life of me though, i can't see what's fratzed?

this must be something that is based on you're browser's interpretation of broken tags. ie5.5 and moz.95 are both fine. what are you viewing with?
posted by lhl at 10:47 PM on November 4, 2001


btw, my icq is 5280167, aimname is randomfoo if you want to talk about this w/ less lag
posted by lhl at 10:49 PM on November 4, 2001


Now what would also be useful is a parser that checks for invalid (x)HTML and fixes it.
Test.

">Hmm
posted by mkn at 11:36 PM on November 4, 2001


Hmmm :\


It seems to treat [br]s a bit strangely. Can't you get it to convert unclosed [br]s and [img]s into [br /]s and [img /]s?
posted by mkn at 11:40 PM on November 4, 2001


">Testing <> inside title

<span style="font-weight: bold;" Testing: if you forget to add a closing bracket it won't fix it.
posted by riffola at 11:52 PM on November 4, 2001


Oh that broke intrestingly, the first line rendered correctly in the preview and the second line didn't but here the second line is written out while the first has errors.
posted by riffola at 11:53 PM on November 4, 2001


ok, i sent matt a new version that should allow extra closing tags...

and also deals with html entities better. um.. as in allowing them to be used, heheh. <tesing>

i'm certain, btw, that cold fusion's regex's are completely fubared.
posted by lhl at 11:57 PM on November 4, 2001


ok, my spelling could use some work, but it looks like the new changes are working.
posted by lhl at 11:58 PM on November 4, 2001


mike, since mefi's isn't xhtml, i'm not going to bother with the / handling. (although it is pretty trivial - i'll leave that up to matt). right now it doesn't add br's, img's, or hr's to the tagstack at all (it does still clean the attributes).

riffola: don't be silly, of course, it's not going to close unclosed brackets. :P also, the regex i'm using is pretty strict to minimize accidental tag interpretations.

if it sees a tag like "
also, since a lot of people have been playing around w/ the attribute cleaning, here's what it's supposed to do: the attributes are passed and tokenized, and there are 2 states that are considered, whether it's a name or a value, and whether there's an open quote or not. it starts out if there's a name, and filters out "style", "type", or "on*", if there's an equal it knows there's a value after. values are filtered for "javascript:" - the attribute cleaning is probably the most critical part that needs to be working from a security standpoint, so feel free to bang on it / make suggestions, however, mefi's survived for 2 years w/o one so it's not a huge deal (not like one can really secure a site running on iis anyway, heheh).

well, i'm going to sleep now. i'll check on this thread tomorrow morning probably, but i'm sure matt will let me know if there are any problems. as an extra bonus w/ the regex's all figured out, it was a cinch to write a highlighter for the search (highlighting the search term, except for when in a tag, nooch)

posted by lhl at 12:52 AM on November 5, 2001


hmm, maybe not strict enough. nm, talking to myself. i'll stop now.
posted by lhl at 12:55 AM on November 5, 2001


ihl: I know that what I was doing would not be commonly done, I was just trying to see what would happen if I tried that out. I like how it converted the unclosed style tag into regular text. That's great, the other one with the <> in the title of the style tag was purely for experimentation. :)
posted by riffola at 3:51 AM on November 5, 2001


What do you mean by "malicious tag and attribute filtering"? How do you define malicious?
posted by rodii at 6:01 AM on November 5, 2001


LHL, I'm using IE6, and on it the whole section which I referred to as "fratzed" was bold face and italicized. You turned it back off again in your 11:57 PM post and everything after that looks fine. (Should I post a screen capture?)
posted by Steven Den Beste at 6:21 AM on November 5, 2001


OK, This is what it looks like to me.
posted by Steven Den Beste at 6:26 AM on November 5, 2001


hmm, must be a quirk with ie6's html rendering engine. actually, i don't know if there's an official w3c recommendation with handling malformed tags - it must be an implementation choice. i added an extra line in the attribute cleaner so it'll remove extra <s, which seems to be what's causing the problems on ie6. of course, since i'm not going to install ie6, i can't really test this.

rodii, for your edification:
CERT® Advisory CA-2000-02 Malicious HTML Tags Embedded in Client Web Requests
CERT: Understanding Malicious Content Mitigation for Web Developers
WhiteHat Security: Web Application Security: "In theory and practice" Defcon 9 Presentation
(probably has everything you'll ever think of)

posted by lhl at 12:29 PM on November 5, 2001


test for script source:


test
posted by mathowie (staff) at 12:40 PM on November 5, 2001


favorite line from the cert advisory: Web Users Should Not Engage in Promiscuous Browsing

and remember kids, don't drink fake shakes.
posted by lhl at 12:41 PM on November 5, 2001


I’d just like to say cooooooooooooool.

Cooooooooooooool.

Thank you very much.
posted by gleemax at 7:40 PM on November 5, 2001


This sounds like a bunch of amateur radio operators testing out audio on the air.
posted by ParisParamus at 8:18 PM on November 5, 2001


Looks like the HTML tag self-healer tripped on the fake sarcasm tag. It inserted an empty tag immediately after the slash-sarcasm.
posted by hijinx at 9:24 PM on November 5, 2001


<pondering>this should work</pondering>

but this shouldn't

but this should should show up < extra space> < /extra>
posted by lhl at 9:53 PM on November 5, 2001


hmm, interesting. looks like the tag filtering regex needs some work. oh how i hate cold fusion's regular expressions... i need to take a break from cold fusion soon before go postal on macromedia. cipro city!
posted by lhl at 10:00 PM on November 5, 2001


RE: mefi/xhtml.
Yeah, I understand that MeFi is not xhtml... but I just thought it would be useful in general.

I'm wondering if you will make the code available? Especially the PHP code? I'd like to look and learn from it, if possible.
posted by mkn at 8:59 AM on November 6, 2001


mike, i'll post a link to the code soon when my new dedicated server gets setup up (any day now barring any problems)

heh, whether you 'll be able to learn anything from the php code, well can't promise that, it's sorta icky. [glotis]but faaaaast manny, faaaassst. sweet decal work...[/glotis]
posted by lhl at 12:56 PM on November 6, 2001


test


posted by Nothing at 10:36 PM on November 6, 2001


test
posted by Nothing at 10:37 PM on November 6, 2001


test |
fix it

Using character references in the word "javascript" seems to get around the javascript filter. The code above replaced the "j" in javascript with &#0106;


posted by Nothing at 10:55 PM on November 6, 2001


Sorry about that. A stupid mistake and my "fix it" just makes it worse. Refresh to fix everything.
posted by Nothing at 10:56 PM on November 6, 2001


If this is fuchsia, the same goes for the "style" filter


posted by Nothing at 11:00 PM on November 6, 2001


So it looks like adding character reference filtering to the values filter would fix it. Attributes seem okay. Unless THIS is fuchsia, (or this)
posted by Nothing at 11:08 PM on November 6, 2001


btw, nothing, i noticed your first test, and sent matt an updated version. entities within tags definitely should've been filtered. lets see how this works now... foo

posted by lhl at 2:21 AM on November 7, 2001


I was being stupid and assuming that perhaps there was some secret trick that was filtering my javascript anyway, unobviously. Thus the version that actually did something. Sorry about all that.

Looks good now. Excellent work!
posted by Nothing at 3:53 AM on November 7, 2001


one more, for good measure... (probably won't work, but it's worth the test. )
posted by Nothing at 4:13 AM on November 7, 2001


Can I try?

posted by tranquileye at 4:49 AM on November 7, 2001


What about hanging hidden comments <!-- like this
posted by Monk at 10:05 PM on November 7, 2001


What about... <!--
posted by Monk at 10:06 PM on November 7, 2001


One last try < !-- boo -->
posted by Monk at 10:07 PM on November 7, 2001


chaost strikes again
posted by geoff. at 4:53 PM on November 19, 2001


>
posted by geoff. at 5:01 PM on November 19, 2001


n

one last time


posted by geoff. at 5:03 PM on November 19, 2001






Hooray for once again making Australia look bad!



Hooray for being a Pacific Island!

Hooray for drinking in the morning!

posted by geoff. at 2:24 PM on November 25, 2001


« Older I can't post. :-(   |   What site was this?? Newer »

You are not logged in, either login or create an account to post comments