Don't Fix Me, Bro May 8, 2008 4:58 PM   Subscribe

Has code been added to "fix" "invalid" HTML?

I just previewed a post which had a few unclosed <li> tags (perfectly legal under the HTML 4.0 Transitional Doctype) and when I previewed, it "fixed" my HTML by stacking a bunch of closing tags at the end.

That is, it found
<ul>
  <li>foo
  <li>bar
  <li>baz
  </ul>


and "corrected" it to
<ul>
  <li>foo
  <li>bar
  <li>baz
  </li></li></li>
</ul>


Which, I'm going to go with, bad idea.
posted by AmbroseChapel to Bugs at 4:58 PM (49 comments total) 2 users marked this as a favorite

It has done this since, like, forever.
posted by cortex (staff) at 5:00 PM on May 8, 2008


Yeah, we've always had auto-closing tags, and it's a good idea because before we had the feature (going on four years ago), we would routinely have a whole thread bolded or hit with a trailing small tag, etc, so we close open tags at the end of a post.

Yes, I know technically you can leave a list item unclosed in HTML 4, but in order to fix a lot of broken and incomplete HTML, we auto-close everything we see get opened.
posted by mathowie (staff) at 5:01 PM on May 8, 2008


This fixes more problems than it creates. What problem does it create, besides that you don't like it?
posted by jessamyn (staff) at 5:08 PM on May 8, 2008


Where there’s foo there’s fire!
posted by tellurian at 5:22 PM on May 8, 2008


>It has done this since, like, forever.

I could swear this is the first time I've seen it. Does it happen only on Preview or on Save? It may be that I've relied on Live Preview in the past, and as we all know, it's often completely different to what really happens to your post.

>What problem does it create, besides that you don't like it?

The HTML I put in was valid. The HTML it produces is invalid.

Clearly, the fix should be:

<ul>
  <li>foo</li>
  <li>bar</li>
  <li>baz</li>
</ul>

So I guess my concern is, you're not parsing or validating the HTML properly, you're just counting tags.
posted by AmbroseChapel at 5:24 PM on May 8, 2008


Does it happen only on Preview or on Save?

It might be only on post, not on preview. I almost never preview. Let me check:

Seems to do it on preview as well. I think you just managed to not notice before.

So I guess my concern is, you're not parsing or validating the HTML properly, you're just counting tags.

Well, like Matt said, just counting tags and closing them solved the actual huge pain-in-the-ass problem of frequent page-munging (or even page-breaking) unmatched tags that used to pop up back in the day. If the edge-case result of e.g. a chain of unnecessary closing tags doesn't actually cause any problems for rendering in common browsers, that it's invalid html doesn't seem like a great big deal.
posted by cortex (staff) at 5:56 PM on May 8, 2008


Looks like it's another thread devoted to the unnecessary pursuit of external validation.
posted by MrVisible at 6:06 PM on May 8, 2008 [9 favorites]


>Looks like it's another thread devoted to the unnecessary pursuit of external validation.

I don't know what you mean by "another", "unnecessary" or "external" in that sentence.

But anyway. I see the problem you've solved, and I agree that you've solved it, and I think my misgivings are obvious to anyone who cares about code, so I've had my say.
posted by AmbroseChapel at 6:38 PM on May 8, 2008


The HTML I put in was valid.

But it sucks, so don't do it. Close your tags, move on with life.
posted by blacklite at 7:03 PM on May 8, 2008


He was making a joke, AC. Emotional vs. syntactic validation, metatalk threads recently on subject of former in one sense or another, etc.
posted by cortex (staff) at 7:11 PM on May 8, 2008


I really don't see it as something to get hung up on.

Also, the year 2000 called and said even if it's optional, just close your tags.
posted by Deathalicious at 7:14 PM on May 8, 2008 [3 favorites]


The HTML I put in was valid. The HTML it produces is invalid.

So? It's not a valid XHTML fragment, just close yer tags.
posted by delmoi at 7:21 PM on May 8, 2008


It's not a valid XHTML fragment

...speaking of the year 2000.
posted by timeistight at 8:43 PM on May 8, 2008


I had this really awesome psychedelic tiling on the background of my webpage back in 1995.
posted by cortex (staff) at 8:48 PM on May 8, 2008


I think this thread needs psychedelic tiling.
posted by Pants! at 9:40 PM on May 8, 2008 [1 favorite]


Don't lick the brown tiles.
posted by MrVisible at 10:26 PM on May 8, 2008


So? It's not a valid XHTML fragment, just close yer tags.

In all fairness, the site doesn't use XHTML, so there isn't any reason the OP would expect to need valid XHTML.
posted by !Jim at 2:40 AM on May 9, 2008 [1 favorite]


Knowing nothing about how the validator works, this seems like a legitimate complaint. If the output is
<ul>
<li>foo
<li>bar
</li></li>
</ul>
rather than
<ul>
<li>foo
<li>bar
</ul>
</li></li>
then there is already some logic about what elements may contain which others. <li>s may not contain <li>s.

A request to fix this is not unreasonable, or urgent.
posted by fantabulous timewaster at 7:00 AM on May 9, 2008


Wow, what an unreasonable request.

Why unreasonable? Because the system, as is, keeps you from breaking Metafilter. That's what it's for. It counts all the open tags and makes sure they're closed. It's fairly simple code. This obviously can make bad HTML, but it's not HTML that's bad enough to, at least theoretically, break any threads.

If you "care about code" as you claim to, you must realize what an enormous difference there is between a tag counter and a a previewer that actually understands HTML. The logic involved in correcting submitted HTML would be insane, and very prone to failure. You'd end up with countless people complaining that the 'fix' to their HTML actually broke it.

You are, in other words, asking them to spend a whole BUNCH of time developing a new feature that will make their lives, and ours, much harder than it was before.

As an alternate solution, you can close your own damn tags.

Offhand, I'd vote for the latter.

And, the thought occurs... if you care about code, the first step is caring about your OWN code and doing it right, not expecting Uncle PB to fix your messes.
posted by Malor at 7:25 AM on May 9, 2008


Where is all this anger coming from? I don't see any request at all in either of Ambrose's posts. Seems like he's pointing out a bug. That's not unreasonable at all.
posted by stubby phillips at 7:31 AM on May 9, 2008 [1 favorite]


And, the thought occurs... if you care about code, the first step is caring about your OWN code and doing it right, not expecting Uncle PB to fix your messes.

You seem to be confused. AmbroseChapel's code is right. It's been valid HTML since HTML existed and it will continue to be valid HTML into the future.
posted by timeistight at 9:52 AM on May 9, 2008


stubby phillips: Where is all this anger coming from? I don't see any request at all in either of Ambrose's posts. Seems like he's pointing out a bug. That's not unreasonable at all.

Well, er...

AmbroseChapel: Has code been added to "fix" "invalid" HTML?... I'm going to go with, bad idea... I don't know what you mean by "another", "unnecessary" or "external" in that sentence. But anyway. I see the problem you've solved, and I agree that you've solved it, and I think my misgivings are obvious to anyone who cares about code, so I've had my say.

See, this wasn't a "bug" "report." It was a "snotty" "implication" that someone else's code (which works just fucking fine, thank you very much) wasn't "clean" "enough." There was no bug. Extra list-tag-closers lumped at the end of the code of a comment doesn't fuck anything up. As jessamyn says, it doesn't create any problems besides the fact that Ambrose doesn't like it.

Here's a brief translation of AmbroseChapel's beef:

"Hey! My HTML was unnecessarily corrected- see, look at the transitional doctype!- and I resent the implication that I might happen to write unclean code. So can we please unnecessarily correct the code of the site so that I won't have to suffer this implication again? Metafilter just isn't coded very cleanly in this instance."

Only with more quotation marks.
posted by Viomeda at 9:52 AM on May 9, 2008


Also, to be charitable, there is a good point here. It was just expressed badly. (If you will, it wasn't up to snuff on the Natural Language 2.5 Transitional Doctype.) It should've gone something more like this:

"Hey, I just noticed that the comment form auto-closes any tags. Now, since I just did a search of Meta I've discovered that it's been doing this for a while, and I see why it's doing it. However, it seems like it could be a little more efficient; specifically, it's closing list tags in a lumped-up way at the end of the comment code, whereas it ought to be nesting the close-tags of lists behind the object they refer to. Is there a way to fix this?"

...whereas it came out:

"Did somebody try to 'fix' this? I'm going to go with, bad idea."

There's some snarkiness back and forth because nobody who's worked hard on code before likes being corrected like that any more than they would've liked getting their essays back from professors in high school with comments on them like, "did you think going in this direction would make your essay 'interesting?' I don't really know what you were thinking."
posted by Viomeda at 10:05 AM on May 9, 2008 [1 favorite]


Guys, if you fix how HTML is formed, it can't have babby.
posted by proj08 at 10:17 AM on May 9, 2008 [5 favorites]


It's more like getting your code back from a professor with comments like, "This code doesn't generate properly formatted HTML".
posted by stubby phillips at 10:33 AM on May 9, 2008 [1 favorite]


I've been annoyed by this too, for what it's worth, but it seems like a non-issue. Browsers pretty much ignore the </li> tag.
posted by ikkyu2 at 11:05 AM on May 9, 2008


Sure, I guess. I suppose a widget-maker (or greasemonkey?) who wanted to do something heretofore-unthought-of with the content might be vexed because his or her HTML parser won't handle improperly formatted HTML. That (and the risk of offending a programmers' sensibilities) is probably the only issue. Far fetched, I'll grant you.

I guess what amazed me was how angry this callout made some people.
posted by stubby phillips at 11:17 AM on May 9, 2008


stubby phillips: I guess what amazed me was how angry this callout made some people.

Auto-closing tags isn't a bad idea. Ambrose here might be great at HTML, but us mere mortals aren't always. If you'd been around in the days when an open tag meant an entire italicized thread, well, you'd know that. So it's sort of obnoxious to say it was a bad idea to start auto-closing tags.

I guess Ambrose might have meant that we should be auto-closing tags more intelligently, but that's not what he said. Reread the post and the comments.

The snooty line about "people who care about code will understand" is probably what made some of us think this was silly. I don't know that anybody's angry besides Ambrose, however.
posted by Viomeda at 11:50 AM on May 9, 2008


Well, Ambrose seemed polite enough to me. Especially compared to you and Malor and a couple others.

So maybe the word "angry" wasn't quite the word I should have used. Maybe I should have used the word "rude".
posted by stubby phillips at 11:57 AM on May 9, 2008


Lets just let everyone use the <> tag and be done with it!

[posts recursive link to this page]

Ha!
posted by cowbellemoo at 12:20 PM on May 9, 2008


aw lame. was supposed to be IFRAME. /pout
posted by cowbellemoo at 12:21 PM on May 9, 2008


then there is already some logic about what elements may contain which others.

I'm pretty sure there's not. The only logic is to determine that tags are closed in the proper order.
1 <ul>2  <li>foo3  <li>bar4  <li>baz5 </ul>
Parser sez:
1: push ul on the tag stack
2-4: push li on the tag stack
5: pop tags off the stack and close them until one of them is a ul

Which doesn't require knowing anything about HTML apart from which elements are typically closed internally, like <br /> and <img />. Therefore fixing this really would involve adding a lot more specific knowledge about HTML, and isn't really reasonable.
posted by moift at 3:39 PM on May 9, 2008


In general, either code conforms to the standard or it doesn't. (If this is not the case, then the standard is ambiguous, which is a Bad Thing™.) AmbroseChapel's code conforms to the applicable standard (or, at least, I haven't seen anyone dispute that). AmbroseChapel asserted that the code generated by the automatic tag-closer is invalid HTML. I don't know whether or not that's true, and I'm too lazy to look it up in the standard. (And I wasn't going to comment, for that reason, but apparently the quality bar for comments in this thread isn't that high, so what the hell.)

If Metafilter is generating invalid HTML, then it's not something to be proud of. On the other hand, it's not exactly rocket science (thankfully) that we're up to here. And, apparently, most browsers handle it OK. So maybe it's not as high a priority as some other things (e.g.,
making <blockquote> render without excessive vertical space,
and things of this nature.
posted by Crabby Appleton at 3:50 PM on May 9, 2008


moift is right.

parser: push every tag i see. pop every /tag i see. if anything isn't closed, close it before writing the /tag.

1. get a ul, push it
2-4 get an li, push it
5 get a /ul, start popping

when the parser gets to the /ul, it realizes that everything the ul contains is not closed, so it pops all the contents of the ul (the li tags) and closes them before writing the /ul token.

this gives the result ambrose noticed.

this much is totally HTML agnostic. any tag can contain any tag and all of a tag's contents are flushed when it closes.
posted by stubby phillips at 4:03 PM on May 9, 2008


AND it's stateless...
posted by stubby phillips at 4:11 PM on May 9, 2008


There's only one reasonable solution: MeFiML
posted by stubby phillips at 4:16 PM on May 9, 2008


making blockquote render without excessive vertical space,

Why do people keep bringing this up? Just don't put a break before and after it.
blockquote
Is that so hard?
posted by languagehat at 4:42 PM on May 9, 2008


I actually write HTML for other purposes besides Metafilter, on occasion. When I write this:
<p> Here is a blockquote:
<blockquote>
Stuff in blockquote.
</blockquote>
That was a blockquote.
in an HTML file, it renders as follows:
    Here is a blockquote:
    Stuff in blockquote.
    That was a blockquote.
But if I write the exact same code in the Metafilter comment box, it renders as shown in my previous comment. If I want it to render as above, I have to write this:
<p> Here is a blockquote.<blockquote>Stuff in blockquote.</blockquote>That was a blockquote.
Is that hard? Not terribly. Is it annoying? Yes. Is it ugly? I think so, and it's uglier when there's more text. Are most people aware of this subtlety? Apparently not, because I see the extra vertical space all the time; also, I suspect people avoid using <blockquote> when they should use it, because it's ugly (unless they know the "easy" trick). But to me it's even uglier to italicize quoted text. So that's why people (or at least I) keep bringing it up.
posted by Crabby Appleton at 5:42 PM on May 9, 2008


Here is a blockquote:
Stuff in blockquote.
That was a blockquote.
posted by stubby phillips at 5:46 PM on May 9, 2008


Here is a blockquote.
Stuff in blockquote.
That was a blockquote.
posted by stubby phillips at 5:50 PM on May 9, 2008


The blockquote rendering problem results from the conversion of every line break into a <br> tag.

One simple fix might be to skip the linebreak conversion after the opening and closing blockquote tags, or to write a special rule to strip off <br> tags when the appear just afrter the opening and closing tags for any block element.
posted by macrone at 7:28 PM on May 9, 2008


moift and stubby phillips, if the tag closer does enough pattern matching to not emit </img> or </br>, it does enough pattern matching to not emit </li>. I guess that's also stateless.

Thank you for your attention to this devastatingly important issue.
posted by fantabulous timewaster at 8:25 PM on May 9, 2008 [1 favorite]


You're new here, aren't you?
posted by Crabby Appleton at 8:43 PM on May 9, 2008


If you took all of the characters that were typed into this thread, including my own, and channeled that same effort into adding closing </li&gt tags where needed, this thread would not be necessary.

note: it isn't necessary even without all that.
posted by davejay at 9:13 PM on May 9, 2008


fantabulous: it does emit /br (try it)

it doesn't emit /img probably becuase it doesn't emit img.
posted by stubby phillips at 4:23 PM on May 10, 2008


what happens if i make a bunch of unclosed br's inside a ul?
    foo bar baz
it doesn't close the br's! that's what.

the plot thickens.................
posted by stubby phillips at 4:26 PM on May 10, 2008


i take it back. it does not emit /br.

the thought plickens.
posted by stubby phillips at 8:20 PM on May 10, 2008



one more test


heh. i think fantab figured it out.
posted by stubby phillips at 8:39 PM on May 10, 2008


AND it's stateless!
posted by stubby phillips at 8:56 PM on May 10, 2008


« Older "Your comment appears to be blank, go back and try...   |   GAME OVER Newer »

You are not logged in, either login or create an account to post comments