MeFi RSS feed is invalid September 8, 2011 11:41 PM Subscribe

My RSS reader says the MetaFilter RSS feed is invalid. I'd really like to use it because I've become RSS oriented and I'm not checking MeFi. It could be a problem with the RSS reader, Thunderbird 2.0.24 (I know there are other readers, but TB uniquely meets my needs), but the MeFi feed has worked previously and the AskMefi RSS feed works currently. I've tried the feeds coming from the feed.feedburner.com host and the one from feed2.. Thanks in advance.
posted by guanxi to Bugs at 11:41 PM (21 comments total)

It looks like there's a weird character in the Canada telecom post that's not encoded correctly. Specifically, the apostrophe (or right before the apostrophe) in "CBC's Spark podcast..."
posted by kmz at 11:56 PM on September 8, 2011

Yeah I've fixed that but we may need to wait for the feed to update.
posted by jessamyn (staff) at 12:01 AM on September 9, 2011 [1 favorite]

Still two weird characters in there, at least one of which is breaking it.

That's one reason CBC^]'s <cite>Spark</cite> podcast <a href="http://www.cbc.ca/spark/podcasts/" title="'Spark'#0160;podcasts
posted by sfenders at 5:14 AM on September 9, 2011

ok, all fixed up. There was an invisible control character (ASCII 29, Group Separator) in the text that RSS feeds hate and that we weren't yet filtering out. Should be all set now going forward.
posted by pb (staff) at 8:27 AM on September 9, 2011

"There was an invisible control character in the text."

Sounds like a phrase from a Borges story.
posted by Kattullus at 8:46 AM on September 9, 2011 [3 favorites]

It works! If only everyone I deal with today would be as responsive and capable ... Many thanks.
posted by guanxi at 9:29 AM on September 9, 2011

I now know how to ping the RSS feed to get it to reload on feedburner, so we've even fixed the metaproblem going forward.
posted by jessamyn (staff) at 9:35 AM on September 9, 2011

YAY PROBLEMS
posted by The Devil Tesla at 10:35 AM on September 9, 2011

The real problem is writers of XML parsers deciding that the right thing to do is just give up and return an error when they encounter an illegal character. Maybe that is what the standard demands, but it's wrong. Just replace it with a '.' or something and move on, you stupid software; or at least provide the option, which 9 out of 10 applications would probably prefer.

That's what lead to the tagsoup and other bullshit the web has always been full of.

(Opera does actually have an option to "reparse as HTML" when it encounters broken XML, but I can totally understand why most feed readers don't.)
posted by kmz at 10:59 AM on September 9, 2011 [1 favorite]

Hardly tag soup. I'm all in favour of being strict with tags, and would be happy with a big loud error message inserted in the output instead of one of my rss feeds totally failing every other week.
posted by sfenders at 12:04 PM on September 9, 2011

guanxi: "Thunderbird 2.0.24"

Thunderbird 5 or maybe even 6 is out now. Have you tried upgrading?
posted by IndigoRain at 8:21 PM on September 9, 2011

Yikes, I should have reloaded the thread. Carry on.
posted by IndigoRain at 8:21 PM on September 9, 2011

Yeah, I've tried the "it's the RSS feed!" argument on a mailing list discussing my software:

Him: "Your software sucks! The Blah feed won't work!"

Me: "That's because the feed is broken."

Him: "But it works in Another RSS Reader! And Yet Another RSS Reader! And..."

Me: Oh good, now I get to write my own XML parser. Which won't give up when it reaches a character that offends its delicate sensibilities. More work for me.

Grumble, grumble...
posted by alasdair at 5:25 AM on September 10, 2011

The real problem is writers of XML parsers deciding that the right thing to do is just give up and return an error when they encounter an illegal character.

This attitude is a big part of the reason that so much software sucks.

I don't know a damned thing about RSS, so someone please correct me if I'm wrong here. I assume that there's a specification for the format of the stream. Sounds like it has to be valid XML. Well, then, guess what? It should damn well be valid. fucking. XML.

Adhering to precise specifications is one way to control software complexity. And controlling and limiting complexity is the main issue in creating correct, reliable, and maintainable software.
posted by Crabby Appleton at 6:39 PM on September 10, 2011 [1 favorite]

Sure it has some resemblance to attitudes that have made some software go wrong. It is also in line with one of the basic principles that has made the Internet great: "Be lenient in what you accept, strict in what you send." See for example the popularity of HTTP, where this rule of leniency is relied upon often, with very little ill effect. On web search I see the w3c explicitly recommends it, for instance here. Has MIME become obsolete and unreliable as a result?

I think my comment is on the right side of the fine line between the two ideas, and if the XML spec really does mandate that when you encounter invalid characters your parser has to throw up its hands in defeat and discard the entire document, then XML isn't.

Besides which, "It should damn well be valid. fucking. XML." is not valid english syntax, and therefore I could not read any of your comment.
posted by sfenders at 4:18 AM on September 11, 2011

Ah, here it is, a bit of Internet history...

What Nick's referring to here is half of the "Robustness Principle",
presented by Jon Postel in RFC 793 (STD 7, Transmission Control Protocol):

TCP implementations will follow a general principle of robustness:
be conservative in what you do, be liberal in what you accept from
others. (2.10)

That comment clarifies the point that the right thing to do is NOT to simply accept invalid characters and treat them as if they were valid. That would indeed lead to security problems and other bad things. I maintain that they should be discarded and some kind of warning generated for those applications which, unlike RSS readers, have reason to care. In the case of that MIME parser interface, it already has a handy mechanism in the form of MimeHeaderHolder notify functions by which it could and should provide notifications for the caller of invalid headers that were dropped.

XML 1.0 (Fifth Edition), W3C:

Conforming XML processors fall into two classes: validating and non-validating.

Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.
...
Note that when processing invalid documents with a non-validating processor the application may not be presented with consistent information. For example, several requirements for uniqueness within the document may not be met, including more than one element with the same id, duplicate declarations of elements or notations with the same name, etc. In these cases the behavior of the parser with respect to reporting such information to the application is undefined.

posted by sfenders at 5:11 AM on September 11, 2011

Right, well... regarding the XML rules I linked to, it comes to my attention that despite not mentioning it in the section titled "Conformance" which so briefly discusses that very thing which one would expect to be its focus, "well-formedness constraints" are elsewhere (i.e. in section 1.2 "Terminology") said to be "fatal errors" which MUST NOT allow for the continuing of normal processing. Things like this are one reason why so many people hate XML.
posted by sfenders at 8:07 AM on September 11, 2011

But wait! The excitement continues! I thought I'd seen something saying that illegal characters made it not well-formed, but that was just for "character references". So it's not, far as I can tell, a fatal error to have invalid literal characters.
posted by sfenders at 9:51 AM on September 11, 2011

"XML processors MUST accept any character in the range specified for Char. ... any Unicode character, excluding the surrogate blocks, FFFE, and FFFF
...
"It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. "

Control characters are perfectly legal UTF-8 which is the character encoding specified, so parsers that broke on that one are in fact in violation of the rules and should be punished.

posted by sfenders at 10:15 AM on September 11, 2011

Of course pb meant 29 decimal, not hex, so although it's valid UTF8 I guess it's in the range that the parser isn't actually required to accept. But it's also not required to fall over and die.
posted by sfenders at 10:34 AM on September 11, 2011

« Older Why, cortex, why?? | Tennis Ace Isner Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

MeFi RSS feed is invalid September 8, 2011 11:41 PM Subscribe

Tags

Share