There seems to be a problem with the RSS Feed November 16, 2002 10:08 AM   Subscribe

There seems to be a problem with the RSS Feed: XML Parsing Error: not well-formed
posted by Steve_at_Linnwood to Bugs at 10:08 AM (11 comments total)

It says this when you attempt to read the RSS:


XML Parsing Error: not well-formed
Location: http://xml.metafilter.com/rss.xml
Line Number 117, Column 15:

<title>Art & Physics</title>
--------------^
posted by Steve_at_Linnwood at 10:10 AM on November 16, 2002


According to Mac IE the problem is line 107 char 371. Looks like a foreign language character (or whatever you call it) lurking in the title of the "Maroons" link. Why I bothered to find that out I don't know.
posted by richardm at 11:07 AM on November 16, 2002


Happens all the time with human edited content. Half my Amphetadesk channels error out now and then (the Wired feeds seem to be the worst, almost always one of them fails). Like richardm said, it's just an illegal character issue; once the post with the funky (to XML) character scrolls off the feed, all is well in rssLand.
posted by Doktor at 2:10 PM on November 16, 2002


so... rss is essentially useless then, huh?
posted by quonsar at 4:07 PM on November 16, 2002


so... rss is essentially useless then, huh?

Strictly speaking, I believe that this behavior is due to the required behavior for any XML parser. If something isn't right with the XML document, the parser is supposed to fail (unlike many HTML parsers which just improvise, which leads to differences in browser behavior, which leads to people exploiting specific browser behaviors, which leads to HTML nightmares). If every XML parser only will parse legal XML documents, then (in theory) people will start making sure that their documents are always well formed.
posted by iceberg273 at 1:03 PM on November 17, 2002


I think the problem is that you cannot have foreign language characters (or special characters or whatever) in XML documents (even in CDATA tags), you have to convert them to &whatever.
posted by richardm at 1:17 PM on November 17, 2002


Not only that, but you can only use the numerical equivalents (e.g., &#8212;), — using the names of characters like &mdash; breaks it.
posted by gleuschk at 1:30 PM on November 17, 2002


If every XML parser only will parse legal XML documents, then (in theory) people will start making sure that their documents are always well formed.
ah! so if every xml parser will only accept an idealized reality, then (in theory) reality will alter itself. cool. [holding breath] uh, when's this gonna happen?
posted by quonsar at 1:37 PM on November 17, 2002

ah! so if every xml parser will only accept an idealized reality, then (in theory) reality will alter itself.
I think that's putting the blame in the wrong place. It's valid to submit "&mdash; when the intended use is HTML. It's also valid to submit "345x343" when the intended use is GoldenEye, apparently. The problem is expecting whatever a person types to be valid HTML and XML at the same time. As it doesn't happen in real life you need to translate valid and poorly written HTML into XML, which is difficult, so no one bothers. Often programmers think it's only a few characters that need replacing because that's all the xml processor has a chance to complain about, but then their input will contain incorrectly nested tags or unquoted variables, and you end up writing HTML Tidy all over again.

But it's the assumption that HTML will be XML too that's at fault, not the XML processor.
posted by holloway at 6:20 PM on November 17, 2002


Good point.
posted by timeistight at 8:28 PM on November 17, 2002


If XML will only parse legal documents, then only outlaws will have guns! Uh..wrong thread...
posted by mecran01 at 10:35 AM on December 5, 2002


« Older What's the difference between Post Title and Link...   |   Is this really a worthwhile contribution? Newer »

You are not logged in, either login or create an account to post comments