XML strips out HTML including links January 4, 2002 3:26 PM Subscribe

Hi. I'm not sure if this is a bug or a feature request... Why does the XML feed at http://xml.metafilter.com/ strip out any HTML (namely links) out of the nodeValue of the body elements? Why does the feed at http://www.metafilter.com/metafilter.xml provide a synopsis and a link to more while the previously mentioned feed shows the complete text (it also strips out HTML too)? Boy, it'd be nice to implement some XML-RPC functions for common tasks (posting, logging in, etc.) and spice up the XML feed so that a developer could build a different client for MeFi...
posted by internook to Feature Requests at 3:26 PM (19 comments total)

Why does the XML feed at http://xml.metafilter.com/ strip out any HTML (namely links) out of the nodeValue of the body elements?

All previous experiments with leaving HTML in produced regular errors in XML processors.

Why does the feed at http://www.metafilter.com/metafilter.xml provide a synopsis and a link to more while the previously mentioned feed shows the complete text (it also strips out HTML too)?

Since I don't have clearly defined RSS elements like a predictable URL, description and short summary, I can't provide much of an RSS feed, so I hacked together something that allows syndication of the posts, and points back at the comments on the site.

Boy, it'd be nice to implement some XML-RPC functions for common tasks (posting, logging in, etc.) and spice up the XML feed so that a developer could build a different client for MeFi...

I currently am experiencing server processing issues, and my time is finite, and I don't see how building a fully fledged xml-rpc interface to metafilter helps either.

Actually, it's an interesting question, what benefits does a site/service owner get from opening up an xml-rpc interface? I would argue the potential benefit for users is high, as they get a new variety of interfaces presented to them, but I would venture to guess it's more processing for the service owner and could potentially create security and control problems.
posted by mathowie (staff) at 3:50 PM on January 4, 2002

oops, actually that second link was a test file, when I was still working on getting the xml working, and I removed it.

I thought you were referring to this file:
http://xml.metafilter.com/rss.xml

because I've gotten a lot of feedback on it, mostly about how it's not a good RSS feed.
posted by mathowie (staff) at 4:14 PM on January 4, 2002

Speaking of which, this here thread is killing the RSS feed right now.
posted by scottandrew at 4:21 PM on January 4, 2002

Don't blame me, blame the nordic people that came up with the high-ascii umlaut entity.
posted by mathowie (staff) at 4:24 PM on January 4, 2002

Yeah. The very same thread is murdering the XML parser I use. Kablooie, the thing pukes on that umlaut. Perhaps I'm idealistic, but considering the current state of MeFi-essentially bandwidth saturation, and lack of developer (you) resources- it seems to me that the MeFi server should be serving XML or RSS data only. Some enterprising developers could implement a client, or series of clients to do all of the stuff that could and should be moved off of your beaten server and onto the client. This would only benefit you if you turned on the XML-RPC interface and killed the old metafilter as it is known today.

I was about to ask if you had considered inviting some developers in to asssit you with optimization/feature requests, but I see that there has been about zero interest in this thread and around 25 posts to the "what is Jason Kottke wearing" thread, so I guess people really don't care all that much.

posted by internook at 4:12 AM on January 5, 2002

All previous experiments with leaving HTML in produced regular errors in XML processors.

On my site, I've been putting HTML inside an RSS feed by putting it inside a <![CDATA[ block. I'm not sure if it's valid by XML or RSS standards, but it looks nice in AmphetaDesk.

Actually, it's an interesting question, what benefits does a site/service owner get from opening up an xml-rpc interface?

If client software could use XML-RPC to only ask for new front-page posts and comments, it could reduce the server load. How much of the processing and bandwidth here is consumed by the delivery of posts and comments the user has already received one or more times?

Also, XML-RPC could enable several MetaFilter mirrors on different servers, exchanging new posts and comments with each other to keep each one up-to-date.
posted by rcade at 11:16 AM on January 5, 2002

If client software could use XML-RPC to only ask for new front-page posts and comments, it could reduce the server load.

I don't see how this would help anything, since all the data is here. Allowing for anyone to create a metafilter interface on a remote server may reduce the number of people at metafilter.com, but it still means a call to the data to get the posts.
posted by mathowie (staff) at 2:38 PM on January 5, 2002

I don't see how this would help anything, since all the data is here.

On the Web, by requesting a page like this one, I'm asking MetaFilter to send me all the comments in a thread *plus* the new ones I haven't seen yet. I've probably received your first comment 10 different times.

If I could use a client that stored everything I had seen already and only requested new stuff (like a Usenet reader), the client would use a lot less bandwidth than a browser.
posted by rcade at 3:41 PM on January 5, 2002

Not to mention that someone else could add functionality (like thread tracking) a priori, and then send a streamlined request to MeFi for only the data it needs.

The client handles the presentation. Code on the client works as a middle tier to request a minimal set of data from the DB (via XML-RPC calls to the MeFi datastore).

You could get really selective about what requests are received and the format of the datasets returned to cut your bandwidth to a fraction of what it is with the current implementation.

Just a thought... hey- is that Jason Kottke over there! Look! What were we talking about...? Oh, nevermind.
posted by internook at 4:15 PM on January 5, 2002

Forumzilla is a usenet-like interface for websites that does most of what's been asked (caching comments, stories). It doesn't fit the mefi model though, as usenet apps avoid server-load by using light-weight comment title lists, and caching individual comments. Mefi has blobs for each comment, and no smaller lightweight way of glancing at an a list of comments, I think.

phpbuilder article on http caching
posted by holloway at 9:45 PM on January 5, 2002

If you don't build a better Metafilter Matt, someone else will.
posted by internook at 3:03 AM on January 6, 2002

When, though? The better MetaFilter I've been planning in my head isn't due to be completed until calendar year 2020.
posted by rcade at 7:20 AM on January 6, 2002

internook: internook.com, HTTP/1.1 New Application Failed

rcade: two weeks, phpilfer@holloway.co.nz

(the motivation guy says to tell everyone when you're going to do something so that you'll do it out of fearing shame. In the next two weeks I'll be testing that theory)
posted by holloway at 12:06 PM on January 6, 2002

PHPilfer 0.1.
posted by holloway at 11:36 PM on January 20, 2002

PHPilfer 0.11
-email validation of accounts,
-fixed randomness of session password,
-misc fixes.
posted by holloway at 6:49 PM on January 27, 2002

"Mmmm... them good misc fixes."
posted by holloway at 7:44 PM on January 27, 2002

No release this week. MySQL is being silly again and I wasted Sunday trying to understand the problem. What number is larger than 20020202000000 and smaller than 20020203000000? Why it's 20020203180122, of course.
posted by holloway at 11:51 AM on February 4, 2002

Phpilfer 0.1 (and the next version number won't be the same)
posted by holloway at 5:28 PM on May 4, 2002

Phpilfer 0.12
posted by holloway at 1:41 PM on May 22, 2002

« Older To Index or Not? | Is anyone else seeing random font changes? Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

XML strips out HTML including links January 4, 2002 3:26 PM Subscribe

Tags

Share