The guy at Suburban Limbo had a great idea for a BlogReader May 16, 2002 2:36 PM Subscribe
The guy at Suburban Limbo had a great idea for a BlogReader. In theory, you could enter the blogs you read daily (weekly, hourly, obsessively) and it would tell which ones had been updated. (More Inside)
Existing versions of (things like) this include blogtracker and blo.gs. wander-lust incorporates the headline-notion (which the ones above do not), but is geared more toward random surfing (like blogsnob).
posted by gleuschk at 2:41 PM on May 16, 2002
posted by gleuschk at 2:41 PM on May 16, 2002
There is also Bloglet which allows people to add email subscriptions to their site. Then every day you get an email from them with the update blogs. of course, all your favorite sites would have to use this.
posted by thebwit at 7:07 PM on May 16, 2002
posted by thebwit at 7:07 PM on May 16, 2002
Jerry Halstead is someone who's written his own, as well, which I covet.
posted by stavrosthewonderchicken at 7:55 PM on May 16, 2002
posted by stavrosthewonderchicken at 7:55 PM on May 16, 2002
Nifty. This is an idea we've thought about embedding in Scoop, as a kind of user service. So, having thought a lot about how to do this smartly, here's how I think it should be done.
It's a service, so you have an account. You enter a URL to watch. Could be any URL, just something you want to be notified of changes to. Behind the scenes, the application stores the URL, and fetches the page three or four times (maybe more, see below), over the course of the next ten minutes or so. The first time it just grabs the page. Each time thereafter, it compares the new grab with the old grab, using something like unix's diff() tool, or any library module that does the same thing (perl has several). We don't really care what's changed, we just care how much of a page changes every time you reload it. So there's any number of ways you can slice it, but one simple method would be to just count lines of diff() output.
In theory, the initial several grabs will establish a baseline "average constant change," which should account for ads and other kind of random fluctuations which would otherwise tend to produce piles of false positives. It would also be sexy to do some sanity checking on your diff outputs, and throw out any values that seem way out of whack (like maybe someone posts a new entry right in the middle of your sampling -- you wanna chuck that one). So the initial phase might have to be flexible as to how long it needs to take. Probably the best way would be to say "We need five samples that ended up within 5% of each other, and if we can't get that in 20 tries, then we declare the page unintelligible."
Anyway, from now on, you just grab the page at set intervals (like 6 hours or something), diff the current and the last grab, throw out the old grab, and see if your diff is some meaningful amount higher than the "baseline" change factor. For each page to watch, you'll be storing one copy of the page itself (the most recent), and some metadata like name, URL, baseline average change, perhaps an adjustable "alert factor" (i.e.: "when the diff is this much greater than the average, alert me"), that sort of thing. Not much data. If it's run as a service, you could also consolidate, like when someone requests a watch on a page, check if you already have it in the system. If so, just link their account to it. If you have something similar, ask if what you have is actually what they want (i.e. "You requested http://metafilter.com/. Will http://www.metafilter.com/ work just as well?").
You have the last copy, and also a diff of what has changed, so pretty presentation of the various info derived from those two things is left as an exercise for the reader.
Advantages to this method: You can watch anything. It relies solely on the actual page itself for its "updated" data (unlike Blogtracker). You can pull up your "watch list" from any web browser, assuming it's all being done on a server somewhere (unlike some browsers "watch list" feature). It doesn't rely on (notoriously lazy ;-) site administrators to do anything (unlike Bloglet). It also should be pretty damn smart about distinguishing between real updates and "pseudo-updates" like rotating ads or small constant changes, which are typical of a lot of sites that aren't exactly traditional blogs, but would be worth keeping tabs on. It should also prove to be pretty tweakable, if you put in some good hooks to add in other heuristics as they become necessary.
If anyone implements this in perl, please do it in modules, and let me know. I (almost) gurantee it'll get into Scoop. :-)
posted by rusty at 11:48 PM on May 16, 2002
It's a service, so you have an account. You enter a URL to watch. Could be any URL, just something you want to be notified of changes to. Behind the scenes, the application stores the URL, and fetches the page three or four times (maybe more, see below), over the course of the next ten minutes or so. The first time it just grabs the page. Each time thereafter, it compares the new grab with the old grab, using something like unix's diff() tool, or any library module that does the same thing (perl has several). We don't really care what's changed, we just care how much of a page changes every time you reload it. So there's any number of ways you can slice it, but one simple method would be to just count lines of diff() output.
In theory, the initial several grabs will establish a baseline "average constant change," which should account for ads and other kind of random fluctuations which would otherwise tend to produce piles of false positives. It would also be sexy to do some sanity checking on your diff outputs, and throw out any values that seem way out of whack (like maybe someone posts a new entry right in the middle of your sampling -- you wanna chuck that one). So the initial phase might have to be flexible as to how long it needs to take. Probably the best way would be to say "We need five samples that ended up within 5% of each other, and if we can't get that in 20 tries, then we declare the page unintelligible."
Anyway, from now on, you just grab the page at set intervals (like 6 hours or something), diff the current and the last grab, throw out the old grab, and see if your diff is some meaningful amount higher than the "baseline" change factor. For each page to watch, you'll be storing one copy of the page itself (the most recent), and some metadata like name, URL, baseline average change, perhaps an adjustable "alert factor" (i.e.: "when the diff is this much greater than the average, alert me"), that sort of thing. Not much data. If it's run as a service, you could also consolidate, like when someone requests a watch on a page, check if you already have it in the system. If so, just link their account to it. If you have something similar, ask if what you have is actually what they want (i.e. "You requested http://metafilter.com/. Will http://www.metafilter.com/ work just as well?").
You have the last copy, and also a diff of what has changed, so pretty presentation of the various info derived from those two things is left as an exercise for the reader.
Advantages to this method: You can watch anything. It relies solely on the actual page itself for its "updated" data (unlike Blogtracker). You can pull up your "watch list" from any web browser, assuming it's all being done on a server somewhere (unlike some browsers "watch list" feature). It doesn't rely on (notoriously lazy ;-) site administrators to do anything (unlike Bloglet). It also should be pretty damn smart about distinguishing between real updates and "pseudo-updates" like rotating ads or small constant changes, which are typical of a lot of sites that aren't exactly traditional blogs, but would be worth keeping tabs on. It should also prove to be pretty tweakable, if you put in some good hooks to add in other heuristics as they become necessary.
If anyone implements this in perl, please do it in modules, and let me know. I (almost) gurantee it'll get into Scoop. :-)
posted by rusty at 11:48 PM on May 16, 2002
Rusty: The once-free Netmind which has apparently evolved into the not-free Mind-It, I think performs many of the functions you describe. Or is there a subtlety I missed?
posted by vacapinta at 12:38 AM on May 17, 2002
posted by vacapinta at 12:38 AM on May 17, 2002
why is this so much better than just using IE's subscription monitor..?
posted by patricking at 12:47 AM on May 17, 2002
posted by patricking at 12:47 AM on May 17, 2002
I can't believe in my other post I forgot to mention Blog Rolling. Take that, combined with a php helper PHP Blog Roll, you can see who has and has not updated.
posted by thebwit at 6:23 AM on May 17, 2002
posted by thebwit at 6:23 AM on May 17, 2002
you might check out fyuze. it allows you to build lists of sites you read (such as mefi, for example) lay them out on the page and then shows you the latest headlines from each site. it uses RSS/RDF to gather all the data and is connected to syndic8 to discover new sources (work on this has part has just started) go there, create a free account and check it out. it has some other nifty features too, but the basic gist is to make it easy to monitor/skim a large number of sites without having to visit each one.
posted by ikarus at 8:46 AM on May 17, 2002
posted by ikarus at 8:46 AM on May 17, 2002
why is this so much better than just using IE's subscription monitor..?
because it has no ulterior motive.
posted by quonsar at 9:51 AM on May 17, 2002
because it has no ulterior motive.
posted by quonsar at 9:51 AM on May 17, 2002
patricking: Because it doesn't live on your desktop. You could use a service like this from any browser anywhere. If you only ever use one machine, it probably doesn't make any difference. But most people at least have a home computer and a work computer. You could use it from either one. How well does IE's thing work, by the way?
vacapinta: From my brief skim, it looks like Mind-it does do roughly what I described. I figured someone else had already invented it. :-)
Actually, reading their FAQ, it might not be so smart about determining when something's changed:
Anyhow, depending on what you want, there are lots of ways to do this. I just thought I'd put my design out there in case anyone did decide to code it.
posted by rusty at 10:22 AM on May 17, 2002
vacapinta: From my brief skim, it looks like Mind-it does do roughly what I described. I figured someone else had already invented it. :-)
Actually, reading their FAQ, it might not be so smart about determining when something's changed:
Why did Mind-it send me a notice about a change when I can't see one on the page?And patents? Ick. I hope they don't have patents on using diff() to compare two page states.
Mind-it uses a number of Pumatech patents and trade secrets to try to detect only relevant changes. Nonetheless, some unimportant page changes can get through our screening process every once in awhile.
Anyhow, depending on what you want, there are lots of ways to do this. I just thought I'd put my design out there in case anyone did decide to code it.
posted by rusty at 10:22 AM on May 17, 2002
I've always wanted to develop an app that periodically checks for site updates and subsequently ICQs (or AIMs or whatever) the user a list of sites that've changed since the previous check. This would effectively convert the system from pull to push, and would prevent the user from having to unnecessarily access a non-updated page, which would conserve system resources.
posted by Danelope at 11:04 AM on May 17, 2002
posted by Danelope at 11:04 AM on May 17, 2002
I think Danelope is talking more along the lines of what this guy wants. He sent me an e-mail saying:
"Thanks for mentioning my BlogReader idea at Metafilter. I followed the
links others offered and I always found the same things - services
geared toward blog creators publicizing their sites. Engineer types
tend to dismiss my idea with "That's already being done" (Dave from
Scrippting News said it was a built in part of Radio). But it's not.
The difference is I envision something that user-oriented, not oriented
to the blog creator.
My ideal is something that requires no special coding, RSS, or anything
else insert by the blog creator on their site. In fact, the blog creator
should have nothing to do with this. I just want a web page that allows
me (the user) to insert the blogs I want tracked and then provides a web
page where I can see which of those blogs have been updated.
It seems so simple, but the first barrier is getting people to see that
a list of 5,000 blogs updated in the past 3 hours is not what I'm
talking about. "
It seems like BlogTracker can do this, but only with sites that are registered with them, which many, many blogs aren't.
Dan - if you've got the technology....then do it!
posted by Ufez Jones at 11:34 AM on May 17, 2002
"Thanks for mentioning my BlogReader idea at Metafilter. I followed the
links others offered and I always found the same things - services
geared toward blog creators publicizing their sites. Engineer types
tend to dismiss my idea with "That's already being done" (Dave from
Scrippting News said it was a built in part of Radio). But it's not.
The difference is I envision something that user-oriented, not oriented
to the blog creator.
My ideal is something that requires no special coding, RSS, or anything
else insert by the blog creator on their site. In fact, the blog creator
should have nothing to do with this. I just want a web page that allows
me (the user) to insert the blogs I want tracked and then provides a web
page where I can see which of those blogs have been updated.
It seems so simple, but the first barrier is getting people to see that
a list of 5,000 blogs updated in the past 3 hours is not what I'm
talking about. "
It seems like BlogTracker can do this, but only with sites that are registered with them, which many, many blogs aren't.
Dan - if you've got the technology....then do it!
posted by Ufez Jones at 11:34 AM on May 17, 2002
Well, back when Dave re-jiggered weblogs.com (which is the source database for blogtracker, not to be overlooked), I suggested that the service could go beyond the user-submitted updates. Right now, the blog writer -- or somebody -- has to notify weblogs.com to check for updates, via the ping site form. I thought the obvious next step would be a democratization of the updates so that various sources could notify weblogs.com of a list of updated blogs (and Ev, for example, has admitted that the Blogger server application should one day be able to do this) -- and if you go that far you may as well go all the way to a desktop application being able to do the same thing. At that level you're only one step up from something like IE subscriptions -- but you get the community benefit as well.
The parts are out there to work with or alongside weblogs.com to provide an improved service, if someone really thinks about what these presently do and what they could do better. I'd like to see a proposal that's aware of the present toolset -- which this one doesn't seem to have been -- and tries to improve on it. On its face, I agree -- there's little reason that IE subscriptions, or Spyonit, couldn't fill the bill of this relatively simple idea.
posted by dhartung at 4:20 PM on May 19, 2002
The parts are out there to work with or alongside weblogs.com to provide an improved service, if someone really thinks about what these presently do and what they could do better. I'd like to see a proposal that's aware of the present toolset -- which this one doesn't seem to have been -- and tries to improve on it. On its face, I agree -- there's little reason that IE subscriptions, or Spyonit, couldn't fill the bill of this relatively simple idea.
posted by dhartung at 4:20 PM on May 19, 2002
You are not logged in, either login or create an account to post comments
posted by Ufez Jones at 2:36 PM on May 16, 2002