Postus Interruptus . . .

Often times, obscure and non commercial websites are hosted on free but metered sites like Tripod and Geocities.
These simple sites make for great FPPs, except the instant surge in traffic puts the site over its hourly bandwidth quota. here's an example where the poster anticipated this and even cautioned everyone to stay away from the video link on the page for a day or two. But to no avail. . .
When the masses will foreseeably shut the site down, wouldn't it make more sense to link to the google cache of that site instead?

I'll embark on a little fantasy here:

One could conceive of a function built into MetaFilter where, if the URL being submitted contains a known bandwidth limiter ('', '', etc.), it could automatically run a wget (or comparable NT-based function) and pull down a mirror of the site. The link to said archive could be displayed at the end of the original post in the form of "[MetaFilter Archive]". The sites' addresses could be entered into a small db table, and a nightly cron (or comparable NT-based function) would see if they'd passed a 3-day expiration date, at which time the mirror would be deleted and the automatic archive link would stop appearing.

Like I said, a fantasy, and likely not pass the "would the feature benefit MeFi more than detriment it" litmus test, but it's functionally possible. An alternative would be for the poster to mirror the site themselves, though given the technological limitations of many, this would be unlikely.
Good ideas, this is a valid and important concern.

I'd also suggest that the poster could contact the site's author / administrator / person-in-charge and ask permission to mirror before doing so.

Just because some people might get annoyed about it. I personally don't think it would be a wrong thing to do, but the point is that the site's author might not agree with the benevolent mirrorer. Getting permission beforehand would be a prudent way to avoid conflict and stepping on toes.

I wonder about the legality of knowingly mirroring someone's content without their permission. At the very least, it might be considered impolite.

(And for what it's worth I think the Google cache and other wide-ranging web archives are a bit different.)
Doesn't Google's cache only cache the text and not images or files? That was my understanding.
This is a tricky one. I'd like to be able to mirror a page I wanted to point to, but it sounds like most usages without explicit permission would construe copyright breach. Under the Hague convention, copyright essentially subsists in any original work from the moment it's realised in a tangible medium. If anyone else publishes a copy of it, they have breached my copyright unless they asked permission first. Not that I'm worried about google or personally, but I can conceive of situations where someone might delete something and not want it to still be made available by others. Text is just as subject to copyright law as any other material.
mkelley is right. is a better method for mirroring image intensive sites.
mefi.replace( orig_txt => "construe", rep_txt => "constitute", comment => "2412#42993" );
Linking to a Google cache, or to is not a bad idea, especially if the poster realises the site may become unavailable (e.g. Geocities, or whatever). However, I think it's best that the primary link should always be to the original source, and then the link to the cache can be included as a courtesy by the poster, at their discretion:

Stupid website [Google cache]

Regarding caching sites locally, this suggestion has been brought up at Slashdot before, and their FAQ has something to say about it.
If mirroring a site sans express permission is an infringement, then how's google getting away with it??
However, I think it's best that the primary link should always be to the original source, and then the link to the cache can be included as a courtesy by the poster, at their discretion:

That's helpful for metafilter readers, but doesn't really do anything for the site owner. Metafilter may not be the author's intended audience, and they may not want their content to be hidden from their intended readers by a tripod bandwidth limit for a day or two just because MeFi'ers are having a good laugh about their spelling and use of caps lock. Arguments about the sensibility of putting important information on reliable servers aside, should MetaFilter extend the courtesy of asking permission to link these sites when the site will invariably be pulled under?
> then how's google getting away with it??

They know all the sordid details about every lawyer in the land.
I'm sure Google would argue the by merely caching pages, they aren't doing anything more than your local ISP. Google also respects the <META NAME="ROBOTS" CONTENT="NOARCHIVE"> and <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> tags, so you can prevent them from caching pages if you wish.

Astirling: It's not our responsibility to watch out for someone's bandwidth caps, it's the author's own "fault"* for going with a cheapo/free server. They can't know how many people will visit, and we can't know how much we'll impact the server. That's just life on the web. You post it, it's public, and sudden blasts of bandwidth are to be accepted, even if not expected.

*I'm not actually laying blame here, but I've been up a long time, and can't come up with a better word at the moment.
I agree that seeking the author's permission to mirror the page would be ideal, but I don't believe it would infringe upon their copyright to do so without. Google and have been mirroring sites locally for years (and the latter, as stated above, mirrors images as well.) These services respect robots.txt restrictions, as do many mirroring programs, including wget.

As specified in my original comment, the mirror would only exist for three days, ample time for the MeFi affect to wind down without presenting any of the problems mentioned in the Slashdot FAQ.

More than anything, I feel this would be a service to those site authors who keep sites on servers like Geocities or Tripod. If my site were about to disappear due to excessive bandwidth, I would certainly appreciate any attempt to relieve the traffic, particularly from a site whose mirroring policy exists to help me.

Again, this is merely a hypothetical solution, but it's also a very real possibility should someone choose to implement it. I'm tempted to do so myself (in a Debian/PHP/Perl/MySQL configuration) as I'm now running my own Web server with unlimited bandwidth.

(For any interested parties, wget for Windows.)
"FPP" makes me want to die.
I have a somewhat-related question. Does anyone else feel kind of uncomfortable when others link to the "print-only" (read: sans-advertising) version of an article?

(the related-ness, of course, is to linking to something other than the version of a site/article intended by its creators)

If we're going to be linking to content, the least we can do is link to the version that pays the bills. It seems disengenous to purposefully evade advertising in such a manner, and to me, speeds the day that content will all be locked behind paid registration.


