Automated Coral Cache function? November 24, 2007 11:55 AM   Subscribe

Feature request: an automatic Coral Content Distribution Network function for front page posts.

Inspired by this comment by Kadin2048.

Methinks it should be pretty easy to implement. We include a checkbox on the posting page that says something like "Use Coral CDN to cache all links? (In case of heavy server load.)" (or something similarly descriptive).

Then, the script reworks the links in the post to include the secondary CDN links as a tag or note after each of the original links.

For example "HURF DURF BUTTERCAT SCANNERS" becomes "HURF DURF BUTTERCAT SCANNERS [CDN]

Or something similar. Ideally, it'd be a "function on preview", allowing the poster to tailor the CDN links before posting. It should function on all HREF calls within the post, whether it's in the title link field or in the description field.

Coral CDN has been around for ages, now, and is a stable and mature technology. It doesn't seem to be vanishing any time soon.

Digg has diggmirror, slashdot has mirrordot and others. What about us?
posted by loquacious to Feature Requests at 11:55 AM (30 comments total) 2 users marked this as a favorite

Although, for old posts, it'd probably be better not to coralize the links. It'd be nice if the coralization could be turned on and off after the fact, and was by default on for, say, 1 week after posting.
posted by hattifattener at 12:22 PM on November 24, 2007


It would be quite ugly to have a [CDN] after every single link. How about a small [CDN] link in the "posted by" part, next to the timestamp, which when clicked would use JavaScript to replace each HREF in the post with the Coral equivalent?
posted by matthewr at 12:24 PM on November 24, 2007 [1 favorite]


If this would preserve links in the archives, I'm all for it. Finding old dead links is a bummer.
posted by cgc373 at 12:30 PM on November 24, 2007


This seems like a solution in search of a problem. What's wrong with things the way they are now?
posted by Steven C. Den Beste at 1:16 PM on November 24, 2007


Surely there's a bookmarklet or Greasemonkey script that people could use instead of cluttering up Metafilter posts with this. Just think about how it would look on posts where[CDN] multiple[CDN] words[CDN] in[CDN] a[CDN] row[CDN] are[CDN] links[CDN].
posted by ckolderup at 1:27 PM on November 24, 2007


(Upon a quick google search, it looks like these would do the trick.)
posted by ckolderup at 1:29 PM on November 24, 2007


What's wrong with things the way they are now?

Nothing is outright wrong with the way things are now, but here's why I think using Coral CDN is right for many posts:

We have a known habit of overloading smaller servers, which makes the link(s) unusable by some users.

Also, since many posts here are to unique, unusual or DIY projects - often run on smaller homebuilt servers, or cheaper shared hosts - bandwidth costs can be a real issue - above and beyond overloaded servers or hosts. It's a nice thing to do for such folks, as a sudden leap in bandwidth costs can actually ding them financially pretty heavily. Using the Coral CDN will defray these costs substantially without unduly reducing or harming presentation or exposure.

Because of the way Coral CDN works, it doesn't matter if the "cache" link is new or old. As long as the original links and source files exist - Coral CDN will re-cache the files on an as-needed basis.

Automating the usage of CCDN and making it a no-brainer for folks would be a nifty and useful feature.


However, having it be an active script as matthewr suggests that renders the CCDN links on the fly, as needed, might be the best way to do it. But as I outlined above - putting an expiration date on the CCDN links isn't required due to the flexible, self-healing nature of CCDN itself.
posted by loquacious at 1:29 PM on November 24, 2007


Just think about how it would look on posts where[CDN] multiple[CDN] words[CDN] in[CDN] a[CDN] row[CDN] are[CDN] links[CDN].

Yeah, stylistically that's not so good. A single, small link that loads all of the links in the FPP as CDN links when clicked in the "posted by" line would probably be much better, or something like it.

Or, it could be placed below the "tags" sidebar inside the post itself, perhaps worded something like "Broken links? Try here!" - however, the key to all of this would be to have the CCDN links loaded at least once upon posting to initiate the cacheing process before the post actually hits the main page.

If the links aren't cached at least once before the actual server gets hammered, it's a moot issue.
posted by loquacious at 1:35 PM on November 24, 2007


I support the implementation of Coral cache.

How about this? I use the CacheIt! extension. It adds coral cache, google cache and wayback machine to everything. Why not add all those to the links?
posted by puke & cry at 1:44 PM on November 24, 2007


I think I've suggested this here myself. Obviously it hasn't happened.

The "ugly" part could be taken care of by some whizzy DHTML. Suppose that links with a coralized equivalent were rendered in a different color to signal "hey, there's a coralized version of me". Clicking on them took you directly to the original link, but hovering for a second would cause the coralized link to appear in a floating layer, which you could then click on (this is sorta how flickr renders avatar icons--they're directly clickable, but hovering reveals an arrow for a menu). This approach could be extended to point to the google cache, archive.org cache, etc, for the same link.
posted by adamrice at 1:48 PM on November 24, 2007


Speaking as someone who owns his own server, and has had it overloaded on occasion when linked by a big site, I resent those kinds of caching servers.

If I write something good, and if a lot of people are motivated to read it, I want to know. If hits go to the caching server instead of to mine, I'll never find out about it. When you're working hard to produce material, and giving it away, the only emotional reward you get is knowing it's being read. Caching servers steal that from you.

I'd rather have my server overloaded on occasion.
posted by Steven C. Den Beste at 2:08 PM on November 24, 2007


Who cares what site owners want? With all due respect, SCDB, Metafilter policy should be made in Metafilter's best interests. Site owners' 'emotional rewards' aren't really our problem.
posted by matthewr at 2:19 PM on November 24, 2007


I actually went ahead and wrote the script that Kaden2048 asked about in the comment -- the one that goes ahead and primes the coral caches with links from Mefi (so if a server does get overloaded, at least there's a mirror in coral). It's running against the RSS feed.

There are a number of plugins that can help you use Coral if/when you want to. Personally, I think this sort of thing should be left up to the user, rather than doing it server-side.
posted by toxic at 2:23 PM on November 24, 2007


Matthew, it's in Metafilter's best interest that people be motivated to produce cool content for the web.
posted by Steven C. Den Beste at 2:43 PM on November 24, 2007


I just wrote a little greasemonkey thingy to rewrite all the links on a Metafilter page with their CoralCDN equivalents, and it seems to work fine. I think I want to make it not coralize links from older posts, though, since if a server is not overloaded, then going through the coral cache is noticeably slower.
posted by hattifattener at 2:45 PM on November 24, 2007


Steven, Metafilter's existence has a negligible effect on the amount of cool stuff produced by other people on the web. The number of owners of posted sites who have even heard of Metafilter is minuscule, and the number who would not have created their site if not for Metafilter is probably zero.
posted by matthewr at 2:52 PM on November 24, 2007


...the number who would not have created their site if not for Metafilter is probably zero.

Well, it's at least one. I created mine because of Metafilter.
posted by Steven C. Den Beste at 3:44 PM on November 24, 2007


The CDN link could be stored in the database when the post is made and only made active if a large number of users flag the post as "unavailable" or "site down".
posted by null terminated at 4:14 PM on November 24, 2007


On principle, I am against this idea. What if, for instance, a site becomes unavailable because the owner decides to remove a portion of it? This is wresting too much control away from the owner of the content and, while this is not a typical use case scenario, the widespread caching of sites assumes that once your work is popular enough to get on Metafilter, it no longer belongs to you.

Links die, and often come back. Or not. We're not inherently entitled to look at something forever just because it existed at some point.
posted by dhammond at 4:54 PM on November 24, 2007


There isn't an obvious win here; the chief advantage would be when the site in question is on a bandwidth-capped server like AngelFire or something way underspecified and likely to be overwhelmed by Metafilter-level traffic. But the downsides are as above; a lot of people would like the traffic and are in a position to handle it. Also, I have to wonder if ad providers will properly account and pay for the view when the embedder isn't the site but the coral cache. Coral also doesn't solve the problem of sites that are unresponsive because of slow embedded content servers like, oh, fmpub.net, just to name a random example, since it doesn't do URL rewriting within the cached page, and doesn't cache anything that's linked via an absolute URL.
posted by George_Spiggott at 5:40 PM on November 24, 2007


Does Coral respect robots.txt?
posted by monju_bosatsu at 5:58 PM on November 24, 2007


robots.txt is for spiders, where as Coral is a proxy/mirroring service. It does not "respect" robots.txt, because it doesn't apply.

Coral does, however, respect the Expires: header, as well as Pragma: no-cache and the Cache-control: header (you know, the ones that are aimed at caches and proxies). If the content owner wants something to not stay in Coral, they can do so.

By default, it keeps content for 12 hours. We're not talking about archive.org here.
posted by toxic at 8:13 PM on November 24, 2007


Wow, and I was just taking the piss there for a minute...

I've seen the to-cache-or-not-to-cache argument go around a few times on other sites (cf. Slashdot's FAQ); I am personally in the pro-cache-links camp as long as the caching system respects robots.txt.

But IMO the real problem occurs even before you get to the decision of whether to put the cache links on the page: it's getting the caching system 'primed' before the site goes down to begin with. All it takes is one person to do it, so toxic's greasemonkey script basically solves the problem, but optimally I'd imagine it being part of the submission process.

Having [CDN] links appear on hover would be icing on the cake; even better if they were somehow triggered automagically if the site was flagged as down somehow. Actually, I guess the ultimate thing would be if flagging the post as "server down" or similar actually triggered the coralization of the links on the page, rather than having a separate button...

But I've probably done enough suggesting-without-implementing for one day.
posted by Kadin2048 at 8:38 PM on November 24, 2007


Durh ... and by "respects robots.txt" what I really meant was "respects Cache-Control" (RFC 2616). I really should have known better; toxic is quite right. Also, I will preview.
posted by Kadin2048 at 8:41 PM on November 24, 2007


Tell me more about these buttercat scanners, hurf durf.
posted by XMLicious at 10:30 PM on November 24, 2007


Who has their servers go down these days anyway? I mean, I can get a 15mbit connection from the cable company in the middle of Iowa. That's the downstream, but I imagine the up is similar. I mean, unless you get linked on the MSN homepage, or maybe digg, or unless maybe you're a key developer of Lotus Notes, who goes down in this day and age?
posted by delmoi at 10:37 PM on November 24, 2007


but I imagine the up is similar.

*bzzzt* Wrong. Very wrong. Good luck getting over 768kbps up on any non-FIOS residential connection. Small business lines with semi-decent upload speeds are MUCH more expensive. Speakeasy will encourage you to run servers, sure, but they can't give you a lot of upload bandwidth to work with. I speak with the experience of having run personal webservers on just about every major US residential ISP - using both residential and small-biz lines - due to 10 apartments in 8 years.

It's precisely the people hosting neat personal projects on residential servers*, or ultra-budget/free hosting that we should be caching for. Anybody else, we're depriving of ad revenue.

Actually implementing this is totally a time-cost/benefit ratio judgement call. Matt's track record when making ROI calls, and the relative success of those calls (several features/sections of the site are very niche), are both so completely random I wouldn't even presume to guess or suggest whether it's worth it.

*This is pretty rare in these modern days of cheap Dreamhost ZOMG levels of hosted site bandwidth, but it still happens and we should be caching for these people.
posted by Ryvar at 6:29 AM on November 25, 2007


p&c: Thanks for the link to that plugin. I'm using it now and it's pretty cool.
posted by philomathoholic at 1:52 PM on November 25, 2007


Since when did we overload servers? Someone linked to my blog earlier in the year in the blue, and I got exactly 343 hits only. Obviously some FPPs must be more "must clickable", but we're no slashdot or digg.
posted by roofus at 3:03 PM on November 25, 2007


FWIW, here is the script I put together. It coralizes links on the front page and on any post within the last few days; it also adds a "coralize/decoralize" dhtml button to the post itself.

After using this for a while, I think the best way to do things would be for the poster to flag any possibly-Coral-needing links when submitting the post, and for MeFi to coralize those links for, say, as long as the post is on the front page. For big, well-connected sites, going through the coral cache is more of a pain than not, and there are too many such sites to have an exclusion list in the script (though I do have a small one there). For posts that are older and presumably not getting as much traffic, no links should be coralized.
posted by hattifattener at 4:09 PM on November 25, 2007


« Older Your thoughts on current and future changes to...   |   Brother, can you spare some well-wishes? Newer »

This thread is closed to new comments.