It was there. Now it isn't. May 9, 2011 7:07 AM Subscribe

First I read the thread about the names at the 9/11 memorial. Great article. Then I see a "related post" at the end of that thread and it brings me to a thread about controversial art one year after 9/11. And when I try to follow that thread... the memory hole has gobbled everything...

One of the truly great things about MeFi is its snapshots of time. I wanted to go back to 2002, for a moment, because of the great article I read today. And when I did, there were a bunch of dead links to something quite notable.

I don't know what is to be done about it, but I wanted to open the discussion to see if I am the only one with this thought (in which case, I will move along...) and if I'm not, what can we or should we do about it?

posted by andreaazure to Etiquette/Policy at 7:07 AM (27 comments total) 2 users marked this as a favorite

what can we or should we do about it?

Use the Google Cache for recent outages and the Wayback Machine for older stuff?

It's not a perfect solution. The News from Babylon robots.txt prevented the Wayback Machine from archiving it.
posted by jedicus at 7:11 AM on May 9, 2011

One of the things that most disturbs me as a web designer is that the web is worse than ephemeral. With old-timey ephemera, some of it hung around for we later generations to examine. Once it's gone from the internet, the electrons just flitter away. Wayback is only a partial solution. I look at some of my old designs, and get pages full of broken image links, etc.

In this world of short attention spans, we just have to accept the fact that some stuff is just gone.
posted by crunchland at 7:18 AM on May 9, 2011 [1 favorite]

Hm, it shouldn't be too hard to whip up an rss follower to mirror all the links/perquisites of FPPs. I imagine it would have to be community-driven though; I doubt #1 would be interested in the hilarious lawyer's fees.
posted by Skorgu at 7:22 AM on May 9, 2011

Some clever person could probably make a script that would automatically create alternate archive.org wayback links for the urls that appear in a post, for folks who want to go wandering through older stuff and at least save themselves a couple clicks per link regardless of whether or not archive.org turns out to have a copy of the old linked page.

Beyond that, though, it's hard to see what would be a reliably good use of energy for fixing old rotten links as a general plan: if archive.org has it, great, but they don't have everything; for stuff they don't have, the alternatives are to try and find a cache of some sort elsewhere, or to produce an alternate link entirely; those are both highly manual processes and would involve investigation and guesswork to even find the appropriate link; and, at that, the new links might rot in a few years too if the alternate source isn't stable.

It's a frustrating thing, because it'd be great if there was some way to retroactively make that stuff rock solid. But locally caching the content of links posted to Mefi is a non-starter, and a moot point for the archives in any case.

So I both sort of like the idealized notion of, and don't see a realistic angle for, fielding the backtagging energy around here to slog through replacement links on old posts. Especially since we'd probably have to do the same thing again five years later.
posted by cortex (staff) at 8:12 AM on May 9, 2011

In this world of short attention spans, we just have to accept the fact that some stuff is just gone.

In a world with a problem, make the problem worse.
posted by DU at 8:13 AM on May 9, 2011

In a world without hope, have the courage to open you heart to love.
posted by Meatbomb at 8:19 AM on May 9, 2011 [2 favorites]

I tried that and you did terrible, disgusting and interesting things. No more.
posted by Brandon Blatcher at 8:41 AM on May 9, 2011 [2 favorites]

Memento project.
posted by infinite intimation at 9:07 AM on May 9, 2011

Once it's gone from the internet, the electrons just flitter away. Wayback is only a partial solution. I look at some of my old designs, and get pages full of broken image links, etc.

I discovered this recently trying to find a bunch of pictures that friends had taken at various parties I went to several years ago before the era of facebook and flickr. All gone. Some of it was archived at the wayback machine, but it's mostly broken links.
posted by empath at 9:32 AM on May 9, 2011

everything (sorry, the others seem not to have been archived).

I have been trying to go through the "history of napster as told in Metafilter" using the tools recently. There is a lot of dead links wasteland, but also really neat stuff.

Like the link in this post here. Which gives a 404 error.

"The record companies have created this situation themselves," says Simon Wright, CEO of Virgin Entertainment Group, which operates Virgin Megastores (from a Rolling Stone article which has fallen into the pit of broken links on the internet, which I was able to make accessible using the incredible “Memento Project”).
posted by infinite intimation at 9:33 AM on May 9, 2011 [1 favorite]

I imagined like a "topic specific" blog, sort of like the "deleted posts blog", where interested people could use the various archive tools, and sort of do "topic/themed history posts", attempting to "recreate a timeline", including several old posts, and/or comments that had good links or interesting points, accounting for, and filling in dead links, as opposed to an "onsite" thing, but I didn't know what sort of complexities there are in using other's comments, so it didn't go far beyond the "neat idea" stage.
posted by infinite intimation at 9:39 AM on May 9, 2011

The Internet: You Can Know EVERYTHING, But You Can Only Know It RIGHT NOW
posted by briank at 10:04 AM on May 9, 2011 [9 favorites]

infinite intimation: "Memento project."

MeMento
posted by zarq at 10:05 AM on May 9, 2011 [1 favorite]

I suppose I shouldn't rename the Obituaries page on the wiki, MeMori. :D
posted by zarq at 10:11 AM on May 9, 2011 [2 favorites]

This interests me. I know there sites that proactively mirror front-page links from Slashdot, Reddit, & Digg. There's software to automatically query caches like Google & the Wayback Machine, & to download video. I wonder how much media there is? The big lists - British comedies come to mind, along with Craig Ferguson musical openers.
posted by Pronoiac at 11:14 AM on May 9, 2011

I hate yahoo for this reason. After just a few years all their links are useless. Never, ever link to a yahoo news article if there is any other reputable place to link the article from.

Also, I'm not great with pictures but if anyone is looking for specific articles from back then I can try to help you locate them - just memail me.
posted by cashman at 12:00 PM on May 9, 2011

Pinboard offers this on paid accounts. I wonder if we could leverage that.
posted by mendel at 12:51 PM on May 9, 2011

On a more Metafilter-related issue, it's regrettable (but totally understandable) that some of the more important threads in our history are full of broken links now. For example, I don't think there's a single link in the Kaycee Nicole threads (1, 2) still standing. Of course, those threads are just ten days shy of being a decade old, so I'm not sure what we would expect...

I'm not sure how to correct it, though, or if we even should. Manually updating old links seems like a fool's errand.

Maybe someone should grab all the Russian Girls links while they're still fresh. They won't be linked in the threads once they go 404, but at least we'll have them.
posted by Ian A.T. at 1:59 PM on May 9, 2011

Hm, it shouldn't be too hard to whip up an rss follower to mirror all the links/perquisites of FPPs.

This will become less and less straightforward as more sites start to copy the gawker/twitter style of page that requires javascript and renders the whole page client-side without providing any kind of progressive enhancement or fallback. It's yet another reason that I want to punch these jerkfaces for breaking the web and convincing everyone else it's a good idea.
posted by Rhomboid at 8:09 PM on May 9, 2011 [4 favorites]

Could the MeFi server perhaps be taught to at least have a stab at automagically supplying archive.org versions, if such are available, of links that have 404'd today?
posted by flabdablet at 10:36 PM on May 9, 2011

I'm interested in building this custom internet archive - it seems like a decent reason to finally get a VPS. (This, and properly seeding the Mefi Music torrents.) Now, to find a decent VPS...
posted by Pronoiac at 5:50 PM on May 15, 2011

I'm interested in building this custom internet archive

Allow me to suggest get_flash_videos which can be easily automated to grab videos from about three dozen various sites in combination with jwplayer. That will take care of the problem of videos that get taken down or otherwise removed, although surely you don't want to be fully in the video hosting business so you might explore some access controls or whatnot so that your mirrored copy only kicks in if the hosted one is gone or something along those lines.
posted by Rhomboid at 6:57 PM on May 16, 2011 [1 favorite]

hmm, ignorant question here; is it possible/how impossible is it to make a 'thing' that automatically "copies" all links in new posts (on Mefi-Blue), and puts them in a text box, and then presses a button (also, can this be done in a "behind the scenes way")?

If you take any given link (or all outgoing links on the blue [or a triage of links as suggested in a modified "MoSCow" method here, starting with the places that always kill links quickly, like, if any still get posted, yahoo!]), and paste it in the box HERE (wayback machine, Beta), and click "show latest", it automatically has Archive.org take a snapshot, at that time, and then in a month or so, it will be permanently in the archive (which looks like this)... which can then be queried by any of many tools. So, basically, is there a way to get a computer to strip and copy links, paste them there, and then "press" a button on a web-page? Or is one of these tools more appropriate for this "archiving" task (Web Curator tool, Firefox Page-Saver/Scrapbook plugin); should I be asking this on AskMe?

I should also clarify, the memento project is not for the "archiving" part, it is for the navigation, and interconnection of the disparate "archive-sources" -- after they are captured; such as, http://www.webcitation.org/archive.php, http://www.archive-it.org/, http://webarchives.cdlib.org/p/projects, Backupurl, Heratrix open source crawling tool.

-these resources might help anyone who is looking at this major problem with web architecture, and thinking they want to "do something". Links via "A Guide for Archiving Web Pages"

Monitoring changes to web pages, an annotated list of detection tools from Rhodes-Blakeman Associates (2008).

Update Scanner, a FireFox add-on monitoring tool.

Preservation of Web Resources Handbook, (pdf) from the University of London Computing Centre, pp. 23-27 (2008).

Tools, a section of Harvard University's Web Archiving Resources pages (2008).

Resource List of harvesting tools from the National Archives and Records Administration, USA (2005).

HTTrack Website Copier, a harvesting tool that is easy to install and use.

Copy an entire web site with HTTrack from ez-nets, home and small office networking support site (2005).

Web Curator Tool, an easy to use, but not easy to install, comprehensive web harvesting toolset.

A Year of Selective Web Archiving with the Web Curator at the National Library of New Zealand, by Gordon Paynter et al; D-Lib Magazine, May/June 2008, Volume 14 Number 5/6. "The Web Curator Tool is an open-source tool for managing selective web archiving developed as a joint project between the National Library of New Zealand and the British Library. It has now been in everyday use at the National Library of New Zealand since January 2007. This article describes our first year of selective web archiving with the new tool. The National Library of New Zealand is reaping the benefits of the Web Curator Tool development and will continue our selective harvesting program with the Web Curator Tool for the foreseeable future."

So one would find a way of making auto-archivisation of Mefi outgoing links first, then on a server, would do something like link to "memento/timeportals" (or something, it is explained more clearly here [Having your server link to http://purl.org/memento/timegate/ will cause Memento clients to talk to the timegate aggregator, which will check 10+ public archives for the appropriate pages. This of course assumes that public archives have been crawling your site; if the site is very new it
might not have been crawled & archived yet.])... which then parses the archives, and sees which, if any, possess the proper resources.

The following terms specific to the Memento framework are introduced here:
Original Resource: An Original Resource is a resource that exists or used to exist, and for which access to one of its prior states is desired.
Memento: A Memento for an Original Resource is a resource that encapsulates a prior state of the Original Resource. A Memento for an Original Resource as it existed at time Tj is a resource that encapsulates the state that the Original Resource had at time Tj.
TimeGate: A TimeGate for an Original Resource is a resource that supports negotiation to allow selective, datetime-based, access to prior states of the Original Resource.
TimeMap: A TimeMap for an Original Resource is a resource from which a list of URIs of Mementos of the Original Resource is available.

The original poster might find this site interesting and on-topic, it is created by the Library of Congress Web Archives, it is the "minerva archive", which has a whole lot of archives from immediately pre 9/11, and then also many from after... it essentially documents "how" America, and the world used the internet both during 9/11, and in the aftermath. And here are is the list of other LCWA topics.

Oh, wow, thissiteisincredible.
Spanamwar.com; Action Reports and First Hand Accounts, Diver Charles Morgan, USS NEW YORK Describes his Descent into the MAINE (*graphic description of the results of war). Not sure what the "Battleship Maine" is? No excuses now. Via "single sites archive". Gratuitous image of awesome three dollar bill; Continental Currency... seriously, are archives actually singularities, from over the event-horizon of which my time may never return?
posted by infinite intimation at 10:52 PM on May 16, 2011 [1 favorite]

So, basically, is there a way to get a computer to strip and copy links, paste them there, and then "press" a button on a web-page?

You could certainly use a macro/automation program to do that but that's making it far more complicated than it needs to be. At the low level of the web there are really no such things as buttons to press or forms to paste into; everything is a series of GET or POST (and occasionally HEAD or PUT but those don't really matter here) HTTP requests. Or put differently, pasting and pushing are things a human does to a browser, but the browser takes those actions and turns them into HTTP requests. You can cut out the browser entirely and just initiate those same requests. In your example entering FOO in the url field pressing the 'Latest' button results in a HTTP GET to http://web.archive.org/form-submit.jsp?url=FOO&type=replay which simply returns a 302 redirect with a Location header pointing to http://replay.web.archive.org/FOO. So you can cut out the middleman entirely: for each URL U in a post, retrieve http://replay.web.archive.org/U and throw away the result. This is very simple to implement in almost any scripting language.
posted by Rhomboid at 5:19 AM on May 17, 2011 [2 favorites]

Wow. I'm going to have to read that comment in stages.

Upon consideration, sharing video will require something with more space than a vps.

Going forward, pinging the Internet Archive could work well.

Going back, mirroring Metafilter articles & rewriting the links to various caches sounds best. Um. That sounds a lot like just going to the Wayback Machine for a Mefi page, except the link format has changed here over the years.
posted by Pronoiac at 7:25 PM on May 17, 2011

This thread's closing in a week, & I thought I'd note what's up:
I've exchanged email with someone at the Internet Archive, & automating the "add this one page" struck them as misuse of the system. They suggested their fee-for-service Archive-It, who I'm waiting to hear back from. Sunday, there's an Internet Archive party in the East Bay, where I plan to inquire further.
posted by Pronoiac at 12:34 AM on June 2, 2011 [2 favorites]

So, I made it to the Internet Archive's Physical Archive launch party, & chatted with Gordon Mohr, the technical lead for their webcrawler. He felt Archive-It wasn't a good fit, & encouraged me to take a shot at doing it myself, with a server or even from my desktop. Getting my archived copies into the Wayback Machine is very unlikely, due to issues of provenance, but he was open to taking suggestions & some measure of coordination.

I look forward into digging into this further.
posted by Pronoiac at 12:07 AM on June 9, 2011 [1 favorite]

« Older HBO Loves Mefi | Gotta Share Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

It was there. Now it isn't. May 9, 2011 7:07 AM Subscribe

Tags

Share