What Links Just. Won't. Die!? August 23, 2010 4:50 PM   Subscribe

Per this comment, what sites/content/links are most frequently deleted on the Blue?
posted by zarq to MetaFilter-Related at 4:50 PM (54 comments total)

I was going to say The Onion, but I think most mefites know better than that. Big Picture has got to be up there.

If I were a dick, I'd say youtube & NYT, but I'm totally not.
posted by synaesthetichaze at 5:03 PM on August 23, 2010


Whatever is currently on the front page of your favorite blog/link aggregator/bookmarking widget.

Also, The Ever-Imminent Death of James Brown.
posted by carsonb at 5:22 PM on August 23, 2010


There was that time that James Brown died while playing violin undercover in the subway
posted by flatluigi at 5:23 PM on August 23, 2010 [2 favorites]


There are a couple of taboo sites too, now that I think about it. English Russia, stormfront, list sites/content mills, ebaumsworld, etc.
posted by carsonb at 5:25 PM on August 23, 2010


I'd say ones that I pick tend to get deleted quite often. :(
posted by mccarty.tim at 5:43 PM on August 23, 2010 [1 favorite]


Whoa! Moderators have the power to delete sites???
posted by qvantamon at 5:48 PM on August 23, 2010 [2 favorites]


What's the trouble with English Russia, carsonb? I remember following a link from ... stbalbach maybe? Lemme see ... yeah, stbalbach—and not noticing anything especially disturbing.
posted by cgc373 at 5:49 PM on August 23, 2010


Oh, I might have just lumped that one in with ebaumsworld as one of those content aggregators that doesn't attribute properly. Bit of quick googly and it looks like I'm just misinformed. Carry on (or it will on-carry YOU!)
posted by carsonb at 5:53 PM on August 23, 2010 [1 favorite]


Posts linking to your friend's midget porn site have a deletion rate approaching 100%.
posted by killdevil at 6:06 PM on August 23, 2010


Well, example.com gets linked on the blue quite frequently. So you know, stuff that's not on there.
posted by special-k at 6:09 PM on August 23, 2010 [1 favorite]


There might have been an English Russia if those pesky Bolshies hadn't beaten back our expedition to Arkhangelsk.
posted by Abiezer at 6:21 PM on August 23, 2010 [1 favorite]


I see a fairly high number of posts linking here but for some reason they tend to just have their links redirected instead of being deleted.
posted by DU at 6:23 PM on August 23, 2010 [1 favorite]


Links to www.thetruthaboutcortex.com are also routinely deleted. They also have a script to mangle the address to www.thetruthaboutcortex.com instead, so that it results in a DNS error.
posted by qvantamon at 6:37 PM on August 23, 2010


webpark.ru is the sleazy counterpart to English Russia, I think.
posted by Rumple at 7:20 PM on August 23, 2010


Daily Mail.
posted by smoke at 7:25 PM on August 23, 2010


Despite threats of banning, warnings of legal action, the impassioned pleas of the crew, and my own growing sense of helplessness and shame, I am compelled to post a link to a jpg image of my naked butt to Metafilter as soon as I wake up every day.

The links are deleted without fail.
posted by stavrosthewonderchicken at 7:34 PM on August 23, 2010 [4 favorites]


I't always amusing to me when a link to InfoWars or PrisonPlanet shows up. There's a few comments worth of "Uh...this seems extreme, but hey there's a picture of KRS-1 on the site" and then the flags from people who know who the hell Alex Jones is kick in and the link gets deleted.
posted by Lentrohamsanin at 7:38 PM on August 23, 2010


HuffPo would be my wild, uneducated guess.
posted by cj_ at 7:44 PM on August 23, 2010


I killed EIT!
posted by The Devil Tesla at 7:46 PM on August 23, 2010


CNN, wikipedia, reddit, onion, bigpicture... and if it were up to me, boingboing would be on the list, too.
posted by crunchland at 7:57 PM on August 23, 2010 [3 favorites]


OKTrends will make the auto-delete list sooner or later.
posted by klarck at 8:00 PM on August 23, 2010 [1 favorite]


Argh, sometime recently I did a little roundup of one day's worth of link sources, in a discussion about whether traditional news media were useless. it turns out that, at least on MeFi, they are decidedly not. Looking for it..

And damn, I almost posted the Russian pics myself yesterday. I saw it was a double when I previewed...but damn, those Russian color pics!

...Found it! Sampled on February 8, 2010:
The LA Times blog - print news
Theme Park Insider - ad-supported travel site-cum-industry rag
The LA Times - print news
The Washington Post - print news
Advertising Age - ad industry rag
CNN - broadcast news
Time - print news
Multichannel News - consumer media industry rag
The American Bar Association - nonprofit professional association
LiveScience - ad-supported vehicle for TechMedia network; aggregated content
NFL.com - for-profit enterprise
A bunch of YouTube videos of songs created by many diifferent users about the NO Saints - some original content made by user, some grabbed from local TV, some professional artists
Nola.com - local news
NowPublic - crowdsourced news reporting site, ad-supported
The Miami Herald - print news, (republishing something from the Washington Post)
NewTimes - ad-supported Miami news blog
WLOX 13 Biloxi - broadcast news
KOTV News on 6 Tulsa - broadcast news
AL.com, print news - an Alabama newspaper group)
The Stars and Stripes - print news
The Boston Herald - print news
Buddy's Boards - seems to be a sports fan site, has ads by Google, a little unclear - hobbyfarm?
WTSP, Tampa - broadcast news
Wisconsin Historical Society - publicly funded not-for-profit
Rotten Tomatoes - media enterprise
Boogalu Productions - media enterprise
NYTimes Wheels blog - print news
Oak Ridge Associated Universities (ORAU) Health Physics Historical Instrumentation Museum Collection - publicly funded, donations, not-for-profit
Health Physics Society - not-for-profit professional association
PhysicsWorld.com - site of the Institute of Physics, publisher and trade group
Salon.com - online print news
Chicago Sun-Times - print news
The entire movie Slackers posted by Film Buff on YouTube
Wikipedia - online crowdsourced nonprofit
The Wall Street Journal - print news
World Heritage Tour - nonprofit, donation-based photography pool
UNESCO - United Nations funded
The Suburban Emergency Management Project - grant-funded public-health consortium
Inkling Magazine, - for-profit ad-supported science news site, aggregator
Torkzadeh.com, - artist and photographer promoting his work
Tomorrow Museum - ad-supported blog, aggregator
Internet Archaeology - not for profit project w/ donors (the Internet Archive among them)
The Guardian - print news
Jesus, Kirk, and Vinny - hobbyfarm?
I would love to see this kind of categorical MeFi source analysis done over the course of a year or even a month.
posted by Miko at 8:38 PM on August 23, 2010 [3 favorites]


(for both cited and deleted sites that is. It would be a blast to compare the two in, say , piechart form.)
posted by Miko at 8:40 PM on August 23, 2010 [1 favorite]


TimeCube seems to keep coming back like a bad hangover...
posted by mannequito at 9:56 PM on August 23, 2010


Posts linking to your friend's midget porn site have a deletion rate approaching 100%

Dwarf shortage :(
posted by turgid dahlia at 10:33 PM on August 23, 2010 [1 favorite]


Also, The Ever-Imminent Death of James Brown.

Touch wood, and shut your mouth.
posted by Meatbomb at 10:40 PM on August 23, 2010 [1 favorite]


No flies on midget wrestling though. Apparently Ronald Reagan was really big on this stuff.
posted by philip-random at 10:41 PM on August 23, 2010


I blame Metafilter for the fact that I have 3,553 photos of Stavros' naked butt on my hard drive.
posted by taz at 11:35 PM on August 23, 2010 [2 favorites]


I thank MetaFilter for my thousands and thousands of naked wonderchickenbutt photos.
posted by cgc373 at 11:55 PM on August 23, 2010 [1 favorite]


So that's what all those chicken butts are doing in my temporary internet files.
posted by amyms at 11:57 PM on August 23, 2010


Chicken Butt. You know What? We haz it.
posted by taz at 12:22 AM on August 24, 2010


You think you have problems? Try explaining those photos to your kids...
posted by dg at 3:31 AM on August 24, 2010


Daily Mail.
posted by smoke at 3:25 AM on August 24 [+] [!]


If only...
posted by i_cola at 5:00 AM on August 24, 2010


How many times did the vibrating broomstick get deleted before it was allowed to stand? At least I thought it was allowed to stand, but I'm having trouble finding it.
posted by Secret Life of Gravy at 5:28 AM on August 24, 2010


Here.
posted by crunchland at 5:37 AM on August 24, 2010


Well, example.com gets linked on the blue quite frequently. So you know, stuff that's not on there.

It took me much longer than it should have to realize that example.com is what they change links to when they delete spammy posts. I kept seeing example.com links showing up in the deleted threads blog and thinking "Oh man, it's those example.com guys again. Why don't they just add a filter to the new post code to block any more of these."

And then of course, months later, "Oh." ::facepalm::

Anyway, regarding the topic at hand, it might be interesting to run some stats on URLs that appear in posts, and in comments for that matter. That stuff is not available in the Infodump though, so someone with access to the actual database containing these would have to be involved.
posted by FishBike at 6:14 AM on August 24, 2010 [1 favorite]


I only have one chickenbutt in my temporary files, and it's all jessamyn's fault.
posted by graventy at 6:20 AM on August 24, 2010


thats no butt
posted by Potomac Avenue at 7:19 AM on August 24, 2010 [1 favorite]


Lots of posts get deleted just for reminding the mods of Fark.
posted by Joe Beese at 9:06 AM on August 24, 2010


The death of James Brown is immanent.
posted by cortex (staff) at 9:30 AM on August 24, 2010


Did you say the death of James Brown is immolent?

*sets this thread on fire*
posted by carsonb at 11:19 AM on August 24, 2010


Oh good grief...
posted by i_cola at 11:36 AM on August 24, 2010


I wonder why y'all seem to think that James Brown is really dead.
posted by blucevalo at 11:45 AM on August 24, 2010


thats no butt

Fun fact: Chickens only got one hole down there. Indeed, that is not butt; it is a SUPERBUTT.

Also, ew. I have apparently not adequately wiped that image from my brain.
posted by Sys Rq at 11:50 AM on August 24, 2010


Well, one thing for sure--you can't call James Brown the most insinuating soul singer evar. That man did not insinuate. He declared.
posted by y2karl at 1:26 PM on August 24, 2010


Oh man, it's gotta be example.com...About half the deleted posts I see go directly to or a subsite of example.com.
posted by piratebowling at 7:45 PM on August 24, 2010


I was curious, so I scraped the results (with some judicious sleeps between requests so as not to pound the server). Youtube takes the prize for being present in a whopping 10.7% of deleted posts, wikipedia is at 6.7%. It starts trailing off significantly after that. There are ~7000 unique domains. However, if you do some aggregation by content type, news sites make up 41% of the top 50 domains seen in deleted posts and only 20% are video. I have no idea how this compares to posts that are allowed to stand, scraping that is out of the question.

Anyway, here's the top 50 and how many deleted posts they appear in. Mostly it just looks like a list of sites ranked by popularity rather than anything surprising or interesting, but there you have it. If someone wants the raw data (all 7000 hosts and their counts, list of id for all deleted posts), memail me.

732 youtube.com
514 metafilter.com (previously links)
457 en.wikipedia.org
243 nytimes.com
194 news.bbc.co.uk
162 example.com
151 cnn.com
125 washingtonpost.com
111 google.com
111 guardian.co.uk
 72 flickr.com
 71 news.yahoo.com
 68 amazon.com
 66 msnbc.msn.com
 59 huffingtonpost.com
 55 video.google.com
 54 imdb.com
 50 wired.com
 47 latimes.com
 46 story.news.yahoo.com
 44 reuters.com
 43 abcnews.go.com
 42 time.com
 42 boston.com
 41 npr.org
 41 sfgate.com
 40 boingboing.net
 38 telegraph.co.uk
 35 salon.com
 34 whitehouse.gov
 31 timesonline.co.uk
 31 slate.com
 30 vimeo.com
 29 ask.metafilter.com
 28 foxnews.com
 28 cbsnews.com
 27 newyorker.com
 26 abc.net.au
 26 metatalk.metafilter.com
 26 bbc.co.uk
 25 twitter.com
 25 dailykos.com
 24 theonion.com
 23 myspace.com
 23 apple.com
 23 dailymail.co.uk
 22 usatoday.com
 20 geocities.com
 20 news.google.com
 19 cgi.ebay.com

posted by cj_ at 7:59 PM on August 24, 2010 [4 favorites]


Yeah, that is interesting - and how many of the sources (more than a few) are also on the "most cited" list is also interesting.
posted by Miko at 8:40 PM on August 24, 2010


Yeah what I think would be most interesting is to see where it deviates from the posts allowed to stand; there's just no great way for me to get that in any reasonable timeframe without being evil about crawling the site, which isn't gonna happen. It'd be cool to get this in an infodump from the database though.

It's interesting how much self-referential links show up. I didn't look at the posts themselves, but I assume links back to metafilter.com are "previously" links. This might represent a pattern where posters trying to milk a done-to-death topic or post updates to subjects that still have open threads are shut down. But again, without seeing how often these show up in the total corpus, it's hard to draw conclusions.

Overall, it's the kind of stuff I'd expect to be deleted. boingboing.net, apple.com, and google.com kind of stood out to me though. Do people really link to these sites that often? I don't notice if so. Linking to a google search seems bizarre, and I can't think of any interesting content on apple.com except the occasional trailer. But they have shown up enough to be in the top 50.
posted by cj_ at 9:25 PM on August 24, 2010


MKaybe it's a google doodle?
posted by mippy at 8:19 AM on August 25, 2010


google.com could show up in all sorts of situations—cache links, maps references, news listings, flat out search results, videoes, etc. I think people just do link to stuff there that often.

boingboing.net as the direct referent or the via link on thin posts or doubles would not be shocking.

apple.com I don't have any real explanation for other than thin trailery posts, duplicate/gratuitous gadget posts, and misc. thin apple-related posts. But 23 posts in eleven years isn't all that much, so, eh.
posted by cortex (staff) at 8:25 AM on August 25, 2010


It'd be cool to get this in an infodump from the database though.

It would indeed. Someone recently asked me a question that could have been answered easily with a dump of URLs extracted from comments, for example, which would have been neat.

The current Infodump contains entirely stuff that has been categorized as "not content", although a couple of things, like tags and post titles, are pretty close to the dividing line there. There are technical details to worry about, like how to extract URLs efficiently and how large the resulting files might be, but to me the bigger question is how this community would feel about doing that.

Are link URLs from posts and/or comments "content"? And if so, how would people feel about including them as another set of files in the Infodump? I'm not saying if the answers are favorable that we should jump up and down and demand it be done. There's still those practical, technical issues, for one thing. I'm just wondering if the concept itself is even a desirable one overall?
posted by FishBike at 8:38 AM on August 25, 2010


Just the domain seems sufficiently divorced from content, but I'm not familiar with any objections that might exist about this sort of thing. In my view, it's content posted publicly and indexed by Google.

fwiw, I extracted the links with BeautifulSoup.
posted by cj_ at 6:48 PM on August 25, 2010


google.com could show up in all sorts of situations—cache links, maps references, news listings, flat out search results, videoes, etc. I think people just do link to stuff there that often.

Well, i chopped off www. but otherwise the counts respect subdomains. So these are links directly to google.com. There are only 9 references to maps.google.com and 20 to news.google.com. Google cache is a completely different domain but I'm not sure that's always been true.

boingboing.net as the direct referent or the via link on thin posts or doubles would not be shocking.

Yeah, good point. I do see a lot of "via boingboing". The results could probably be improved by stripping out links that wrap "via" or "previously" but I didn't think of it at the time.
posted by cj_ at 2:32 AM on August 26, 2010


« Older Mobile Site Theme Preferences?   |   It Does Seem Untoward Newer »

You are not logged in, either login or create an account to post comments