User comment dumps April 14, 2008 4:53 PM   Subscribe

Many of us have spent quite a bit of time writing our various comments and posts on MetaFilter. In the spirit of "All posts are © their original authors," I would like to request the ability for a user to be able to download all of their own contributions as a simple data dump.

This would accomplish at least three things. First, it would provide us with a permanent archive of our contributions here even if the site were to ever be taken over or shut down, while I know this probably seems very unlikely right now. But members of other forums have lost their contributions and it probably didn't seem likely to them either.

Second, it would make it easier for us to do full-text searching of our own comments/posts/MeMails/etc.

Third, it would establish MetaFilter as a shining example to others of how a community that depends on its users for content generation should treat those users, and make it easy for them to do whatever they want with their own content they contributed.
posted by grouse to Feature Requests at 4:53 PM (66 comments total) 11 users marked this as a favorite

I should add that full-text searching of my own comments has been something that would have been useful to me many a time. Sometimes original research goes into MeFi/AskMe comments, and it is useful to be able to find them and avoid doing the same stuff over again. Perhaps I should be keeping my own notes, but that would get tedious and hard to organize.
posted by grouse at 4:56 PM on April 14, 2008


you can search your contributions right now from your profile page with the form at the top right of the page.
posted by pb (staff) at 5:01 PM on April 14, 2008


I don't think we're opposed to a data dump philosophically. I think the sticking point is finding a format that will work. Do you have a format you'd like to see your comments in?
posted by pb (staff) at 5:03 PM on April 14, 2008


furthermore, you can search anyone else's contributions in full text. pb has a cat.
posted by desjardins at 5:05 PM on April 14, 2008


That is a good point about the full-text searching, but if I had my own copy of it, I could do find-as-you-type searching, search for partial words, etc.

As for format, I don't have any pre-set ideas. HTML with a div for each comment, and the standard permalink so we can find the context would be a great start. Maybe other people have better ideas.
posted by grouse at 5:09 PM on April 14, 2008


I'm writing an entire novel, but I'm lazy, so I do it in small increments as MeFi comments. The data dump would save me copying and pasting once I want to send it to a publisher. Good call!
posted by Fuzzy Skinner at 5:12 PM on April 14, 2008


JSON or XML would work nicely, maybe.
posted by Blazecock Pileon at 5:16 PM on April 14, 2008


1) HTML skeleton page with html/head/body etc explaining what it's all about.
2) after the explanation text, #include 'userid.html' if it exists. If not, count the comments and if it wouldn't liquify the server, generate it. If it will liquify the server throw up an "We'll generate this in the next batch" message and queue it.
3) Every day/week/when the other infodump happens, append comments made since the last dump (by comment ID probably) to the end of the userid.html
4) Wipe hands on pants.

No crushed servers, parsing basic HTML is easy for the recipient, nothing gets generated until its used.
posted by Skorgu at 5:21 PM on April 14, 2008


You said "dump."
posted by Eideteker at 5:23 PM on April 14, 2008 [4 favorites]


This is a great idea. I'd put a blue ribbon on this pony if it were a literal one.
posted by ignignokt at 5:28 PM on April 14, 2008


I don't want to host an API so JSON or XML will not be available. Since most people want something in notepad they can control-F and read, we can come up with a compromise that is easy to read and search, and if you're a perl hacker that wants to stuff it in a db, it could be possible to do that as well. I suspect we'll do something with unique breaks between comments (I suggest 10 dashes, followed by a space, with a line break before and after).

I suppose we could just intermix comments from all subsites in one big file for you, ordered by posting date and every comment would have a permalink so you could see it in context again.
posted by mathowie (staff) at 5:48 PM on April 14, 2008


pb: I don't think we're opposed to a data dump philosophically. I think the sticking point is finding a format that will work. Do you have a format you'd like to see your comments in?

Sky writing.
posted by loiseau at 5:52 PM on April 14, 2008 [12 favorites]


First, it would provide us with a permanent archive of our contributions here

Eh. when making contributions here I'm not looking to see it archived forever and ever.

Second, it would make it easier for us to do full-text searching of our own comments/posts/MeMails/etc.

Yeah can do that already.


Third, it would establish MetaFilter as a shining example to others of how a community that depends on its users for content generation should treat those users, and make it easy for them to do whatever they want with their own content they contributed.

I'm not sure what else I can do with the many comments I've made about specific subjects.

Not opposed to this idea, but it and the rationales seem odd. What I do here is meant for use here, so using it somewhere else just seems really strange to me.
posted by Brandon Blatcher at 5:54 PM on April 14, 2008 [1 favorite]


To say MeFi "depends on its users for content generation" whilst at the same time requesting a user comment dump (i.e. not a post dump) eerily implies that MeFi is about The Discussion, which of course we all know it's not, right? Right?

Damn loiseau, I was going to go with 'format of choice - tattood on pb'
posted by cosmonik at 5:54 PM on April 14, 2008


I don't want to host an API so JSON or XML will not be available.

Perhaps I'm not understanding your acronyms, but what would hosting an Application Programmers Interface have to do with serving XML pages? Just have a cgi-bin that takes a username as an argument and dumps the XML output. People can Save-To-File as necessary.

Needless to say I'm making a big time vote for XML here. I spend too much time scraping HTML as it is.
posted by tkolar at 6:10 PM on April 14, 2008


I don't understand a word of the technical nitty-gritty, but I think it's a cool idea.
posted by rtha at 6:39 PM on April 14, 2008


Ooh, maybe also we could have the option of paying some sum to order a printout of all of our detritus to be shipped to us printed on fancy Metafilter stationary! Would that be feasible? Also, might it be possible to have a big container of hand lotion included in each order? Preferably the kind with the built-in pump.

(GAG ME WITH A SPOON in re: this horrible idea.)
posted by 1 at 6:40 PM on April 14, 2008


tkolar, this isn't intended to be first and foremost parseable by programmers, but instead readable by humans, with the ability to parse the readable text second. I would argue a giant XML file is mostly noisy junk to regular non-programmer folks. And I don't want to host a free-for-all API that lets you query anyone else's content but your own. I don't want to see mirrors of the entire site somewhere else plastered in ads. You'll be able to request a text file of your own comments and posts in a flat file and that's pretty much it. It should satisfy the original request to create a simple searchable text file for use by someone to look at their own history.
posted by mathowie (staff) at 6:42 PM on April 14, 2008 [1 favorite]


most of the comments i'd actually want have been deleted.
posted by quonsar at 6:50 PM on April 14, 2008 [6 favorites]


I really like this idea, but I think it would be much more useful to me personally if you could parse my posts and comments between the scrape and the dump such that they receive a complete copy edit, and then replace most of them with something Miko or robocop is bleeding wrote.
posted by It's Raining Florence Henderson at 6:50 PM on April 14, 2008


Metafilter 2.0 = robomiko is reading, with gradients.
posted by Brandon Blatcher at 6:52 PM on April 14, 2008


The file should be called IAmVeryCleverAndSpecial.txt
posted by Artw at 6:59 PM on April 14, 2008 [3 favorites]


BTW, that Markov Chain thing of Cortexs that I was abusing the other day looks very much like it works off of a feed of user comments.
posted by Artw at 7:02 PM on April 14, 2008


specialsnowflake.html
posted by desjardins at 7:03 PM on April 14, 2008


most of the comments i'd actually want have been deleted.

Maybe we could pay extra for those.
posted by timeistight at 7:04 PM on April 14, 2008




----------



Like


----------


this?


----------


I


----------


am


----------


a


----------


jerk


----------


to


----------


myself!


----------


posted by klangklangston at 7:21 PM on April 14, 2008


*gags 1 with a spoon*

BTW, that Markov Chain thing of Cortexs that I was abusing the other day looks very much like it works off of a feed of user comments.

It's working directly from the database, actually. Which is more or less what this request would be doing, too, except without all the algorithmic fucking up after the fetch.
posted by cortex (staff) at 9:14 PM on April 14, 2008 [1 favorite]


iamaspecialsnowflakeDATEGOESHERE.html

Jebus, don't you people do ANY archival work?
posted by SlyBevel at 9:17 PM on April 14, 2008


Ooh, maybe also we could have the option of paying some sum to order a printout of all of our detritus to be shipped to us printed on fancy Metafilter stationary! Would that be feasible? Also, might it be possible to have a big container of hand lotion included in each order? Preferably the kind with the built-in pump.

What is wrong with you
posted by Pope Guilty at 9:20 PM on April 14, 2008


Dry hands, apparently.
posted by It's Raining Florence Henderson at 9:37 PM on April 14, 2008


I would like my comments laser-etched on to the face of a diamond, encased in lead, then shot into the depths of space at sufficient velocity to escape the gravitational pull of the sun. I would then like to track the path of said object in real-time, with updates on location and velocity appearing on my user page. Eventually, once the distance from the sun exceeds that of the Voyager and Pioneer probes, I want everyone on earth to use it as a rough measure of interstellar distances. As in: "See that star? That star is 17.7 BBC's from earth (blue_beetle's comments)." One day a race of hyper intelligent machines will find those comments, incorporate them into their programming, and send an emissary to earth. That emissary will attempt to make contact with my descendants, but due to the sarcastic and snarky nature of the communication (based on my comments), some sort of interstellar war will break out, leading to the utter destruction of earth, the human race, and all we've ever known.

And I will call that pony Death, and she will be swift and pale.
posted by blue_beetle at 9:47 PM on April 14, 2008 [12 favorites]


and general asshattery.

1 appears to be the essence of the angry kid who always thought he was smarter than everyone around him, which is why no one would hang out with him, because he intimidated them with his obvious superior intellect and snark.

But on a separate note, this sounds like a pretty fun idea in and off itself.
posted by mrzarquon at 9:50 PM on April 14, 2008


1 appears to be the essence of the angry kid who always thought he was smarter than everyone around him, which is why no one would hang out with him, because he intimidated them with his obvious superior intellect and snark.

MetaFilter: Find your archetypes here.

Also, that sounds like the kind of copy that would go on the back of an action figure blister pack or a collectible card game card.
posted by ignignokt at 9:58 PM on April 14, 2008


which ironically would be something that character would collect himself.

self referential archetypes.
posted by mrzarquon at 10:02 PM on April 14, 2008


Would it be possible to write a function to re-attribute all my posts to someone else? (I don't really care if they want them or not.)
posted by loiseau at 10:34 PM on April 14, 2008


This would be a great idea for reminding those of us that need it what an arsehole we can be at times.

Well, me anyway.
posted by dg at 10:46 PM on April 14, 2008


Ooooh, I remember asking for this years ago. I dunno what I'd do with it exactly (foulmouthed wonderchicken lexical corpus to upload and inform the silicon sensibilities of the Artificial Wonderchicken Intelligences that will one day roam the earth, eliminating the last few outposts of human life, maybe), but I'd sure love to have it!
posted by stavrosthewonderchicken at 11:22 PM on April 14, 2008


So pb built a prototype we'll release in a day or two. It took about five minutes to download my more than 10,000 comments posted to date into a 6Mb text file and I have to admit it's kind of cool to be able to jump to anything I've said in the past with a quick find command.
posted by mathowie (staff) at 11:35 PM on April 14, 2008


I hereby declare Pony Success.
posted by cosmonik at 11:47 PM on April 14, 2008


I vote awesome.
posted by allkindsoftime at 11:47 PM on April 14, 2008


That is awesome, pb and Matt, and I can't wait to play with it. Thanks for being so open-minded.
posted by grouse at 11:54 PM on April 14, 2008


You're saying "text file" but I'm thinking that much of what I post here is useful because it's got links in it. Is that text file actually going to be HTML?
posted by flabdablet at 2:36 AM on April 15, 2008


This would be a great idea for reminding those of us that need it what an arsehole we can be at times.

Can I have mine filtered to remove my obnoxious drunken shite? I'm pretty sure I said something worth saying back in 2006.
posted by Sparx at 4:08 AM on April 15, 2008


mathowie: So pb built a prototype we'll release in a day or two.

Wow, two days from request to new shiny pony. That's pretty efficient. Hats off to you, gentlemen.
posted by sveskemus at 5:35 AM on April 15, 2008


I dunno what I'd do with it exactly

I shall engrave my words on copperplate, print them on Arches Moulin du Gué paper, and bind them in calf.
posted by octobersurprise at 6:07 AM on April 15, 2008


I don't want to host an API so JSON or XML will not be available.

What does an "API" have to do with the format of the output? And JSON is hardly "noisy" or "unintelligible", it was designed to be readable. And no one asked for anyone else's transcript, certainly not me, so what's up with the angry response. ?????
posted by Blazecock Pileon at 6:34 AM on April 15, 2008


You're saying "text file" but I'm thinking that much of what I post here is useful because it's got links in it. Is that text file actually going to be HTML?

It'll be a raw text file; some of that text will consist of html code. You could load the whole thing into your browser and have the links (and other formatting) work, but it'd also collapse the whitespace into an awful mess.

If someone is feeling generous, they could probably end up throwing together some sort of post-processing script to nice-ify the raw text output into an okay html view by adding in some linebreak tags / etc, but with the basic file you'll just need to copy and paste your links (or use a text editor that will autolink raw urls, I guess).
posted by cortex (staff) at 6:44 AM on April 15, 2008


On the one hand, I like the idea that some of the articulate and creative people here will be able to see all of their comments in one clean interface.

On the other, I realize that I've made far too many stupid and useless comments to fall into that category. Better for me to just forget most of what I've written and only hold onto the good bits.

Cool idea though.
posted by quin at 7:33 AM on April 15, 2008


I hope this thing allows me to send all my comments to everyone else.
posted by Mister_A at 8:29 AM on April 15, 2008


I prefer my comments to come in smoke signals from the Sistine Chapel.
posted by cowbellemoo at 8:46 AM on April 15, 2008


Do you have a format you'd like to see your comments in?

mbox.
posted by ikkyu2 at 9:18 AM on April 15, 2008 [1 favorite]


And I don't want to host a free-for-all API that lets you query anyone else's content but your own.
Yeh, this. The increasing permanence of online discussion is making me increasingly less likely to participate anywhere that doesn't have barriers to data mining.
posted by bonaldi at 9:52 AM on April 15, 2008


Here's the prototype mathowie mentioned: Export Your Comments. We set it up so you can run an export once every seven days so the server isn't overwhelmed with exports. And because there's no way for us to tell if a download was successful or not, you could get locked out for a week if you cancel the download once it starts. So just make sure you're ready when you click the "export comments" button.
posted by pb (staff) at 2:13 PM on April 15, 2008 [13 favorites]


Thanks pb!
posted by timeistight at 3:04 PM on April 15, 2008


Thanks! I'm going to do this when I get home...
posted by rtha at 3:42 PM on April 15, 2008


I don't know if I did something wrong but I downloaded it and then when I clicked to view the file on my computer I got was this:


Sorry—Can't Export

Your last export was April 15 at 03:49 and you can only export your comments once every seven days. If you feel this message is an error, please contact the site admins to let them know.


and I'm sure that I only hit download once. I did use downloadthemall when doing it so that might have screwed something up.
posted by lilkeith07 at 4:19 PM on April 15, 2008


For those who are curious the downloaded contents look like this:

2008-04-15 14:47:40.403
http://www.metafilter.com/70858/America-the-Godly#2081804
Your lack of sentence structure!
-----
2008-04-15 14:44:10.357
http://www.metafilter.com/70858/America-the-Godly#2081803
Darth Puppy finds your lack of distressing.
-----
2008-04-15 11:27:24.09
http://metatalk.metafilter.com/16114/25-The-Train-Episode#535612
Hey, Georgia has trees, ok?
-----

I only had 1.6 mb of comments though.
posted by Brandon Blatcher at 4:32 PM on April 15, 2008


On the above, the Darth Puppy link was in regular, and obviously working, html code.
posted by Brandon Blatcher at 4:35 PM on April 15, 2008


lilkeith07, I don't think you'll be able to use a download manager to grab the file. I'm not even sure how DownThemAll would come into play with a form post. Anyway, I reset your account so you can try again. If you can disable that extension, you might try that.
posted by pb (staff) at 4:44 PM on April 15, 2008


Thank you!
posted by Pope Guilty at 6:39 PM on April 15, 2008


Thanks, guys.
posted by stavrosthewonderchicken at 8:00 PM on April 15, 2008


I guarantee that I will never search for this comment. It has no content and isn't even funny.
posted by iamkimiam at 8:41 PM on April 15, 2008 [1 favorite]


Not that I know a damn thing about how these infodump things work, or how databases are constructed, or anything, but in my download, I get a few doubles (which did/do not appear in the original). Most curiously - so far, anyway - I got this:

-----

2007-01-12 16:33:49.573

http://ask.metafilter.com/54929/#827141

Break up with her.

-----

2007-01-12 16:33:49.573

http://travel.metafilter.com/6432/#38375

Break up with her.

---

How that happen? I do remember - vaguely - posting in that deleted askme thread, but how did it move over to travel.metafilter?
posted by rtha at 12:54 PM on April 16, 2008


We moved a bunch of travel-related threads and comments from Ask over to Travel. And those threads that weren't travel-related but happened to make it over were deleted. But we didn't delete the comments associated with those threads. So you hit a glitch with the system.

For now I'll take Travel out of the comment export, and restore it once Travel is officially live.
posted by pb (staff) at 1:25 PM on April 16, 2008


I just got 2.98 MB worth my comments. All I can say is that 1.) it's pretty humbling to have six years worth of writing all laid bare before me. and 2.) I'm kinda inarticulate a lot of the time.

Credit due though, it came down in less than a minute and worked perfectly, so excellent work on the technical side as well as the giving us cool little horses one.
posted by quin at 2:22 PM on April 16, 2008


Glitch! Glitch! I hit a glitch!

(Hope it's okay - it jumped up and ran away, so I figured it was fine.)
posted by rtha at 2:55 PM on April 16, 2008


« Older I've got a bunch of good mixes just waiting to be...   |   Jessamyn on FutureTense Newer »

You are not logged in, either login or create an account to post comments