Metafilter Uber-API. Possible? July 28, 2011 10:32 PM   Subscribe

It's been a while since we discussed the possibility of a metafilter API. Is there any news on this front? Is it worth reconsidering?

In the past I've used the rss feeds to harvest metafilter data, though I have noticed there are a few pages with character encoding issues/entity issues, and the parsing of user ids and timestamps is bloody awful.

I would love to see something like:
{ 'post_title' : {{ post_title }},
  'id': {{ post_id }},
  'user_id': {{ post_user_id }}
  'post_body': {{ post_body }},
  'more_inside': {{ more_inside }},
  'date': {{ date_published }},
  'deleted': {{ is_deleted }},
  "comments" : [
    {'id': {{ comment_id }},
     'date': {{ comment_date_published }},
     'user_id': {{ comment_user_id }},
     'comment': {{ comment }},
     'deleted': {{ comment_is_deleted }}
    },
    ...
  ]
}
Here is my thought: Start with a read-only system using the "accepts" header to specify that JSON be returned. Start with a basic set of returned fields for posts and comments and then slowly whitelist more fields to be returned, per your comfort level of course. Only logged in users can use the API and I would think that use of the API for commercial purposes (and such) should require your permission and perhaps renumeration. In fact, developing an API could eventually be a good money making venture for the site.

One question though, is metafilter ready for an API and does it make sense to open the data-gates in this fashion? I think a good compromise is a read-only API system in the interim. Personally, I don't want to see Metafilter damaged by any hasty transitions, it is a great asset. So of course we must defer to good judgment in this matter.
posted by kuatto to Feature Requests at 10:32 PM (55 comments total) 1 user marked this as a favorite

We have the infodump and RSS feeds and that's as comfortable as we are with sharing data at the moment. We're very careful about site culture, and we feel like comments belong within the context of threads on MetaFilter. An API like this is for displaying that content somewhere else, and we'd rather not go down the road right now. It's a nice idea in theory, but what do you want to build with it? Maybe we could discuss some concrete examples of applications we would see with an API.
posted by pb (staff) at 10:40 PM on July 28, 2011 [6 favorites]


Well, for my part, I would like to write an iPhone app using this notional API that replaces everything cortex says with 'BUTTS LOL'.

Because it makes me laugh when he says that!
posted by stavrosthewonderchicken at 10:44 PM on July 28, 2011 [9 favorites]


I know, I know: Greasemonkey.
posted by stavrosthewonderchicken at 10:44 PM on July 28, 2011 [2 favorites]


butts lol
posted by cortex (staff) at 10:59 PM on July 28, 2011 [40 favorites]


And yeah, what pb said. Part of me loves loves loves the idea of having a general API for the site because, woo, data manipulation, yay! But on the other hand it's a great big Pandora's Box, even setting aside implementation and resource issues, and for all my generic enthusiasm about the idea over the years I've never come up with an actual "this is how this will be a net gain for Metafilter as a community" vision for an API.

With the Infodump and some of the Corpus stuff I've been working on, we can at least get a tasty chunk of data out there in a more historic sense, but that's all been intentionally derivative of site activity rather than just dumping raw db content out on demand. I'm always interested in possible additional projects in that vein, but we've been erring intentionally on the side of caution with what gets added to the public dump.

Folks interested in doing specific research or data analysis projects are totally welcome to contact me and we can talk about one-off approaches to things. That's been the case for a few academic projects, and it works out pretty well since (a) there's a context in which someone has enough at stake educationally/professionally that giving them some extra data isn't nerve-wracking and (b) it means someone's got enough of an interest in both the site and their idea that they're willing to contact me and talk about it.
posted by cortex (staff) at 11:06 PM on July 28, 2011 [3 favorites]


Imagine someone writes a jabber chatroom gateway using your API, meaning people view their comments as IMs, much like facebook messages now. Instant reply notification using a read-only API might encourage fights & chat too. Bad idea!

An infrequently updated rss or opt-in email digest for threads you've favorited maybe? meh
posted by jeffburdges at 11:08 PM on July 28, 2011


{ 'post_title' : {{ post_title }},
'id': {{ post_id }},
'user_id': {{ post_user_id }}
'post_body': {{ post_body }},
'more_inside': {{ more_inside }},
'date': {{ date_published }},
'deleted': {{ is_deleted }},
"comments" : [
{'id': {{ comment_id }},
'date': {{ comment_date_published }},
'user_id': {{ comment_user_id }},
'comment': {{ comment }},
'deleted': {{ comment_is_deleted }}
},
...
]
}


That has to be some of the worst ASCII art I've ever seen.
posted by dg at 11:55 PM on July 28, 2011 [16 favorites]


It's supposed to be an archaeopteryx looking dejected.
posted by pracowity at 12:20 AM on July 29, 2011 [9 favorites]


If you squint and look at it sideways, you can just make out that it still looks like shit.
posted by Blazecock Pileon at 1:07 AM on July 29, 2011 [1 favorite]


I'm in favor of an API as long as every call has to pay five bucks.
posted by twoleftfeet at 2:23 AM on July 29, 2011 [2 favorites]


I don't think it's a matter of Metafilter being "ready" for an API, like an API is a logical next step following from the level of maturity of the site. MeFi works as well as it does because of the firm but fair hand the ops have on the interface end. If someone takes away my Chromed Bird on Twitter I'll cut them, I swear, but I don't want to see the same thing happen here.
posted by chmmr at 2:51 AM on July 29, 2011 [1 favorite]


Sometimes I wish you people would just speak in a language I can understand. Insert grumpy luddite frown here.
posted by malibustacey9999 at 4:32 AM on July 29, 2011


Hah. Wanting to learn to use django, I spent some time this year building a Magic the Gathering type online game where each playing card represented a MeFi user, whose fighting characteristics were based on their profile info and stats. I got pretty far, but abandoned it because 1) it probably wasn't going to be that much fun 2) I was pretty sure the mods wouldn't approve 3) I felt kind of devious scraping offline MeFi profiles and parsing the HTML 4) I got busy with other stuff.

So this would have made that development a little bit easier - take that as a good or bad thing as you will.
posted by Salvor Hardin at 4:51 AM on July 29, 2011 [3 favorites]


I did become familiar with django, so anyhow it was a win for me.
posted by Salvor Hardin at 4:52 AM on July 29, 2011


For some reason I read this is asking for a Metafilter IPO. And I was like 'What the hell...?'
posted by shakespeherian at 5:10 AM on July 29, 2011

When Brandon Blatcher is put into play, all other MeFites in play at that moment become spouses. Mefites put into play after Brandon Blatcher are not spoused. The player can tap Brandon Blatcher and spend 3 karma points to extract +1 favorites from each spouse in play, or spend 10 favorites to spouse any unspoused MeFites in play.

Note: Brandon Blatcher is immune to Giant Donut, but takes 2x DMG from Ray of DTMFA.
posted by BeerFilter at 5:13 AM on July 29, 2011 [10 favorites]


I'm a big fan of open data, a big fan of APIs, a huge fan of JSON, and now officially a big fan of kuatto. But I can't think of many uses that this would have that the infodump (possibly with a little expansion) wouldn't have. There are some neat scripting opportunities, but I think that cleaning up the HTML to be more semantic and standardized (cross-subsites) would be more helpful and have fewer pitfalls.

One time, I got so annoying in a thread about MeFi's HTML that Matt called my bluff and told me to do it. I should take him up on that, four years later.
posted by Plutor at 5:33 AM on July 29, 2011


The point of an API is largely that you can't predict what people will do with it. You make the API available and people come up with stuff. If the stuff is abusive or bandwidth intense, you turn off their access.

Personally I have intermittent fantasies about a MeFi stats thing. Without an API though, there isn't any way to authenticate users against their MeFi user accounts, so that's going nowhere.
posted by DarlingBri at 6:11 AM on July 29, 2011 [1 favorite]


Personally, I'd love to have an API so I could build a custom MetaFilter client. That way, I could implement all of the features and usability stuff that I want.

However, the mods have strong opinions with regard to the way that people use the site, and exposing a robust API would remove their ability to control that. For example, the mods have said "no no no no no" when it comes to killfiles. If I had access to a good API and I were writing my own client, killfile functionality would be one of the first things I implemented. Why? Because I think it has value and a lot of other users have expressed that they do, too.

This is the sort of can of worms that is opened by making an API available, and it's part of why I think it probably won't happen.

Please note that none of this is intended to be criticism.
posted by DWRoelands at 6:27 AM on July 29, 2011 [1 favorite]


"comments" : [
{'id': {{ comment_id }},
'date': {{ comment_date_published }},
'user_id': {{ comment_user_id }},
'comment': {{ comment }},
'deleted': {{ comment_is_deleted }}
},


This is all just a ploy to be able to read deleted comments, isn't it? I'm on to you.
posted by Johnny Assay at 6:27 AM on July 29, 2011 [1 favorite]


DWRoelands: "For example, the mods have said "no no no no no" when it comes to killfiles."

Well, yes, but there are a lot of userscripts for Metafilter, including a killfile script. So that horse has already left the barn so to speak.

I do agree with your general point that the way the site runs influences how the community works, and a public API could mess with that. If there were specialized MeFi clients, we'd be having different experiences with Metafilter: perhaps I'd be using Client A that would allow for inlined images, you'd be using Client B that muted certain posts or users, etc, and that would fracture the shared experience—there'd be a camp of people using Client A that were posting inlined images as gags that other users didn't share in.

Again, this kind of thing is already happening with userscripts, but that's outside the mods' control. A public API is in their control. I can imagine good and bad coming from it, and I respect their conservatism regarding it.
posted by adamrice at 7:19 AM on July 29, 2011


APIs are tricky things to make. It's really easy to do it wrong & expose a part of the system you didn't intend to, leave open a door you didn't mean to. I'll add my voice in caution. It may be obvious but security's got to be an issue on this.
posted by scalefree at 7:33 AM on July 29, 2011


Totally off-topic, but this


Well, for my part, I would like to write an iPhone app using this notional API that replaces everything cortex says with 'BUTTS LOL'.

Because it makes me laugh when he says that!
posted by stavrosthewonderchicken at 1:44 AM on July 29 [1 favorite +] [!]

-------

butts lol
posted by cortex at 1:59 AM on July 29 [10 favorites +] [!]


is one of the great things about metafilter.
posted by oddman at 7:52 AM on July 29, 2011 [2 favorites]


Would you like to comment on the article "Furry's, the new world order", please create an account or login with your MetaFilter account.

--

Hello Jessamyn, I see you're a member of MetaFilter, would you like to see our display of viking helmets?

--

User cortex is also watching the video "naked blondes with banjos"! Would you like to chat with him?

--

You have over 250 comments in the category Health and Fitness on AskMetafilter, congratulations! Click here for your chance to win a pedometer!
posted by Brandon Blatcher at 8:21 AM on July 29, 2011 [1 favorite]


Well, yes, but there are a lot of userscripts for Metafilter, including a killfile script. So that horse has already left the barn so to speak.

Yes, this is my opinion as well. And this is why a *read-only* api addresses most (if not all) of these concerns about site culture etc. I don't think the site would change if the users' scrapers employed a read-only api versus polling the content directly. It's just a matter of clarity and elegance.

It's a nice idea in theory, but what do you want to build with it? Maybe we could discuss some concrete examples of applications we would see with an API.

I would like a fresh copy of the site data so I can build experiments in search and language processing. The infodump lacks dumps of the actual info, so I have written various scrapers and crawlers, but they suffer from a big problem: polling for new data--especially comments, is really hard. It essentially amounts to fast polling across a moving 30 day window worth of posts, which is *super* lame. (Anyone have a better way out there to scrape comments?).

With a read only service, this would be really easy: Give me the latest comments (with a reading cursor) starting from comment x, paginated by blocks of 100, or 1000. The resources and effort expended by everyone (including the server!) would be minimal. Is it a question of resources? I would help!

The benefit for you guys lies in a culture of experimentation and development, new technologies and uses will emerge that could enhance the site. There is always a danger in the "new", but think of it this way, the Internet is changing rapidly, your userbase is changing rapidly, the "new" is coming! and why should metafilter remain the same, forever? How can it remain the same? I'm not suggesting that we destroy some beautiful thing here in an attempt to provoke change, far from it. What I'm suggesting is that we make Metafilter more useful, more valuable, more indispensable, and present itself ready to the horizon of possiblity. Metafilter is relatively stable right now, I agree, but we all know the internet is a harbinger and of what we cannot tell. Let us not be fearful of consequence, even as it is upon us right now!

Thank you, amen.
posted by kuatto at 9:16 AM on July 29, 2011


Well, again, if you're interested in doing a specific thing with site data you're welcome to drop me a line and talk about it a bit. It wouldn't be the first time, and to the degree that it's something that might have some value or interest to the community at large I'm generally pretty down with such stuff. It'd also mean cleaner data and less of us feeling slightly ooky about ongoing scraping of the site.

I'm a big fan of the new and the future, I'm down with you on the general enthusiasm, but "the future!" is not by itself a compelling for us to leap implementing an API service. Ideas needs to come first, at least where we are in the present.
posted by cortex (staff) at 9:26 AM on July 29, 2011


For Mix Party, I'd like to get JSON rather than XML for the 50 random MeMusic tracks, and maybe add tags or something so I can filter out holiday music while it's still not that time of the year.

But I can live with what I have because it is working, and nobody is really visiting Mix Party at the moment anyway. :(
posted by narwhal bacon at 9:41 AM on July 29, 2011


Cortex, look at it this way. MeFi as an institution has been very, very super reluctant to make UI changes. Fine; MeFi is fine and works and whatever, but with an API, some of us who want to could build different UIs, or apps, or 3rd party services, or... whatever. Things that are literally unimaginable because there are currently no services to imagine them against.

I am sort of at a loss as to how you could look around at Google and Amazon and last.fm and flickr and Posterous and all of the services that offer APIs, and all of the cool things people have built on them, and not think this was a really good use of MeFi's resources.
posted by DarlingBri at 9:54 AM on July 29, 2011


DarlingBri, the sites you mentioned are not unified communities sharing sometimes very personal stories and information. The are utilities. Maybe MetaFilter could become more like a utility, and maybe that would improve the community here, but we're not sure that's the case.

Building an API is no small task for such a small team. It requires the design time, build time, documentation time, and then added support time indefinitely in the future. We're not opposed to doing that if we feel there are enough benefits to the community. I personally love the idea of build it and the benefits will appear, but we don't have the excess time or manpower to justify that kind of faith. Combine that with the hesitation of the bad uses we can imagine, and it doesn't make sense to us right now.
posted by pb (staff) at 10:01 AM on July 29, 2011 [1 favorite]


We're not a behemoth and we're not a monetization-focused service-provider, which are two really big problems with making a comparison to those others. It is actively in their best interests to provide a ton of service hooks because they're all, to some extent, in the business of trying to become a ubiquitous presence in their service markets so they can continue to grow, grow, grow their userbases and solidify their hold on their respective markets and mindshare.

I can look at the APIs they've built, and the awesome stuff people have done with it, and be really glad that's happened, and still not have a clear picture of what Mefi's net gain from an API will be.

It's not failing to get the neatness of the unspecific possibilities; I've said more than once in this thread and many, many times in the past that I really like the idea of people doing cool stuff with Mefi-related data and content, and have personally made an effort to make more data available than was before I came onboard.

But there has to be something more than "but it could be cool!" to justify the design and implementation difficulty of putting a robust API together, the resource costs involved in servicing it, the policy and enforcement issues of making it work well for those who want to use it in good faith while also preventing misuse or overuse, and the great big tarpit of finding a balance between people who want data to be more easily retrievable and manipulable and folks who aren't actually terribly comfortable with that idea at all.

We're not a data outlet, we're a community site. We have limited resources and a desire not to bork the site or the community dynamic in service of untested notions, no matter how shiny, and an API is a very shiny but also a very big and very untested notion.

If people have ideas—not just the idea that future ideas will occur, but actual specific "this is a thing I wish I could implement right now, but for access to the data" ideas—they are totally welcome to get ahold of me to talk about it. If it's something that's practical and something we're okay with, I will seriously try to help them make it happen. If it's something that we're not so okay with, I'll explain why.
posted by cortex (staff) at 10:10 AM on July 29, 2011 [2 favorites]


API access should cost $50, weeding out the skimmers but granting access to people doing it for educational purposes. Each $50 payment should go directly to PB, who will add exactly one new API call for each $50 payment received. The first $50 contributor will receive getAllComments(postID[, skimMemes]), plus whatever the second API call PB feels like adding is.

skimMemes is boolean
posted by davejay at 10:37 AM on July 29, 2011 [1 favorite]


Building an API is no small task for such a small team.

Well for a full fledged api that may be true. But a read-only api that prints out json instead of html for just one or two data models is actually quite manageable. It's a very modest step I think. The site is built in cold fusion right? I volunteer to help :)

...less of us feeling slightly ooky about ongoing scraping of the site.

Just curious, what are policies towards site scrapers?

But there has to be something more than "but it could be cool!" to justify the design and implementation difficulty of putting a robust API together

I keep emphasizing a simple read-only api because that drives interest towards metafilter.com but eliminates tweet-like responses from mobile devices. The net effect is that more people are reading metafilter and the userbase becomes more involved and committed to the community. I'm talking about building diverse connections within the metafilter community. Is that "something more"? This is the same philosophy behind the RSS feeds here on the site. However I think that considering the data in terms of an api will focus internal development on what is important: metafilter.com and the data housed in it; this is a utility of the community.

So let me turn this conversation on it's head: What kind of internal features could the site admins implement with an API? What kind of useful scripts have you written to get notifications? Track events etc? I think this is to the point, the lexicon and grammar of a community reflects its tools.
posted by kuatto at 11:41 AM on July 29, 2011


Note: Brandon Blatcher is immune to Giant Donut...

This is exactly the kind of thing that worries me about an API, it'll be used to do blatantly false things.

But seriously, I can't think of a feature that an API will bring that's "HOLY MOLY MUST HAVE". That admins and pb are very conservative about implementing new features and I think that's a good thing.

The benefit for you guys lies in a culture of experimentation and development

The community went nuts over whether to display favorites or not. We're still rocking the '90s website look. The site use to allow images and nixed that. It nixed user made CSS pages. Various redesigns of the site have been mocked out and then not implemented. I don't think Metafilter is the place for cutting edge web and database experimentation.
posted by Brandon Blatcher at 11:58 AM on July 29, 2011


Just curious, what are policies towards site scrapers?

We prefer people not do it much, and we holler at people who do it in a sketchy way. The latter camp includes spammy content repackagers and people who don't understand the concept of throttling, including the occasional hungry hungry spider that thinks trying to request a ton of pages all at once is a good idea.

Small scrape jobs for noble purposes aren't really a big problem, and insofar as they are done without any foolishness aren't particularly detectable, which is why on that side it's just "prefer people not do it much". We'd rather not people suck up giant chunks of the site, we'd rather people not maintain hand-assembled copies of large chunks of the site's content without us knowing about it and knowing why, that sort of thing.

If you've got something that you're working on that scraping is your current only way to accomplish it, seriously let me know and I might be able to both get away from the scraping not-greatness issue and provide cleaner data directly.

I keep emphasizing a simple read-only api because that drives interest towards metafilter.com

More interest from who? In what capacity? In what context? What's expected outcome of generating that interest?

Is that "something more"?

Specific ideas, specific use cases, are something more. Restated forms of "cool stuff could happen" is still just "cool stuff could happen". I like cool stuff, I'm not against it happening, and I'm really not trying to give you a hard time here because I think in general we're probably excited about the same stuff.

What kind of internal features could the site admins implement with an API? What kind of useful scripts have you written to get notifications? Track events etc?

To the extent that we have internal tools that interact with the database, those could in principle be rewritten using an API if it existed, to what advantage I am not clear on. Adding another layer to existing tools to justify the existence of that added layer isn't really a big win for us. Aside from which, we're now talking about needing to have distinct access restrictions for classes of API users, since there's data we handle on the admin side that needs to never, ever be publicly available.
posted by cortex (staff) at 12:13 PM on July 29, 2011


More interest from who? In what capacity? In what context? What's expected outcome of generating that interest?

Think in terms of the RSS feed, the use-case is approximate.
posted by kuatto at 12:18 PM on July 29, 2011


JSON would be a nice start, but I'd also like to see MeFi's "accepts" header support YAML, BSON, ProtoBuf, MessagePack, Thrift, Avro, Hessian, XML, and ASN.1 XER.
posted by finite at 6:29 PM on July 29, 2011


cortex, re. corpus stuff: is the Markov generator ever coming back?
posted by Meatbomb at 6:48 PM on July 29, 2011


By the stones of the fathers, Markov will one day return.
posted by cortex (staff) at 7:36 PM on July 29, 2011


Majcher's old markov thing once spat out, for y6y6y6, the sentence "I'm only hoping that you'll hate me so
much that you'll lie awake in bed late and masturbate", which was just fantastic.
posted by kenko at 9:54 PM on July 29, 2011


There shouldn't really be a hard line break in there, though.
posted by kenko at 9:54 PM on July 29, 2011


By Grabthar's hammer, by the suns of Warvan, you shall have a viking helmet!
posted by arcticseal at 4:15 AM on July 30, 2011


I still have a Markovized quote of myself on my profile page. The day it returns will be a happy one.
posted by Johnny Assay at 6:45 AM on July 30, 2011


The mods seldom use one word where five hundred will be more confusing, but I think their answer here is 'no'.
posted by joannemullen at 8:45 PM on July 30, 2011


Why does it need to be in JSON? We already have per-thread RSS, but Atom is actually a much, much cleaner format. I would recommend rather then a custom JSON system we simply add an Atom feed and make sure that the right data is there. You could also use AtomPub to push changes, so clients could keep up to date with new posts.

APIs are great, but when every site uses their own API to do, basically, the same thing it becomes somewhat of a pain. So I think it's best to use existing standards when possible.
posted by delmoi at 9:32 PM on July 30, 2011


I'd be cool with Atom. An RSS or Atom feed for new comments would also solve the problem of polling as well.
posted by kuatto at 11:26 AM on July 31, 2011


You can use the RSS to get posts, and comments-latest.mefi to get the comments for those posts.

You'll probably need to reverse engineer the latter, but it should give you enough for whatever application you wish to write.
posted by seanyboy at 4:35 AM on August 1, 2011


Here's something I think would be killer: A user-centric view of MetaFilter. You'd give some app the usernames of users you want to 'follow', then it'd present a gestalt "Recent Activity" containing all of their comments and posts with links back to the threads they're from. Sorta like following someone in mlkshk, then seeing their stuff in your Friend Shake.

I was going to do something like this for myself, but I'd need my free time debt ceiling raised yet again.
posted by ignignokt at 7:08 PM on August 1, 2011


I was hoping to make a markov generator that dumpster-dived for most popular comments from a user, and then attempted to imitate said user at their best.

Then I learned the infodump doesn't contain comment text, and I'd need to webscrape for it. Which I'm not so good at, and where an API would be great, or an infodump with comment text built in. And scraping it doesn't seem so friendly, either.

But I don't even want to think how huge that file would be.

So, long story short, I'm debating doing it with Reddit once I learn "how do I parse JSON?" Yeah, I'm new to this stuff.
posted by mccarty.tim at 10:29 AM on August 2, 2011


On review, apparently someone already made that?

For the record, I was planning on using python, which I really want to learn to do something cool with. Maybe I'll just play around with pygame instead and leave the servers alone. They got problems of their own.
posted by mccarty.tim at 10:31 AM on August 2, 2011


There have been two website that run Metafilter through Markov chains: Genefilter, by majcher, which ran for about a year, and MarkovFilter, by cortex, which disappeared in the Great Hacking of Aught-Nine. But that's not to say that another incarnation wouldn't be appreciated, and the idea of weighting by favourites is an interesting angle.
posted by Johnny Assay at 6:00 PM on August 2, 2011


Yeah, as I look over it, to take advantage of favorites weighting, it looks like the easiest to code solution is to slurp the IDs of the comments from the infodumpster, and then slurp those comments from MeFi. Put them all in a big list, and you got yourself a markov stew going.

I foresee settings including a favorites threshhold (excluding comments with less than, say, 2 favorites), a userID number (I could search for names, but that's a pain and I assume mefites are savvy enough to copy that part of the URL), and a length.

Course, I'm not very good at webscraping or web frameworks yet, but I am playing with Scrapy and Django, and I like this idea, so maybe I'll post it to the gray or MeFi Projects if I ever bring it to fruition.

Here's a markov of a few of my top faved comments:
movement thrives is philosophically bankrupt and cables. BUSH: Do you I'm deeply offended, and stupid on the major issue of a nice, vaguely offensive haze when viewed from watching you!

posted by mccarty.tim at 2:25 PM on August 3, 2011


loyal? I learned it would hurt troop unity. If you how to tell you how to be obedient and end all over it! Who taught you can train a mental hospital) for the major issue from afar), the higher ups are acting like to me as a foodie. Trying new foods is almost never about trying to tell you I'm going to start a fight or anything, but that's not in adulthood. Depriving yourself that we have no Stephani Germanotta and cables. BUSH: Do you how to wash dishes and that we have special ed. If the existing troops are
posted by mccarty.tim at 2:27 PM on August 3, 2011


One more:
for a pig to not racist, making for gay troops, too. Every economics professor will work as the rest of their lives, or them contributing what this movement thrives is the major issue of a nice, vaguely offensive haze when it exploded. And then a foodie. Trying new foods is the kid gloves and welfare queens (but the older brother of way. I always ship my little cousins back home by helium balloon. Meanwhile, The Onion predicts McCain's response. Let's take those accusations almost never about trying to not eat a black president and how to act like to be
posted by mccarty.tim at 2:34 PM on August 3, 2011


So, I came up with a scraping script, but I feel really bad about it. What it does is it downloads each ENTIRE thread's HTML that a highly favorited comment is in, and then parses out the relevant comment. This seems really wasteful for more than just a handful of comments.

Is there an easier way to go about this? I could use RSS for each thread, but I feel like that's just a slightly lighter solution to a bloated problem, and were I to use the infodump, I'd still have to parse the page to get the title. A page with just the single comment would be handy, as would a single huge RSS feed of every comment a user has made so I could just pick out the ones I want by ID or other attributes. And on review, it looks like the mods have some reservation about making comments as easy to slurp as posts (so maybe this is deliberate). In particular, I read cortex on the concept of a huge comments.txt file for every public comment say he's intrigued by the idea, but understands it would probably alienate a lot of people. This is about halfway there.

I'm not asking for such things to be made (although it'd be nice), I'm just wondering aloud if they exist and I've missed them.

So, tl:dr: Unless I can find a more server-friendly and fast way to scrape comments, the favorited comments markov script I wrote will just be a learning exercise for myself.
posted by mccarty.tim at 5:54 AM on August 4, 2011


One scrape-free route to doing this right now for a specific account with that user's buy-in would be to have any interested user use the Export Your Comments function at the bottom of their Preferences page to get a dump of all their comments. Content of that dump looks like this:
...
2011-08-03 10:46:44.8
http://metatalk.metafilter.com/20866/BrowserOS-stats#912494
Do we have any numbers on how many people are Side-talkin'?
-----
2011-08-03 09:39:29.447
http://music.metafilter.com/5715/July-Challenge-Irrational-Songs#29862
I may be underthinking this, but I feel like it'd be sufficient to just make sure challenge playlists aren't timebound and 
then maybe reframe the presentation on the Challenges page to emphasize like "the latest challenge" vs. 
"the current challenge".  <br>
<br>
Maybe find a simple way to prompt a couple of older challenges as well.
-----
2011-08-03 08:12:52.473
http://www.metafilter.com/106133/Humble-Indie-Bundle-3-or-is-it-4#3849709
Limbo is super great as a little bit of art gaming.
-----
Write up a script to parse that for specific comments by subset and comment id; then hit the Infodump's favorites data (or the Infodumpster's calculated lists) to figure out which comment ids you need to go fishing for.

It's not slick as shit, but it's doable, and in the absence of us just unilaterally dumping everybody's comments out of context (either in a flat file or a la carte via some comment-viewer call) it's definitely a friendlier way to get at it than scraping a whole bunch of threads for individual comments, yeah.
posted by cortex (staff) at 7:17 AM on August 4, 2011


« Older Legitimate post v. Wackadoodlery   |   Mail Your Mix Or ELSE! Newer »

You are not logged in, either login or create an account to post comments