How to determine Metafilter Contribution Quotient March 16, 2003 11:50 PM Subscribe
I just wrote Dan Hersham asking this: Do you suppose there could be a way to determine Metafilter Contribution by word count? Could that be automated? Is this within the realm of possibility? How about by links in posts and comments, length per capita per person? And have you ever considered that MetaFilter is a gift culture , as is the so-called Hacker culture, where participants compete for prestige by giving time, energy, and creativity away in antagonistic cooperation, such as it is? Just wonderin'...
In my opinion the fastest way to do it is if you had direct access to the database. Since I think Dan scrapes MeFi rather than get info via the database. It would be ridiculously tasking to do it.
posted by riffola at 12:05 AM on March 17, 2003
posted by riffola at 12:05 AM on March 17, 2003
It would be ridiculously tasking to do it.
And in English dogs and cats can understand that is?
posted by y2karl at 12:06 AM on March 17, 2003
And in English dogs and cats can understand that is?
posted by y2karl at 12:06 AM on March 17, 2003
Well since all the comments by a member are not listed on one page (Note: not links to comments, 'cause that doesn't help much). It would be hard to gather total word count, number of links per comment, etc. So instead of loading one page and getting the info, he'd have to load say 1000 odd pages if you've made say 1300 comments.
posted by riffola at 12:12 AM on March 17, 2003
posted by riffola at 12:12 AM on March 17, 2003
I'm such a technopeasant. I'm used to magic. I expect miracles. I want a little cyber Maxwell's demon to do all the pointing, clicking and counting for me while I mindlessly scratch my ass and appreciate the running lights.
posted by y2karl at 12:21 AM on March 17, 2003
posted by y2karl at 12:21 AM on March 17, 2003
Upon reflection, I must replace gift culture with
Bermudas Triangle between gift culture, Department Of Motor Vehicles and Christmas dinner with the whole family.
posted by y2karl at 12:31 AM on March 17, 2003
Bermudas Triangle between gift culture, Department Of Motor Vehicles and Christmas dinner with the whole family.
posted by y2karl at 12:31 AM on March 17, 2003
stop italicizing gift culture, technopeasant.
posted by _sirmissalot_ at 12:56 AM on March 17, 2003
posted by _sirmissalot_ at 12:56 AM on March 17, 2003
Just guesstimating, y2karl, but your ~4000 comments probably work out to about 400,000 words. That puts you just shy of Leo Tolstoy's War and Peace. (500,000 words)
posted by Ljubljana at 1:51 AM on March 17, 2003
posted by Ljubljana at 1:51 AM on March 17, 2003
Just guesstimating, y2karl, but your ~4000 comments probably work out to about 400,000 words.
And, y2karl, you've earned valuable MeFi Prestige Points®!
These points may be exchanged at the current rate of one (1) regular visitor to your personal blog per 73,000,000 MeFi Prestige Points. MeFi Prestige Points are non-transferable. Offer may be cancelled at any time. Void where prohibited by law.
posted by Opus Dark at 2:25 AM on March 17, 2003
And, y2karl, you've earned valuable MeFi Prestige Points®!
These points may be exchanged at the current rate of one (1) regular visitor to your personal blog per 73,000,000 MeFi Prestige Points. MeFi Prestige Points are non-transferable. Offer may be cancelled at any time. Void where prohibited by law.
posted by Opus Dark at 2:25 AM on March 17, 2003
Sounds like Karma to me.
posted by Space Coyote at 3:42 AM on March 17, 2003
posted by Space Coyote at 3:42 AM on March 17, 2003
as I see it, you're basing this on some pretty arbitrary stuff -- who's to say that the number of links per (comment, 1000 words, whatever) or word count is the best reflection of "contribution"?
(and why quantify this stuff anyway? This seems like the Contribution Index -- nifty to look at once or twice, but ultimately not telling us anything that we don't already know. I know, just from hanging out on the site, who the interesting people are, who the trolls are, who the blowholes are. Who's good for a laugh, who thinks they're funny but isn't, and who's good for a thoughtful point that never would have occurred to me. I can't see a way of analyzing MeFi that would do this "real-world" knowledge any justice.)
I think you have hit it on the head, though; MeFi definitely sounds like a gift culture. The Web -- and MeFi members' life experiences -- is certainly abundant, and we're giving each other what we think is the best or most interesting. And those that have the most Whuffie around these parts tend to be those that make the most insightful comments or post the most interesting sites.
posted by Vidiot at 4:25 AM on March 17, 2003
(and why quantify this stuff anyway? This seems like the Contribution Index -- nifty to look at once or twice, but ultimately not telling us anything that we don't already know. I know, just from hanging out on the site, who the interesting people are, who the trolls are, who the blowholes are. Who's good for a laugh, who thinks they're funny but isn't, and who's good for a thoughtful point that never would have occurred to me. I can't see a way of analyzing MeFi that would do this "real-world" knowledge any justice.)
I think you have hit it on the head, though; MeFi definitely sounds like a gift culture. The Web -- and MeFi members' life experiences -- is certainly abundant, and we're giving each other what we think is the best or most interesting. And those that have the most Whuffie around these parts tend to be those that make the most insightful comments or post the most interesting sites.
posted by Vidiot at 4:25 AM on March 17, 2003
word count is nothing is nothing word count is nothing to hang to hang a hat on on. it will only spawn a spawn a contest to see who to see who can rack (heh he said rack) up the most words is all is all this would accomplish (could have said "do" here, but possibly there is a 'letter count' somewhere in our future future of the all encompassing future times way out there in the future, don't you know.
posted by quonsar at 4:39 AM on March 17, 2003
posted by quonsar at 4:39 AM on March 17, 2003
Ladies and gentlemen: Quonsar as Gertrude Stein.
posted by macadamiaranch at 5:25 AM on March 17, 2003
posted by macadamiaranch at 5:25 AM on March 17, 2003
It would be fairly easy to write a script that would sprider through all users, then all comments by each user. But you would have to figure out a way to throttle the requests or the spider would in effect be a DOS attack on the server.
From there getting things like word count and links per comment is trivial.
Of course it would be better to replicate the database on another machine and do the parsing there, but Matt would have to do that. And personally I don't see the value.
I suggest you learn some PHP and go for it. It's not hard.
posted by y6y6y6 at 5:26 AM on March 17, 2003
From there getting things like word count and links per comment is trivial.
Of course it would be better to replicate the database on another machine and do the parsing there, but Matt would have to do that. And personally I don't see the value.
I suggest you learn some PHP and go for it. It's not hard.
posted by y6y6y6 at 5:26 AM on March 17, 2003
And personally I don't see the value.
IN 400 POINT FREAKIN' LETTERS OF FIRE.
posted by machaus at 5:32 AM on March 17, 2003
IN 400 POINT FREAKIN' LETTERS OF FIRE.
posted by machaus at 5:32 AM on March 17, 2003
your task: find out how many times I use the words 'la cucaracha' on sundays.
posted by angry modem at 5:35 AM on March 17, 2003
posted by angry modem at 5:35 AM on March 17, 2003
To reconstruct the Mefi database you would have to spider 17,000 user pages, 24000 Metafilter threads, and 3100 Metatalk threads: about 45,000 page loads. How much bandwidth, in terms of $$, would this use? If Matt gives his okay for someone to do this at an off-peak time of day, and the funds for the cost of the project could be scraped together, this would be a worthwhile (and extremely interesting) project.
posted by PrinceValium at 5:36 AM on March 17, 2003
posted by PrinceValium at 5:36 AM on March 17, 2003
"How much bandwidth, in terms of $$"
Unless you were an asshole you'd distribute requests over time so that the cost would be zero. Since there is a flat rate for connectivity costs don't increase as the site gets busier. The problem isn't the cost of bandwidth. The problem is slowing the site to a crawl.
"this would be a worthwhile (and extremely interesting) project."
In what way? Still wondering why this is worthwhile beyond it's value as silly trivia.
posted by y6y6y6 at 5:54 AM on March 17, 2003
Unless you were an asshole you'd distribute requests over time so that the cost would be zero. Since there is a flat rate for connectivity costs don't increase as the site gets busier. The problem isn't the cost of bandwidth. The problem is slowing the site to a crawl.
"this would be a worthwhile (and extremely interesting) project."
In what way? Still wondering why this is worthwhile beyond it's value as silly trivia.
posted by y6y6y6 at 5:54 AM on March 17, 2003
Also there are copyright issues. In theory the database of all content here has value and is copyrighted. If you copy the data and then repackage it you violate Matt's IP rights. Maybe.
posted by y6y6y6 at 5:58 AM on March 17, 2003
posted by y6y6y6 at 5:58 AM on March 17, 2003
Still wondering why this is worthwhile beyond it's value as silly trivia.
1. Because knowing things is better than not knowing things.
2. This is interesting, and by interesting I mean academically and socially interesting, far beyond silly trivia. Sociologists and computer scientists would find the evolution of activity in a relatively controlled online enviornment fascinating, especially one with the population as large as Mefi's.
Also there are copyright issues. In theory the database of all content here has value and is copyrighted.
Individual posts belong to their authors. Aggregate statistics, however, are more murky. IANAL. Yet.
posted by PrinceValium at 6:11 AM on March 17, 2003
1. Because knowing things is better than not knowing things.
2. This is interesting, and by interesting I mean academically and socially interesting, far beyond silly trivia. Sociologists and computer scientists would find the evolution of activity in a relatively controlled online enviornment fascinating, especially one with the population as large as Mefi's.
Also there are copyright issues. In theory the database of all content here has value and is copyrighted.
Individual posts belong to their authors. Aggregate statistics, however, are more murky. IANAL. Yet.
posted by PrinceValium at 6:11 AM on March 17, 2003
this would be a worthwhile (and extremely interesting) project
yeah. like jacking off while sitting in a different chair.
posted by quonsar at 6:23 AM on March 17, 2003
yeah. like jacking off while sitting in a different chair.
posted by quonsar at 6:23 AM on March 17, 2003
quonsar, were you on the debate team in high school? just curious.
posted by PrinceValium at 6:33 AM on March 17, 2003
posted by PrinceValium at 6:33 AM on March 17, 2003
"Sociologists and computer scientists would find the evolution of activity...."
Still not seeing it. I think your sociologist and computer scientist friends need to find something worthwhile to study.
User x has .02 links per comment. User y has .1. Well. I see.
Since your community sample size is one, I don't think you can argue this has some sort of scientific value.
But by all means go for it. It's not hard.
posted by y6y6y6 at 6:51 AM on March 17, 2003
Still not seeing it. I think your sociologist and computer scientist friends need to find something worthwhile to study.
User x has .02 links per comment. User y has .1. Well. I see.
Since your community sample size is one, I don't think you can argue this has some sort of scientific value.
But by all means go for it. It's not hard.
posted by y6y6y6 at 6:51 AM on March 17, 2003
maybe this will help - see the (1,2,3) part. It is not the statistic itself, .02 links per comment, but the distribution of these values and how they change over time.
posted by MzB at 7:29 AM on March 17, 2003
posted by MzB at 7:29 AM on March 17, 2003
quonsar, were you on the debate team in high school? just curious.
kicked off ignomiously.
posted by quonsar at 7:58 AM on March 17, 2003
kicked off ignomiously.
posted by quonsar at 7:58 AM on March 17, 2003
note: i still attend the debates, though my participation these days is limited to shouting occasional non sequiters from the bleachers.
posted by quonsar at 8:13 AM on March 17, 2003
posted by quonsar at 8:13 AM on March 17, 2003
I just want to get this straight. This place is about to be crushed by the weight of non-stop war coverage due to the lack of a post rating system (not that it is the silver bullet), and we're concerned with this minutiae? If no one reads them out of pure frustration, do my 400 words per post still exist?
posted by machaus at 8:25 AM on March 17, 2003
posted by machaus at 8:25 AM on March 17, 2003
I think that this idea could generate a lot of data, but no information.
posted by SpecialK at 9:09 AM on March 17, 2003
posted by SpecialK at 9:09 AM on March 17, 2003
if matt's happy enough about the idea to accept that kind of screen-scraping then surely he could just burn a backup of the db to cd and post it to whoever was interested...
posted by andrew cooke at 10:04 AM on March 17, 2003
posted by andrew cooke at 10:04 AM on March 17, 2003
I responded to y2karl's email this morning, but just saw this thread now. My reply:
It would be possible, but impractical. Right now, the script queries user pages. In order to get a word count, it would have to query every comment the user had ever made, and for folks like Miguel, that could
be a lifelong process
It's Hersam by the way, not Hersham. For some reason, people always want to add the second 'h'.
posted by jaden at 10:06 AM on March 17, 2003
It would be possible, but impractical. Right now, the script queries user pages. In order to get a word count, it would have to query every comment the user had ever made, and for folks like Miguel, that could
be a lifelong process
It's Hersam by the way, not Hersham. For some reason, people always want to add the second 'h'.
posted by jaden at 10:06 AM on March 17, 2003
it will only spawn a spawn a contest to see who to see who can rack (heh he said rack) up the most words is all is all this would accomplish
Au contraire, Quonsaire--I think the Highest Contribution Indexes list on the MeFi Toplists has had a moderating influence on the chattier members. (from time to time) I think a word count index would have the same effect.
And it's not like anyone's bombing civilians or anything *thinks* Oh, wait...
Ladies and gentlemen: Quonsar as Gertrude Stein.
I hope this doesn't make madamejujujive Alice B. Toklas...
But if not her, then who?
posted by y2karl at 10:11 AM on March 17, 2003
Au contraire, Quonsaire--I think the Highest Contribution Indexes list on the MeFi Toplists has had a moderating influence on the chattier members. (from time to time) I think a word count index would have the same effect.
And it's not like anyone's bombing civilians or anything *thinks* Oh, wait...
Ladies and gentlemen: Quonsar as Gertrude Stein.
I hope this doesn't make madamejujujive Alice B. Toklas...
But if not her, then who?
posted by y2karl at 10:11 AM on March 17, 2003
Jesus H Kee-rist!--excuse me, jaden... We all must have a surname spell check, you know, like the way our Spell Checker gives Aztecs for zydeco. Lucky you.
posted by y2karl at 10:18 AM on March 17, 2003
posted by y2karl at 10:18 AM on March 17, 2003
I think I would find the stats interesting, but I hesitate to grab any other pages besides the user page without Matt's okay. I emailed him a while ago to find out how he felt about it, but I never got a reply. I was considering a query of all 17,000+ users to make sure the top 25 lists were accurate. I don't want to cause any undue strain on the server.
As discussed previously, a dump of the database would be a better solution for computing these types of stats.
posted by jaden at 10:46 AM on March 17, 2003
As discussed previously, a dump of the database would be a better solution for computing these types of stats.
posted by jaden at 10:46 AM on March 17, 2003
" I was considering a query of all 17,000+ users"
A better way would be to loop through the numbers for each thread and save that locally. Then open each of those and parse the comments back into your own database. So basically you're grabbing the raw goods and then recreating the database. This saves you a lot of trouble getting from the user page to each comment, requires substantially fewer page requests, and allows you to do all sorts of future processing without going back to MetaFilter.
posted by y6y6y6 at 11:22 AM on March 17, 2003
A better way would be to loop through the numbers for each thread and save that locally. Then open each of those and parse the comments back into your own database. So basically you're grabbing the raw goods and then recreating the database. This saves you a lot of trouble getting from the user page to each comment, requires substantially fewer page requests, and allows you to do all sorts of future processing without going back to MetaFilter.
posted by y6y6y6 at 11:22 AM on March 17, 2003
Alice B. Toklas?? Well I guess there are worse people to be, y2karl, but I prefer to think of myself in these terms.
posted by madamjujujive at 1:26 PM on March 17, 2003
posted by madamjujujive at 1:26 PM on March 17, 2003
Statistics are interesting and maybe they would divert our attention from pointless bickering with each other over who is the worst poster/commenter/troll/communist/fascist/jew-hater/whatever (for a few hours at least).
posted by dg at 2:30 PM on March 17, 2003
posted by dg at 2:30 PM on March 17, 2003
A better way would be to loop through the numbers for each thread and save that locally.
I was only going to query the user pages to calculate everyone's CI, not their comment count.
That's a good suggestion for the word count though.
posted by jaden at 2:35 PM on March 17, 2003
I was only going to query the user pages to calculate everyone's CI, not their comment count.
That's a good suggestion for the word count though.
posted by jaden at 2:35 PM on March 17, 2003
Would someone else's quoted words count?
posted by thomcatspike at 3:34 PM on March 17, 2003
posted by thomcatspike at 3:34 PM on March 17, 2003
Thomas - No reliable way to parse that out.
Like I said, though, this kind of stats-gathering will create a lot of meaningless data and virtually no meaningful information. I admit I'm confused as to what the purpose of it will be.
posted by SpecialK at 4:42 PM on March 17, 2003
Like I said, though, this kind of stats-gathering will create a lot of meaningless data and virtually no meaningful information. I admit I'm confused as to what the purpose of it will be.
posted by SpecialK at 4:42 PM on March 17, 2003
{David Niven voice}
"Karl, I am thinking rather that one doesn't use the term "hacker" in analogy. Thats in the handbook ole boy, even i would not use that term. (lites cigarette). Perhaps the term makes me nervous. But the word count idea is meta...good, good idea i say...."
i think,
by unmperial thought:
the greatest dead-pan link goes to Madamjujujive.
(that receives an Abe Vigoda t-shirt and Bob Newhart coffee mug.......
posted by clavdivs at 4:45 PM on March 17, 2003
"Karl, I am thinking rather that one doesn't use the term "hacker" in analogy. Thats in the handbook ole boy, even i would not use that term. (lites cigarette). Perhaps the term makes me nervous. But the word count idea is meta...good, good idea i say...."
i think,
by unmperial thought:
the greatest dead-pan link goes to Madamjujujive.
(that receives an Abe Vigoda t-shirt and Bob Newhart coffee mug.......
posted by clavdivs at 4:45 PM on March 17, 2003
jonathan richman tonight ! : )
posted by sgt.serenity at 4:59 PM on March 17, 2003
posted by sgt.serenity at 4:59 PM on March 17, 2003
Alice B. Toklas?? Well I guess there are worse people to be, y2karl, but I prefer to think of myself in these terms.
Another illusion shattered.
posted by y2karl at 1:27 AM on March 18, 2003
Another illusion shattered.
posted by y2karl at 1:27 AM on March 18, 2003
jaden writes: I was considering a query of all 17,000+ users to make sure the top 25 lists were accurate. I don't want to cause any undue strain on the server.
I was curious about that myself, for fairly obvious reasons, and I did go through with this -- scraping all 17,000 user pages and putting them into my own database. It was slightly justifiable as practice for another project I was planning. I spread the page requests over two consecutive holiday weeks, mostly in the wee hours -- which really isn't much more bandwidth, or server load, than a single person visiting the site during the same period (it's the same as about 100 loads of the Corrie thread -- not that I'm suggesting everyone do this). I determined a few things in the process. First, MeFi's user pages are dangerously unstructured. Second, there were 3 or 4 old-time members who belonged on the top 25 lists who hadn't been discovered by the CI pages, which builds its list only by those queries that pass through it. Third, keeping track of the top 100 from that list -- I recently ran an update -- should easily be sufficient to keep the top 25 from becoming outdated, and indeed, it seems to manage quite well by itself. In other words, the users' curiosity about each other seems to generate the needed requests to keep the top 25 surprisingly accurate.
Fourth, writing this stuff may be easy in comparison to some other things, but making it even halfway automated is a lot of work for the reward. I have new respect for the efforts of Dan Hersam and even more so for Matt, and I'm not going to be asking for any ponies anytime soon. Finally, looking at my own contributions through the long end of the telescope, i.e. merely numerically, immediately made me re-evaluate the quality and value of those contributions, which are not measured numerically.
posted by dhartung at 10:29 PM on March 18, 2003
I was curious about that myself, for fairly obvious reasons, and I did go through with this -- scraping all 17,000 user pages and putting them into my own database. It was slightly justifiable as practice for another project I was planning. I spread the page requests over two consecutive holiday weeks, mostly in the wee hours -- which really isn't much more bandwidth, or server load, than a single person visiting the site during the same period (it's the same as about 100 loads of the Corrie thread -- not that I'm suggesting everyone do this). I determined a few things in the process. First, MeFi's user pages are dangerously unstructured. Second, there were 3 or 4 old-time members who belonged on the top 25 lists who hadn't been discovered by the CI pages, which builds its list only by those queries that pass through it. Third, keeping track of the top 100 from that list -- I recently ran an update -- should easily be sufficient to keep the top 25 from becoming outdated, and indeed, it seems to manage quite well by itself. In other words, the users' curiosity about each other seems to generate the needed requests to keep the top 25 surprisingly accurate.
Fourth, writing this stuff may be easy in comparison to some other things, but making it even halfway automated is a lot of work for the reward. I have new respect for the efforts of Dan Hersam and even more so for Matt, and I'm not going to be asking for any ponies anytime soon. Finally, looking at my own contributions through the long end of the telescope, i.e. merely numerically, immediately made me re-evaluate the quality and value of those contributions, which are not measured numerically.
posted by dhartung at 10:29 PM on March 18, 2003
You are not logged in, either login or create an account to post comments
posted by y2karl at 12:05 AM on March 17, 2003