Infodump updates: contact dates, comment length, metatalk closures, munging December 14, 2009 7:20 AM   Subscribe

Infodump update: we've added a few new features.

New since the August relaunch:

1. Comment length files. For folks interested in analyzing how the general size of a comment correlates to other aspects of site activity, you can now work with number-of-characters information about each comment on mefi, askme, meta and music. These are stored in new files separate from the existing commentdata files.

2. Metatalk thread closure information. We've had a "deleted" column in the postdata files previously, listing a 0 for undeleted and 1 for deleted threads, but now that column in the metatalk file can also have a value of 2 for closed threads and 3 for (rare) threads that are both closed and deleted.

3. Contact creation dates. If you're interested in looking at networking activity over time, you can now explicitly examine contact info in that light. Some of this info is approximate, since we didn't originally track creation date in that table. Details are on the wiki.

4. ID munging. On request from one user back in August, there's now an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list, swaps out their actual userid for a unique 7-digit fake id throughout the dump. It's a very low hurdle to identification, but it's there, for whatever that's worth. Folks doing any analysis that makes assumptions about userids themselves as meaningful values should be aware of and account for this in setting up their analyses.

My to-do list is now completely clear. If folks have other Infodump additions they'd like to see in the future, let me know.

Also, there's been some interesting graphs coming out of this post-November thread, in case you're interested in datawankery but missed it somehow.
posted by cortex (staff) to MetaFilter-Related at 7:20 AM (151 comments total) 4 users marked this as a favorite

And please let's just not talk about this. I'm still on my first cup of tea.
posted by cortex (staff) at 7:20 AM on December 14, 2009 [6 favorites]


aww... I was just coming in here to poke fun about the story in the blue.

but, since you've put the kibosh on that, all I can say is thanks for the good work. I've never used the infodump myself, but I think it's super cool that you guys go to the effort to make this information so easily available.

/tip of the hat.
posted by 256 at 7:26 AM on December 14, 2009


wag of the finger.
posted by gman at 7:27 AM on December 14, 2009 [1 favorite]


pump of the rump.
posted by Secret Life of Gravy at 7:31 AM on December 14, 2009


Also, since I'm not positive I hadn't made a couple small script revisions since the last run of the dump, I'm regenerating it now. The contacts data is I think the only section that would be affected by this, however, so if you're raring to go on the rest of it you're all set immediately.
posted by cortex (staff) at 7:33 AM on December 14, 2009


diddle of the piddle
posted by cgc373 at 7:34 AM on December 14, 2009


Stan: Those New Yorker kids are gonna be here any second, and we still don't know what queef means.
Kyle: Well, we can still pre-tend like we know what it means.
Stan: No, they'll catch on. Hey, wait a minute. I've got a great idea. Let's make up our own word. We can make up a word, and then use it, …and then they'll act like they know it, and then we'll bust 'em.
Kyle: Yeah. That'll make 'em look stupid!
Stan: What word could we make up?
Kyle: How about… finkleroy?
Stan: No, uhno, not finkleroy.
Cartman: How about geebo, or, or mung?
Stan: Yeah, mung.
Kyle: Mung's good.
Stan: Sh. Here they come. [the New Yorkers arrive]
Tough Guy 1: Well hel-lo there, queefs. All bundled up nice and warm, are we?
Stan: You know what you guys are? You guys are nothing but mung?
Tough Guy 2: We're not mung. You're mung.
Kyle: Oh, so you know what mung means, hunh?
Tough Guy 1: Of course we know what mung means!
Athlete: Yeah, D'ya think we wouldn't know what mung means? [Stan laughs, then Kyle, Cartman, and Kenny join in]
Stan: We busted you!
Kyle: Hyeh. Yeah. Mung isn't even a word! We made it up! [they resume laughing]
Tough Guy 1: You guys are even stupider than I thought! Mung is so a word!
Stan: [the boys stop laughing] It is?
New Yorkers: [behind the two toughs and two others] Yeah. [they turn around]
Athlete: It sure is.
New Yorker 1: Yeah.
New Yorker 2: Uh huh. [turns around]
Tough Guy 1: Yeah! Mung is the stuff that comes out when you push down on a pregnant woman's stomach.
Kyle: [winces] Eewww.
Stan: Ooogh.
Tough Guy 1: You guys didn't know that? [the rest of the New Yorkers turn around and they all laugh. Then, the rest of the 4 million+ kids laugh with them] Come on, guys. Let's get away from these rednecks before we get redneckasitis, or somethin'! [they leave. Stan, Kyle, and Kenny turn on Cartman]
Stan: You dumbass, Cartman!
Kyle: Yeah! Next time you make up a word, don't make up one that already exists!
posted by jefficator at 7:56 AM on December 14, 2009


My to-do list is now completely clear.

Perfect! Could you go run some errands for me?
posted by amyms at 8:00 AM on December 14, 2009


Quiver of the liver.
posted by the littlest brussels sprout at 8:01 AM on December 14, 2009


I was playing around with the infodump just yesterday. I wanted to know how many users had achieved the triple-K, which I define as >= 1000 comments on each of MeFi, Ask, and MeTa. To my surprise it was a relatively low number: 44 people.

I fess up: I'm 12 blue comments away from achieving this and I was a little curious how common it was.
posted by Rhomboid at 8:17 AM on December 14, 2009


I guess it doesn't hurt, but for anyone that wants the info, the userid masking is so easy to get around as to seem nearly pointless.
posted by gsteff at 8:17 AM on December 14, 2009


Fart of the heart?
posted by Captain Cardanthian! at 8:22 AM on December 14, 2009


cortex is a self-linker!
posted by cjorgensen at 8:28 AM on December 14, 2009


Rhomboid: "To my surprise it was a relatively low number: 44 people."

Considering I'm 2.5-K (I've got exactly 500 in the green), and I consider myself far from prolific, I'm shocked that it's that low.
posted by Plutor at 8:30 AM on December 14, 2009 [1 favorite]


Yeah, me too. I didn't investigate any farther but I suspect that there's shitloads of people with a kilo of blue and a crapload with a kilo of green, and a ton with both, but also having the kilo of grey really narrows the field.
posted by Rhomboid at 8:31 AM on December 14, 2009


Here's the output of the script.
posted by Rhomboid at 8:36 AM on December 14, 2009


(And BTW that's with the Dec-6 dataset, not that I'd expect it to change much over a week.)
posted by Rhomboid at 8:37 AM on December 14, 2009


I was playing around with the infodump just yesterday. I wanted to know how many users had achieved the triple-K, which I define as >= 1000 comments on each of MeFi, Ask, and MeTa. To my surprise it was a relatively low number: 44 people.


I'd say "Challenge accepted," but I don't think there's any possible way I could make another 883 quality answers on the green in less than a long long time. I'm just not that smart.
posted by Caduceus at 8:49 AM on December 14, 2009


Time certainly does seem to be a factor. Of the 44, only 4 signed up in 2006 or later; 25 were pre-$5-signup users.
posted by Rhomboid at 9:02 AM on December 14, 2009


I'm another 2.5Ker (500 AskMe answers). I'd be curious how many people break the 1000 comment barrier in MetaTalk.
posted by Kattullus at 9:08 AM on December 14, 2009


To my surprise it was a relatively low number: 44 people.

only 4 signed up in 2006 or later; 25 were pre-$5-signup users.


I guess I shouldn't be surprised to be on this list - my nickname when I was a kid was motormouth.
posted by rtha at 9:09 AM on December 14, 2009


So, is that the cabal I've heard so much about?
posted by MadamM at 9:10 AM on December 14, 2009


Ohh, I'm getting closer! Is there a prize? I like prizes!
posted by iamkimiam at 9:11 AM on December 14, 2009


I wanted to know how many users had achieved the triple-K

*checks profile*

Damn it. I guess I'm going to be spending some time in Askme.
posted by quin at 9:13 AM on December 14, 2009


Burhanistan: "24Heh. And even stranger"

Somebody should drop him a MeFi Mail..."Don't touch anything!"
posted by iamkimiam at 9:15 AM on December 14, 2009 [1 favorite]


There are only 44 and I'm one of them? Maybe I should spend less time here.
posted by grouse at 9:16 AM on December 14, 2009 [1 favorite]


Also, FLAGGED.
posted by iamkimiam at 9:19 AM on December 14, 2009


Here's the full breakdown:

blue-K: 634
green-K: 287
grey-K: 114
blue-K + grey-K: 108
blue-K + green-K: 142
grey-K + green-K: 45
triple-K: 44

posted by Rhomboid at 9:23 AM on December 14, 2009


Man I wish google chart's venn diagrams weren't such shit.
posted by cortex (staff) at 9:28 AM on December 14, 2009


I'm deeply disturbed to be in a group of people 108 strong. If it turns out we all spent too much time watching Lost I think I might just 'splode.
posted by Kattullus at 9:33 AM on December 14, 2009


Kattullus: "If it turns out we all spent too much time watching Lost I think I might just 'splode."

If I said "there's no such thing as 'too much time watching Lost'" would that help your head situation?
posted by Plutor at 9:41 AM on December 14, 2009


So green K is for Superman, everyone knows that; I'm pretty sure blue K works on Bizarros, but what's this grey K for?
posted by jtron at 9:52 AM on December 14, 2009 [1 favorite]


I stopped watching when they had that incredibly contrived mudwrestling scene at the end of season 3. Did it ever pick up again?
posted by Kattullus at 9:52 AM on December 14, 2009


Obligatory "I want Markovfilter to come back" post.
posted by flatluigi at 10:06 AM on December 14, 2009 [1 favorite]


I'm deeply disturbed to be in a group of people 108 strong

I think maybe you, me, and the other hundred and six should get together and establish some kind of club.

Well call ourselves the MetaMilitia and sit around drinking and cursing the cabal. (all while secretly trying to figure out how to get in ourselves.)

Not that there is one, of course.
posted by quin at 10:13 AM on December 14, 2009


There are only 44 and I'm one of them? Maybe I should spend less time here.
posted by grouse


Uhhh ditto. And that's not counting my sockpuppet.

But I like calling us The Fab 44, so there's that.
posted by The Deej at 10:14 AM on December 14, 2009


I'm in. Let's rumble!
posted by The Deej at 10:18 AM on December 14, 2009


I think "triple K" has bad connotation though.
posted by The Deej at 10:19 AM on December 14, 2009 [1 favorite]


I guess the SQL DDLs need to be reworked to include the new columns too.
posted by smackfu at 10:21 AM on December 14, 2009


I think "triple K" has bad connotation though.

Agreed. Maybe the "Three G" instead? Or, as I like to call you, "Bastards who have answered more questions than me".
posted by quin at 10:39 AM on December 14, 2009


Matt's Minions?
posted by The Deej at 10:46 AM on December 14, 2009


Huh, I am 6 away from elite 44 status. Time for an alphabet thread!
posted by Rumple at 11:06 AM on December 14, 2009


Let's just shorten it to 'Dumpers.

Good, it's settled... 'Dumpers.
posted by the littlest brussels sprout at 11:06 AM on December 14, 2009


So what's the status of MarkovFilter? I tried finding it the other night but ran into some 404's :(
posted by localhuman at 11:07 AM on December 14, 2009 [1 favorite]


It remains in Unsolved Security Issue territory. Me and pb have talked about a couple possible ways to try and make it resurrectable without any worries, but it's a little complicated and just hasn't been that high of a priority.
posted by cortex (staff) at 11:32 AM on December 14, 2009


I'm a bit imbalanced, so I need to post more in Metatalk.
posted by smackfu at 11:33 AM on December 14, 2009


geek of the week
posted by Cranberry at 11:34 AM on December 14, 2009


smackfu: I'm a bit imbalanced, so I need to post more in Metatalk.

I believe that it's in fact that because you're balanced you don't post that much in MetaTalk.

I was mighty perturbed to realize that I am fast approaching 2000 MetaTalk comments. And I seriously never thought of myself as a top 100 MetaTalk commenter.
posted by Kattullus at 11:37 AM on December 14, 2009


I guess the SQL DDLs need to be reworked to include the new columns too

I'm on it.
posted by FishBike at 11:56 AM on December 14, 2009 [1 favorite]


Huh, I'm only 36 35 comments away from being blue-K + grey-K.
posted by Pronoiac at 12:22 PM on December 14, 2009


That's a great list.
posted by OmieWise at 12:33 PM on December 14, 2009


I'm
posted by OmieWise at 12:34 PM on December 14, 2009


only
posted by OmieWise at 12:34 PM on December 14, 2009


about
posted by OmieWise at 12:34 PM on December 14, 2009


100
posted by OmieWise at 12:34 PM on December 14, 2009


comments
posted by OmieWise at 12:34 PM on December 14, 2009


from
posted by OmieWise at 12:34 PM on December 14, 2009 [1 favorite]


being
posted by OmieWise at 12:34 PM on December 14, 2009


over
posted by OmieWise at 12:34 PM on December 14, 2009


2k
posted by OmieWise at 12:35 PM on December 14, 2009


in
posted by OmieWise at 12:35 PM on December 14, 2009


all
posted by OmieWise at 12:35 PM on December 14, 2009


three.

I really didn't mean this joke to go on this long.
posted by OmieWise at 12:35 PM on December 14, 2009


Ok, I've updated the SQL scripts to create and load Infodump databases. If you have any trouble with these, MeFiMail me.

Also, as I don't have MySQL, I can't test the scripts for it, so would someone please volunteer to try the two MySQL scripts and let me know if they work for you?
posted by FishBike at 12:46 PM on December 14, 2009


Hey cortex, on a related note, I noticed a few months back that MarkovFilter was down. (I wanted to show a friend what distilled me sounded like and was disappointed that he had to make do with undistilled me.) It's still down. Is this intentional? Any chance we could get it running again?
posted by painquale at 12:52 PM on December 14, 2009


Since August, it's been an open question whether the jump in contacts was due to spouses or the Tenth Anniversary parties. I've compiled a graph of contact adding dates, using the new timestamp data (thanks, cortex!). You can see the stunning conclusion there.

spoiler: *shruggo*
posted by Pronoiac at 12:54 PM on December 14, 2009


Wow, the 3K and 2-out-of-3K numbers are indeed suprisingly low. Can we attach usernames to these numbers somehow or is that unkosher? I'm curious now.
posted by goodnewsfortheinsane at 12:59 PM on December 14, 2009


Er, yeah I just looked at the pastebin dump. Can you maybe do this for the "2 out of 3"s?
posted by goodnewsfortheinsane at 1:00 PM on December 14, 2009


Incidentally, I love the comment length data. Some more charts will be forthcoming as a result of this.

I'm not too sure what the MeTa thread closure information will be good for, but I assume somebody had a use for it already, and that's why it's there now.

Contact creation dates, even approximate ones, have a variety of cool uses. They're useful in the analysis of how a person's contact network influences their use of the site, because now we can tell roughly what that network looked like on any particular date, instead of just what it looks like now. And this adds another dimension to the contact network visualization possibilities.

I have a couple of questions about the userid munging:
  1. Is the userid in the usernames table also munged? (From a quick inspection, I think it's not.)
  2. Is a munged userid munged to the same value in all the tables where it appears, other than the usernames table? (I hope it is).
I'm hoping that the way this works is, if you don't include the username in your query, it works exactly as it would have before userid munging and you can match up the various records from different tables. And if you do include the usernames table in your query, you either get nothing for the munged userids or you get a null for the username, depending on how you do the join.
posted by FishBike at 1:05 PM on December 14, 2009


Hmm. I'm 522..er 521 grey comments out of contention.
I don't think I have enough _________ to make up that ground before 2k becomes the new 1k.
posted by juv3nal at 1:15 PM on December 14, 2009


We need more Metafilter Achievements, with unlockable DLC and special profile themes.
posted by backseatpilot at 1:16 PM on December 14, 2009 [3 favorites]


Is the userid in the usernames table also munged? (From a quick inspection, I think it's not.)

Nope. That's left as normal; low a barrier as the munging is, it'd be even lower if the username table explicitly identified the munged id with the username, heh.

Is a munged userid munged to the same value in all the tables where it appears, other than the usernames table? (I hope it is).

It is. Every munge is unique from every other munge (through a not-very-fancy arithmetic transform), and the munge should remain static over time—the transformation function won't need revisiting until we close in on userid 1,000,000 and at that point we'll all be driving around the post-apocalyptic landscape in dunebuggies, wearing leather and spikes and discussing who it is, exactly, that runs Bartertown.

Obviously it's possible to demunge manually where necessary, but for politeness' sake it'd be good to take apparent munging as a hint to not include a munged user's identity in any name-based results generated from analysis.
posted by cortex (staff) at 1:19 PM on December 14, 2009


Obviously it's possible to demunge manually where necessary, but for politeness' sake it'd be good to take apparent munging as a hint to not include a munged user's identity in any name-based results generated from analysis

Cool, thanks for the clarifications. Yeah, I assumed the point was to prevent accidental disclosure of names for those MeFites who don't want to appear in the results of our datawankery efforts. This is a good way of doing that as it would be hard to include them by accident.

Anyone who includes them on purpose, well, I guess they just get the rubber hose treatment, or similar.
posted by FishBike at 1:33 PM on December 14, 2009


I've got over 3,000 comments on the blue, 2,000 on the grey, and 2,500 favorites received. I've also met almost 150 mefites. I'm going to play with the dump while I wait to find out what my prize is.
posted by Eideteker at 1:38 PM on December 14, 2009


Oh, INFOdump. Shit.

I need to wash my hands.
posted by Eideteker at 1:38 PM on December 14, 2009


Also, as I don't have MySQL, I can't test the scripts for it, so would someone please volunteer to try the two MySQL scripts and let me know if they work for you?

Thanks! It seems to be fine. I did get some warnings on a few rows in a few tables but that might have been existing (and I don't really know how to tell what the warning was for in MySQL).
posted by smackfu at 1:56 PM on December 14, 2009


Although... apparently MySQL does not like doing joins between two three-million-row tables. That makes the length tables somewhat less useful. Maybe I can just add length to the commentdata table.
posted by smackfu at 2:12 PM on December 14, 2009


Here's a top 25 list of users with more than 500 comments in the blue, green, and grey combined, by ratio of grey comments to blue+green comments.
  1. pb: (9.570:1)
  2. If I Had An Anus: (3.067:1)
  3. and hosted from Uranus: (2.811:1)
  4. gramschmidt: (2.727:1)
  5. timeistight: (2.669:1)
  6. cortex: (2.510:1)
  7. little e: (1.865:1)
  8. cgc373: (1.524:1)
  9. dg: (1.352:1)
  10. Kwine: (1.331:1)
  11. Cranberry: (1.299:1)
  12. Plutor: (1.193:1)
  13. breezeway: (1.180:1)
  14. It's Raining Florence Henderson: (1.100:1)
  15. team lowkey: (1.083:1)
  16. SpiffyRob: (1.047:1)
  17. Dave Faris: (0.996:1)
  18. jessamyn: (0.989:1)
  19. carsonb: (0.957:1)
  20. Alvy Ampersand: (0.929:1)
  21. FishBike: (0.923:1)
  22. Ethereal Bligh: (0.920:1)
  23. CKmtl: (0.918:1)
  24. anapestic: (0.916:1)
  25. gleemax: (0.906:1)
Uh oh, I'm on that list! And I'm the newest MeFite on that list, too.
posted by FishBike at 2:13 PM on December 14, 2009 [1 favorite]


If we got achievements for Metafilter, I'd be tempted to aim for the Triple-K (100GC). As it is... whatever... have your elite cabal.
posted by yeti at 2:18 PM on December 14, 2009


If we got achievements for Metafilter, I'd be tempted to aim for the Triple-K (100GC).

At great personal peril, I hacked into the sekrit cabal-only site and found this.
posted by juv3nal at 2:48 PM on December 14, 2009


Here's a top 25 list of users with more than 500 comments in the blue, green, and grey combined, by ratio of grey comments to blue+green comments.

It's interesting to see people who spend a disproportionate amount of time on one specific subsite. How about the same list for [green vs grey/blue] and [blue vs green/grey]?
posted by chrisamiller at 2:51 PM on December 14, 2009


I expect that would be much more lopsided. There are tons of people that primarily just hang out on AskMe or the blue and infrequently venture to other parts of the site.
posted by Rhomboid at 3:00 PM on December 14, 2009


there's now an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list

Hi cortex, can you please put my ID on the munge list? Thanks!
posted by Blazecock Pileon at 3:18 PM on December 14, 2009


pump of the rump.

Throb of the knob. Angle of the dangle. And, away we go.
posted by ericb at 3:21 PM on December 14, 2009


Should be possible to do a ternary plot of proportion of users' comments in each of blue, grey, green.
posted by Electric Dragon at 3:23 PM on December 14, 2009


Consider yourself munged, BP. The Infodump regenerates weekly (on Sunday morning, I think), so changes to the munge list will take up to a week to percolate into the data files.
posted by cortex (staff) at 3:24 PM on December 14, 2009


Thanks, cortex.
posted by Blazecock Pileon at 3:34 PM on December 14, 2009


Never mind - I got bored and crunched them myself. This shows users who disproportionately contribute to one subsite.

Same criteria as FishBike - onlye users that have 500 or more comments on all three sites combined. This obviously excludes those who have only contributed to one of the three subsites to avoid divisions by zero.

Blue : (Green+Grey)

1) Foosnark 1112.0
2) tomplus2 962.0
3) HTuttle 950.5
4) spazzm 941.5
5) peeping_Thomist 823.0
6) rough ashlar 768.333
7) raygirvan 656.0
8) Cerebus 636.0
9) tgrundke 594.0
10) Western Infidels 579.0
11) Relay 540.0
12) QuietDesperation 539.0
13) dwivian 530.0
14) mike3k 513.0
15) FormlessOne 476.0
16) kozad 436.25
17) zoogleplex 424.6
18) kliuless 415.833
19) Elim 369.5
20) Mitrovarr 318.25
21) fleener 273.333
22) CynicalKnight 263.833
23) digaman 258.333
24) sfts2 251.333
25) kgasmart 250.0



Green : (Grey+Blue)

1) thinkingwoman 783.333
2) 4ster 650.0
3) rhizome 626.2
4) cooker girl 590.0
5) autojack 566.0
6) londongeezer 501.0
7) mu~ha~ha~ha~har 381.0
8) sully75 374.666
9) JimN2TAW 334.5
10) peanut_mcgillicuty 333.0
11) Nelsormensch 314.5
12) christinetheslp 270.0
13) jon1270 239.25
14) vanoakenfold 216.333
15) anadem 207.75
16) PatoPata 200.5
17) Gerard Sorme 171.333
18) bkeene12 169.0
19) devilsbrigade 135.8
20) Sara Anne 131.8
21) crazycanuck 119.0
22) sharkfu 110.25
23) Good Brain 108.047
24) advicepig 105.6
25) micawber 102.6



Grey : (Blue+Green)

1) pb 9.57
2) If I Had An Anus 3.067
3) and hosted from Uranus 2.81
4) gramschmidt 2.727
5) timeistight 2.668
6) cortex 2.51
7) little e 1.864
8) mathowie 1.557
9) cgc373 1.523
10) dg 1.352
11) Kwine 1.33
12) Cranberry 1.299
13) Plutor 1.192
14) breezeway 1.179
15) It's Raining Florence Henderson 1.099
16) team lowkey 1.082
17) SpiffyRob 1.046
18) Dave Faris 0.996
19) jessamyn 0.989
20) carsonb 0.957
21) Alvy Ampersand 0.928
22) FishBike 0.923
23) Ethereal Bligh 0.92
24) CKmtl 0.917
25) anapestic 0.916


BTW, FishBike, my numbers differ from yours for the grey - I've got mathowie in at #8, and he's absent from your list all together. The rest seem the same, though
posted by chrisamiller at 3:52 PM on December 14, 2009


"I'm going to play with the dump"

ew.
posted by mr_crash_davis mark II: Jazz Odyssey at 4:24 PM on December 14, 2009


BTW, FishBike, my numbers differ from yours for the grey - I've got mathowie in at #8, and he's absent from your list all together. The rest seem the same, though

Ha! That's because mathowie is the first user in the usernames table, and stupidly I tried to fix what looked like a bug in the data load script, and instead fixed the part of it that was right. It was skipping the first row in each file, which would be virtually unnoticeable for any table except username. Thanks for catching that!

I've re-updated the infodump_load.sql script to correct this, sorry to anyone who downloaded it earlier this evening. (The MySQL scripts I think are fine in this regard.)
posted by FishBike at 5:04 PM on December 14, 2009


I don't know what the munge list even is, but... I want to be on it.

And I know that's fucked up.
posted by flapjax at midnite at 5:38 PM on December 14, 2009


Oh, I just read the post. Yes, munge me, please. i wanna lunge into the munge.
posted by flapjax at midnite at 5:39 PM on December 14, 2009


Heh. Added.
posted by cortex (staff) at 5:45 PM on December 14, 2009


Do longer comments get more favorites? Yes. And graphing comment length frequency sure is regular.

(These are using the MeFi comment data only, rounding the comment length up to the nearest 10, and then weeding out any that then had a count of less than 1000. The AskMe data is essentially the same graphs.)

The only two zero-length comments with favorites: 1, 2
posted by smackfu at 5:47 PM on December 14, 2009 [4 favorites]


I thought it might be interesting to find the shortest best answers, but they are just "yes" or "no". So it's not.
posted by smackfu at 5:57 PM on December 14, 2009


I've got a lot of comments to make on the gray if I'm going to win this. *frantically composes erotic haiku involving decapods and traditional Bavarian garments in need of a polish*

I've also met almost 150 mefites.

I have 69 contacts that I claim to have met, but I have no idea what the actual number is. I suck at remembering names at the best of times, but give me a real name plus an internet name to remember, and throw in my excessive alcohol consumption over the summer and I'm hopeless. I know I've forgotten a decent number of names. It is also entirely possible that I've claimed to have met someone, when in reality I met someone with a remotely similar username. Though I have yet to receive any baffled mefi mail from a new contact questioning our alleged encounter.
posted by little e at 6:08 PM on December 14, 2009


I thought it might be interesting to find the shortest best answers, but they are just "yes" or "no". So it's not.

I'm curious what the relationship of comment length to "best answer" is. Do longer comments tend to be marked "best answer" more often than shorter comments?
posted by FishBike at 6:10 PM on December 14, 2009


smackfu: "Although... apparently MySQL does not like doing joins between two three-million-row tables. "

It doesn't mind unless your tables are totally un-indexed, which they are. Here's my recommendations for indexes to add to the MySQL file. Can't test it from here, though.
posted by Plutor at 6:12 PM on December 14, 2009


Oops, I meant to come back and mention that too. No primary keys in there either, which you've added too. Maybe Fishbike can swap out your create script for the one he has.
posted by smackfu at 7:10 PM on December 14, 2009


I thought I'd leave the indexes as the dreaded exercise for the reader, since useful indexes will depend on what queries you're running and what you're running them on. But if there's a set that will be generally useful, I can put a script or two for those onto that page with the other scripts, if anybody would like to send me a file.

Similarly, if defining a primary key makes a material difference with a particular database server platform, and anybody wants to send me an updated copy of the table creation script, I'll update the page with that.

I have a few other things I do with the Infodump data once it's loaded into a database, such as combining the data that's split into 4 separate files (for the 4 sub-sites) into a set of consolidated tables with a "siteid" field. I then transform the favorites data so it's easy to match up with those consolidated tables. If anybody would find that sort of stuff useful, let me know and I'll add that to the page too.
posted by FishBike at 7:25 PM on December 14, 2009


Is it possible to have an indexed sqlite database available for download? It would allow exportation to a variety of formats (csv, etc), and shouldn't be difficult to create. Having this format available with the text files would solve the various parsing issues that crop up and would reduce the amount of time needed to start using the data.
posted by null terminated at 7:50 PM on December 14, 2009 [1 favorite]


Also, since the database is static, it makes sense to index everything (where 'everything' is some subset of everything).
posted by null terminated at 7:51 PM on December 14, 2009


There's something oddly fitting, to me, that 2 of the top 3 posters to metatalk are literally assholes.
posted by Cold Lurkey at 8:51 PM on December 14, 2009


2 of the top 3 posters to metatalk are literally assholes.

And, iirc, the same person? or no?

I think I have met the most MeFites. True?
posted by jessamyn (staff) at 9:10 PM on December 14, 2009


Probably, although cortex caught up a bit with his nationwide tour. Alas, the contact details like "met" aren't in the dump. Blah blah privacy or some such.
posted by smackfu at 9:15 PM on December 14, 2009


According to her contacts, jessamyn has met 374 users; cortex stands at 295. DaShiv, who's been at a bunch of meetups, has 158, and Ambrosia Voyeur at 108. Many of the users I often see on local meetup threads have 30-60 contacts (yes I read meetup threads for places I don't live in).

There might, of course, be other people with a lot of "met" contacts, since I just did that by browsing around.
posted by Monday, stony Monday at 9:34 PM on December 14, 2009


(Did I ever miss someone: ThePinkSuperHero, at 278.
posted by Monday, stony Monday at 9:37 PM on December 14, 2009


I have 135.

DaShiv, I'm coming for you!

I miss DaShiv... anyone know why he disappeared?
posted by Kattullus at 9:42 PM on December 14, 2009


I can confirm through the infodump that jessamyn, cortex and ThePinkSuperHero have "met" the most people through contacts.
posted by Monday, stony Monday at 9:52 PM on December 14, 2009


SQL question: how can I treat all the commentdata_* tables as a single big table? Something like a view (call "it commentdata_all") with a primary key of (site,commentid). My SQL was never very good to start with, and I haven't done any since back when MoFi was so active they were having hosting problems.
posted by Monday, stony Monday at 11:03 PM on December 14, 2009


how can I treat all the commentdata_* tables as a single big table
UNION.
posted by Electric Dragon at 2:44 AM on December 15, 2009


I would be really interesting in seeing a graph of favorites over time both site total and by user, because I vaguely remember having a pretty middling number of favorites until like nine months ago when I achieved some sort of SNARK APOTHEOSIS. I was curious whether favorites behavior had changed or I just posted a shitload more.
posted by Optimus Chyme at 7:21 AM on December 15, 2009


Man I wish google chart's venn diagrams weren't such shit.

Triple-K full breakdown
posted by Akeem at 7:25 AM on December 15, 2009 [1 favorite]


Monday, stony Monday: "SQL question: how can I treat all the commentdata_* tables as a single big table? Something like a view (call "it commentdata_all") with a primary key of (site,commentid). My SQL was never very good to start with, and I haven't done any since back when MoFi was so active they were having hosting problems"

I guess this answers my earlier question about whether or not anybody would find my table consolidation code helpful. So I've posted it here in a new Table Consolidation and Favorites Transformation section of my SQL scripts page.

The script does two things:
  • consolidates all the Infodump data that is split into separate tables by sub-site (adding a 'siteid' field to distinguish them)
  • makes a transformed version of the favorites data that replaces the 'type, target, parent' business with 'siteid, postid, commentid' for easier joining with other tables
posted by FishBike at 7:35 AM on December 15, 2009 [1 favorite]


I can confirm through the infodump that jessamyn, cortex and ThePinkSuperHero have "met" the most people through contacts.

I wanted to find out who has "married" the most people through contacts, but apparently that information is not in the dump. Anyway, I bet I know who it is [NOT SPOUSIST].
posted by grouse at 7:38 AM on December 15, 2009


Optimus Chyme: "I would be really interesting in seeing a graph of favorites over time both site total and by user, because I vaguely remember having a pretty middling number of favorites until like nine months ago when I achieved some sort of SNARK APOTHEOSIS. I was curious whether favorites behavior had changed or I just posted a shitload more"

It's not quite a graph, but here's a table showing how your activity compares to the site as a whole since January 2008, by month.
                        /- comments ----------------------\ /- favorites----------------------\ /- favorites per comment ----\
                        all         user        user        all         user        user                   
year        month       #           #           % of all    #           #           % of all    all        user       user/all

2008        1           84934       31          0.036%      48299       102         0.211%      0.569      3.290      5.786
2008        2           76556       37          0.048%      41213       97          0.235%      0.538      2.622      4.870
2008        3           79243       28          0.035%      49141       74          0.151%      0.620      2.643      4.262
2008        4           80191       15          0.019%      43806       47          0.107%      0.546      3.133      5.736
2008        5           74541       36          0.048%      41725       136         0.326%      0.560      3.778      6.749
2008        6           75003       61          0.081%      48774       240         0.492%      0.650      3.934      6.050
2008        7           80187       73          0.091%      55553       173         0.311%      0.693      2.370      3.421
2008        8           78087       39          0.050%      57007       72          0.126%      0.730      1.846      2.529
2008        9           81515       69          0.085%      78116       525         0.672%      0.958      7.609      7.940
2008        10          80923       78          0.096%      69410       281         0.405%      0.858      3.603      4.200
2008        11          74267       36          0.048%      64087       149         0.232%      0.863      4.139      4.796
2008        12          75877       85          0.112%      64664       282         0.436%      0.852      3.318      3.893
2009        1           84811       89          0.105%      79682       326         0.409%      0.940      3.663      3.899
2009        2           75059       127         0.169%      71410       395         0.553%      0.951      3.110      3.269
2009        3           87028       166         0.191%      80043       786         0.982%      0.920      4.735      5.148
2009        4           85818       157         0.183%      81562       529         0.649%      0.950      3.369      3.545
2009        5           80044       128         0.160%      81900       566         0.691%      1.023      4.422      4.322
2009        6           92503       160         0.173%      101426      1011        0.997%      1.096      6.319      5.763
2009        7           95361       120         0.126%      103988      451         0.434%      1.090      3.758      3.447
2009        8           90459       100         0.111%      98420       424         0.431%      1.088      4.240      3.897
2009        9           90221       154         0.171%      106918      1135        1.062%      1.185      7.370      6.219
2009        10          90370       102         0.113%      111718      497         0.445%      1.236      4.873      3.941
2009        11          88463       115         0.130%      112296      918         0.817%      1.269      7.983      6.288
2009        12          43060       85          0.197%      49820       652         1.309%      1.157      7.671      6.630
So, around February 2009 your number of comments went up, both in absolute terms, and as a percentage of the total number of comments site-wide that were yours. The same thing happened with favorites--more for you in absolute numbers, and more as a percentage of site-wide favorites given in each month.

Average favorites per comment has been going up steadily, site-wide. Your average favorites per comment has also been going up steadily. The last column is especially noisy, but I think it shows that overall, your favorites/comment growth outpaces the general growth rate on the site as a whole a little bit.
posted by FishBike at 9:15 AM on December 15, 2009 [5 favorites]


That is awesome. You are the best.
posted by Optimus Chyme at 9:32 AM on December 15, 2009


Metafilter: I can confirm through the infodump
posted by flapjax at midnite at 4:57 PM on December 15, 2009


I didn't investigate any farther but I suspect that there's shitloads of people with a kilo of blue and a crapload with a kilo of green, and a ton with both, but also having the kilo of grey really narrows the field.

Is having kilo of grey as good as having buns of steel?
posted by Secret Life of Gravy at 5:21 PM on December 15, 2009


So far in 2009:

MetaFilter: The most frequent commenting 10% (885 people) made 72% of all the comments.
AskMe: The most frequent commenting 10% (1202 people) made 64% of all the comments.
MetaTalk: The most frequent commenting 10% (380 people) made 71% of all the comments.
posted by smackfu at 5:38 PM on December 15, 2009 [3 favorites]


Is that 10% of the "active workforce" (all people with at least one comment in 2009) or 10% of all registered users?

But thanks for that; that's something I really wanted to see. In fact (and I may already have said as much), if anybody knows a really good way of showing the repartition of "wealth" among commenters (in terms of volume of comment), I'd like to know.
posted by Monday, stony Monday at 6:18 PM on December 15, 2009


10% of the people who commented in 2009.

Unless I did something wrong. Which I might have, because by my stats, there are 42k users and only 20k have EVER posted a comment to the blue. Same numbers for the green. Unless those two sets of users are very disjoint, that seems weird.
posted by smackfu at 6:27 PM on December 15, 2009


I just ran a quick count of how many distinct users have commented in each sub-site:
  • AskMe: 19847
  • MeFi: 19892
  • MeTa: 8411
  • Music: 1634
What's more, only 27900 users have posted a comment at all, anywhere. I think your stats are right, smackfu.
posted by FishBike at 6:35 PM on December 15, 2009 [1 favorite]


Ah, I think a big part of that is that there are around 10k free users who never posted. Which makes sense, since free isn't much of a barrier.
posted by smackfu at 7:06 PM on December 15, 2009


This is a preventive (and probably unnecessary) service announcement: http://www.metafilter.com/81106/Information-doesnt-want-to-be-scale-free
posted by Monday, stony Monday at 7:19 PM on December 15, 2009


The rest will as links to another site; but I thought this was really interesting. Here's the number of comments and commenters for every month, ever, on mefi.
+------+-----------+----------+-------+
| yyyy | mmmmm     | comments | users |
+------+-----------+----------+-------+
| 1999 | June      |        1 |     1 |
| 1999 | July      |       11 |     3 |
| 1999 | August    |        8 |     3 |
| 1999 | September |       21 |     4 |
| 1999 | October   |       16 |     5 |
| 1999 | November  |       48 |     9 |
| 1999 | December  |       26 |    10 |
| 2000 | January   |      312 |    64 |
| 2000 | February  |      853 |   150 |
| 2000 | March     |     1371 |   196 |
| 2000 | April     |     1931 |   269 |
| 2000 | May       |     3750 |   367 |
| 2000 | June      |     3723 |   359 |
| 2000 | July      |     3299 |   368 |
| 2000 | August    |     3222 |   384 |
| 2000 | September |     3961 |   412 |
| 2000 | October   |     5355 |   434 |
| 2000 | November  |     6056 |   505 |
| 2000 | December  |     4769 |   475 |
| 2001 | January   |     9677 |   662 |
| 2001 | February  |     8144 |   755 |
| 2001 | March     |    10478 |   862 |
| 2001 | April     |    12324 |   995 |
| 2001 | May       |    13983 |  1183 |
| 2001 | June      |    15416 |  1052 |
| 2001 | July      |    15226 |  1024 |
| 2001 | August    |     9028 |  1097 |
| 2001 | September |    22378 |  1955 |
| 2001 | October   |    25053 |  1683 |
| 2001 | November  |    20565 |  1458 |
| 2001 | December  |    16531 |  1537 |
| 2002 | January   |    22730 |  1632 |
| 2002 | February  |    22049 |  1490 |
| 2002 | March     |    20921 |  1507 |
| 2002 | April     |    20110 |  1512 |
| 2002 | May       |    18138 |  1437 |
| 2002 | June      |    15761 |  1367 |
| 2002 | July      |    18875 |  1446 |
| 2002 | August    |    24407 |  1967 |
| 2002 | September |    24977 |  2152 |
| 2002 | October   |    26055 |  2325 |
| 2002 | November  |    19668 |  2002 |
| 2002 | December  |    16105 |  1834 |
| 2003 | January   |    18980 |  1759 |
| 2003 | February  |    22479 |  1960 |
| 2003 | March     |    21440 |  1851 |
| 2003 | April     |    20634 |  1806 |
| 2003 | May       |    16518 |  1733 |
| 2003 | June      |    17787 |  1705 |
| 2003 | July      |    21308 |  1703 |
| 2003 | August    |    17823 |  1643 |
| 2003 | September |    19177 |  1642 |
| 2003 | October   |    19263 |  1629 |
| 2003 | November  |    19607 |  1604 |
| 2003 | December  |    22625 |  1734 |
| 2004 | January   |    23809 |  1765 |
| 2004 | February  |    17445 |  1579 |
| 2004 | March     |    26638 |  1822 |
| 2004 | April     |    27461 |  1878 |
| 2004 | May       |    24825 |  1777 |
| 2004 | June      |    30673 |  1894 |
| 2004 | July      |    28376 |  1830 |
| 2004 | August    |    29036 |  1806 |
| 2004 | September |    29570 |  1821 |
| 2004 | October   |    28602 |  1806 |
| 2004 | November  |    42375 |  3031 |
| 2004 | December  |    56359 |  3479 |
| 2005 | January   |    53239 |  3553 |
| 2005 | February  |    52862 |  3657 |
| 2005 | March     |    49939 |  3719 |
| 2005 | April     |    47690 |  3779 |
| 2005 | May       |    46391 |  3625 |
| 2005 | June      |    49878 |  3715 |
| 2005 | July      |    53149 |  3863 |
| 2005 | August    |    55293 |  4081 |
| 2005 | September |    60548 |  4112 |
| 2005 | October   |    58220 |  4218 |
| 2005 | November  |    70861 |  4431 |
| 2005 | December  |    70252 |  4502 |
| 2006 | January   |    72000 |  4699 |
| 2006 | February  |    62773 |  4653 |
| 2006 | March     |    75535 |  4883 |
| 2006 | April     |    64176 |  4753 |
| 2006 | May       |    66221 |  4870 |
| 2006 | June      |    67162 |  4988 |
| 2006 | July      |    70283 |  5102 |
| 2006 | August    |    72707 |  5310 |
| 2006 | September |    62130 |  5058 |
| 2006 | October   |    70227 |  5315 |
| 2006 | November  |    67866 |  5297 |
| 2006 | December  |    65938 |  5404 |
| 2007 | January   |    74891 |  5663 |
| 2007 | February  |    65430 |  5367 |
| 2007 | March     |    68375 |  5477 |
| 2007 | April     |    76363 |  5664 |
| 2007 | May       |    80248 |  5758 |
| 2007 | June      |    73615 |  5659 |
| 2007 | July      |    78175 |  5763 |
| 2007 | August    |    76476 |  5734 |
| 2007 | September |    75698 |  5697 |
| 2007 | October   |    82245 |  5984 |
| 2007 | November  |    85161 |  5978 |
| 2007 | December  |    74929 |  5912 |
| 2008 | January   |    84934 |  6322 |
| 2008 | February  |    76556 |  6240 |
| 2008 | March     |    79243 |  6503 |
| 2008 | April     |    80191 |  6425 |
| 2008 | May       |    74541 |  6322 |
| 2008 | June      |    75003 |  6310 |
| 2008 | July      |    80187 |  6467 |
| 2008 | August    |    78087 |  6364 |
| 2008 | September |    81515 |  6468 |
| 2008 | October   |    80923 |  6369 |
| 2008 | November  |    74267 |  6393 |
| 2008 | December  |    75877 |  6551 |
| 2009 | January   |    84811 |  6716 |
| 2009 | February  |    75059 |  6512 |
| 2009 | March     |    87028 |  6886 |
| 2009 | April     |    85818 |  6850 |
| 2009 | May       |    80044 |  6699 |
| 2009 | June      |    92503 |  6906 |
| 2009 | July      |    95361 |  7217 |
| 2009 | August    |    90459 |  7025 |
| 2009 | September |    90221 |  7095 |
| 2009 | October   |    90370 |  7086 |
| 2009 | November  |    88463 |  7076 |
| 2009 | December  |    43060 |  5427 |
+------+-----------+----------+-------+
posted by Monday, stony Monday at 10:51 PM on December 15, 2009 [1 favorite]


Nice, MsM. Clearly the next step is a pretty graphs. But that more or less confirms the impression I've had from previous data-diving outings: we've had steady growth over time, but at a pretty slow rate compared to what sometimes feels like the popular perception.

The last two years has seen something like a 10% increase in total comments and commenting users, which is nothing to sneeze at but also not quite the storming-of-the-gates that gets suggested.

Of course, the flat numbers like that don't express the rate of turnover; if there's approximately the same number of folks commenting each month but a hundred oldbies bail and a hundred newbies take their place, that's a lot of displacement, which would create a legitimate sense of growth/change even if the raw aggregate numbers are fairly static.

Also interesting to look at the peak-and-decline around Sept 2002, that only ever partially recovered before Nov 2004 when signups reopened. The whole stretch from Sept 2001 to Nov 2004 is an interesting period in userbase history.
posted by cortex (staff) at 7:03 AM on December 16, 2009


Metafilter owes its success to The Events of September 11th.

I leave the reader to work out the implications.
posted by Rumple at 10:11 AM on December 16, 2009


The most active 10% of those active in a given year: how often are they commenting?
posted by Pronoiac at 10:45 AM on December 16, 2009


I made a chart of the number of active users across all sub-sites, by month with some extra information to look at user turnover. I categoried the users who were active (posted or commented) in each month as follows:
  • Arrived and stayed: their first activity was in this month, and they were active in a subsequent month.
  • Stayed: their first activity was in an earlier month, and they were active in a subsequent month, too.
  • Arrived and left: their only activity was in this month.
  • Left: their first activity was in an earlier month, but their last activity was in this month.
The data for users leaving the site in the last few months looks a bit funky, because the less active users haven't done anything for a while and so are considered to have "left" the site whenever they were last active. And in the absence of time machines, none of us have done anything in January 2010 yet, so all remaining active users "left" in December 2009.
posted by FishBike at 10:57 AM on December 16, 2009


I'm curious, do posts to the Blue made during the weekend get fewer comments and favorites on average than posts made during the week?
posted by Kattullus at 6:58 AM on December 20, 2009


Hmm, we've looked at the timing of posts, comments, and favorites before, but not how the timing of the post itself affects these. So based on the datestamp of every front-page post:
Day	Avg Comments	Avg Favorites

Sun	32.892721	4.764401
Mon	33.367296	4.028600
Tue	34.662431	3.989863
Wed	33.393797	3.988335
Thu	33.086604	3.909726
Fri	31.663129	3.708222
Sat	31.191443	4.626126
There seems to be a slight drop in comments on Friday, Saturday, and Sunday... but a significant increase in average favorites for posts made on the weekend. That's for all of MeFi's history. If we look at just the data for posts in 2009:
Day	Avg Comments	Avg Favorites

Sun	46.402571	13.958456
Mon	49.941473	13.713150
Tue	50.714749	12.743989
Wed	49.653896	13.207142
Thu	49.413397	12.966203
Fri	48.208473	12.802286
Sat	45.667016	13.702731
The numbers are larger, but the overall trend looks about the same.
posted by FishBike at 7:13 AM on December 20, 2009 [1 favorite]


Huh? That's so not what I was expecting. I was pretty certain that there'd be little or no difference between the weekend and weekdays and I briefly considered that posts made during the weekend got a little bit less in terms of comments and favorites, never that they'd get more.

Also, 13 is considerably higher than I thought the favorites average would be. I thought it would be more like 10.
posted by Kattullus at 7:36 AM on December 20, 2009


What's the favorites median?
posted by Kattullus at 7:39 AM on December 20, 2009


For MeFi's entire history, median favorites on posts is 0 (more than half of all posts don't have any, probably since most of them pre-date the favorites feature).

For 2009 so far, the median favorites per post is 8.

I wonder if the minor difference in comments and favorites on the weekend has to do with a subjective difference in the types of content people post on weekends? When I previously looked at the comments and favorites vs. tags used on the post, it was clear that the subjects people favorite a lot differ a great deal from the subjects people comment on a lot.
posted by FishBike at 7:52 AM on December 20, 2009 [2 favorites]


Do we see a dip in the raw volume of posts on the weekend? Somewhat fewer posts, each getting a slightly-higher-than-average disbursement of favorites, would un-rock my conceptual world, for example.
posted by cortex (staff) at 8:14 AM on December 20, 2009


Do we see a dip in the raw volume of posts on the weekend?

Yes, in a previous Infodump discussion I made this chart showing site activity by weekday. Front page posting activity definitely drops off on weekends, as does commenting and favoriting. It looks like favoriting doesn't drop off as much, in relative terms, as posting activity.

The lines for favoriting activity include favorites on posts and comments, not just posts, but given the numbers above I don't see how the curves for favorites on posts only could be any different.
posted by FishBike at 8:28 AM on December 20, 2009


To see if people post significantly different stuff on the weekends, I ran a list of the top 25 most popular tags for posts on weekdays vs weekends, for all of MeFi history and for just 2009. They look pretty similar, but there are minor differences:
All-Time                      2009
Weekdays       Weekends       Weekdays       Weekends

music          music          music          music          
art            art            art            art            
politics       politics       photography    Photography    
history        photography    history        history        
photography    history        video          video          
flash          iraq           science        youtube        
science        video          youtube        politics       
iraq           war            film           science        
war            science        politics       film           
video          film           obama          comics         
bush           flash          game           food           
film           bush           flash          flash          
internet       youtube        food           obama          
usa            USA            Comics         game           
humor          religion       Movies         documentary    
games          humor          Games          tv             
movies         Movies         design         design         
Religion       games          animation      animation      
youtube        television     books          China          
television     terrorism      war            japan          
game           books          Internet       economics      
terrorism      game           tv             technology     
technology     Internet       literature     movies         
law            technology     obituary       batshitinsane  
books          design         religion       books          
posted by FishBike at 8:54 AM on December 20, 2009


There's some difference, but largely it seems it's not that great. Nothing that leaps out at me, anyway.
posted by Kattullus at 9:09 AM on December 20, 2009


Metafilter: weekdays usa weekends USA
posted by Rumple at 9:33 AM on December 20, 2009


Perhaps people have more time to put together stupidly awesome mega-link posts over a weekend that then attract loads of favourites.
posted by Electric Dragon at 4:33 PM on December 21, 2009


Perhaps people have more time to put together stupidly awesome mega-link posts over a weekend that then attract loads of favourites.

It's possible. Other hypotheses that come to mind:
  • Posts stay on the front page longer on weekends, giving people more time to notice and favorite them.
  • Favoriting is far less time-consuming than posting, so people making minimal use of MeFi on weekends have time to favorite things but not to create new posts.
  • People are in a better mood on weekends and are thus more likely to enjoy a post enough to favorite it.
  • People don't have as much time to read the linked articles or the comment thread on weekends, so they're more likely to favorite it as a bookmark to come back to later.
The fun part about all of these, including your hypothesis, is I have no idea how to tell which, if any, are true based on data in the Infodump.
posted by FishBike at 6:33 PM on December 21, 2009


Random science project, if anyone is interested.
posted by cortex (staff) at 10:46 AM on December 22, 2009


but a significant increase in average favorites for posts made on the weekend.

I hate to be overly pedantic, but did you actually do a t-test or something to see if the apparent difference is likely to be caused by chance? Given the size of the dataset, I'm also inclined to believe that it's significant, but stats have a way of being non-intuitive sometimes.
posted by chrisamiller at 2:26 PM on December 22, 2009


I hate to be overly pedantic, but did you actually do a t-test or something to see if the apparent difference is likely to be caused by chance?

No, and since step 1 of me doing that would be "look up t-test on Wikipedia", I'm pretty sure I wouldn't do it right if I tried.
posted by FishBike at 3:02 PM on December 22, 2009


I've made a filter for Infodump files, which might help with analysis.
posted by Pronoiac at 12:21 PM on December 23, 2009


(the link is busticated, Pronoiac)
posted by FishBike at 12:26 PM on December 23, 2009


Heh, whoops, let me try that again: infodump filter.
posted by Pronoiac at 1:40 PM on December 23, 2009


Doing some sanity checks on beanplate, I ran across some gotchas.

1. These post titles are likely to cause hiccups while processing the Infodump. Maybe they should be edited.

AskMe 30523: ^M
Five minutes here, ten there, oh no!

Mefi 37099: Living can be lovely, here in New York State Ah, but I wish that I^M
Living can be lovely, here in New York state / Ah, but I wish that I were home again

Mefi 77685: “Every man dies - Not every man really lives.”^M
^M
--William Ross Wallace

2. It looks like the AskMe & Music postdata files have some (er, 1031?) entries without the "[NULL]" deletion reason. This is, admittedly, possibly utterly irrelevant to everyone & everything whatsoever.

I caught these with the beanplate condition: "\$#read_fields != \$#fields" - though running this on the posttitles files returns a not-terribly-helpful list of posts without titles.

3. The wiki has a note about an error - with favorites for comments without parents - but checking for "type =~ /^(2|4|6|9|12|13)$/ && parent == 0" showed that those have definitely been fixed. (I confirmed the output on an older Infodump with those errors.)
posted by Pronoiac at 3:59 PM on January 4, 2010


Pretty sure I just fixed those titles, not sure about the other stuff.
posted by jessamyn (staff) at 4:49 PM on January 4, 2010


The titles look better - thanks!
posted by Pronoiac at 6:16 PM on January 4, 2010


« Older MeepFilter   |   "Mom, Dad -- why did Metafilter name me ----- ?" Newer »

You are not logged in, either login or create an account to post comments