Infodump 2.0 August 13, 2009 1:55 PM   Subscribe

The Metafilter Infodump is back, and moderately better than ever! Statistics nerds rejoice!

For those who don't know what I'm talking about, the Infodump is a collection of files generated from the Metafilter database, containing a wealth of vital stats about posts, comments, tags, favorites, and so on.

Brief history: after being launched in January, 2008, the Infodump spent a happy year being nerdly and occasionally updated before we took it down (along with some other things on the stuff.mf subdomain) as a safety measure after the site got hacked in January of this year.

It is not a full-text dump of the site—that'd be gigantic and is not something we're necessarily comfortable with regardless—but it does contain most of the quantitative information available about activity on the site, enough for folks to have done a variety of cool-ass things with it.

For those of you who are already familiar with it, note that there are a few neat additions (and some important format-change caveats if you have existing automated tasks you've been importing older versions of these files into):

  • There are four new "Post Titles" files, listing thread id and title text for Mefi, Ask, Meta and Music.
  • There is now a tag data file for Metatalk, since we've added tags since the Infodump originally launched last year.
  • The comment data files now include two new columns: (1) favorite count, to save folks from having to cross-reference against the fave data file for a simple count check, and (2) best answer boolean so that folks who want to look into Best Answer-related stuff or create ad hoc search tools for same can now do so. Note that this BA column is present in all comment data files for format consistency but has no meaningful content except in the askme data.
  • The post data files also have a new column, for category code. There's meaningful data for Ask, Meta, and Music; Mefi has no category data, but includes a dummy column just for, again, format consistency.
  • There is now an all-in-one zip file available for folks who intend to download most/all of the files regardless, to reduce the amount of file/transfer wrangling required.
  • I've done a bunch of updating and expanding of the wiki's Infodump page, to note current format stuff. Big, big thanks to Pronoiac for doing a fantastic job of creating that page in the first place.
  • Invisible but nice: the process for generating the Infodump is now significantly streamlined, which makes regenerating it by hand very easy, but beyond that pb is planning to set it up as an automated task so it'll likely just run weekly in the middle of the night without any human intervention at all. And then: Skynet.
  • The Infodump page is now automatically updated when stats are regenerated, and includes a timestamp and up-to-date file-size info for the downloadable zips. The page is also less ugly than it used to be.

    So! Get your nerd on. If folks have specific ideas/requests for additions, let me know; there're some things we just aren't going to do (full text dumps, flag data), but there may be other things worth adding going forward. I'm considering doing some sort of word-frequency and collocation tables, for example, so that folks could dig into some of that in the absence of an entire corpus.

    If you've done work with the Infodump (or related mefi datawankery) that isn't currently noted on the MetaAnalysis wiki page, please mention it here or in mefimail or just add it to the wiki yourself if you're into that sort of thing.
  • posted by cortex (staff) to MetaFilter-Related at 1:55 PM (502 comments total) 44 users marked this as a favorite

    Badass. I did some very light monkeying before it went down, but now my monkeying can really get going again. Thanks!
    posted by SpiffyRob at 1:59 PM on August 13, 2009


    Hooray! Thanks!
    posted by jedicus at 2:01 PM on August 13, 2009


    *rejoices*

    Oh. Wait. I'm not really a statistics nerd. But still, so cool!
    posted by rtha at 2:07 PM on August 13, 2009


    Shoutout to desjardins for bringing the term datawankery to MetaFilter!
    posted by lukemeister at 2:07 PM on August 13, 2009


    Can this be used to calculate the degree to which my mother never loved me?
    posted by It's Raining Florence Henderson at 2:07 PM on August 13, 2009 [3 favorites]


    Next up: resurrection of the MarkovFilter!
    posted by cgomez at 2:10 PM on August 13, 2009 [3 favorites]


    I got all excited until I saw "Info-".
    posted by Eideteker at 2:10 PM on August 13, 2009


    I just now realized that Anonymous has more favorites than me, and it's driving me crazy.
    posted by Astro Zombie at 2:14 PM on August 13, 2009


    Ah! Input!
    posted by contraption at 2:16 PM on August 13, 2009 [4 favorites]


    I love this, and would love to do cool stuff with it. However, I only have limited understanding of Unix and corpus search software. It'd be absolutely awesome if somebody wanted to do a step-by-step of how to do simple things with the data dump. I realize that's kind of a big task, and there's no real incentive to help us half-assed techie geeks into the bowels of the inner sanctum of nerdery, but for those of us who can make use of this, but don't know where/how to start, it would be very, very helpful.

    I can assure you that, with a little bit of knowledge, I could go a long way towards expanding on these skills on my own, and I would try to do some cool stuff with the data. I've got ideas for analyzing linguistic trends on MeFi like you wouldn't believe. Anybody feel like helping me out?

    I'd also be down for spending a few hours with somebody in a bar/restaurant (my treat), if you'd show me what I can do with this stuff, and how to do it. I'm serious.

    If there's more interest in this, it might be fun to do a bay area meetup, centered around our laptops and focused on swapping nerd skills. But maybe that's just me, and my weird idea of fun.
    posted by iamkimiam at 2:30 PM on August 13, 2009


    Very cool. Will try to think up new and interesting ways to compare data.
    posted by ocherdraco at 2:32 PM on August 13, 2009




    Astro Zombie: I just now realized that Anonymous has more favorites than me, and it's driving me crazy.

    Yeah, but Anonymous is a one–trick pony having never posted here or on the blue.
    posted by gman at 2:35 PM on August 13, 2009


    Sweet! Now hopefully some awesome statistical nerd can whip up some charts which qualitatively prove that I have worth.

    Or failing that, show me the others here who fit that category so that I can begin my cyberstalking and inevitable assumption of their identities.
    posted by quin at 2:40 PM on August 13, 2009


    I have no clue why I'm so excited about this, but this is cool as hell.
    posted by Slack-a-gogo at 2:46 PM on August 13, 2009


    It is not a full-text dump of the site—that'd be gigantic and is not something we're necessarily comfortable with regardless

    Do you think it would be possible in the future to release a full text dump, especially of AskMe, to researchers on a case by case basis? This kind of thing would be really, really valuable to the Natural Language Processing Community.

    It's really disappointing to attend talks at conferences and realize that most, if not all, of the automated question answering systems in development are being trained on Yahoo! Answers. It's kind of like training a dialog system on YouTube comments.

    We need better data!
    DO IT FOR SCIENCE!
    posted by Alison at 2:50 PM on August 13, 2009 [3 favorites]


    I'm not a statistics nerd but it's Friday so I've decided to rejoice anyway.
    posted by shelleycat at 2:53 PM on August 13, 2009


    Can this be used to calculate the degree to which my mother never loved me?

    I don't think we need a calculator to do that.
    posted by blue_beetle at 2:54 PM on August 13, 2009


    I'm updating the row counts now. Some notable stuff since last October:
    * The number of favorites has almost doubled.
    * The number of AskMe tags has gone up five times, thanks to backtaggers.
    * There's a 40% jump in the number of contacts, probably due to spouses.
    posted by Pronoiac at 2:58 PM on August 13, 2009 [1 favorite]


    Do you think it would be possible in the future to release a full text dump, especially of AskMe, to researchers on a case by case basis? This kind of thing would be really, really valuable to the Natural Language Processing Community.

    I'm a big sucker for SCIENCE; the "would it be possible" question is broad and who-knows enough that the only responsible answer in that context is "maybe". I'd certainly be willing to entertain discussion with someone who was serious about NLP stuff.

    "Help me out or I'll have to use Yahoo! Answers" is its own compelling kind of sales pitch. Heh.

    For stuff that's borderline territory or something that we're not including in the infodump, I'm always game for special requests on specific search tasks, regardless, so if someone has given thing they want to research they can always drop me a line. My hit rate for actually following through on such things is well under 100%, but it's worth a shot.
    posted by cortex (staff) at 3:00 PM on August 13, 2009


    Alison, one thing I've started looking into since I've been taking linguistics/corpora experimentation seminars at the LSA Institute for the last several weeks, are the different web-based corpus search sites and software packages that are out there. Some that might be good, but messy for Mefi data, are WebCorp, TIGERsearch (which would require some automated and/or manual annotation or tagging of the data first), QueryGoogle, and TextSTAT (requires Python download). I haven't started using these yet, so I don't know what the dis/advantages or quirks are. I also have a list of links to other, free corpora databases, in various languages, if you're interested.
    posted by iamkimiam at 3:00 PM on August 13, 2009


    nerd drool is drooling
    posted by chillmost at 3:01 PM on August 13, 2009


    Alison: couldn't such a researcher (you, I assume) just crawl AskMe with your own robot? With some polite bandwidth limit and perhaps limited by metadata from the infodump? Does Y!A offer some special access to researchers?
    posted by fantabulous timewaster at 3:09 PM on August 13, 2009


    Oh yeah: this is great
    posted by fantabulous timewaster at 3:09 PM on August 13, 2009


    There's a 40% jump in the number of contacts, probably due to spouses.
    and post-10th party, "Hey I just met a bunch of people," contacts.
    posted by ODiV at 3:23 PM on August 13, 2009


    And then: Skynet.

    Excellent! I call shotgun on first in line for consciousness upload. See you in The Singularity, guys!
    posted by Meatbomb at 3:34 PM on August 13, 2009


    I see you in your Singularity RIGHT NOW. And it's hot.
    posted by carsonb at 3:40 PM on August 13, 2009


    BRING BACK THE MARKOVFILTER
    or at least tell me where to find it?
    posted by spiderwire at 3:45 PM on August 13, 2009


    I also wrote up some starting documentation for looking at the files in Excel.

    Just want to second the info in there that Excel below 2007 cannot handle these files. Last time I did something with that, I had to preprocess the data to extract only the rows I wanted, then load it into Excel.
    posted by smackfu at 3:49 PM on August 13, 2009


    Even Microsoft Access threw up its tiny little hands and walked off the job when given some of the queries I tried to run on this data at the beginning of the user matching thread.

    For corporate types (such as myself) who don't feel like trying to figure out another database server than the one you're used to already, you can download Microsoft SQL Server Express for free. That's what I ended up doing for that MeFi User Matching thread.
    posted by FishBike at 4:06 PM on August 13, 2009


    iamkimiam: I love this, and would love to do cool stuff with it. However, I only have limited understanding of Unix and corpus search software. It'd be absolutely awesome if somebody wanted to do a step-by-step of how to do simple things with the data dump.

    There was a infodump data playground made by, If I remember correctly, null terminated, that's now down. That was a lot of fun to play around with.
    posted by Kattullus at 4:12 PM on August 13, 2009


    Oh yeah: this is great
    posted by fantabulous timewaster at 6:09 PM on August 13


    Eponysterical.
    posted by Horace Rumpole at 4:12 PM on August 13, 2009 [2 favorites]


    I do most of my analysis stuff in perl; I could try and write up a little tutorial on that sort of thing, but I can't honestly think of a less newbie-friendly way to go on such stuff, so I'm not sure who that would be helping.

    or at least tell me where to find it?

    It turns out that MarkovFilter is a Harder Problem, due to the need for interactivity to make it work well combined with some extant security issues with perl on 64-bit Windows. Or something like that—pb knows the score on that front much, much better than I do.
    posted by cortex (staff) at 4:14 PM on August 13, 2009


    And yeah, the Data Playground that null terminated put together was awesome. My impression is that it became a pain to deal with for one reason or another; if I remember right, he was doing all the data import stuff manually and that became a problem (something which which I sympathize deeply).
    posted by cortex (staff) at 4:16 PM on August 13, 2009


    if I remember right, he was doing all the data import stuff manually and that became a problem (something which which I sympathize deeply)

    So, here's a thought: I've already got database tables for most of the files in the Infodump, and SQL statements to bulk load them from the text files. Would anybody be interested if I was to put table creation scripts and data load scripts all together and put them online somewhere?

    With these you could get the complete Infodump into an SQL database as follows:

    1. Install SQL Server 2005 (or something close to it)
    2. Create an empty database
    3. Run the table creation script
    4. Download the everything-in-one Infodump zip file
    5. Unzip the Infodump files into a specified folder
    6. Run the data load script

    Repeat only steps 4-6 to update in future.
    posted by FishBike at 4:28 PM on August 13, 2009 [2 favorites]


    The playground had the interface where you just typed in SQL statements? See, that scared me. Though, if it were useful, I'd think about running one, if someone can talk me through securing it.
    posted by Pronoiac at 4:30 PM on August 13, 2009


    FishBike: I could use that!

    I just but the contacts one into an Oracle DB for fun. I was curious as to who are mefi's most social users. My metric for this is the number of 'incoming' contact connections - that is the number of people who call them a contact. Anwyays, here's the top 25:

    USERNAME USERID INCOMING
    ------------------------------------------
    jessamyn 292 621
    languagehat 14752 426
    mathowie 1 403
    cortex 7418 355
    Miko 19344 309
    ThePinkSuperhero 17721 275
    Anonymous 17564 261
    loquacious 17349 260
    amberglow 14275 250
    jonmc 58 239
    nickyskye 1228 235
    ColdChef 7683 225
    BrandonBlatcher 17675 210
    asavage 23392 208
    madamjujujive 15971 207
    DaShiv 3502 200
    quonsar 986 197
    scody 16239 193
    Ambrosia Voyeur 42440 184
    flapjax at midnite 39010 173
    Astro Zombie 31765 172
    klangklangston 23558 171
    miss lynster 20640 170
    robocop is bleeding 17391 169
    stavrosthewonderchicken 2238 169
    posted by vacapinta at 4:40 PM on August 13, 2009


    There's a 40% jump in the number of contacts

    Sorry, that was me. I got amnesia and forgot you all.

    Oh, you mean an upward jump.
    posted by DU at 4:40 PM on August 13, 2009


    I'm about to take a quant/research stats class for my Master's and am going to try my damnedest to get my prof to let me use the infodump in a project.
    posted by subbes at 4:40 PM on August 13, 2009


    And yeah, the SQL stuff would be great. Assuming it isn't to MS-specific--I'm using postgresql here.
    posted by DU at 4:43 PM on August 13, 2009


    Can't we just cut out the middleman and download an MDF?

    Just attach to SQLExpress and you're laughing.

    or crying... so cold... so lonely...
    posted by blue_beetle at 4:45 PM on August 13, 2009


    Older MS Excel versions can at least open the smaller files, right? If the problem was just file size, I put a warning about file size at the bottom of the Infodump page, though that could also go at the top of the Infodump & Excel page. I'm really surprised that Access choked on any of these: what limit did you hit, FishBike?

    We could test "spousing sprees vs. the 10th parties" hypothesis, if we either get timestamps in the contacts file or scrape the entries that have changed since January.

    My webhost offers MySQL, if that's a data point. That's, what, the fifth database mentioned in the thread? (On preview: Sixth?)
    posted by Pronoiac at 4:47 PM on August 13, 2009


    Can't we just cut out the middleman and download an MDF?

    No, because it takes a middleman to go from the Infodump's text files to an SQL Server database. If I'm in the middleman, you have to wait for me to refresh it from time to time. If I post the scripts for you to run, you can be your own middleman on whatever schedule you feel like.
    posted by FishBike at 4:54 PM on August 13, 2009


    We could test "spousing sprees vs. the 10th parties" hypothesis, if we either get timestamps in the contacts file or scrape the entries that have changed since January.

    Hmm. There is a "date created" column in the contact stuff; I could consider adding that to the dump.
    posted by cortex (staff) at 4:58 PM on August 13, 2009 [1 favorite]


    I'm really surprised that Access choked on any of these: what limit did you hit, FishBike?

    Ultimately, file size, though not for the reasons you might think. Just importing all the Infodump files was fine. But some of the queries I tried to run required a great deal of temporary storage for the Jet database engine to do its thing. Temporary storage that it tries to stuff in the .MDB file, and when that hits 2GB, game over.

    Even without that problem, the speed was sufficiently slow that the switch to SQL Express was worthwhile. By which I mean 100x faster execution of the same queries.
    posted by FishBike at 4:58 PM on August 13, 2009


    Oh, and I just ran a fun little query to do with favorites. The question of who has the most seems kind of dull, so I thought, who has the highest average number of favorites per comment?

    I thought it might be a good way to get a list of Metafilter's favorite commenters, in a way that doesn't encourage posting a lot to troll for favorites, since comments that don't get many hurt the score.

    I decided to limit it to people with at least 100 comments, otherwise a certain stevewoz wins it with a single very popular comment, followed by a bunch of other users with single-digit comment counts.

    Would it be evil to post some results from that analysis here?
    posted by FishBike at 5:06 PM on August 13, 2009


    Yes, but this is MetaTalk: The Home of Evil.
    posted by Kattullus at 5:20 PM on August 13, 2009 [1 favorite]


    So I've pretty much got a pair of SQL scripts ready, one to create the 19 tables to hold the Infodump, and another to actually bulk insert into those tables from the Infodump text files. Would it be possible to put these scripts on the page with the Infodump files? Or where else would be a good place to host these?

    Also I am noticing some weirdness when trying to load the posttitles data. For example postid number 30523 in posttitles_askme seems to have a line break right before the title text. Bulk insert is unhappy about this (but fortunately seems to carry on rather than aborting).
    posted by FishBike at 5:50 PM on August 13, 2009


    I have prepared a list of the top 25 most helpful Ask Metafilter users, for this very particular definition of 'helpful.' Among users with at least 10 best answers, these are the users with the highest best answer : answer ratio in threads where a best answer was given.

    NB: These results are skewed by those assumptions in various ways. Answers in anonymous questions don't count against the user, for example. And if a user gives 100 non-best answers in a thread and one best answer, then the 100 non-best answers still count 'against' his or her ratio.

    The first number is the userid, the second number is the ratio, expressed as a percentage. I was 95th, fairly respectable if I say so myself.

    7356  65.22%
    60514 65.00%
    78231 59.46%
    371   56.52%
    61738 56.52%
    17461 53.44%
    17408 52.63%
    89006 52.63%
    77551 52.00%
    32115 51.72%
    25860 51.52%
    15219 50.00%
    34339 50.00%
    39624 50.00%
    43417 50.00%
    57514 50.00%
    65601 50.00%
    66776 50.00%
    30860 48.84%
    75229 48.15%
    30895 47.13%
    14251 46.88%
    47284 46.88%
    15650 46.15%
    42193 46.15%

    posted by jedicus at 5:58 PM on August 13, 2009


    Would it be evil to post some results from that analysis here?

    Putting it on t-shirts or taking out full-page ads in major dailies might be evil; chattering about numbers in here is pretty much what we're here for.

    Also I am noticing some weirdness when trying to load the posttitles data. For example postid number 30523 in posttitles_askme seems to have a line break right before the title text.

    Yeah, there's bound to be some weird stuff in there; since that data is brand new, no one's really bonked their head against it much, so by all means mention any fucked up bits and I can look into either modifying them in the db or filtering them out of the dump in the future, etc.
    posted by cortex (staff) at 6:01 PM on August 13, 2009


    Here, then, are the top 20 all-time favorite commenters (highest average number of favorites per comment). 100 total comments require to get into this club.
    4.20:Parasite Unseen [1697 favorites on 404 comments]
    4.17:Kiablokirk [575 favorites on 138 comments]
    4.07:Christ, what an asshole [892 favorites on 219 comments]
    4.05:dyoneo [490 favorites on 121 comments]
    3.66:asavage [673 favorites on 184 comments]
    3.61:mhoye [451 favorites on 125 comments]
    3.56:cebailey [356 favorites on 100 comments]
    3.43:East Manitoba Regional Junior Kabaddi Champion '94 [5436 favorites on 1583 comments]
    3.43:Damn That Television [1538 favorites on 448 comments]
    3.28:TryTheTilapia [1027 favorites on 313 comments]
    3.15:Combustible Edison Lighthouse [870 favorites on 276 comments]
    3.10:Legomancer [459 favorites on 148 comments]
    3.03:Mr. Bad Example [522 favorites on 172 comments]
    3.03:Dee Xtrovert [3480 favorites on 1147 comments]
    3.00:Pater Aletheias [3568 favorites on 1189 comments]
    3.00:the quidnunc kid [1058 favorites on 353 comments]
    2.90:Avenger [4982 favorites on 1715 comments]
    2.83:Stonewall Jackson [348 favorites on 123 comments]
    2.83:billyfleetwood [1691 favorites on 598 comments]
    2.79:Ratio [298 favorites on 107 comments]
    posted by FishBike at 6:20 PM on August 13, 2009 [1 favorite]


    I love that the third most helpful AskMe user is Jeeves.
    posted by ocherdraco at 6:23 PM on August 13, 2009 [2 favorites]


    And just for fun, here's the same list but with the entrance requirement raised to 1000 comments or more. I think I recognize virtually all of these names, unlike the first list.
    3.43:East Manitoba Regional Junior Kabaddi Champion '94 [5436 favorites on 1583 comments]
    3.03:Dee Xtrovert [3480 favorites on 1147 comments]
    3.00:Pater Aletheias [3568 favorites on 1189 comments]
    2.90:Avenger [4982 favorites on 1715 comments]
    2.77:Pastabagel [11804 favorites on 4263 comments]
    2.68:Greg Nog [6065 favorites on 2259 comments]
    2.57:adipocere [6435 favorites on 2508 comments]
    2.29:felix betachat [3836 favorites on 1675 comments]
    2.27:ND¢ [6514 favorites on 2875 comments]
    2.15:Rhaomi [3154 favorites on 1465 comments]
    2.11:Astro Zombie [20158 favorites on 9568 comments]
    2.03:Mutant [2997 favorites on 1478 comments]
    1.94:The Straightener [4599 favorites on 2368 comments]
    1.79:robocop is bleeding [5822 favorites on 3256 comments]
    1.74:EatTheWeak [2444 favorites on 1403 comments]
    1.69:Pope Guilty [7680 favorites on 4534 comments]
    1.66:Joe Beese [4447 favorites on 2671 comments]
    1.62:allkindsoftime [2771 favorites on 1715 comments]
    1.59:netbros [3316 favorites on 2084 comments]
    1.59:ThePinkSuperhero [11963 favorites on 7545 comments]
    Also corrected spelling of ND¢. Do you know how hard it is to search for that on the site? And there's ThePinkSuperhero again, courtesy of ~1 billion favorites from davey_darling. Anomalies abound!
    posted by FishBike at 6:27 PM on August 13, 2009 [1 favorite]


    Don't use average. Use median. Although the 1000 comment constraint probably mostly fixes the outlier problem.
    posted by DU at 6:35 PM on August 13, 2009


    Can someone just do straight up most favorites and most comments?
    posted by empath at 6:36 PM on August 13, 2009


    Oh NM, median is going to be much, much harder to compute.
    posted by DU at 6:37 PM on August 13, 2009


    Clearly, I need to make fewer comments so I can up my average.
    posted by Pater Aletheias at 6:43 PM on August 13, 2009 [3 favorites]


    Damn. Why did I do that?
    posted by Pater Aletheias at 6:45 PM on August 13, 2009 [3 favorites]


    ARGH!
    posted by Pater Aletheias at 6:45 PM on August 13, 2009 [9 favorites]


    That favorites:comments count should probably include posts, or exclude favorites on posts (which would be neat, but probably impossible).
    posted by Sys Rq at 7:00 PM on August 13, 2009


    Sys Rq, the favorites count in those two tables I posted so far already exclude favorites on posts, so not only is it possible, it's how I did it!

    Another Mefite contacted me to point out that they should have appeared on one of these lists based on the counts appearing on their profile, where it is not easy to figure out how many are on comments and how many are on posts. But there's a favorites count attached to each comment record in the Infodump, so it's easy to just add 'em up.
    posted by FishBike at 7:07 PM on August 13, 2009


    Another Mefite contacted me to point out that they should have appeared on one of these lists based on the counts appearing on their profile

    TEACHER TEACHER I GOT 8/10 ON MY QUIZ BUT YOU FORGOT TO PUT MY NAME ON THE BOARD
    posted by brain_drain at 7:32 PM on August 13, 2009


    Another couple of tables. These ones are average favorites per post.

    First, the top 20 considering people who have made at least 10 posts:
    40.76:limon [1386 favorites on 34 posts]
    35.85:phoenixy [466 favorites on 13 posts]
    34.92:cthuljew [454 favorites on 13 posts]
    28.80:Christ, what an asshole [1469 favorites on 51 posts]
    28.67:Carialle [430 favorites on 15 posts]
    28.42:revgeorge [341 favorites on 12 posts]
    27.50:decoherence [330 favorites on 12 posts]
    26.83:Rinku [483 favorites on 18 posts]
    26.18:Saxon Kane [288 favorites on 11 posts]
    24.35:turgid dahlia [633 favorites on 26 posts]
    22.75:Metroid Baby [637 favorites on 28 posts]
    22.10:steinwald [221 favorites on 10 posts]
    22.00:archagon [1012 favorites on 46 posts]
    21.89:mrzarquon [416 favorites on 19 posts]
    21.80:Upton O'Good [1090 favorites on 50 posts]
    21.57:pasici [302 favorites on 14 posts]
    20.60:lampshade [206 favorites on 10 posts]
    20.53:forrest [308 favorites on 15 posts]
    20.00:churl [220 favorites on 11 posts]
    19.84:2or3whiskeysodas [754 favorites on 38 posts]
    And then the top 20 of the more prolific posters, those with at least 100 posts:
    16.50:Kattullus [6057 favorites on 367 posts]
    16.43:nickyskye [4963 favorites on 302 posts]
    15.74:blahblahblah [3810 favorites on 242 posts]
    14.79:Ambrosia Voyeur [1671 favorites on 113 posts]
    12.40:carsonb [1773 favorites on 143 posts]
    12.20:dersins [2281 favorites on 187 posts]
    12.15:netbros [3923 favorites on 323 posts]
    12.01:not_on_display [2041 favorites on 170 posts]
    11.72:vronsky [2695 favorites on 230 posts]
    11.08:flapjax at midnite [4101 favorites on 370 posts]
    10.93:Miko [2340 favorites on 214 posts]
    10.54:jbickers [2561 favorites on 243 posts]
    9.97:fearfulsymmetry [1834 favorites on 184 posts]
    9.54:empath [2422 favorites on 254 posts]
    9.48:amyms [2351 favorites on 248 posts]
    9.11:miss lynnster [2787 favorites on 306 posts]
    8.90:Joe Beese [1291 favorites on 145 posts]
    8.86:psmealey [2101 favorites on 237 posts]
    8.77:cortex [1623 favorites on 185 posts]
    8.71:Artw [2333 favorites on 268 posts]
    There should be no names in common on these two, since the first list doesn't include anybody with 100 or more posts (not as a criteria, it just works out that way).
    posted by FishBike at 7:33 PM on August 13, 2009 [1 favorite]


    I'd be curious to see a couple things:

    1. Post-centric versions of the above (posting and commenting being rather significantly different avocations in a lot of ways)
    - Per-subsite breakdowns (folks having different preferences as to where they hang out, comment, favorite, etc)

    The question of how favoriting in general is skewed between subsites is something I'd love to see explored. Do favorites pretty much get distributed in the same way on each subsite, or do they differ on (a) proportion of post vs. comment faving, (b) degree to which faves are clustered (e.g. are 30% of askme comments faved vs. 20% of mefi comments, does the median faved meta comment receive more faves than the median faved mefi comment...)?

    Does favoriting volume follow commenting volume by subsite, i.e. does someone who comments more on askme than on mefi fave more on askme than on mefi by the same proportion?
    posted by cortex (staff) at 7:35 PM on August 13, 2009


    8.77:cortex [1623 favorites on 185 posts]

    Good god. I've only got a dozen and a half actual mefi posts; the balance is half Music and the rest metatalk, askme, and projects stuff. Again with the interest in subsite breakdowns.
    posted by cortex (staff) at 7:38 PM on August 13, 2009


    Damn you and your timing. I'm off to the back woods of New Hampshire for a long weekend. I'll try to come up with a shell script to calculate this inconvenience on Monday.
    posted by Plutor at 7:40 PM on August 13, 2009


    Deleted posts per admin:

    vacapinta 87
    mathowie 343
    jessamyn 678
    cortex 1527
    posted by gwint at 7:40 PM on August 13, 2009 [1 favorite]


    And finally (for tonight, because I should be asleep already... more tomorrow I'm sure), here are some stats on whose posts generate the highest average number of comments.

    I wonder if this means they are good posts, or controversial posts? I don't know how to tell that from the Infodump. But... it might be interesting to produce a table of "words occurring in post title vs. number of comments generated" and see which words are associated with the most comments in the resulting thread. Hmm. Maybe tomorrow.

    Anyway, here are the highest average comments-per-post, considering people with at least 10 posts:
    149.45:missbossy [1644 comments on 11 posts]
    144.21:motty [2740 comments on 19 posts]
    126.97:heatherann [4952 comments on 39 posts]
    103.80:jaduncan [1038 comments on 10 posts]
    101.81:shetterly [2138 comments on 21 posts]
    98.55:hippybear [2168 comments on 22 posts]
    94.79:landis [1327 comments on 14 posts]
    92.23:SirOmega [1199 comments on 13 posts]
    91.22:you just lost the game [1642 comments on 18 posts]
    87.07:coudal [1306 comments on 15 posts]
    85.67:yhbc [6254 comments on 73 posts]
    84.50:kgasmart [1183 comments on 14 posts]
    84.07:taosbat [1177 comments on 14 posts]
    82.10:twoleftfeet [4844 comments on 59 posts]
    82.10:dorian [821 comments on 10 posts]
    81.33:dw [4229 comments on 52 posts]
    80.00:landedjentry [800 comments on 10 posts]
    79.00:Poagao [790 comments on 10 posts]
    78.85:Stewriffic [2050 comments on 26 posts]
    78.38:darkstar [1019 comments on 13 posts]
    And the same, but for more prolific posters with at least 100 posts:
    71.14:ericb [14726 comments on 207 posts]
    69.12:digaman [13202 comments on 191 posts]
    64.95:orthogonality [15392 comments on 237 posts]
    60.64:three blind mice [7034 comments on 116 posts]
    58.39:Artw [15648 comments on 268 posts]
    54.89:plexi [8837 comments on 161 posts]
    53.54:four panels [10923 comments on 204 posts]
    52.87:empath [13430 comments on 254 posts]
    52.01:Saucy Intruder [6449 comments on 124 posts]
    51.67:Joe Beese [7492 comments on 145 posts]
    50.86:bardic [5289 comments on 104 posts]
    50.76:ThePinkSuperhero [9086 comments on 179 posts]
    50.73:CunningLinguist [6747 comments on 133 posts]
    50.22:The Jesse Helms [8386 comments on 167 posts]
    50.01:billysumday [5401 comments on 108 posts]
    49.88:Brandon Blatcher [10674 comments on 214 posts]
    49.53:amberglow [19417 comments on 392 posts]
    49.50:Afroblanco [5990 comments on 121 posts]
    49.45:insomnia_lj [8851 comments on 179 posts]
    48.40:XQUZYPHYR [17377 comments on 359 posts]
    posted by FishBike at 7:42 PM on August 13, 2009


    Again with the interest in subsite breakdowns.

    Yeah, I'm curious to see who's "MetaFilter's favorite poster on the blue" and so forth. I'd also like to run some of these over a limited time period. "Poster of the year" kind of thing.
    posted by FishBike at 7:45 PM on August 13, 2009


    Sweet... my stream graph code still works. Now with New and Improved Filtering of Bogus AskMe Data that Does Not Belong in MetaTalk!

    The music graphs benefit the most from an extra year's worth of data.
    posted by Galvatron at 7:45 PM on August 13, 2009 [2 favorites]


    Oh, and while there's no way I'm naming names... there are people with 1,000+ comments and *no* favorites received on any of them. I feel like looking up a comment of theirs and favoriting it just to pop their favorite-receiving cherry. Eh, maybe when I am more awake.
    posted by FishBike at 7:48 PM on August 13, 2009


    this is awesome. thanks pb, cortex, matt and anyone else involved in making it happen. I've already spend a bunch of time with the old dump, and it is a really fun infoset.

    I can't wait to dig into the new data, and hopefully make something out of it. It usually takes a bit of work to transform raw files into something usable for presentation (xml or some other form, perhaps byte arrays??) if I do do the said conversion I'll be sure to post any results on the wiki..
    posted by localhuman at 7:50 PM on August 13, 2009


    Anyway, here are the highest average comments-per-post,

    I bet the average number of comments per post has increased overtime - do you feel like normalizing for that?

    shorter shothotbot: Could I have even better free ice cream
    posted by shothotbot at 7:55 PM on August 13, 2009


    Can you runs some kind of query on who's thoroughly average, or maybe the most mediocre or something so I can see my name on one of those lists?

    Fun stuff!
    posted by Devils Rancher at 8:02 PM on August 13, 2009 [1 favorite]

    2.15:Rhaomi [3154 favorites on 1465 comments]
    2.11:Astro Zombie [20158 favorites on 9568 comments]
    2.03:Mutant [2997 favorites on 1478 comments]
    Oh great... of all the users to barely squeak by, it had to be the zombie and the mutant. I fear for my brain (and my rem count).

    Fishbike, I noticed you limited your analysis to either posts or comments only. I'm not savvy enough to mess with the infodump, but jaden's awesome Metafilter Contribution Index has a handy function summing all of a user's comments, posts, questions, answers, projects, etc., across all subsites into a single number. I spent a few minutes recalculating the averages you gave using this "total contribution count" and the publically-available favorites counts -- it really shakes up the list:
    3.742 - East Manitoba Regional Junior Kabaddi Champion '94
    3.176 - Pater Aletheias
    3.072 - Avenger
    3.054 - Dee Xtrovert
    3.025 - netbros
    2.870 - Greg Nog
    2.833 - Pastabagel
    2.609 - adipocere
    2.596 - Rhaomi
    2.474 - Mutant
    2.400 - ND¢
    2.320 - felix betachat
    2.296 - Astro Zombie
    2.043 - Joe Beese
    2.027 - The Straightener
    2.001 - allkindsoftime
    1.966 - EatTheWeak
    1.926 - robocop is bleeding
    1.823 - Pope Guilty
    1.706 - ThePinkSuperhero
    netbros, for instance, leaps from 1.59 to 3.025, likely because he comments less but makes awesome FPPs more. A few more stellar posters would probably make the list if you included this metric in your number crunching.

    One caveat when dealing with favorites, though: they weren't introduced until May 2006, and the most prolific users have posting histories that start far before then. To get the truest view you'd have to somehow limit your scan to contributions made after that date, though I've got no idea how. And of course that would eliminate popular comments that received a lot of favorites after the fact, but I don't think it would impact averages that much.

    Incidentally, all this talk of methods for identifying top contributors reminds me of a funny parallel. Wired recently published an article on the so-called "Genius Index", one attempt at a simple method for determining the influence and impact scientists have on their field. To calculate it, you take all the papers published by a given scientist and rank them by the number of times each has been cited. You then start counting down the list, starting with one, until the number of cites is less than the number you're on. As the article puts it:

    Say paper number one has been cited 10,000 times. Paper number two, 8,000 cites. Paper number 32 has 33 citations, but number 33 has received just 28. You've published 32 papers with more than 32 citations—your h-index is 32. Or to put it more technically, the h-index is the number n of a researcher's papers that have been cited by other papers at least n times.

    Since reading that, I've wondered how well this method might work on Mefi. I doubt there's a simple way to calculate it automatically, but you can do it for yourself easily enough -- just go to your Popular Favorites page (http://www.metafilter.com/activity/USERNUMBER/favorited/popular) and start counting down the list. My index is 25, for instance; cortex's is 38. I think it's cool because it recognizes a combination of quality and quantity of favorites, and can be reckoned without resorting to huge databases of statistics. Of course it has the same problems here as it does in academia, namely:

    On the other hand, Hirsch acknowledges that the h-index has its own intrinsic weaknesses. It's kind to older folks, for example, but not great to younger scientists. If a scientist writes six brilliant papers and dies, his h-index will never be higher than six, even if each paper is cited 10,000 times. And by putting the onus on individuals, it encourages researchers to write about sexy topics and hew close to the conventional wisdom—exactly what Hirsch was trying to avoid.

    ...which, if you think about it, sounds similar to some of the criticisms of the favorites system (especially that "hewing to the conventional wisdom" part).

    On preview: I think the numbers here are off. By my math, I should qualify for the list as-is (by virtue of having one really popular FPP out of only 16, no doubt). But given that at least one other user said they were left out on the first set, it casts some doubt on all of the charts -- there's probably a programming hiccup that's excluding some groups of people from the different lists.
    posted by Rhaomi at 8:02 PM on August 13, 2009


    so, does this mean we can have an updated "MetaFilter:" tagline list?

    This it Important.
    posted by ArgentCorvid at 8:05 PM on August 13, 2009 [1 favorite]


    The "Genius Index" is fascinating. Mine is 19, and would be several fewer if I hadn't had a gonzo week on the Blue.
    posted by ocherdraco at 8:22 PM on August 13, 2009


    Mine's 40, but I can't see what it means except that I've made a lot of posts. It seems like a weird metric.
    posted by Kattullus at 8:40 PM on August 13, 2009


    9.48:amyms [2351 favorites on 248 posts]

    Channeling Navin R. Johnson: "Millions of people look at this book website everyday! This is the kind of spontaneous publicity - your name in print - that makes people. I'm in print in a list on Metafilter! Things are going to start happening to me now.
    posted by amyms at 8:50 PM on August 13, 2009 [1 favorite]


    Kattullus, it's impressive not simply because you've made a lot of posts, but specifically 40 posts that each received at least 40 favorites. From what I've seen, the average FPP gets somewhere in the neighborhood of 10-20 favorites -- getting 40 or 80 or 100+ is difficult and requires a lot of work finding and presenting interesting content that appeals to a lot of people. But to accomplish that 40 times over? That's a remarkable track record. For me to reach that, I'd have to make at least 23 posts/comments/whatever, all getting at least 40 favorites each. That's a lot of high-quality contributing. I guess that's easier to do when you've been a member for many years, but then again seniority is usually a factor in determining "impact" or "influence" or whatever you want to call it.
    posted by Rhaomi at 8:56 PM on August 13, 2009


    I'm so happy right now. :)
    posted by East Manitoba Regional Junior Kabaddi Champion '94 at 8:59 PM on August 13, 2009 [9 favorites]


    Fishbike, I noticed you limited your analysis to either posts or comments only. I'm not savvy enough to mess with the infodump, but jaden's awesome Metafilter Contribution Index has a handy function summing all of a user's comments, posts, questions, answers, projects, etc., across all subsites into a single number. I spent a few minutes recalculating the averages you gave using this "total contribution count" and the publically-available favorites counts -- it really shakes up the list:

    Yeah, this time around I was specifically interested in breaking out the comments vs. posts into separate listings. But in the user matching thread, I did a bunch of stats that took into account all of those things you mentioned.

    From that previous exercise, I have a table full of derivative data that counts "total activity" per user, and since their sign-up date is in the usernames table (I think), it ought to be relatively easy to calculate the top users by Metafilter Contribution Index (being just activities/day).

    I had actually already thought about doing that to see who the biggest lunatics most active contributors are. So it's neat that others have already thought of this and built a tool to calculate it for individual users.
    posted by FishBike at 9:07 PM on August 13, 2009


    I'm so happy right now. :)
    posted by East Manitoba Regional Junior Kabaddi Champion '94


    You realize if you don't get at least 4 favorites for saying that, your score on one of these charts is going to drop right? ;)
    posted by FishBike at 9:09 PM on August 13, 2009 [1 favorite]


    FishBike: "From that previous exercise, I have a table full of derivative data that counts "total activity" per user, and since their sign-up date is in the usernames table (I think), it ought to be relatively easy to calculate the top users by Metafilter Contribution Index (being just activities/day).

    I had actually already thought about doing that to see who the biggest lunatics most active contributors are. So it's neat that others have already thought of this and built a tool to calculate it for individual users.
    "

    FishBike, I might be misunderstanding your comment, but you do know that the Contribution Index site already has multiple top lists, right? It includes rankings for most posts, most comments, and most contributions per day. I only linked to the Index's search page above, and from your last comment it sounded like, although you knew you could find the numbers by searching for individual people, you didn't know that rankings were already available and that you were fixing to calculate them yourself.
    posted by Rhaomi at 9:20 PM on August 13, 2009


    Those lists are static; they don't reflect the current data.
    posted by ocherdraco at 9:23 PM on August 13, 2009


    Good point, ocherdraco, I hadn't noticed the difference. Carry on, then!
    posted by Rhaomi at 9:30 PM on August 13, 2009


    FishBike, can you do another one of these?
    posted by not_on_display at 9:37 PM on August 13, 2009


    Cool, now instead of admitting that I didn't notice the lists already on that site, I can pretend I noticed them and realized they weren't real-time like the single-user inquiry is.

    It's worth re-doing them anyway, if only because it then leads to other interesting "highest (x) per day" queries. Who favorites the most items per day? Who receives the most favorites per day? I might try some of these, unless someone else runs them first of course.
    posted by FishBike at 9:38 PM on August 13, 2009


    Genius index -- mine is 30.
    posted by empath at 9:41 PM on August 13, 2009


    Can someone compute the genius index for everyone, i'm dying to know the top 20 now.
    posted by empath at 9:45 PM on August 13, 2009


    On the other hand, Hirsch acknowledges that the h-index has its own intrinsic weaknesses. It's kind to older folks, for example, but not great to younger scientists.

    Yeah, exactly. It strikes me that there must be some way to deal with this—either introduce a scaling factor for total number of contributions, or examine the data in discrete chunks, or something.

    so, does this mean we can have an updated "MetaFilter:" tagline list?

    Heh. That's it's own thing, but I probably have the code sitting around somewhere so I'll see if I can find it and re-run it.
    posted by cortex (staff) at 9:57 PM on August 13, 2009


    Can someone compute the genius index for everyone, i'm dying to know the top 20 now.

    Jeez, you people come up with some complicated requests sometimes. I think I have managed to do this though, at least for the "genius index" based on posts only (not comments):
    Anonymous: 50
    Kattullus: 38
    blahblahblah: 34
    nickyskye: 32
    netbros: 29
    jonson: 29
    homunculus: 28
    flapjax at midnite: 28
    jbickers: 26
    vronsky: 26
    madamjujujive: 25
    miss lynnster: 24
    empath: 24
    orthogonality: 24
    Miko: 24
    not_on_display: 24
    Effigy2000: 23
    stbalbach: 23
    Artw: 23
    dersins: 23
    As I understand it, the basic algorithm is to take all your posts, rank them in descending order of favorites received, and then find the last one where the favorites count is equal to or higher than that post's rank on your list. Whichever entry on the list that is, that's your genius index.

    Getting the favorites count for each post is trivial, of course, because it's right there in the Infodump already. But I wondered, how am I going to figure out where each post is on each user's personal list of most-favorited posts?

    What I did was compute, for each post, how many 'better' posts have been made by the same user. Where better means more favorites given, or the same number of favorites given but a lower postid (to break ties).

    Create a temporary table of only those posts where the number of better posts is less than or equal to the number of favorites it received. Find the maximum "number of better posts" in this table for each user, and sort in descending order. Take the top 20 and this is what I get.
    posted by FishBike at 10:14 PM on August 13, 2009 [1 favorite]


    nicely done.

    So the person who is the biggest genius of metafilter says the genius stat doesn't mean anything :)
    posted by empath at 10:20 PM on August 13, 2009


    FishBike, can you do another one of these?

    Yup. So this is an updated version of the "MetaFilter Mutual Appreciation Society", or at least one variation of it. This one shows the percentage of each user's total favorites given to the other user, and then ranks the list by the lowest percentage of either user (all possible pairings of users are considered and ranked).
    aleahey [21.17%] ---- [37.36%] ginagina
    chococat [8.87%] ---- [8.19%] micayetoca
    jessamyn [5.97%] ---- [4.72%] not_on_display
    sleepy pete [4.09%] ---- [6.25%] cog_nate
    Artw [3.85%] ---- [30.19%] fearfulsymmetry
    Stynxno [40.61%] ---- [3.65%] ThePinkSuperhero
    melissa may [3.60%] ---- [6.13%] sleepy pete
    loquacious [4.51%] ---- [3.48%] Ambrosia Voyeur
    NortonDC [9.82%] ---- [3.05%] onlyconnect
    jonmc [2.85%] ---- [4.08%] Divine_Wino
    snsranch [2.81%] ---- [4.74%] micayetoca
    matteo [3.68%] ---- [2.41%] Mayor Curley
    sleepy pete [2.20%] ---- [7.33%] micayetoca
    ColdChef [2.16%] ---- [4.56%] yhbc
    yerfatma [2.77%] ---- [2.12%] Mayor Curley
    ThePinkSuperhero [3.16%] ---- [2.02%] hermitosis
    languagehat [2.03%] ---- [1.99%] Kattullus
    BitterOldPunk [1.81%] ---- [3.53%] nola
    nickyskye [1.58%] ---- [2.81%] homunculus
    nickyskye [1.53%] ---- [2.29%] flapjax at midnite
    posted by FishBike at 10:25 PM on August 13, 2009 [1 favorite]


    empath: So the person who is the biggest genius of metafilter says the genius stat doesn't mean anything :)

    Hah!

    Well, looking at that list it's all users who've made a lot of posts since the favorites system was implemented. I wouldn't be surprised if there was a correlation between number of posts since then and your "genius" number.
    posted by Kattullus at 10:40 PM on August 13, 2009


    Clever workaround! Unfortunately it discriminates against users who derive the lion's share of their favorites through comments as opposed to posts, which I suspect would include the majority of Mefites. For instance, my number is 25, but only four of the items constituting that number are posts -- the other 21 contributions are comments. So as it is, only hardcore FPP factories qualify, which, while a good measure of site impact, excludes a universe of equally-valid contributions.

    Would generating an index number for a combination of posts and comments even be possible via the Infodump? If not, would it be possible to automate its calculation on a user-by-user basis like the Contribution Index does? It pulls posting data and account age from people's profiles on-the-fly. Maybe a more sophisticated script could scan through people's "Popular Favorites" pages, counting each entry on the way. It shouldn't be too intensive, as the ceiling appears to be around 40, or two full pages from the Popular Favorites listing (each page has 20 items).
    posted by Rhaomi at 10:48 PM on August 13, 2009


    Something to note about Katallus's posting history. Most of his FPP's are single links with just a few words. There's something to be said for brevity.
    posted by empath at 11:01 PM on August 13, 2009


    Arr, doh, i'm an idiot.. the 'popular favorites' page just shows the post title, lol.
    posted by empath at 11:03 PM on August 13, 2009


    Yeah, if anything I'm wordy :)

    Though I almost never use the more inside field.
    posted by Kattullus at 4:05 AM on August 14, 2009


    Update to my deleted threads by admin:

    I hadn't listed non-attributed deletions, or deletions from pb.

    pb 29
    vacapinta 87
    mathowie 343
    jessamyn 678
    cortex 1527
    unknown 2496

    Total: 5190 which is 16% of all posts (which is surprisingly high, maybe my numbers are off?)
    posted by gwint at 5:22 AM on August 14, 2009


    Alison, one thing I've started looking into since I've been taking linguistics/corpora experimentation seminars at the LSA Institute for the last several weeks, are the different web-based corpus search sites and software packages that are out there.

    Let me tell you, not all corpora are created equal. Natural language Q&A researchers we have different needs in a corpus than someone, say, involved in statistical machine translation (bilingual corpora, plus monolingual corpora for building language models), or data extraction (big corpus, possibly limited knowledge domain, preferably with some annotations for training). The idea is to develop a system that any human can ask a question in plain language, even an open ended one, and get some sort of acceptable answer.

    Now, this might not seem very useful if you can just google for something, but there are certain subject areas where people might have a harder time finding information because they don't have the right words, the medical field is especially prone to this for laymen. There are varying methodologies to achieve systems that are able to answer these questions; some are better than others and most of the better ones require a lot of human power (expensive and slow) to get working and as a result can only be applied to a small domain of information. We need better stuff, better data, in order to get closer to the goal of answering all kinds of questions automatically and well.

    The data we would get from AskMefi is very, very special and very rich for natural language Q&A researchers. First, there's the simple fact that the data is arranged so that we have a list of questions and a list of answers to each question. Secondly, most of these answers are annotated with favorites and best answer markings so I can separate the the good stuff from just the stuff. Questions are even further annotated with tags and categories so it's even easier to compare apples to apples.

    Lastly, a lot of the really good corpora aren't free; they cost thousands of dollars. Well, let's just hope that Matt and Cortex keep it affordable.
    posted by Alison at 5:28 AM on August 14, 2009


    jessamyn [5.97%] ---- [4.72%] not_on_display

    That's sweet! :-)
    posted by Devils Rancher at 5:44 AM on August 14, 2009 [1 favorite]


    Huh. It really is a popularity contest.
    posted by plinth at 5:47 AM on August 14, 2009


    On my to-do list for this evening is to put the SQL table creation and data load scripts on my web site. I'll post a link to it here, since several people seem to be interested in these.

    Would generating an index number for a combination of posts and comments even be possible via the Infodump?

    It's relatively easy. I'll just create a table that contains a combination of posts and comments history and I'll pull in the favorites counts. Then I can just re-run what I did last night, with the source being this "contributions" table instead of the comments table. That'll work for quite a few of these stats so it's worthwhile. Also on the to-do list for this evening.
    posted by FishBike at 5:56 AM on August 14, 2009


    Could somebody please crunch these numbers to the point where it looks like I am popular and/or and influential?
    posted by dirtdirt at 5:57 AM on August 14, 2009 [2 favorites]


    I've developed a proprietary metric of popularity and influentialness and run it on the DB.

    1. DU [100%]
    2. dirtdirt [95%]

    Everyone else fell into the margin of error.
    posted by DU at 6:10 AM on August 14, 2009 [5 favorites]


    Could somebody please crunch these numbers to the point where it looks like I am popular and/or and influential?

    I would also appreciate some form of list in which I rank prominently, thanks in advance.
    posted by Meatbomb at 6:20 AM on August 14, 2009


    DU, I applaud your continued advances in the field of statistics.
    posted by dirtdirt at 6:26 AM on August 14, 2009


    list of Meatbombs:
    1. Meatbomb

    (p = 0.00001)
    posted by Cold Lurkey at 6:29 AM on August 14, 2009 [5 favorites]


    Ctrl-F Optimus Ch          1 of 2

    goddamn it
    posted by Optimus Chyme at 6:42 AM on August 14, 2009 [1 favorite]


    So is this a geek thing or a nerd thing? I just want to be sure which subset of the MeFi population I'm dealing with here.


    Seriously, I'm as far from a stat person as you can imagine, but this discussion is very interesting, even if I can only follow about half of it. I just like seeing the top 20 list of whatevers.
    posted by slogger at 6:52 AM on August 14, 2009


    This is clearly a data nerd thing. This would only be a data geek thing if it involved figuring out something like who got mentioned more often, Captain Kirk or Darth Vader.
    posted by Kattullus at 6:55 AM on August 14, 2009


    No, this is a geek thing because it's amateur. If we were doing it professionally, that would be nerdy.

    Arguing about Star Trek isn't a data geek thing, it's a Data geek thing.
    posted by DU at 7:13 AM on August 14, 2009 [2 favorites]


    Arguing about Star Trek isn't a data geek thing, it's a Data geek thing.

    No way! TNG hasn't held up as well as TOS.
    posted by robocop is bleeding at 7:18 AM on August 14, 2009


    robocop is bleeding: "Arguing about Star Trek isn't a data geek thing, it's a Data geek thing.

    No way! TNG hasn't held up as well as TOS.
    "

    Watching TNG on a modern TV is like watching "Multimedia CR-ROM" from 1995. TOS still looks kitschy and cool.
    posted by minifigs at 7:31 AM on August 14, 2009 [1 favorite]


    But TNG had holodecks!

    Holodecks, people!
    posted by not_on_display at 7:34 AM on August 14, 2009


    Watching TNG on a modern TV is like watching "Multimedia CR-ROM" from 1995.

    I've been watching TOS with my kids (a "next generation" of nerds, if you will). I thought about switching to TNG, but a) I don't like TNG nearly as much and b) I figured this "multimedia CDROM" issue would be true.

    Also, if you consider the project to be an education into nerd/SF tropes, then TOS has way more/better examples of them than TNG.
    posted by DU at 7:40 AM on August 14, 2009


    Voyager, baby.
    posted by dirtdirt at 7:48 AM on August 14, 2009


    DS9, clearly.
    posted by Kattullus at 7:50 AM on August 14, 2009 [2 favorites]


    Vader weighs in.
    posted by brain_drain at 7:51 AM on August 14, 2009


    Oh god. I have to bookmark this for later, because I will get sucked into a 48-hour coding marathon that produces lots of awesome stats, but will prevent me from ever getting out of grad school. Later, methinks.
    posted by chrisamiller at 8:29 AM on August 14, 2009


    Putting this into Origin 8 makes it look all academic and serious. I wish I knew how to work this program.
    posted by geoff. at 9:14 AM on August 14, 2009


    Update to my deleted threads by admin:

    pb 29
    vacapinta 87
    mathowie 343
    jessamyn 678
    cortex 1527
    unknown 2496


    Oh god that's a relief. I know I do a lot of the post deletions at this point, but I was gonna be goddamned if I'd deleted well more since I came on board than both Jess and Matt had deleted combined since the site started. Clearly I've only deleted well more than both of them combined since I started, which, heh. That's still interesting.

    Attribution of reasons started very shortly after I came onboard, specifically in fact as a result of the "what, cortex is a mod?" discussions that popped up around the beginning of March 2007. So the "unknown" set consists, with maybe the exception of one or two posts, of deletions by Matt and Jess. I should probably add a note about this to the wiki; I have this vague aspiration to get as much specific meta-information about the structure of this data as I can into documentation, since (a) I think stuff like that is interesting and (b) it may be useful to info-spelunkers to know what's know and what's unknown about some of these things.

    As far as I know, there's no way to be sure who deleted what between Matt and Jess in the pre-attribution deletions, but I've always wondered whether it'd be possible to separate the two a little bit through some text analysis.

    Which seems like a Hard problem for a variety of reasons, but could be fun: given the existing corpus of attributed reasons, someone could try to build up a profile for both Matt and Jess and then run that against the unknown stuff. Maybe introduce a cap on what needs analyzing to the approximate time that Jess came on board, which would also introduce some extra Matt-specific data for any of the reason-laden deletions that preceded that.

    Total: 5190 which is 16% of all posts (which is surprisingly high, maybe my numbers are off?)

    It's high but not crazy high. I was carrying around a number like 10% in my head as an approximation, based on some numbers I ran for Jessamyn for a talk she was giving a while back. Some days we delete nothing, most days we delete something, some days we delete a lot of things. Even if you assume like 20 posts a day and an average of one deletion on a normal day, it only takes one big day a week (say, an Brownian obit storm, or a quadruple about some news thing, or a couple extra spammers, or or or...) to drive the average for the week up.

    Factor in some test posts, too, and I don't find 16% unbelievable. Feels a bit high, but not by an order of magnitude or anything.

    You could sanity check by running the same analysis on askme, where the deletion rate should be much lower. I'd guess like 3-4% there.

    Metatalk can be checked as well, though because the database originally just nuked metatalk posts instead of hiding them like it does now (and like the other subsites had for a very long time), you need to actually look for gaps in postids for metatalk up until the point deletions reasons started being deleted and count most of THOSE as deletions. But! Since Meta and Ask shared a code base from the late-3000s to somewhere in I think the 8000s, you have to be sure to disregard couch-surfing Ask posts (category id 10) from the count altogether, too.
    posted by cortex (staff) at 9:24 AM on August 14, 2009


    Another thought: I'd like to see a breakdown of deletions, binned by perpetrator, graphed against time.
    posted by cortex (staff) at 9:26 AM on August 14, 2009


    You know, I should really include closure data for Metatalk threads, since that's a further confounding factor in deletion-rate analysis. Because Metatalk threads were poof, gone forever upon deletion, the bar for deletion there was much higher, which created a greater incentive to close-rather-than-delete in Metatalk when a thread needed to be shut down.

    I'd be interested to see how the closure rate changed once we implemented proper delete-but-don't-nuke functionality in, I think, late 2007. Obvious guess is that deletion rate went up and closure rate went down, in a roughly proportional fashion, but that's just an assumption.

    Perhaps I should change the postdata_meta.txt output so that the "deletion" column can contain, along with 0-for-undeleted and 1-for-deleted, a 2-for-closed value. That way the format stays the same, with just the need to check for a third value explicitly if folks want to account of all three cases vs. checking for does-or-does-not-equal-1 if they're looking just for deleted or not. Might bite someone inattentive on the ass if they just check for boolean truth-as-non-0, of course, but data ain't easy, baby.

    (I'm not sure if anything's ever been first closed and THEN deleted. I suppose I could take a look and if I find any include a 3-for-deleted-and-closed value as well. And then people can use bitmasks! Yeah!)
    posted by cortex (staff) at 9:35 AM on August 14, 2009


    Another thought: I'd like to see a breakdown of deletions, binned by perpetrator, graphed against time.

    By "perpetrator", do you mean the moderator who deleted the thread, or the user who posted it originally?
    posted by FishBike at 9:44 AM on August 14, 2009


    Ah, heh, I mean the mod in question.

    An interesting related graph, though, might be deletions by datestamp of post (i.e. time the post was made) vs. time of last comment, if any, in the deleted post, to get a sense of (approximately) when the deletions occur as well as the average lifetime of a deleted post as a function of when it was posted.
    posted by cortex (staff) at 9:48 AM on August 14, 2009


    yeah, we started tracking who deleted what officially on the admin side in May 2007, probably about the time we started appending signatures to deletion reasons. So that data probably isn't much help but something worth checking.
    posted by pb (staff) at 10:16 AM on August 14, 2009


    Ah, yes, the secret admin-activity stats. Another thing that isn't gonna go in the Infodump, though I might do some analysis myself sometime for fun, since it would give us more specific info about time-of-deletion than would the time-of-last-comment estimate I suggested above.
    posted by cortex (staff) at 10:23 AM on August 14, 2009


    And here's a February 28th, 2007 thread in which attribution reasons are requested and Matt threatens to whip it out up.
    posted by cortex (staff) at 10:32 AM on August 14, 2009


    Fun fact*: pb is responsible for all of the unattributed deletions. Because pb is The Digital Assassin.

    * for certain values of "fact"
    posted by Rhaomi at 11:03 AM on August 14, 2009


    Sure, you people may have a lot of favorites, but I have this. Who's crying now?!?!
    posted by blue_beetle at 11:04 AM on August 14, 2009 [1 favorite]


    Brownian obit storm

    I hate to imagine what that day is going to look like. You know what even a semi-important celeb's death usually gets 4 or 5 deleted obits, I bet that James Brown might get enough to overload the server and kick-start the Singularity.
    posted by Meatbomb at 11:31 AM on August 14, 2009 [2 favorites]




    Meta.
    posted by cortex (staff) at 11:47 AM on August 14, 2009


    Galvatron's visualizations are particularly awesome. Can someone tell me what happened to anonymous toward the beginning of 2005 as depicted here: www.pessimization.com/random/mefi-stream/2009/08/askme-posts-3000.png and also please tell me what the hell the name for that kind of representation of data is?


    I wonder if the data from blogs could be converted somehow into RPG player attributes?
    And updated daily, that would make for an interesting website, half blog half game where the data associated with your nickname would translate into your avatar. Whatever system you chose: favorites, karma, some point system using clickthroughs from non member/players to your post, you would have to control by ip and or login somehow. Favoriting a post/comment would be broken into categories like intelligence, strength, imagination and so on and it would translate to your avatar's attributes. If, for example, you wanted to add "intelligence" points to your avatar you would make a post/comment that was likely to be favorited as "intelligent". Post favorites would carry more weight than comment favorites. Posts, comments and favorites each member/player could make would be limited in number in order to ensure the member/player would try to make posts and comments of high quality and spend favorites only on posts and comments of high quality. Adjusting the weight given to clickthroughs by non member/players would serve as a governor to control the degree to which alliances influenced the game portion of the website. To further control this you could manipulate if and how many favorites could be spent on oneself. This website idea seems most natural for demos like /., using something like WoW for the game half. It will be harder to create a game that would attract a more broadly interested nontech crowd, maybe something similar to the Sims or Second Life. Lots of interesting dynamics for both halves of the website and how they influence each other. Just a rough sketch. Now where did I leave my meds?

    Has this been tried or done before and I just never heard of it?
    Sorry for the thinking out loudish tone
    posted by vapidave at 11:48 AM on August 14, 2009


    This bit of goodness and the money mathowie sent to the guy with the medical bills were all the final straw... I sent in a $25 donation.
    posted by crapmatic at 11:55 AM on August 14, 2009


    Finally: statistics to prove that I'm the 9th best commenter here.
    posted by Damn That Television at 11:55 AM on August 14, 2009


    Has this been tried or done before and I just never heard of it?

    I think that's sort of the idea behind Justin Hall's Nethernet. You progress through the game by using social tools to interact with other players. Not sure how attributes and all that work, but it sounds like a similar idea.
    posted by pb (staff) at 11:55 AM on August 14, 2009


    It's a neat idea, vapidave. I've had similar thoughts before—the idea of community website as fantasy sports league is another way of framing it.

    But it's also something I wouldn't want to let anywhere near Metafilter if I could help it. Giving folks incentives to game the mechanics of the site or their contributions is a weird a dangerous territory to get into—take the argument that favorites has some measurable effect on site activity and multiply it by a thousand.

    I've found it interesting to look at how some of these things have played out on Stack Overflow, actually, since they've got a highly visible reputation system (the mefi equivalent of having e.g. favorite count listed next to everyone's username on their byline) as well as some game-like "achievement" Badges for specific sorts of quantifiable behavior.

    And I think SO is a neat place and actually doing an okay job of achieving it's general goal of being a solid Q&A site for programmers, but these game-like aspects are unavoidably wrapped up in the site culture there, and they've had to deal with internal discussions about things like whether some of the badges are actually encouraging bad behavior—there's a tag-related badge, for example, that rewards people for using new tags, which on the one hand can be seen as an incentive toward actively building up the local tag taxonomy, but on the other hand can be seen as an incentive to litter, by looking for excuses to use novel-and-unnecessary tags in order to secure a shiny reward from the site.

    It's fascinating stuff, but it worries me.

    On the other hand, something with less active feedback but playing with the same ideas—perhaps just a snapshot analysis tool, or something focused on historical site content that's already static—could be cool without venturing into the same tricky territory.

    I've mentioned a few times before the notion of building up a Metafilter Baseball Card app, for example, that'd just turn someone's site activity to date into a stats-laden Topps card, maybe, and that's something I really do want to make happen one of these days.
    posted by cortex (staff) at 12:03 PM on August 14, 2009


    Another, less rigorous view of Mefi stats: Google Trends' profile of the site

    From this data, it looks like we hit a local maximum of traffic in late August/early September of 2008 -- right around the introduction of McCain's VP (and the burgeoning thread that followed). It's all been downhill from there. As such, I dub this moment Peak Palin.

    Also, the "Also Visited" chart suggests that Mefites are a bunch of Mac-heads -- five of the top six related sites are Apple-centric. Of course, the "Also Searched" chart kind of casts doubt on the whole affair -- though the high placement for "recursion" sounds about right.
    posted by Rhaomi at 12:15 PM on August 14, 2009 [1 favorite]



    As far as I know, there's no way to be sure who deleted what between Matt and Jess in the pre-attribution deletions, but I've always wondered whether it'd be possible to separate the two a little bit through some text analysis.


    I know you guys try to cover for each other's absences, so I'd suggest using their activity on the site at the time of deletion. (if times aren't in the dump, go by the time of the last comment in the thread). If Jess was around and making comments, but there was no sign of mathowie (or vice versa), you might make a pretty strong case for who it was likely to be.
    posted by chrisamiller at 12:22 PM on August 14, 2009


    Ooh, that's a clever idea, chrisamiller.
    posted by cortex (staff) at 12:25 PM on August 14, 2009


    If you don't have an average of at least 3 favorites per comment you are subhuman trash that I literally cannot even see.
    posted by Damn That Television at 12:29 PM on August 14, 2009 [1 favorite]


    So that's why you wouldn't talk to me when I showed up at your house with a giftbasket full of money.
    posted by Kattullus at 12:37 PM on August 14, 2009


    kattullus, i'll glady take that giftbasket. sooner rather than later, and in small bills please.
    posted by the aloha at 1:07 PM on August 14, 2009


    Huh? James Brown is dead?
    posted by Meatbomb at 1:10 PM on August 14, 2009


    God. Imagine if Michael Jackson ever died!
    posted by Astro Zombie at 1:24 PM on August 14, 2009 [1 favorite]


    Michael who?
    posted by not_on_display at 1:29 PM on August 14, 2009


    Best Answers per day: Top 50

          1 Chocolate Pickle                        0.276859504132231
          2 paulsc                                  0.239698200812536
          3 Conrad Cornelius o'Donald o'Dell        0.21
          4 valkyryn                                0.208872458410351
          5 damn dirty ape                          0.197416974169742
          6 flabdablet                              0.19615832363213
          7 gjc                                     0.185121107266436
          8 rokusan                                 0.183632734530938
          9 ikkyu2                                  0.183168316831683
         10 EmpressCallipygos                       0.174840085287846
         11 Kadin2048                               0.172356369691923
         12 Forktine                                0.17223382045929
         13 LobsterMitten                           0.169166666666667
         14 jamaro                                  0.16711590296496
         15 burnmp3s                                0.166402535657686
         16 jessamyn                                0.164084911072863
         17 desjardins                              0.161497326203209
         18 FishBike                                0.15819209039548
         19 aquafortis                              0.152380952380952
         20 pseudostrabismus                        0.148867313915858
         21 le morte de bea arthur                  0.143817204301075
         22 iconomy                                 0.142857142857143
         23 grouse                                  0.141482739105829
         24 Class Goat                              0.141025641025641
         25 PhoBWanKenobi                           0.135064935064935
         26 Mike1024                                0.133177570093458
         27 DarlingBri                              0.130979498861048
         28 Miko                                    0.128546612623046
         29 Cool Papa Bell                          0.124267291910903
         30 VikingSword                             0.121212121212121
         31 Fuzzy Skinner                           0.11969111969112
         32 Inspector.Gadget                        0.118374558303887
         33 rhizome                                 0.117504051863857
         34 peanut_mcgillicuty                      0.116788321167883
         35 hiteleven                               0.111111111111111
         36 rtha                                    0.11100569259962
         37 jon1270                                 0.10930576070901
         38 chrisamiller                            0.108974358974359
         39 GuyZero                                 0.107094133697135
         40 charmcityblues                          0.105839416058394
         41 mdonley                                 0.105082417582418
         42 Blazecock Pileon                        0.103345724907063
         43 x46                                     0.102564102564103
         44 Houstonian                              0.101101101101101
         45 phunniemee                              0.1
         46 thebazilist                             0.0995850622406639
         47 -harlequin-                             0.0979064039408867
         48 salvia                                  0.0961986035686579
         49 Rhomboid                                0.0958811613774477
         50 box                                     0.0932642487046632


    Notes:
    * score = Best answers / Days of activity
    * limited to users who have been around for at least 90 days
    * It'd be interesting to break this down by tag or category.
    * Is there a "date account disabled" somewhere in the dump. I know, for example that ikkyu2 closed his account a while back, which means his score is skewed - it should be higher.
    posted by chrisamiller at 1:49 PM on August 14, 2009 [1 favorite]


    Also, take this with a grain of salt. As has been pointed out before, the people marking Best Answers are often, by definition, the people least qualified to judge whether they're accurate.
    posted by chrisamiller at 1:55 PM on August 14, 2009


    For the supposedly hardest working man in showbusiness, Mr Brown's recent output has been disappointing.
    posted by East Manitoba Regional Junior Kabaddi Champion '94 at 1:55 PM on August 14, 2009


    vapidave: Can someone tell me what happened to anonymous toward the beginning of 2005 as depicted here: www.pessimization.com/random/mefi-stream/2009/08/askme-posts-3000.png

    It looks like some sort of a glitch in the data set, actually. "Anonymous" didn't start until October 2004, which is probably where the pinch point is, so I'm not sure what's going on with the bulge to the left of that. If I have time tomorrow, I might dig a little deeper.

    and also please tell me what the hell the name for that kind of representation of data is?

    "Stream graph" seems to be the common term.
    posted by Galvatron at 2:18 PM on August 14, 2009


    If you don't have an average of at least 3 favorites per comment you are subhuman trash that I literally cannot even see.

    We're about to become really good friends, aren't we?
    posted by Pater Aletheias at 2:29 PM on August 14, 2009


    I've had similar thoughts before—the idea of community website as fantasy sports league is another way of framing it.

    The social-networking-as-competition aspect is really interesting for both its intended and unintended effects. Foursquare is another great example.
    posted by Combustible Edison Lighthouse at 2:30 PM on August 14, 2009


    Ctrl-F The Whe

    goddamn it
    posted by The Whelk at 2:33 PM on August 14, 2009


    Is there a "date account disabled" somewhere in the dump. I know, for example that ikkyu2 closed his account a while back, which means his score is skewed - it should be higher.

    There isn't, no. You could fudge it by noting date of last comment (or go fancier and do last measured activity, across posts and comments and favorites).

    Other approaches for deskewing the BA/day stats:

    1. Divide by total askme answers. A high raw BA/day value isn't as impressive from someone who makes 10 answers a day as it is from someone who only makes 1 answer a day, for example, so you need some way to account for the shotgunning factor of higher-volume answerers.

    You could call this the normalized BA/day rate. Instead of expressing raw BA productivity, it expresses BA productivity as a function of total output.

    2. Count only days on which the user actually provides one or more answers. If someone answers one question every day and gets one BA for every seven answers, and someone else answers only one question a week but gets a BA every time, they have the same raw BA/day score despite the second person having a much, much higher hit rate.

    You could call this the Good Hair Day metric—folks who do well at it are by implication more likely to answer only when they're able to hit the nail on the head.
    posted by cortex (staff) at 3:14 PM on August 14, 2009


    For people who wanted the SQL scripts to create an Infodump database and load the text files into it, I put together a super-cheap Google Sites page with these things and some really basic instructions:
    http://sites.google.com/site/fishbikeonmefi/infodump-stuff/sql-scripts
    I'd love to hear if this works, or doesn't work, for people.
    posted by FishBike at 3:19 PM on August 14, 2009 [2 favorites]


    I'm #36? Seriously? Not that I don't know stuff, because I do, but to appear in the top 50 of best answers/days of activity means either I spend too much time on askme or....There was an article sometime in the last couple of months, maybe in the NYT, about how people believed people who answered their questions in the most authoritative way, even if the answer was wrong, and even if the person answering was known to be wrong a lot.

    Of course, I am always correct in the answers I give in askme.

    Now I have to go look for that article.
    posted by rtha at 3:30 PM on August 14, 2009


    Some more top-20 lists, along similar lines to yesterday's. These are the top 20 by "favorites per contribution", combining all posts, comments, etc., together.

    First, the top 20 for users with 100 or more contributions:
    8.74:Christ, what an asshole [2361 favorites on 270 contributions]
    5.90:Upton O'Good [1150 favorites on 195 contributions]
    5.18:limon [1731 favorites on 334 contributions]
    5.11:giggleknickers [670 favorites on 131 contributions]
    4.67:cthuljew [537 favorites on 115 contributions]
    4.60:Anonymous [30601 favorites on 6654 contributions]
    4.27:Balonious Assault [778 favorites on 182 contributions]
    4.19:Parasite Unseen [1783 favorites on 426 contributions]
    4.18:mintchip [711 favorites on 170 contributions]
    4.14:Kiablokirk [579 favorites on 140 contributions]
    4.11:Rinku [650 favorites on 158 contributions]
    4.05:dyoneo [490 favorites on 121 contributions]
    3.99:asavage [878 favorites on 220 contributions]
    3.90:katala [448 favorites on 115 contributions]
    3.82:btkuhn [477 favorites on 125 contributions]
    3.70:East Manitoba Regional Junior Kabaddi Champion '94 [6115 favorites on 1654 contributions]
    3.70:chrisalbon [595 favorites on 161 contributions]
    3.61:mhoye [451 favorites on 125 contributions]
    3.59:AceRock [901 favorites on 251 contributions]
    3.51:Damn That Television [1617 favorites on 461 contributions]
    And again but for users with 1000 or more contributions:
    4.60:Anonymous [30601 favorites on 6654 contributions]
    3.70:East Manitoba Regional Junior Kabaddi Champion '94 [6115 favorites on 1654 contributions]
    3.30:blahblahblah [4002 favorites on 1214 contributions]
    3.16:Pater Aletheias [3900 favorites on 1234 contributions]
    3.06:Avenger [5320 favorites on 1738 contributions]
    3.04:Dee Xtrovert [3540 favorites on 1163 contributions]
    3.01:netbros [7239 favorites on 2407 contributions]
    2.84:Greg Nog [6500 favorites on 2287 contributions]
    2.83:Pastabagel [12164 favorites on 4301 contributions]
    2.60:adipocere [6645 favorites on 2553 contributions]
    2.59:Rhaomi [3899 favorites on 1503 contributions]
    2.47:Mutant [3926 favorites on 1590 contributions]
    2.46:jbickers [3472 favorites on 1410 contributions]
    2.39:NDó [7032 favorites on 2944 contributions]
    2.32:felix betachat [3964 favorites on 1710 contributions]
    2.22:Astro Zombie [21837 favorites on 9838 contributions]
    2.04:Joe Beese [5738 favorites on 2816 contributions]
    2.02:The Straightener [4823 favorites on 2392 contributions]
    2.00:Horace Rumpole [2166 favorites on 1085 contributions]
    1.98:allkindsoftime [3708 favorites on 1875 contributions]
    (scientific proof that Anonymous rules!)
    posted by FishBike at 3:44 PM on August 14, 2009


    Random analysis idea, really kind of a restatement of this thought from the old Mutual Appreciate thread but without the mutuality focus:

    Analyze any given user's favorites in terms of how many other favorites were given to the things they favorited. Call it an Esoteric Index (or a Hipster Index if you're nasty): a lower average number of total favorites per favorited item implies somewhat less mainstream tastes in favorited content.

    Another idea: an Early Worm index, that expresses a user's average placement in the list of favoriters on those things that they favorite. A lower average placement means they tend to get to things sooner.

    That idea could be extended just as well to commenting, actually—for each thread that a given user commented in, how long (in either time, which is probably easier to calculate, or in terms of number of comments into the thread) is it before they make their first comment?
    posted by cortex (staff) at 3:49 PM on August 14, 2009


    A "talked about" index - top 20 most frequent mentions of a username within posts / comments on the site.
    posted by Meatbomb at 4:18 PM on August 14, 2009


    Analyze any given user's favorites in terms of how many other favorites were given to the things they favorited. Call it an Esoteric Index (or a Hipster Index if you're nasty): a lower average number of total favorites per favorited item implies somewhat less mainstream tastes in favorited content.

    I like it. So the 20 users with the numerically highest Esoteric Index are:
    95.03: robot [10548/111]
    94.12: finallymarki [11859/126]
    91.51: jprind [10158/111]
    89.34: badrolemodel [15098/169]
    88.35: ilona [8835/100]
    88.71: Shutter [17121/193]
    88.39: curbstop [21655/245]
    87.73: adrianhon [9650/110]
    86.39: Earl the Polliwog [13477/156]
    86.42: m0nm0n [34137/395]
    84.61: aldurtregi [29699/351]
    83.62: Kupo? [8947/107]
    80.17: kookaburra [8338/104]
    80.48: jontyjago [8611/107]
    78.50: Christ, what an asshole [8870/113]
    76.03: aqhong [7679/101]
    74.23: imabanana [12470/168]
    73.69: socratic [9358/127]
    72.39: Parasite Unseen [7673/106]
    72.69: The GoBotSodomizer [7996/110]
    This considers only users who have favorited at least 100 things. The number on the left is the average number of favorites by all users given to items favorited by the named user. The numbers in square brackets just show how this is calculated (favorites by all users on items user x favorited/count of items user x favorited).

    Similarly, the 20 users with the lowest Esoteric Index (which seems to be the point of calling it this) are:
    1.24: Tennyson D'San [1875/1507]
    1.56: Horken Bazooka [205/131]
    1.23: watercarrier [209/170]
    2.45: frosty_hut [939/383]
    2.44: picaro [419/172]
    2.00: davey_darling [16470/8215]
    2.24: nofundy [2068/922]
    3.58: TeachTheDead [372/104]
    3.99: nameless.k [423/106]
    3.63: thilmony [635/175]
    3.39: chicainthecity [24505/7237]
    3.42: CorporateHippy [1162/340]
    3.04: Capri [913/300]
    3.88: jbenben [1518/391]
    3.75: gregb1007 [2293/612]
    3.98: zaphod [2026/509]
    3.06: x46 [738/241]
    3.01: mijuta [623/207]
    3.80: kuppajava [1716/452]
    3.03: Gungho [666/220]
    posted by FishBike at 4:23 PM on August 14, 2009 [1 favorite]


    Wow, something is really screwed up with the sort order there, isn't it?
    posted by FishBike at 4:25 PM on August 14, 2009


    Let me try the above again:

    20 highest by Esoteric Index:
    95.03: robot [10548/111]
    94.12: finallymarki [11859/126]
    91.51: jprind [10158/111]
    89.34: badrolemodel [15098/169]
    88.71: Shutter [17121/193]
    88.39: curbstop [21655/245]
    88.35: ilona [8835/100]
    87.73: adrianhon [9650/110]
    86.42: m0nm0n [34137/395]
    86.39: Earl the Polliwog [13477/156]
    84.61: aldurtregi [29699/351]
    83.62: Kupo? [8947/107]
    80.48: jontyjago [8611/107]
    80.17: kookaburra [8338/104]
    78.50: Christ, what an asshole [8870/113]
    76.03: aqhong [7679/101]
    74.23: imabanana [12470/168]
    73.69: socratic [9358/127]
    72.78: specialagentwebb [11863/163]
    72.69: The GoBotSodomizer [7996/110]
    20 lowest by Esoteric Index:
    1.23: watercarrier [209/170]
    1.24: Tennyson D'San [1875/1507]
    1.56: Horken Bazooka [205/131]
    2.00: davey_darling [16470/8215]
    2.24: nofundy [2068/922]
    2.44: picaro [419/172]
    2.45: frosty_hut [939/383]
    3.01: mijuta [623/207]
    3.03: Gungho [666/220]
    3.04: Capri [913/300]
    3.06: x46 [738/241]
    3.39: chicainthecity [24505/7237]
    3.42: CorporateHippy [1162/340]
    3.58: TeachTheDead [372/104]
    3.63: thilmony [635/175]
    3.75: gregb1007 [2293/612]
    3.80: kuppajava [1716/452]
    3.88: jbenben [1518/391]
    3.98: zaphod [2026/509]
    3.99: nameless.k [423/106]
    posted by FishBike at 4:27 PM on August 14, 2009


    I would also be fascinated to see some sort of geekery with these numbers that makes me seem more important.
    posted by desuetude at 4:42 PM on August 14, 2009 [1 favorite]


    2.00: davey_darling [16470/8215]

    The intersection of this with the known fact that thousands upon thousands of ThePinkSuperhero's favorites are from davey_darling creates some kind of thing that I don't know what it is but which is definitely a thing of some kind.
    posted by cortex (staff) at 4:51 PM on August 14, 2009 [1 favorite]


    Also, it's interesting that tehloki's massively-outlier favoriting volume manages not to land him on either of these lists. My gut assumption is that his Esoteric Index is lower than median, but I suppose there's no reason it would have to be significantly lower.
    posted by cortex (staff) at 4:52 PM on August 14, 2009


    Tehloki's love cannot be contained in your mere "lists".
    posted by The Whelk at 4:58 PM on August 14, 2009


    Tehloki's love cannot be contained in your mere "lists".

    Yes it can. His score is 7.55, so he's ranked #104 on the lowest Esoteric Index scale. Significantly lower than the median score of 24.10, just as cortex predicted.
    posted by FishBike at 5:01 PM on August 14, 2009


    like, whatever, you can prove anything with facts, man.
    posted by The Whelk at 5:04 PM on August 14, 2009 [4 favorites]


    The intersection of this with the known fact that thousands upon thousands of ThePinkSuperhero's favorites are from davey_darling creates some kind of thing that I don't know what it is but which is definitely a thing of some kind.

    Well, extending the results from some earlier queries, I see that ThePinkSuperhero averages 1.70 favorites per contribution. If we assume that pretty much 1.0 of those on average are from davey_darling, that means we should see a score of 0.7 for davey_darling if we calculate the Esoteric Index for only contributions made by ThePinkSuperhero. Because I'm calculating the EI based on average favorites received by things you favorited, NOT including the 1 favorite you gave it yourself.

    And when I run that query, I get:
    0.70: davey_darling [5280/7524]
    Look at that, it works! So favoriting all those posts of ThePinkSuperhero's must be dragging down davey_darling's EI, even though a 1.70 favorites/contribution average is pretty good. It's the non-selective nature of the favoriting that reduces the EI.

    So what would davey_darling's EI be without all those favorited contributions of ThePinkSuperhero? Here's what it looks like with those excluded:
    16.91: davey_darling [11190/691]
    posted by FishBike at 5:17 PM on August 14, 2009


    vapidave: Upon closer inspection, the AskMe graph looks reasonable. In March of 2005 there was only one anonymous question--way below the average. I'm guessing that Matt and Jessamyn allowed the approval queue to get a little backed up. Maybe the JRuns were being particularly vexing that month...
    posted by Galvatron at 5:41 PM on August 14, 2009


    Genius Index 2.0 - now including posts and comments:
    Anonymous: 50
    Astro Zombie: 45
    Pastabagel: 42
    orthogonality: 39
    Kattullus: 39
    Miko: 38
    cortex: 37
    netbros: 36
    jessamyn: 35
    DU: 34
    blahblahblah: 34
    Greg Nog: 34
    nickyskye: 33
    robocop is bleeding: 33
    NDó: 33
    East Manitoba Regional Junior Kabaddi Champion '94: 32
    Optimus Chyme: 32
    loquacious: 31
    Blazecock Pileon: 31
    adipocere: 31
    Also since both cortex and jessamyn appear on this list, my suckup index for this comment is 2.0
    posted by FishBike at 5:42 PM on August 14, 2009 [2 favorites]


    That idea could be extended just as well to commenting, actually—for each thread that a given user commented in, how long (in either time, which is probably easier to calculate, or in terms of number of comments into the thread) is it before they make their first comment?

    Time is indeed easier to calculate. Here are the top 20 users ranked by shortest average delay between posts being made and their first comment in those posts. Restricted to users with at least 100 comments to their name.
    0.63 hours: rancidchickn [121 comments]
    1.10 hours: AaRdVarK [251 comments]
    1.11 hours: saeculorum [150 comments]
    1.17 hours: Inspector.Gadget [1262 comments]
    1.24 hours: frieze [127 comments]
    1.28 hours: Oktober [254 comments]
    1.37 hours: BobbyVan [121 comments]
    1.41 hours: mrt [141 comments]
    1.42 hours: billysumday [1528 comments]
    1.43 hours: phil [256 comments]
    1.43 hours: Poolio [719 comments]
    1.43 hours: foodgeek [146 comments]
    1.53 hours: Chocolate Pickle [985 comments]
    1.55 hours: @troy [299 comments]
    1.58 hours: Sassyfras [570 comments]
    1.58 hours: mhoye [110 comments]
    1.59 hours: Class Goat [1283 comments]
    1.60 hours: torquemaniac [329 comments]
    1.65 hours: unexpected [281 comments]
    1.68 hours: kellyblah [211 comments]
    And the slowest on the draw appear to be:
    488.91 hours: browolf [106 comments]
    373.23 hours: phalkin [129 comments]
    365.40 hours: kaspen [128 comments]
    308.46 hours: mlang [191 comments]
    245.00 hours: FlyingMonkey [110 comments]
    237.56 hours: nicolin [484 comments]
    229.43 hours: Neale [488 comments]
    228.91 hours: Elysum [197 comments]
    227.32 hours: pedmands [104 comments]
    218.12 hours: Jaybo [122 comments]
    210.19 hours: spinturtle [154 comments]
    208.64 hours: kathryn [230 comments]
    208.54 hours: LeisureGuy [149 comments]
    207.53 hours: abc123xyzinfinity [247 comments]
    194.95 hours: BrnP84 [383 comments]
    190.85 hours: MUD [241 comments]
    182.42 hours: Saellys [102 comments]
    176.87 hours: mhh5 [126 comments]
    171.16 hours: jheiz [100 comments]
    169.64 hours: bloggboy [128 comments]
    These numbers amaze me so much that I worry there must be something wrong with the data or my method of looking at it. But picking the top of the latter list, browolf, I do seem to see a lot of comments made weeks and months after posts were made.
    posted by FishBike at 6:02 PM on August 14, 2009


    Huh, so, looking at this Metatalk thread about askme updates from Feb. 16, 2005, categories were apparently a late addition to AskMe. Which means that someone had to go back and categorize all of the existing questions by hand, I guess?

    Here's a thread a week later asking about back-categorization, in which there's a mention of back-tagging as well. Didn't find a concrete answer on the category thing at a quick glance, though.

    Also, on Encyclopedia Anonymous and the Mysterious Missing March:

    The feature disappeared temporarily as part of the Great Bugfix mentioned above, was still missing on March 21st, and then on April 16th Matt replied to a post requesting its return by saying he'll fix it on Sunday.

    Someone can go diving for the timing on that; I didn't see any obvious "it's back!" post while skimming the April archives on Metatalk, but I can't say for sure it didn't get declared somehow.

    Given that it was apparently down for all of March (along with the last half of Feb. and the first half of April), the sole March anonymous question is almost certainly one that was not originally anonymous at all but was requested to be made so by the asker in retrospect. Makes more sense than Matt approving one stray question in the middle of a drought.
    posted by cortex (staff) at 6:13 PM on August 14, 2009


    Notably, browolf's comments are basically all askme. I'd guess that Late Worms probably all have primarily askme activity; the likelihood of coming to a late askme and commenting seems greater to me than for mefi, setting aside even the powerful late-making outliers that would be months-late comments which aren't even possible outside of ask (and Music).

    Seeing that broken down by subsite might be neat.
    posted by cortex (staff) at 6:19 PM on August 14, 2009


    Seeing that broken down by subsite might be neat.

    Indeed. Top 10's by subsite, then, to keep this comment to a slightly less insane length.

    Ask
    493.55 hours: browolf [105 comments]
    259.62 hours: Eideteker [116 comments]
    256.25 hours: FlyingMonkey [104 comments]
    249.90 hours: nickyskye [131 comments]
    236.63 hours: LeisureGuy [131 comments]
    230.06 hours: Elysum [196 comments]
    178.79 hours: ostranenie [261 comments]
    175.58 hours: eritain [593 comments]
    163.97 hours: prodevel [104 comments]
    162.48 hours: mjao [118 comments]
    MeFi
    351.12 hours: mlang [104 comments]
    279.26 hours: j.edwards [190 comments]
    260.92 hours: Neale [353 comments]
    231.39 hours: MUD [186 comments]
    226.47 hours: phalkin [123 comments]
    203.87 hours: bloggboy [106 comments]
    198.03 hours: Sean Meade [102 comments]
    195.04 hours: werty [108 comments]
    183.20 hours: goneill [305 comments]
    156.58 hours: redfoxtail [129 comments]
    Meta
    230.31 hours: gleemax [135 comments]
    182.10 hours: moss [115 comments]
    172.83 hours: Neale [113 comments]
    115.38 hours: j.edwards [216 comments]
    104.27 hours: feelinglistless [235 comments]
    95.00 hours: gramschmidt [176 comments]
    83.68 hours: ook [248 comments]
    83.48 hours: waxpancake [126 comments]
    82.63 hours: davidmsc [106 comments]
    76.08 hours: iceberg273 [200 comments]
    Music
    791.77 hours: not_on_display [132 comments]
    587.19 hours: The Great Big Mulp [137 comments]
    567.15 hours: nicolin [156 comments]
    455.35 hours: BrnP84 [154 comments]
    432.55 hours: abc123xyzinfinity [117 comments]
    346.65 hours: ageispolis [173 comments]
    289.45 hours: Ynoxas [134 comments]
    250.54 hours: umb· [294 comments]
    232.25 hours: Jofus [120 comments]
    214.21 hours: ORthey [279 comments]
    posted by FishBike at 6:30 PM on August 14, 2009


    I'm surprised homunculus doesn't place on the "late to the party" index, considering how often I see him bumping old threads in my Recent Activity queue with single-link news reports and other miscellaneous updates.
    posted by Rhaomi at 6:41 PM on August 14, 2009


    Top 20 users named homumculus:
    21.02 hours: homunculus [6466 comments]
    ;)

    So, not even close to being on those earlier lists.
    posted by FishBike at 6:46 PM on August 14, 2009


    hom's late linking wouldn't necessarily be well-represented here anyway, since it's the time of the first comment, and not the last, that gets counted. So if he comments on the first day of a post and also strings in some links two weeks later, his Early Worm footprint is still only first-day stuff for that thread.

    And I'm not nearly so surprised by the numbers for Music—I think pretty much all the regulars over there end up going diving through the archives for one reason or another, and Music threads stay open forever, so you could do some pretty hefty average-weighting work by commenting in a two-year-old thread or whatever.

    I'd be curious to see what a distribution of answers vs. time looks like for some of these high achievers.
    posted by cortex (staff) at 6:55 PM on August 14, 2009


    Here's a bit more of a breakdown of the post-to-first-comment delay stats for Homunculus:
    0-1 hours: 1410 comments
    1-2 hours: 951 comments
    2-4 hours: 1092 comments
    4-8 hours: 946 comments
    8-24 hours: 1138 comments
    24-72 hours: 593 comments
    72 hours-1 week: 171 comments
    1 week +: 165 comments
    posted by FishBike at 6:57 PM on August 14, 2009


    For browolf, across all sites:
    0-1 hours: 2 comments
    1-2 hours: 9 comments
    2-4 hours: 3 comments
    4-8 hours: 12 comments
    8-24 hours: 15 comments
    24-72 hours: 19 comments
    72 hours-1 week: 10 comments
    1 week +: 36 comments
    Same for phalkin:
    0-1 hours: 50 comments
    1-2 hours: 23 comments
    2-4 hours: 16 comments
    4-8 hours: 20 comments
    8-24 hours: 12 comments
    24-72 hours: 2 comments
    1 week +: 6 comments
    And for kaspen:
    0-1 hours: 20 comments
    1-2 hours: 15 comments
    2-4 hours: 18 comments
    4-8 hours: 25 comments
    8-24 hours: 37 comments
    24-72 hours: 8 comments
    72 hours-1 week: 2 comments
    1 week +: 3 comments
    cortex, for comparison:
    0-1 hours: 3439 comments
    1-2 hours: 1193 comments
    2-4 hours: 974 comments
    4-8 hours: 853 comments
    8-24 hours: 1252 comments
    24-72 hours: 302 comments
    72 hours-1 week: 73 comments
    1 week +: 54 comments
    (bet that jump in numbers for the 8-24 hour bracket has something to do with the need to sleep!)
    posted by FishBike at 7:04 PM on August 14, 2009


    Oh-Duh addendum to previous comment about anony posting function: here's an April 17, 2005 It's Back Now post from Matt.
    posted by cortex (staff) at 7:08 PM on August 14, 2009


    Ok yeah, got home and sorting for anon only I show these consecutively:
    16358 17564 15/03/2005 6:15 PM
    17716 17564 19/04/2005 6:42 PM
    The exaggerated effect must be because of smoothing and then a ramp-up as people started to realize anon was back up.

    Thanks both
    posted by vapidave at 10:21 PM on August 14, 2009


    FishBike: Genius Index 2.0

    Kattullus: 39


    Something's up. I just went and did a handcount and my number should be 40. Well, it's 41 now but "41" is today's post, so that has changed since the dump. However, "40" is way old.
    posted by Kattullus at 10:43 PM on August 14, 2009


    And Astro Zombie should be 46. Pastabagel's 42 seems right, as he made a really popular comment recently. Orthogonality should be 40. Miko should be 39.

    Keep in mind though that a) I'm drunk & b) I'm not sure I understand the damn thing.
    posted by Kattullus at 10:51 PM on August 14, 2009


    Speaking of music threads staying open forever... can you guys do that on an ad-hoc basis? Like maybe this thread, for instance? Because I am just in love with everyone doing this dumpcrunching right now. I realize it can continue without this thread, or any thread, but it'd be awesome to have a perpetual discussion area for it.
    posted by SpiffyRob at 5:10 AM on August 15, 2009


    Something's up. I just went and did a handcount and my number should be 40. Well, it's 41 now but "41" is today's post, so that has changed since the dump. However, "40" is way old.

    Yup, as soon as I read this I had a theory what was wrong. Confirmed it just now. Stupidly, the "count how many contributions had more favorites than this one" as a way of numbering them ends up numbering from 0 rather than 1. So line #39 on your list is actually the 40th item. Oops.

    So how come it seems right for some people? Well, since the criteria for making the list is favorites>=position on the list, if e.g. pastabagel's 43rd most popular contribution had 42 favorites, it would still be included and shouldn't be. But since that line gets numbered as "42" it accidentally comes up with the right answer anyway. But only for some people. Like the people I checked to see if this was working right.

    I'm going to re-run it now with a +1 added to the count and that should fix it. Generating the ordered list of qualifying contributions for all users is quite a monster query though and it didn't scale all that well with millions of comments. It takes over an hour to run it, so it'll be a while before I post v2.01 of the index.
    posted by FishBike at 6:48 AM on August 15, 2009


    Genius Index 2.01, now with more accuracy. It looks to me like everybody's number was off by exactly one. There was another mistake in my query in that I only included contributions where the number of favorites was greater than the position on the list, not equal or greater. But since the position numbers were understated by 1, these two bugs offset each other in most cases.
    Anonymous: 51
    Astro Zombie: 46
    Pastabagel: 43
    orthogonality: 40
    Kattullus: 40
    Miko: 39
    cortex: 38
    netbros: 37
    jessamyn: 36
    DU: 35
    blahblahblah: 35
    Greg Nog: 35
    nickyskye: 34
    robocop is bleeding: 34
    NDó: 34
    East Manitoba Regional Junior Kabaddi Champion '94: 33
    Optimus Chyme: 33
    loquacious: 32
    Blazecock Pileon: 32
    adipocere: 32
    posted by FishBike at 8:31 AM on August 15, 2009 [1 favorite]


    Favorites : my friend favorite me, i show metafilter. why no?
    posted by Damn That Television at 8:59 AM on August 15, 2009 [1 favorite]


    > Hmm. There is a "date created" column in the contact stuff; I could consider adding that to the dump.

    What about type of contact? I'm interested in 'met' contacts specifically.
    posted by mathlete at 9:08 AM on August 15, 2009


    Including contact information at all was a borderline decision. We don't take it terribly seriously (and the great-enspousening is obviously pretty silly in its own right), but at the same time raw contact-pair metrics feel to me like a little less of a problem than full XFN dumps; I don't want to make anyone feel overly exposed about some aspect of the relationship-acknowledgment stuff they did via the contact form, etc.

    It's sort of a hand-wavy thing, I know, but that someone straddles the line of what I'm down with including.
    posted by cortex (staff) at 9:56 AM on August 15, 2009 [1 favorite]


    I've always been interested to see an "must accurate/best AskMe answerer" stat - a number of best answers to total number of answers ratio.
    posted by davey_darling at 10:05 AM on August 15, 2009


    davey_darling: jedicus posted that earlier in this thread. I think that's what you were looking for anyway. It's a shame there are not user names on there, just user IDs, but they're not too hard to look up.
    posted by FishBike at 10:12 AM on August 15, 2009


    When I read that, I thought that it implied that it was comparing best answers in a thread against total number of answers in a thread. Am I just being tripped up by the wording?
    posted by davey_darling at 10:16 AM on August 15, 2009


    Maybe jedicus can confirm this, but I read that calculation as:

    best answers by user / total answers by user

    ... but taking into account only those threads where a best answer was awarded at all (e.g. the total isn't the total number of answers given by the user across all threads, but only the total number of answers given by the user in threads where a best answer was awarded).
    posted by FishBike at 10:26 AM on August 15, 2009


    Hey davey_darling, while we have you, can you explain why you favourite all of TPS's shit?
    posted by gman at 10:33 AM on August 15, 2009 [1 favorite]


    Ooh, fun with contacts data. Remember that game, Six Degrees of Kevin Bacon? Well, I thought, how about Six Degrees of FishBike as defined by MeFi contact linkages? So I am degree 0 (self), all my contacts and "contacted by" users are degree 1, their contacts and contacted by's are degree 2, and so on.

    8,289 users are within 6 degrees of FishBike. Way, way too many to post here. Even the 2nd degree of FishBike contains 626 users, so is way too big to post. And the 1st degree is trivial to figure out from my profile page.

    Interestingly, the 7th degree of FishBike has only one additional user in it, MShades, and (one of) the paths between me and that user is:
    FishBike<-amyms->[NOT HERMITOSIS-IST]->M.C. Lo-Carb!<-signalnine->pts->wintersweet->MShades
    ... where -> means contact of, and <- means contacted by. I didn't try to figure out mutual contacts to have them show up as <->, so "contact of" links take precedence.

    Degrees 8+ of FishBike are empty -- 8290 users in 7 degrees is it for me. There are only 8926 users with non-empty contact networks, so most of MeFi is linked to me somehow already.

    This is neat stuff, but I'm not sure what tables I can post. Instead, I guess if anybody wants to know stuff like the size of their contact network, the most distant contact (like MShades is mine), or the path between yourself and any arbitrary user, let me know and I'll run some queries for you later this weekend.
    posted by FishBike at 10:54 AM on August 15, 2009


    Well, contacts is a directed thing. A may choose B as a contact. But B may not reciprocate. This brings up some potentially interesting tables:

    Most Promiscuous Mefite
    Ordered by number of mutual contacts.

    Celebrity Mefites
    Ordered by number of people who contact them - mutual contacts.
    I'd imagine that that list would include people like steve wozniak?

    Groupie Mefites
    Ordered by number of mefites they contact - mutual contacts.
    posted by vacapinta at 12:41 PM on August 15, 2009


    I've also been playing around the degrees of separation thing. I am representing my network as a directed graph. I've only been messing around for a couple of hours, but here is what I've come up with so far.

    I am the starting point and all paths go through contactees. That is, I have a link to little e who has a link to toomuchpete who has a link to FishBike. Right now, I only look for a single path--no duplicates yet.
    user    degrees path
    87842   3       mathlete -> little e -> toomuchpete -> FishBike
    
    1       3       mathlete -> thatelsagirl -> danb -> mathowie
    8       4       mathlete -> terrapin -> nthdegx -> Aaaugh! -> OneBallJay
    16      3       mathlete -> terrapin -> bluishorange -> jjg
    17      4       mathlete -> terrapin -> jessamyn -> ClaudiaCenter -> honkzilla
    27      3       mathlete -> terrapin -> jessamyn -> peterme
    29      3       mathlete -> terrapin -> jessamyn -> camworld
    40      3       mathlete -> little e -> Kattullus -> gac
    44      3       mathlete -> terrapin -> bluishorange -> fraying
    46      3       mathlete -> terrapin -> yerfatma -> davewiner
    53      4       mathlete -> terrapin -> jessamyn -> heeeraldo -> rider
    58      3       mathlete -> terrapin -> bkudria -> jonmc
    61      4       mathlete -> terrapin -> yerfatma -> joeclark -> grant
    91      4       mathlete -> little e -> Lipstick Thespian -> graventy -> jgilliam
    92      4       mathlete -> terrapin -> Eideteker -> plep -> riley370
    93      5       mathlete -> terrapin -> hugsnkisses -> sambosambo -> mimi -> jsapn
    94      4       mathlete -> terrapin -> bluishorange -> delfuego -> mikewas
    98      4       mathlete -> Navelgazer -> Miko -> limeonaire -> <unknown user>
    100     3       mathlete -> terrapin -> bluishorange -> Jeremy
    
    17807   7       mathlete -> zennie -> grapefruitmoon -> joelf -> flipper -> bachelor#3 -> juiceCake -> magog
    ... over 7000 more omitted ...
    
    degrees total users
    0       1
    1       18
    2       279
    3       2619
    4       3467
    5       846
    6       124
    7       18
    total   7372
    
    There are some black holes where a user has linked to someone whose username is not in the infodump. See above user 98. I've put "<unknown user>" in that case.

    More at 11. Or when I have time later this weekend.
    posted by mathlete at 12:45 PM on August 15, 2009 [1 favorite]


    Degrees of mathowie would be interesting.

    mathowie only links to 125 users. So, 125 people have a mathowie number of 1. What does the rest of the network look like?
    posted by vacapinta at 12:51 PM on August 15, 2009


    gman: Hey davey_darling, while we have you, can you explain why you favourite all of TPS's shit?

    davey_darling made an offer that ThePinkSuperhero couldn't refuse because she has no shame.
    posted by Pronoiac at 1:34 PM on August 15, 2009 [1 favorite]


    Alright, that makes it less creepy.
    posted by gman at 1:50 PM on August 15, 2009


    Degrees of mathowie would be interesting.

    mathowie only links to 125 users. So, 125 people have a mathowie number of 1. What does the rest of the network look like?
    degrees total users
    0       1
    1       125
    2       2017
    3       3728
    4       1270
    5       203
    6       23
    7       5
    total   7372
    
    I am getting 7372 for pretty much every user that has more than a few contacts.

    I calculate the total number of users who have links into them as 8059 and the total number of users who have links out from them as 5453.
    posted by mathlete at 2:31 PM on August 15, 2009 [2 favorites]


    Thanks, cortex, vacapinta, FishBike, mathlete and everybody. This is fascinating.

    I'm a little curious to know who has made the most first comments, and who has the highest first comment/comment ratio. Hmm. It would also be neat, along the same lines, to know who has commented on the most deleted posts.
    posted by box at 6:46 PM on August 15, 2009


    I am getting 7372 for pretty much every user that has more than a few contacts.

    When considering links in both directions ('contacts' and 'contacted by'), there is a sort of big island that most users are on. As soon as you are linked to anyone in that big network, your network includes everyone else in it. Since it contains the majority of users, it wouldn't be too surprising to see most users have the same size contact network.

    I think there is more potential for this to vary when looking at links in only one direction. Imagine a case where a user has a single link to the contacts network. From their own point of view, they're part of it, but from everyone else's point of view, they're not (or vice-versa). This doesn't happen when looking at both directions of linkages.

    Anyway, since the big island doesn't include all MeFites with non-zero contact lists, I wonder if there is another fairly significant island that most of the rest are on, but which is disconnected from the main one? If there is, it shouldn't be too hard to discover by randomly testing users who aren't on the big island. Finding one member of another large network should reveal the whole thing.

    Something I will take a stab at tomorrow, unless somebody else does it first (which would be cool, too.) And if such a thing is found, there is fairly staggering potential for one person to link up the two islands by adding one more user to their contact list.
    posted by FishBike at 9:03 PM on August 15, 2009


    I think there is more potential for this to vary when looking at links in only one direction.

    I am, in fact, only looking at links in one direction. My network representation is a directed graph. When representing the network as a undirected graph, the total is around 8300.

    My guess is that there is one big network -- the 7372 network -- and numerous much smaller networks to make up the other 700 or so users.
    posted by mathlete at 9:47 PM on August 15, 2009


    > Most Promiscuous Mefite
    Ordered by number of mutual contacts.
    #       num mutual contacts     user
    1       313     jessamyn
    2       223     nickyskye
    3       201     languagehat
    4       185     Brandon Blatcher
    5       164     miss lynnster
    6       159     cortex
    7       156     Miko
    8       153     ThePinkSuperhero
    9       146     Kattullus
    10      145     flapjax at midnite
    11      143     madamjujujive
    12      139     Marisa Stole the Precious Thing
    13      129     not_on_display
    14      126     taz
    15      119     scody
    15      119     Ambrosia Voyeur
    17      115     hadjiboy
    18      112     ColdChef
    19      109     amberglow
    20      103     Meatbomb
    21      101     interrobang
    21      101     melissa may
    23      98      hermitosis
    24      95      rtha
    25      94      DaShiv
    
    posted by mathlete at 10:35 PM on August 15, 2009


    "9th Most Promiscuous MeFite"

    That's so going on my headstone.
    posted by Kattullus at 10:59 PM on August 15, 2009


    > Celebrity Mefites
    Ordered by number of people who contact them - mutual contacts.
    I'd imagine that that list would include people like steve wozniak?
    #       num celebrity contacts  user
    1       267     jessamyn
    1       267     mathowie
    3       251     Anonymous
    4       198     languagehat
    5       195     asavage
    6       183     cortex
    7       159     loquacious
    8       151     grumblebee
    9       148     stavrosthewonderchicken
    10      140     Miko
    10      140     Mutant
    12      138     quonsar
    13      135     Astro Zombie
    14      130     jonmc
    15      125     amberglow
    16      123     ikkyu2
    17      122     orthogonality
    18      118     Pastabagel
    19      111     ThePinkSuperhero
    20      108     robocop is bleeding
    21      107     klangklangston
    22      104     Blazecock Pileon
    23      101     DaShiv
    24      100     ColdChef
    25      91      orange swan
    ...
    41      60      stevewoz
    

    > Groupie Mefites
    Ordered by number of mefites they contact - mutual contacts.
    #       num groupie contacts    user
    1       289     tellurian
    2       242     toomuchpete
    3       203     jessamyn
    4       181     hadjiboy
    5       145     hermitosis
    6       143     Foci for Analysis
    7       127     ThePinkSuperhero
    8       125     Jofus
    9       114     not_on_display
    9       114     sambosambo
    9       114     schyler523
    12      108     ClaudiaCenter
    13      107     mitzyjalapeno
    14      102     cortex
    15      101     taz
    15      101     amberglow
    17      100     homunculus
    18      97      goodnewsfortheinsane
    18      97      limeonaire
    20      96      LobsterMitten
    21      94      languagehat
    21      94      marsha56
    23      93      ColdChef
    24      91      KokuRyu
    24      91      Taksi Putra
    
    posted by mathlete at 10:59 PM on August 15, 2009


    FishBike: "Anyway, since the big island doesn't include all MeFites with non-zero contact lists, I wonder if there is another fairly significant island that most of the rest are on, but which is disconnected from the main one? If there is, it shouldn't be too hard to discover by randomly testing users who aren't on the big island. Finding one member of another large network should reveal the whole thing."

    ATTN: HIGH COMMAND
    SUBJ: INTERLOPER FOXTROT BRAVO

    HE KNOWS

    PREPARE OBFUSCATORY COUNTERMEASURES

    ---
    AGENT 53126
    CABAL ISLAND - LISTENING POST OMEGA

    posted by Rhaomi at 11:15 PM on August 15, 2009 [4 favorites]


    I'm still buzzed from the meet up. It's amazing enough I wrote code for the above. More tomorrow.
    posted by mathlete at 11:16 PM on August 15, 2009


    Is it possible for an individual user to request his/her stats not appear, at all, in the InfoDump?
    posted by paulsc at 7:32 AM on August 16, 2009


    It's not something that's accounted for in current design of the dump process. Is there something specific you're concerned about?
    posted by cortex (staff) at 8:13 AM on August 16, 2009


    "... Is there something specific you're concerned about?"
    posted by cortex at 11:13 AM on August 16

    I don't participate here to have my contribution stats analyzed, and I don't see that exposing stats connected to my nic for public inspection, nor seeing my nic/user ID coming up in various discussions of rankings in Meta, or not, very much contributes to any useful understanding of the site, as a whole. It's enough that people can pound public search engines with my nic, if they so desire; at least, doing that, they'll be linked to particular comments/posts, and can make of them, in context, what they like.

    I imagine it would be trivial to add a user ID exceptions list as a first step in the process that runs the dump.
    posted by paulsc at 8:37 AM on August 16, 2009


    My general take is that this is all data which could be trivially scraped by non-members independent of the Infodump by someone with malicious intent, and it's the case that doing something meanspirited with such data whether it came from the Infodump or manual scraping is and always has been something we're very much not okay with, and would mean serious problems for any member caught doing so. We've intentionally excluded some data that, while still easily scrapable, seemed like it was pushing too far into the realm of genuinely personal in one respect or another.

    There's a root problem here of cost/benefit that comes with public participation on a website that filtering the Infodump doesn't really address, and as far as concerns about that go I'd say the ugly downsides of that have more to do with someone indeed pounding search engines for text content than with abstract crunching of text-free skeletal data. That was true before the Infodump existed and would be true if it disappeared entirely—crappy motives are format-independent, and there's no option to opt out of metafilter visibility itself after the fact.

    But I understand that we might just disagree about this stuff. It's not something I'd like to implement, but if it's something you feel very strongly about in practice, I can look into it.

    Excluding lines from the commentdata and postdata files where the respective comment or post is attributed to your userid would be a gimme, as far as that goes.

    Other statistics that involve the interaction of two users are less clearcut—a contacter/contactee relationship belongs just as much to one user as it does to the other, ditto faver/favee relationships. At that point we'd be blowing holes in a bunch of other folks' activity by running an aggressive exclusion filter, which goes beyond just opting one's own data out and moves from a matter of personal inclination into the realm of imposition on others.
    posted by cortex (staff) at 8:59 AM on August 16, 2009


    A middle ground might be that a user ID exception list entry just produced a munge for that nic/user ID in InfoDump. All user info would still contribute to aggregate uses and totals, and there would be whatever ongoing two way activity relations you care to support -- but one side of those relations, and all personally identifiable references to nic/user ID entries in the user ID exception list would just be a munge in the InfoDump. How about that?
    posted by paulsc at 9:16 AM on August 16, 2009


    I don't participate here to have my contribution stats analyzed, and I don't see that exposing stats connected to my nic for public inspection, nor seeing my nic/user ID coming up in various discussions of rankings in Meta, or not, very much contributes to any useful understanding of the site, as a whole.

    I'm glad this has come up, because I think it's a valuable discussion to have. I'm actually not sure how to interpret this comment, though.

    Taken literally, I would conclude that you don't think these statistics serve any useful purpose or benefit. In which case, I mostly agree (subject to potential disagreements about whether or not "fun" counts as useful or beneficial). But that seems like a fairly weak argument against doing this.

    I think that maybe the implication is supposed to be that this type of analysis is actually causing harm of some sort, perhaps by encouraging the wrong sort of behavior on the site, or by revealing something private about specific users that would have been difficult to stumble across any other way. This would be potentially a much stronger argument for at least changing something in this process.

    So, I wonder if you could clarify whether the former or latter interpretation is what you meant (or something else entirely)? And if you think there are harmful results of the information being posted here, could you be more specific about what harm you see being done? Depending on what it is, of course, it may be unfair to ask you to post specifics here.
    posted by FishBike at 9:18 AM on August 16, 2009


    A middle ground might be that a user ID exception list entry just produced a munge for that nic/user ID in InfoDump. All user info would still contribute to aggregate uses and totals, and there would be whatever ongoing two way activity relations you care to support -- but one side of those relations, and all personally identifiable references to nic/user ID entries in the user ID exception list would just be a munge in the InfoDump. How about that?

    That'd be a happier compromise, yeah. Throwing an excepted userid through some sort of hash to produce an unambiguously fake userid, leaving the data in place for analysis but thus dissociated from the actual account's id. I'll think about how to best implement it and make it function well for a potentially dynamic list should anyone else ever voice similar reservations.
    posted by cortex (staff) at 9:32 AM on August 16, 2009


    > Anyway, since the big island doesn't include all MeFites with non-zero contact lists, I wonder if there is another fairly significant island that most of the rest are on, but which is disconnected from the main one? If there is, it shouldn't be too hard to discover by randomly testing users who aren't on the big island. Finding one member of another large network should reveal the whole thing.

    table which includes every one with contacts outward

    > This brings up some potentially interesting tables:

    full tables of the excerpts I posted earlier

    note: new domain. dns may not have propogated for you yet.
    posted by mathlete at 9:42 AM on August 16, 2009


    "... And if you think there are harmful results of the information being posted here, could you be more specific about what harm you see being done? ..."
    posted by FishBike at 12:18 PM on August 16

    Take chrisamiller's Best Answer/Day ranking, where my nic pops up #2. What does anyone learn, that is at all useful from that, given that there are methodological problems with the data that chrisamiller mentions, that a "Best Answer" is a pretty subjective thing, given entirely by the Asker, who many think might not even be the best person to give that, and that many questions never have a "Best Answer" chosen by an Asker, and others have every answer marked "Best?" I suppose you could infer I comment in AskMe a lot, but that is about all that anyone could responsibly infer from that "metric," if they knew anything about the site. I certainly don't want anybody thinking, mistakenly, that the quality of answers I post is better than other contributors, because it isn't. People disagree with me in AskMe all the time, and I suppose many people familiar with the site discount 90% of what I do post, considering the source. That's fine by me.

    I just think, that if you are producing a corpus that supports such calculations, it would be better for me to appear, if I must, as "munge101" or some such; I gain nothing I seek by appearing as paulsc.
    posted by paulsc at 9:42 AM on August 16, 2009


    Once you hit that 'post' button, you no longer have any reasonable expectation of privacy. As cortex said - it's nothing that anyone else couldn't do with a web-scraping script and a little bit of time.

    I realize that hindsight is 20/20, but one of the great lessons being learned in this age of connectedness is that if you didn't want people to read it, analyze it, and chop it up looking for trends, you shouldn't have posted it on the web.
    posted by chrisamiller at 9:43 AM on August 16, 2009


    I think these stats are not only helpful, but an interesting online adaptation of the way that we build mental maps of social relationships and hierarchies in online communities. These things are present and have conventionalized ways of developing in face-to-face communities such as work environments, social clubs, regulars at gyms, classrooms and bars, etc., but the limitations of online communities make this somewhat different. I would imagine that this curiosity and desire for social structure is part of what drives interest in these types of stats here. That, and we're all nerds and tend to make mini-games out of EVERYTHING. But maybe that's just me.
    posted by iamkimiam at 9:48 AM on August 16, 2009


    Munging the user id and user name in the Infodump only helps a little, I think. There are all kinds of truly trivial ways to find out who they are anyway. For instance, just find the comment data for one comment from an anonymized user. Now you know the thread they commented in and the exact time of their comment. Go look up the thread and find their comment, and now you know who they are.

    I'm saying this not to give people an easy recipe to de-anonymize specific users if this ever gets implemented. My point is just that it's hard for me to think of problems that are big enough to do anything about, yet small enough that such a trivial speed bump would prevent people from looking up the real user name anyway.

    As various large companies have found out the hard way, truly anonymizing this type of data is very hard.
    posted by FishBike at 9:50 AM on August 16, 2009 [1 favorite]


    Oh, I also just wanted to add that I am sensitive (I think) to the concerns about attributing too much meaning, or the wrong meaning, to the statistics and lists we are producing. And about posting things that are fundamentally creepy even when interpreted correctly.

    See my earlier comment about how there are a few users with 1,000+ comments but no favorites received on any of them. There's no way I'm posting their user names because I'm sure someone will feel it's a kind of "you are the worst commenters on MetaFilter" list. When it could be that they're just users who have been inactive since before favoriting was implemented.

    That's also why you won't see a lowest favorites/comment list from me, and hopefully we won't see a lowest "best answer" ratio list either. Maybe the best solution isn't to allow specific users to opt out (given how hard that is to do properly) but to have good discussions about what should and shouldn't be posted?

    Just a thought...
    posted by FishBike at 9:56 AM on August 16, 2009


    table which includes every one with contacts outward

    mathlete, does the existence of columns out to 13 there suggest that there in fact a few cases of folks with chains significantly higher than 7? I'd look at those folks as potential island-gappers as far as the "small second island" (or, I'm betting, more an archipelago of very small islands) analysis goes. Though I'm not sure from what you were doing earlier whether that's actually possible.

    What does anyone learn, that is at all useful from that, given that there are methodological problems with the data that chrisamiller mentions, that a "Best Answer" is a pretty subjective thing, given entirely by the Asker, who many think might not even be the best person to give that, and that many questions never have a "Best Answer" chosen by an Asker, and others have every answer marked "Best?"

    What anyone learns from that particular bit of calculation is probably up to the person. They may learn something about methodology, or about the connotation vs. denotation of aggregate numbers; they might learn to look at the meaningfulness or validity of their existing assumptions about answer rates or Best Answers or what have you in a different way than they did before; they might be inspired to try and more thoroughly pursue an analysis of some subset of the original analysis and hence advance our collective insight into the data.

    But more generally, it's a wealth of interesting emergent data about a place a lot of folks care about, and, like iamkimiam says, we're nerds who think data is fun. I don't think the BA/day analysis is particularly insightful, but the BA stuff is brand new to this release of the data and so folks who were already excited about the Infodump in previous incarnations are digging in and playing with the obvious stuff just for the heck of it. It doesn't need to serve any great purpose, and I don't think most people playing with the data are doing so with the intent to either help or hinder any particular mefite.

    Framing it in terms of whether it helps you gain something you seek seems kind of off-point, all else aside; very, very little that happens on metafilter is in anyway intended to help any single specific mefite gain something they seek—there's tens of thousands of us, and mefi is mostly a place that is more than a place that does. Askme is a nice exception of something explicitly utility-driven, but even then the point is to help people get their questions answered, not to help one specific user get ahead on that front.

    I certainly don't want anybody thinking, mistakenly, that the quality of answers I post is better than other contributors, because it isn't.

    I don't think that things like this are all that realistic a concern, for what it's worth; none of the produce of the Infodump is going to be sanctioned as Official Facts About Who Is Good At Metafilter, certainly, and we're not going out of our way to broadcast these stats or analysis to unclueful masses regardless. If someone locates and digests without thought or reservation some dry stats posted in Metatalk or on the wiki, they probably have analytical problems that transcend any particular numerical misconception about one mefite or another, basically.
    posted by cortex (staff) at 10:01 AM on August 16, 2009


    I don't think the BA/day analysis is particularly insightful

    Nor do I, for what it's worth. I had about 10 minutes to kill while a script ran and threw up the first stat I thought of. I dont' see it as a leaderboard, or anything like that, and I would actively protest the creation of any official metric like that. We don't answer questions for the points, we answer them because we have insight to share with the community (or at least, we think we do).
    posted by chrisamiller at 10:19 AM on August 16, 2009


    mathlete, does the existence of columns out to 13 there suggest that there in fact a few cases of folks with chains significantly higher than 7? I'd look at those folks as potential island-gappers as far as the "small second island" (or, I'm betting, more an archipelago of very small islands) analysis goes. Though I'm not sure from what you were doing earlier whether that's actually possible.

    Yes. The most degrees of separation any one has is 12, though that group is quite small. (The 13 degrees column is all zeros.) However, it does not seem to me that those people are on another island. Rather, it looks like they contact a single person who contacts a single person who then contacts a single person who then contacts someone who has many contacts.

    To me, there appears to be three categories:
    • total >= 7372 -- the big island. those with more than 7372, I believe, have one way bridges to the smaller archipeligo of very small islands.
    • 1 < total < 10 -- the smaller archipelago of very small islands. not sure how many.
    • total == 1 -- no contacts outward.
    posted by mathlete at 10:31 AM on August 16, 2009


    "As cortex said - it's nothing that anyone else couldn't do with a web-scraping script and a little bit of time. "
    posted by chrisamiller at 12:43 PM on August 16

    I haven't spent much time looking into the InfoDump, to see for myself what is included, so perhaps I should. But lacking detailed familiarity, it seems to me that what's different about the InfoDump is that you're publishing transactional detail, and handing it out to passerby, that they'd have more than a little difficulty compiling themselves, just by screen scraping or index harvesting. You're making it far more convenient for people to suss out relationships that I think might be tough to pound out of Google's general index, as for example, your Best Answer/Day metric.

    Accordingly, I think a mechanism like a user ID exception list, or at the very least, some kind of effective munge scheme, is a pretty reasonable accommodation to provide, for those not interested in going down such paths, as easily as lambs led to water. I say this with fishbike's comment about the difficulty of anonymizing such data fully in mind.

    "... I don't think most people playing with the data are doing so with the intent to either help or hinder any particular mefite. ..."
    posted by cortex at 1:01 PM on August 16

    I'm not saying that there are nefarious forces at work on dastardly schemes, but seeing as how any passerby that wants it can download the InfoDump files, how do you really know who is playing with it, or what their intent might be? And so why would you just toss in new fields, like Best Answer, with really vague meaning and inconsistent usage on the site? It's that kind of "GEE WHIZ, GUYS, LOOK WHAT WE GOT" attitude concerning future revisions/data additions to InfoDump that concerns me. Presumably, the data you're posting in InfoDump should go to some beneficial purpose that offsets the possibly low risk of it being misused, to the extent that it can be.

    If it's just for fun, please take me out.
    posted by paulsc at 10:36 AM on August 16, 2009


    Presumably, the data you're posting in InfoDump should go to some beneficial purpose that offsets the possibly low risk of it being misused, to the extent that it can be.

    Well, but also, sometimes data is published and that data is then used to some interesting purpose by somebody else. Metafilter is unique in the online social networking sphere and may provide some insights about how a community grows and develops.

    To give a concrete example, the flickr api allowed users such as GustavoG to capture early portraits of how flickr really took off, fueled by a network of bloggers and by... people from arab countries! Here's one of his beautiful graphs.

    So, I'd argue that this data is not for fun, but for knowledge.
    posted by vacapinta at 10:47 AM on August 16, 2009


    And so why would you just toss in new fields, like Best Answer, with really vague meaning and inconsistent usage on the site?

    Well, folks have been asking for a direct implementation of a BA count on profile pages for a long time now. We've never done that, not because we thought the data was supposed to be secret but because Matt never crossed the threshold of actively deciding he wanted to make that change.

    Including the info in the Infodump let's curious souls play around with the data, which is a middle-path approach to the whole thing and will hopefully let those who would otherwise be inclined to go scraping 100K+ askme threads get their nerd on without sucking up a bunch of our bandwidth in the process.

    It's that kind of "GEE WHIZ, GUYS, LOOK WHAT WE GOT" attitude concerning future revisions/data additions to InfoDump that concerns me.

    To be clear, this is the first major revision to what was included in the dump since it launched a year and a half ago, and accounts for basically every addition request that's been greenlit (and excludes various things requested but refused) in that interim. BA data was one of the frequent requests from the data nerds here; metatalk tags were another, post titles were another yet. There's nothing else on the table at this point that's come up and hasn't been either included with this revision or plainly rejected.

    If we make further additions, it'll probably be of data new to the site, not more of what's already here, and like in previous passes we'll probably talk about it a lot to figure out what works and what doesn't. Nothing's getting added willy-nilly.

    Presumably, the data you're posting in InfoDump should go to some beneficial purpose that offsets the possibly low risk of it being misused, to the extent that it can be.

    It's hard to quantify "some beneficial purpose" here; it's very beneficial to those of us who value being able to look at the quantitative skeleton of this community in new ways, while it sounds like it has no perceived benefit to you personally, and I'm not sure there's a clearcut way to (oh thematic irony) establish something like a median beneficialness as far as that goes. But it's certainly A Good Thing for at least a subset of folks here, and the things that would make it a potentially bad thing (malicious intent, essentially) are in no way dependent on the Infodump to manifest themselves.

    That said, I do hear you, and will actively look into munging implementation.
    posted by cortex (staff) at 10:56 AM on August 16, 2009


    table which includes every one with contacts outward

    same table except with network represented as undirected graph
    posted by mathlete at 10:59 AM on August 16, 2009


    As a consumer of the Infodump files, having the user IDs and names of certain users anonymized would certainly be fine with me. The process for me would not have to change at all.

    A little checkbox on the preferences page along the lines of 'anonymize my records in the Infodump' might be a good way to give people that option, near where similar options are given about hiding or showing certain information.

    Of course I say this as a person who a) doesn't have to implement the checkbox and b) doesn't have to implement the munging algorithm for the user name column (of which there is one) and user ID columns (of which there are many, in many tables). ;)

    If people who are concerned with their names appearing in Infodump analysis posts are happy with the level of anonymity provided by that, it sure sounds like a good solution to me. Yes it would still be easy for someone to figure out who you are if they want to, but I would feel better about at least not handing them the information on a plate if people do not want to see their names posted here.
    posted by FishBike at 10:59 AM on August 16, 2009


    So, I'd argue that this data is not for fun, but for knowledge.

    Yeah, it's natural for me to describe analysis stuff as "ooh, nerd-fun", but it can be in practice a kind of productive fun that I don't associate with e.g. sitting down to play some Geometry Wars on my DS. Building a community website is a certain kind of fun, doing ethnographic research is a certain kind of fun, etc.

    I mean neither to canonize datawankery as being automatically Important nor to imply that productive research is automatically genuinely fun, but there's some semantic overlap here between the two that carries it a bit beyond huffing paint as far as the production of useful artifacts goes.
    posted by cortex (staff) at 11:00 AM on August 16, 2009


    "... So, I'd argue that this data is not for fun, but for knowledge."
    posted by vacapinta at 1:47 PM on August 16

    So how is something as inconsistently applied, as little understood, and as capriciously used when it is used, on the site itself, as "Best Answer," ever going to be statistically valid knowledge to the greater world? That's a thin, thin bit of vapor you're blowing, vacapinta.

    And anyway, all my contributions, total, are such a small bit of the site, that excluding them is no great statistical damage to the corpus. If it's a small accommodation to make, like an exception list in the generation of the thing, why not offer it to those few iconoclasts like me, who might request such?

    On preview:

    "That said, I do hear you, and will actively look into munging implementation."
    posted by cortex at 1:56 PM on August 16

    Thank you, cortex. Please keep us posted as to your results. And my sincere wishes that good uses be found for InfoDump, including recreational uses by other members of the site. Recreational uses by the public, I've got a harder time getting behind...
    posted by paulsc at 11:03 AM on August 16, 2009


    Of course I say this as a person who a) doesn't have to implement the checkbox and b) doesn't have to implement the munging algorithm for the user name column (of which there is one) and user ID columns (of which there are many, in many tables). ;)

    Heh. Well, the munging isn't a biggie—done right, it's a one-time implementation, and doing exception checking on every row will slow down the code a bit but shouldn't be Hard unless I'm making some bad assumptions.

    The profile-checkbox idea is more of a general mefi UI issue, and would be a lot more to ask for—not just for the html on the page and the db management behind it, but in terms of user education and additional support.

    Simpler would be to just include information about the munging possibility in discussions about the infodump or on dump's page, and let those who are concerned contact me for prompt addition to the exception list. It's not something that has come up at all in the previous 18 months, so the likelihood that it'd become a volume issue is pretty darned low.
    posted by cortex (staff) at 11:05 AM on August 16, 2009


    I'm going to be honest with you here, paulsc. If I saw that users were being obscured in the infodump, one of the first scripts I'd write would be one to deanonymize all of those users, so that my other analyses wouldn't be crippled. Truly anonymizing this data is a very hard problem and you're not likely to find a good solution, when all of the data is already out there.

    I'd also point you to some of the sites that were doing mefi stats long before the info dump came around. Waxy has been doing this for years, using nothing but a little screen scraping. No, his stats aren't as user-specific or detailed as these, but it's only a short and simple hop to get the kind of data being posted in the dump.

    Presumably, the data you're posting in InfoDump should go to some beneficial purpose that offsets the possibly low risk of it being misused

    I'd argue that these insights into online communities are scholarship, and less impressive analyses have certainly been used as dissertation topics in various social sciences.

    I'd also ask you to come up with some examples of how this might be misused to your detriment. I mean, anytime we put something on the web, there's an implicit understanding that the information is no longer under our control. Are you worried about your boss cross-correlating your posts with times you're supposed to be working? Or does this just inspire some general icky big-brotherish feeling in you? If it's the latter, then I don't know what to tell you, because, man, society is becoming more open, and short of dropping off the grid, there's very little we can do about it.
    posted by chrisamiller at 11:09 AM on August 16, 2009 [1 favorite]


    So how is something as inconsistently applied, as little understood, and as capriciously used when it is used, on the site itself, as "Best Answer," ever going to be statistically valid knowledge to the greater world? That's a thin, thin bit of vapor you're blowing, vacapinta.

    I was referring to the dataset as a whole and the value in making it public. I offered no opinions as to the value of "Best Answer."
    posted by vacapinta at 11:10 AM on August 16, 2009


    So how is something as inconsistently applied, as little understood, and as capriciously used when it is used, on the site itself, as "Best Answer," ever going to be statistically valid knowledge to the greater world? That's a thin, thin bit of vapor you're blowing, vacapinta.

    From a data geek's perspective, that's a kind of a backwards way of looking at it. The fact that BA data is so muddy and overloaded with conflicting interpretation is part of what makes it an interesting prospect: exploring the ways in which those inconsistencies manifest themselves, trying to make some analytical sense of the different interpretations as they affect the use of the feature, trying to identify and partition distinct classes of BA-deployment, all are interesting possible approaches to it.

    Again, the BA/day stuff is (as chrisamiller has himself acknowledged) just a bit of a first-blush lark, but that's not the standard by which to judge the potential utility or depth of the analysis that could come out of such data, anymore than the first tentative strums on a guitar are the measure by which to judge the instrument's potential. There are rich, fascinating veins in here, waiting to be mined, but that takes time and we're all inclined to stretch our legs a bit first and pluck at the low-hanging fruit.
    posted by cortex (staff) at 11:11 AM on August 16, 2009


    The profile-checkbox idea is more of a general mefi UI issue, and would be a lot more to ask for—not just for the html on the page and the db management behind it, but in terms of user education and additional support.

    I guess it depends how many people want to make use of the capability. At a certain point, it should be easier to completely automate it than to have to manually add an entry to a table every time someone asks. But I'm not anticipating a lot of people asking, either.

    Which raises another issue. What if paulsc is the only one who asks? I'm not kidding -- if that's what happens, every time we see 'MungedUser101' won't we know that's him?
    posted by FishBike at 11:13 AM on August 16, 2009


    Hey, are you guys just trying to sell me a hundred sock puppet accounts?

    Seriously, if a munge isn't effective user ID obfuscation, then just take me out, please. I'll never be a large enough part of the site to be statistically significant to the dataset as a whole. You'll never miss me, I promise.
    posted by paulsc at 11:21 AM on August 16, 2009


    Which raises another issue. What if paulsc is the only one who asks? I'm not kidding -- if that's what happens, every time we see 'MungedUser101' won't we know that's him?

    We, the folks in this thread? Yes. But I'm guessing paulsc's concern isn't really that one of the dozen or so particularly chattery geeks is out to get him.

    We, mefites in general? Not if they don't go looking, which a casual observer wouldn't.

    We, the potentially malicious jerks in our midst? Yes, they could figure out it's him, but if they're malicious jerks then this is, again, not something that has much to do with the Infodump.

    I'm going to be honest with you here, paulsc. If I saw that users were being obscured in the infodump, one of the first scripts I'd write would be one to deanonymize all of those users, so that my other analyses wouldn't be crippled.

    The munging would be done in a non-crippling fashion; your data would now just include a Mysterious User instead of paulsc. The polite quid-pro-quo here to folks accepting a munging compromise would be other folks being willing to nod and go along with that.
    posted by cortex (staff) at 11:27 AM on August 16, 2009


    Would simply changing the username for that user id in the dump to something else be acceptable?

    This sounds a bit like "favorites are killing the site."
    posted by Pronoiac at 11:47 AM on August 16, 2009


    I've started a little Charts and Graphs page, since we can't post images in a comment directly. The first one is a scatter plot showing contributions on the blue vs. the green. Every dot on the chart represents one user. It's a log/log plot to expand the low end of the scale, where most people are. I'm not sure if it's omitting the users with a 0 contribution count for one or the other, but I think it is.

    I wondered if there would be one cloud of dots more or less along one axis, or several distinct clouds along different axes... looks to me more like the former at the moment. Maybe a plot of of AskMe vs. MeFi contribution ratio distribution would be interesting?
    posted by FishBike at 1:03 PM on August 16, 2009


    "... The polite quid-pro-quo here to folks accepting a munging compromise would be other folks being willing to nod and go along with that."
    posted by cortex at 2:27 PM on August 16

    Would that polite society also include some means of restricting distribution of the InfoDump to the dozen or so researchers/chattery geeks whose continued goodwill and confidence we (meaning, of course, first and foremost, me, but then, also, the larger MeFite user community) could continue to expect? I mean, as opposed to distribution: world from the current download Web site? Because, while I don't want to go into this in a lot of depth, I think there is a substantive difference in running a Web site, which might be scraped by an outsider for user data, and then having to choose to allow or deny such automated access, and field any user complaints about what is done with that data by a third party, as opposed to packing it up yourselves, including fields that might not be exposed in dynamically generated pages, and putting it up for download by anyone who wants it.

    I don't want to get all hypothetical, nor suggest that "chattery geeks" should either, with regard to the data's possible nefarious uses. And I'm not unduly hand wavy, or paranoid. I'm open to reasonable accommodation of what seems to me a reasonable request, and a munge among known gentlepersons, with some promise that only those known gentlepersons will get the data, seems reasonable to me.

    What goes up under my name, I'll stand to. But analysis did not go up under my name, and my name is not a statistically worthwhile aspect of understanding how the dynamics of this site operate. And that goes double for unknown passerby.
    posted by paulsc at 1:06 PM on August 16, 2009


    Would that polite society also include some means of restricting distribution of the InfoDump to the dozen or so researchers/chattery geeks whose continued goodwill and confidence we (meaning, of course, first and foremost, me, but then, also, the larger MeFite user community) could continue to expect?

    My bid for politeness was directed at chrisamiller, not at you, but beyond that, interest in the data is already a powerfully self-selecting thing: the only practical way to know it exists is to be a regular metatalk reader or otherwise attentive enough to metafilter to notice something that only gets mentioned outside of metatalk comments every once in a while. And that sort of readership disposition plus an inclination to play with data makes for a pretty comfortable target audience. We could put this stuff under lock and key, but that seems like an onerous approach to data that is fundamentally available to anybody who wants to go looking for it, as in...

    as opposed to packing it up yourselves, including fields that might not be exposed in dynamically generated pages, and putting it up for download by anyone who wants it.

    To be clear, there is zero data in the infodump that is not openly exposed on mefi to unknown passersby already (and a great deal that is intentionally left out that would be nonetheless trivially scrapable by anyone so inclined). We're reducing the sort of scraping (and, beyond that, duplicated effort by different individuals) that has already traditionally occurred by interested data nerds before the Infodump existed.

    The reason I emphasize that what we're packing up is very skeletal stuff is that it is as a result about as far separated from any potential notion of e.g. prurient interest as I could imagine. That's part of why we're not dumping actual comment or post text, among other things.

    Again, I'm okay with munge solution, and as far as it goes as a way to reduce the casual correlation of activity data with your username I think it's a reasonable compromise. Anything aiming to be more thorough than that is kind of quixotic given the plain visibility of this data on the site itself to interested parties and the practical impossibility of preventing a malicious dataminer from DIYing something far more personal and content-oriented than what's available here. Practically speaking, there is no solution to that problem other than traveling back in time and choosing not to participate on the site.
    posted by cortex (staff) at 1:25 PM on August 16, 2009


    The first one is a scatter plot showing contributions on the blue vs. the green.

    Interesting. I'd be curious to see a version of that binned out by year of join date vs. contributions from late 2003 (when askme came into being) forward; my presumption is that we'd see a swing over time toward the askme side in overall distribution, but I wonder what shape exactly that would take.
    posted by cortex (staff) at 1:29 PM on August 16, 2009


    FishBike: You might introduce different color dots for different densities, like I tried. Or for different years, for cortex.
    posted by Pronoiac at 1:45 PM on August 16, 2009


    So here's how the ratio of MeFi vs. AskMe contribution looks. This comes out okay as a text table, so no need for a fancy graph.

    I am calculating the percentage of the user's total contributions to MeFi and AskMe (combined) that are MeFi contributions. So a 0% score means no activity on MeFi at all (all on AskMe), and a 100% score means all activity is is on MeFi (none on AskMe).

    I've sorted these percentages into ranges, and counted how many users fall into each range:
    0 - 10% : 9510
    10 - 20% : 1249
    20 - 30% : 1117
    30 - 40% : 964
    40 - 50% : 856
    50 - 60% : 1088
    60 - 70% : 960
    70 - 80% : 945
    80 - 90% : 1089
    90 - 100% : 9157
    I find this distribution surprising given the earlier graph. I expected it to look more like a bell curve, but instead it's just about the opposite of that. The only way I could make sense of this is if there are large numbers of users who have 0 contributions on MeFi or AskMe. And indeed, there are! They don't show up on a log/log plot, either, so they are basically missing from the earlier graph.

    Admittedly, a user with 1 comment on MeFi will be shown in the 100% MeFi bucket, and a user with 1 question posted on AskMe will show up as 0% MeFi activity. So how about excluding all those users with less than 100 contributions on the two areas combined:
    0 - 10% : 1166
    10 - 20% : 495
    20 - 30% : 481
    30 - 40% : 467
    40 - 50% : 484
    50 - 60% : 497
    60 - 70% : 511
    70 - 80% : 499
    80 - 90% : 533
    90 - 100% : 1415
    Now it's closer to a flat line, but is still interesting.
    posted by FishBike at 2:05 PM on August 16, 2009


    So, now considering only contributions since AskMe went live, here's the percent MeFi vs. AskMe as an average for all users who signed up in the listed year:
    2000 : 53.32%
    2001 : 58.82%
    2002 : 62.64%
    2003 : 59.63%
    2004 : 59.60%
    2005 : 45.77%
    2006 : 31.67%
    2007 : 30.23%
    2008 : 29.93%
    2009 : 28.46%
    ... and the same again, but only including users with at least 100 contributions in total:
    2000 : 46.99%
    2001 : 51.36%
    2002 : 54.44%
    2003 : 57.10%
    2004 : 54.34%
    2005 : 44.22%
    2006 : 35.36%
    2007 : 36.92%
    2008 : 38.59%
    2009 : 37.19%
    posted by FishBike at 2:21 PM on August 16, 2009


    Oh, and going one level further than this meta-MetaFilter discussion... here's some meta-meta-MetaFilter information: of the 12 visitors to my SQL scripts download page so, 9 used FireFox and 3 used Chrome. Nobody used IE. The correlation of preferred browser use with statistics geekiness is interesting, though perhaps not surprising.
    posted by FishBike at 2:27 PM on August 16, 2009


    Finally got around to reading this thread. Fascinating!
    posted by ThePinkSuperhero at 3:50 PM on August 16, 2009 [1 favorite]


    Now it's closer to a flat line, but is still interesting.

    Yeah, that's pretty neat. I'm trying to think of ways to quantify that sort of cross-site thing better or in more detail.

    For one thing, I would if it'd be useful to do a weighted version of that askme-vs-mefi percentage thing, assigning each person in each bucket a weight of total number of contributions maybe. You wouldn't have to exclude the low-comment-count folks at that point, either, since they wouldn't be overrepresenting their total activity.

    Looking at the same thing in terms of ask/meta and mefi/meta would be interesting. I did some rough analysis of the blue/green/grey activity habits question a while back, but I'd be curious to see how it looks in more detail—the Venn diagram approach captures absolute activity but not proportional activity, and seeing how folks proportionally split their time across the three subsites would be a nice expansion of that line of thinking.

    Are there, for example, subgroups of people who spend a lot of time on askme and a fair amount of time on meta without any significant time on the blue? Or on the blue and the grey but not on the green?

    I have this untested feeling that there might be something to be found in terms of the likelihood of that compared to of people who spend a great deal of time on both the blue and the green but very little on the grey—hazy reasoning being that heavy involvement with just one or the other of the blue and the green is probably accomplishable without needing much awareness of the other, but being thoroughly active on both implies a kind of holistic approach to the site that makes metatalk constituency likely as well. But that's just rash speculation.
    posted by cortex (staff) at 4:07 PM on August 16, 2009


    cortex: "Are there, for example, subgroups of people who spend a lot of time on askme and a fair amount of time on meta without any significant time on the blue? Or on the blue and the grey but not on the green? "

    Just want to add that the answer to this question can vary wildly for any particular user at any point in time for that user's history. For example, when I first joined MeFi, I stuck mostly to the green and was pretty much scared shitless of the blue. Then the election came around and I have slowly gotten sucked into the blue, and have pretty much stayed there since. So, if you look at my stats today, you could assume that I bounce around the sites evenly, since # of posts/comments are pretty comparable. Track this by months/seasons though, and you get a different trend. That'd be pretty neat to see, averaged out for a statistical sampling of users since their inception(s) on the site. Then you'd get some sort of picture of how people tend to experience MeFi, i.e. do they generally start on the green, like I did, and then move over to the blue and grey, etc.? Anyways, just some thoughts.
    posted by iamkimiam at 4:41 PM on August 16, 2009


    That's a good point, iamkimiam. It'd be interesting to graph user activity by subsite against time, sort of see the engagement curves (or blips) as their attention goes from one area to another.
    posted by cortex (staff) at 7:15 PM on August 16, 2009


    list of all users and how many connected to, including degrees of separation
    list of all networks (excluding networks with only one person, i.e. users without outward contacts)

    some more data to help sort through the above list:
    network to total users in the network mapping
    network to total users connected to the network mapping (if a user is connected to a network I mean their outward contacts lead them to every person in that network and that network is the largest such network that satisfies that condition -- one person connects to one network)

    There are 3877 people who connect to the network of 7372. The next most connected networks have 5 people connected to it.

    The largest network by total users has 7384 people in it and 2 people are connected to it.
    posted by mathlete at 8:35 PM on August 16, 2009


    Music
    791.77 hours: not_on_display [132 comments]

    posted by FishBike

    Hey, they were new songs to me!
    posted by not_on_display at 9:59 PM on August 16, 2009


    # of comments it took Fishbike to break the Greasemonkey script which counts comments: 49.
    posted by gman at 4:39 AM on August 17, 2009

    There are 3877 people who connect to the network of 7372.
    I don't understand this. I count 5453 contacters, 8059 contactees, and only 8928 unique users connected to the contact system in total.
    posted by fantabulous timewaster at 4:40 AM on August 17, 2009


    # of comments it took Fishbike to break the Greasemonkey script which counts comments: 49.

    ha! This idiot and his morning eyes didn't realize that was how not_on_display quotes people.
    posted by gman at 4:48 AM on August 17, 2009


    mathlete: "There are 3877 people who connect to the network of 7372."

    Can you explain this to me? Can you give an example of a user who would be "in this network" but not "connected to this network"?
    posted by Plutor at 5:11 AM on August 17, 2009


    It might be interesting to estimate the efficiency of the contact network.

    To do this you would have to come up with some measure of co-interested-ness. For instance, you could divide the number of threads where two users both comment by the total number of threads where either user has commented. This gives a probability that one user will show up in a discussion when the other has. You could also compute the probability of co-favoriting rather than co-commenting. Are people more likely to add cointerested users as contacts? How much?
    posted by fantabulous timewaster at 5:16 AM on August 17, 2009


    Plutor: Can you explain this to me? Can you give an example of a user who would be "in this network" but not "connected to this network"?

    Imagine a network with 100 people connected to it by various contacter/contactee relationships. Imagine a person not in that network with 0 contacts outward and 0 contacts inward. One of those 100 people make a contact to that person not in the network such that the network now has 101 people in it. However, that new person is not connected to the network because they have no outward contacts.

    In short, if you have a contacts in, you can be in the network, but if you don't have any contacts out, you won't be connected to the network.

    fantabulous timewaster: I don't understand this.

    Can you elaborate?
    posted by mathlete at 6:21 AM on August 17, 2009


    You could also compute the probability of co-favoriting rather than co-commenting. Are people more likely to add cointerested users as contacts? How much?

    Ha, that would complement nicely the existing Mutual Appreciation calculations.
    posted by cortex (staff) at 7:28 AM on August 17, 2009


    mathlete: "In short, if you have a contacts in, you can be in the network, but if you don't have any contacts out, you won't be connected to the network."

    Ah, thanks. I think I'll have to go back to discrete math and re-read the chapter on digraphs. I find it surprising that there are so many more people in the network than there are people connected to the network.
    posted by Plutor at 7:31 AM on August 17, 2009


    Sorry: I meant that your big network, plus the number of people who "connect to" it, is substantially larger than the number of unique userids in the contactdata.txt dump. So, as Plutor asked better, you're apparently using "in" and "connected to" in a technical way that I'm not familiar with.

    What you apparently mean is that, of the large network containing nearly everyone, about half (nearly 3900) are contactees, included in the network only by the action of others. This is larger than the number of people who are contactees but not contacters, so it's not entirely due to people simply opting out of using contacts. Interesting.

    It would be neat if the contactdata dump had the timestamp that the contact was made, so you could watch the evolution of the network.
    posted by fantabulous timewaster at 7:37 AM on August 17, 2009


    I find it surprising that there are so many more people in the network than there are people connected to the network.

    I didn't expect that number either, but I'm not really surprised by it per se. Contacting isn't compulsory behavior or even explicitly encouraged, and judging by the raw numbers of folks who have outbound contacts it's only a fraction of the userbase (though probably a very hefty fraction of the more active portion*) that uses it. So folks with the inclination to contact may be contacting folks without that inclination on a regular basis.

    I haven't checked the numbers, but I'd bet a lot of the people in the network but without inbound links to the network are contacted by just one person and contact no one else—needles on our contact cactus, staring out into the desert.

    *That's another thing to look out: how does number of contacts correlate to number of contributions? Generally positively, I'd guess, but I wonder how tight that mapping really is beyond that?
    posted by cortex (staff) at 7:54 AM on August 17, 2009


    Oh thanks, cortex. I didn't remember that FishBike had already worked out co-commenting and co-favoriting stuff. Too bad he confirmed my suspicion that it's computationally hard.
    posted by fantabulous timewaster at 7:56 AM on August 17, 2009


    I didn't remember that FishBike had already worked out co-commenting and co-favoriting stuff. Too bad he confirmed my suspicion that it's computationally hard.

    Some of it is, for sure, although I can recall only one thing I tried where I gave up completely due to that. When I use the phrase "computationally difficult" it just means the query would take somewhere between "longer than I feel like waiting" and "longer than the universe has existed". So it's kind of a vague statement.

    And things are often too difficult to compute only until someone clever enough comes along and figures out an efficient way to compute them. There are much clever-er people on MeFi than me (and even in this thread).

    I don't think it would be at all difficult to compute % of items favorited by people who are contacts vs. people who are not (both in the "things my contacts wrote that I favorited" and "things I wrote that my contacts favorited" senses). It would be interesting to know, for MeFi as a whole, if the % of items favorited is higher for contacts than for non-contacts. If I wasn't at work, I'd be running that query right now.
    posted by FishBike at 8:39 AM on August 17, 2009


    Ok, I have a question about the Infodump:

    Favorites data includes information for three sub-sites (projects, jobs, and travel) in addition to the four main sites (askme, mefi, meta, and music), all in one file.

    Meanwhile, for most other types of data, we've got 4 separate files (askme, mefi, meta, and music).

    I can't seem to locate any data about those first 3 sub-sites in the 4 files (no post data, no comment data, no post titles). The target IDs for favorites with type "project post" (for example) seem to already exist under askme, mefi, meta, and music and none of them are the project in question.

    So my question is, are the posts, comments, titles, etc. for these 3 sub-sites represented in the Infodump in a way I am not figuring out, or are they actually not in there?
    posted by FishBike at 4:07 PM on August 17, 2009


    They're just not there. When I first started in on the dump, I was focusing on the Big Three since that was the vast majority of site activity; I threw in Music because I'm totally biased and also because the table design for it in the db is basically identical to that for the big three and it was trivial to include.

    Projects when it launched back in 2005 had its own idiosyncratic viewable-only-to-the-poster comment style, and when I started in on the Infodump at the start of 2008 it was still using that format (we introduced public comments in May of that year); Jobs has no comments period; and travel never really launched properly in the first place. All three were weird fits for the existing scripts as well as being relatively very low traffic areas, so getting them fit into the data dump wasn't a priority.

    I may go back and look at supporting them; Projects certainly could be incorporated at this point, and I suppose a posts-only dump of Jobs would be possible though I sort of feel like that one's in slightly-personal territory given the practical who-is-trying-to-employ-who nature of the beast.

    Travel will come up for review again if it ever springs to life; at this point doing a dump of a subsite that never really started and has since stopped even twitching feels kind of pointless, since basically all of the data is test posts and outliers.
    posted by cortex (staff) at 4:35 PM on August 17, 2009


    I dipped my toe into the infodump pool this weekend, keeping mainly to the mefi post data to try and determine a heat versus light metric for threads. Basically threads with many comments get a high heat factor and those with many favorites get a high light factor, but I didn't want massive threads like the Sarah Palin one to skew results so I wanted to find out what are large comment and favorite counts for most threads.

    These thread stats show the percentiles of comments and favorites over the years (I based the June start date on the month after favorites were implemented) and I ran with the 99th percentile to calculate heat and light factors. A couple more gory details later, and out popped these hidden gems which are threads shedding lots of light.
    posted by hoppytoad at 4:52 PM on August 17, 2009 [6 favorites]


    Nice, hoppytoad. Looking at the percentile graphs, it's interesting to me how cleanly the various percentiles track. I guess that's not really surprising, but I suppose it says something about the regularity of the distribution of activity even as the volume increases.

    That jump between the June 04 and June 05 strikes me as aligning perfectly with both the re-opening of signups (greetings, 17kers and friends) and the 04 election cycle (greetings, way too much political hair-tearing on the blue). I wonder if the big jump between June 07 and June 08 can be attributed largely to the heated, early-start primary season—that's all pre-Palin, so it can't be blamed on her, outlier threads or not.

    Favorite distribution being nearly flat for the last two years does confirm somewhat my suspicion that growth on that had trailed off after the first year of folks really getting to know the feature and integrating it into their daily habits. I think there's more meat to chew on re: the proportional distribution of favorites compared with other site activity over time, actually.

    But that hidden gems list is great, and that goes back to something I'd been talking about in Metatalk early this year regarding the idea of "referenceability" of askme threads as an explanation for high favorite counts: some threads look, regardless of your interest in commenting, like they'll be useful. The links look like nice reference material for a rainy day, etc. And a lot of stuff on your list looks like just that sort of thing.
    posted by cortex (staff) at 5:07 PM on August 17, 2009


    Thanks, cortex.

    I had to do some transformation on the favorites data anyway, to match up with the way I merged all the comments and posts data together from the big 4 sites into one table. So it was easy enough to exclude favorites for those excluded sites in the following analysis.

    WARNING: Huge and astonishing numbers ahead.

    Here is my very preliminary analysis of the "do people favorite stuff written by their contacts more than they favorite stuff in general?" question.

    I looked at how many favorites everyone could have given ("favoriting opportunities") vs. how many they actually gave. Then I looked at how many favorites everyone could have given to only things written by their contacts, and how many favorites they actually gave to those things.

    Here are the numbers and percentages:
    Favoriting opportunities: 230,296,321,224 (yes, 230 billion!)
    Favorites given: 2,410,574
    % of favoriting opportunities favorited: 0.001047%

    Favoriting opportunities for contacts contributions: 169,951,333
    Favorites given to contacts contributions: 122,111
    % of favoriting opportunities for contacts contributions favorited: 0.071851%

    MeFites are, overall, 6864 times more likely to favorite contributions by their contacts vs. contributions in general. Wow!
    The main problem with this analysis is that it seriously overestimates the number of favoriting opportunities. That number of 230 billion is how many favorites there would be if every user went back and favorited every post and comment on the entire site. Similarly the 169 million number assumes every user could go back and favorite every post and comment by every one of their contacts.

    Next (maybe tomorrow) I am going to work on defining a "favoriting window" for each user, that is a date range from their first favorite to their last. Only contributions made during a user's favoriting window will count as a favoriting opportunity. This should better account for:
    • Favoriting functionality not being there from day 1 of Metafilter.
    • Users generally not going back and favoriting things from before their sign-up date (but also taking into account those who did that)
    • Users who are no longer active on MeFi
    • Users who don't use favoriting functionality (or don't use it any more, or didn't use it from the start, etc.)
    posted by FishBike at 5:08 PM on August 17, 2009


    So far most of these analyses seem to be favoring extremes -- most, least, highest, lowest. How about average? Can one of you calculate the average # of posts/comments/favorites/contacts across sites, find the mean, then list the 25 or 50 "most average" users? It'd be interesting to see what is smack dab in the middle of user participation.
    posted by Devils Rancher at 5:26 PM on August 17, 2009


    Holy butts, FishBike. I look forward to the tightened up analysis, yeah, but that looks like an interesting effect regardless.

    However, either I'm misunderstanding or you slipped a decimal somehow (maybe in switching between percentage and raw decimal representation?) because I'm seeing the factor being 68.6, not 6.8K. Still a great big number, but not an insane one.

    One benefit that would come from having the creation data info in the contactdata file is that you could break analysis of this behavior into per-user before-and-after chunks to look at whether favoriting seems to predict contacting or vice versa at all.
    posted by cortex (staff) at 5:34 PM on August 17, 2009


    Average/median stuff would totally be worth looking into, yeah.

    I have a grand notion to eventually put up a companion page to the Infodump that would have something like a daily/weekly/monthly/yearly site activity zeitgeist—commenting and posting activity, popular askme categories, maybe some more granular tag analysis, trending (up and down) vocabulary over time. I think representing not just the extremes but the middle-of-the-road numbers in that context would be really important and make for a nice snapshot of how the guts of mefi are currently churning.

    Obviously the word-content analysis would have to be something done behind the scenes, but almost all of the rest is practically calculable from Infodump data. If anyone is interested in putting together ideas for autogenerated, nice looking graphs that could get spat out easily by a perl script, I'm all ears. I should also talk to pb about it, actually, since he's thrown together a few nice graphs in the past and may have good tools available.

    But, basically, ideas for what would be good on such a snapshot page are welcome. I don't know for sure when or in what form it would happen, but it's something I find exciting as an idea.
    posted by cortex (staff) at 5:38 PM on August 17, 2009


    However, either I'm misunderstanding or you slipped a decimal somehow (maybe in switching between percentage and raw decimal representation?) because I'm seeing the factor being 68.6, not 6.8K. Still a great big number, but not an insane one.

    Guilty as charged. I had accidentally formatted that cell as a percentage, and with the huge number of digits after the decimal, totally didn't see the % sign at the end. So yeah, it's 68.6x more likely and not 6800x. Nevertheless, I was prepared to be wowed by a number like 2.

    One benefit that would come from having the creation data info in the contactdata file is that you could break analysis of this behavior into per-user before-and-after chunks to look at whether favoriting seems to predict contacting or vice versa at all.

    I had that thought earlier today, and was going to include it in the posting of the tightened-up analysis. Spooky.

    Although it wouldn't be perfect as people can delete and re-create contacts at will... so we wouldn't know for sure that a person wasn't a contact of another person at the time of a specific post or comment. But we can be sure about the ones who were contacts prior to the contribution, and who still are now, which is probably good enough.
    posted by FishBike at 5:43 PM on August 17, 2009


    hoppytoad, very nice. I also have a thought about a giant pile-o-links, which prompts me to ask: what's a good set of regular expressions for making canonical URLs from the posttitles dumps? It looks like something along the lines of s/[^\w\s]//g; s/\s+/-/g; gets most things right, but maybe there are some edges. This would be nice to get right since my browser remembers the canonical URLs and will properly color the ones I've visited, which isn't the case with the recently-encouraged http://metafilter.com/mefi/NNNNN scheme.

    FishBike, interesting.
    posted by fantabulous timewaster at 5:43 PM on August 17, 2009


    Sorry: I meant that your big network, plus the number of people who "connect to" it, is substantially larger than the number of unique userids in the contactdata.txt dump. So, as Plutor asked better, you're apparently using "in" and "connected to" in a technical way that I'm not familiar with.

    Sorry for confusing everyone with my terminology. It is not technical. I am just lazy and that was what popped into my head first. An actual graph would probably help. That, or better terms.

    If I wasn't at work, I'd be running that query right now.

    I hear you. There is still stuff I'd like to do too, but, yeah, work.
    posted by mathlete at 5:53 PM on August 17, 2009


    I would like to see a nice, simple line graph showing the life of a thread, as indicated by number of comments and favorites per [day? hour?] over time.

    It would be super-mega-hyper-neat if this could be done automagically for:
    1. Individual threads
    2. Daily, weekly, monthly, yearly, and foreverly averages
    3. Individual users' posts' averages, in a handy (albeit mostly useless) widget to plunk into their profiles

    Hop to it, Stats Fairy!
    posted by Sys Rq at 5:54 PM on August 17, 2009


    Although it wouldn't be perfect as people can delete and re-create contacts at will... so we wouldn't know for sure that a person wasn't a contact of another person at the time of a specific post or comment. But we can be sure about the ones who were contacts prior to the contribution, and who still are now, which is probably good enough.

    Yeah, there's bound to be some noise in there. The biggest source of it I'd guess is revoked contacts, since we'd have no way of knowing that a contact relationship once existed. (Unless there was some profoundly predictive effect found, I guess, in which case we could guess. But that seems like a daydream.)

    I'd bet the number of revoked contacts is pretty low; non-zero, definitely, I know it's happened, but not common. Someone could check the old infodump I have cached from the start of the year and look for things that are gone now, I guess, as a test of that theory.

    But I'd bet the number of revoked and then reinstated contacts is even lower. Complicated set of motivations required for that, excepting the possibility of someone doing a gone-and-back-in-no-time cycle of an existing contact for bizarre XFN-fiddling reasons. So probably not a big source of noise regardless.
    posted by cortex (staff) at 5:55 PM on August 17, 2009


    Sorry for confusing everyone with my terminology. It is not technical. I am just lazy and that was what popped into my head first. An actual graph would probably help. That, or better terms.

    If I may, how about the following analogy to clarify it:

    Imagine Metafilter is a country. Every Metafilter user is a city. Every contact is a one-way road leading from the contacter to the contactee.

    There are a big bunch of cities with roads leading in and out, linked together in a big road network. You can go from any one of these to any of the others. This is the "main network" that we are talking about with thousands of cities (users) in it.

    There are also cities with a one-way road coming in from this network, but no road going back out to the network. These, too, are considered "part of" the network because you can get to them from other cities in the network.

    And finally, there are cities with a one-way road going to the network, but no road going back to that city from the network. These are the ones which are "connected to" the network (you can get to the network from them) but aren't part of the network themselves (since you can't get back to them from the network, so they might as well not exist from the point of view of others in that network).

    Is that about right?

    The weirdness caused by this was one of motivations for studying the contact links in both directions... that way you are either fully in or fully out of the main network of links with neither of those latter two weird cases.
    posted by FishBike at 6:07 PM on August 17, 2009


    I hear you. There is still stuff I'd like to do too, but, yeah, work.

    Work, sleep, and essential chores are sort of like the refractory period for this datawankery business.

    Did I just say that out loud?
    posted by FishBike at 6:09 PM on August 17, 2009


    Thanks, all. Yeah, there's a definite reference feel to the hidden gems list. Those are the threads that scored a 99% light factor. When I lower the light threshold to mid 80% level then fun threads like Lasagna Cat begin to pop up (and totally sidetrack me from stats).
    posted by hoppytoad at 6:12 PM on August 17, 2009 [1 favorite]


    It's possible that we're confusing cause and effect here. You could just as easily reverse the question and say "Are you more likely to add someone as a contact if you have favorited lots of their comments?".

    Given that we don't have dates for when contacts are added, it's impossible to get a good read on that.

    I'd like to see that favorites analysis broken down by subsite, as well. I feel like MetaTalk skews things significantly: It's comprised of the highly active (and interconnected) users, and contains whole threads full of favorited one-liners. I feel like the green is likely to be a bit more democratic, where favorites are given to useful information, regardless of source, and the blue will be somewhere in the middle.
    posted by chrisamiller at 6:43 PM on August 17, 2009


    It's possible that we're confusing cause and effect here. You could just as easily reverse the question and say "Are you more likely to add someone as a contact if you have favorited lots of their comments?"

    The contact activity sidebar seems like a probable mechanism of action for the correlation we are seeing. I'm sure there are people who've added contacts when they've noticed a user writing a lot of interesting stuff, to make sure everything they write shows up on the sidebar, so as not to miss anything.

    Heck, just check out asavage's profile for a likely contactee for a lot of those sort of contact records (196 incoming contacts, 0 outgoing). And I was one of them at one point for exactly that reason!

    And once their activity shows up on the sidebar, you are much more likely to see and favorite things they write. So it would be really interesting to know which of these things (linking because you favorite them a lot vs. favoriting a lot because you've linked to them) is the main one. Or maybe the correlation is so impressive because these two effects multiply together?
    posted by FishBike at 6:55 PM on August 17, 2009


    I don't understand this. I count 5453 contacters, 8059 contactees, and only 8928 unique users connected to the contact system in total.

    This morning, while replying to this, I almost said: theoretically with the network data I've gathered, if you add up the sum of every "Total users who are connected to this network: X" number, it should add up to 5453.

    So let's try it:
    $ cat _Total_Users_Who_Are_Connected.txt | sed -r 's/network(.*): //' | awk '{ sum += $1; } END { print sum; }'
    5432
    
    Hmm. We're 21 short of 5453. Must be a bug... unless--. Since I excluded networks with only 1 user in them, the only explanation would be users who have an outward contact to themselves and only themselves. They would show as an outward contact but the network would be excluded from my output because they are the only member of that network.

    First, can you even create an outward contact to yourself? Well, I tried in this morning and yes you can. (Note: You may get MeMails from people calling it freaky and asking how you did it.)

    Second, how many people have only one outward contact with that outward contact being to themselves?
    Self Contacters Whose Only Outward Contact Is Themselves
    id	name
    268	stefnet
    534	julen
    671	stopgap
    14270	blm
    14274	mss
    14734	triggerfinger
    17098	meehawl
    17642	unrepentanthippie
    17962	CMcKinnon
    18170	rooftop secrets
    19223	bigbigdog
    25130	zueod
    27288	gman
    27605	nowonmai
    39130	outlier
    46341	Partial Law
    49417	almostmanda
    52373	Cool Papa Bell
    61772	be11e
    75438	Xany
    91354	preview
    
    I count 21.

    For fun: the whole list of self contacters.
    posted by mathlete at 7:20 PM on August 17, 2009 [1 favorite]


    narcissistic pricks.
    posted by gman at 7:28 PM on August 17, 2009 [1 favorite]


    Second, how many people have only one outward contact with that outward contact being to themselves?

    Ha, nicely done. I recall a couple of metatalk threads about (or at least derailed to be about) self-contacting; I'd bet most if not all of these 21 are folks who saw one of those threads and tried it out despite their otherwise having no apparent interest in the contact system.

    Here's one of those threads.
    posted by cortex (staff) at 7:32 PM on August 17, 2009 [1 favorite]


    And in fact the self-contacters fall off pretty seriously after the 62k mark of users who would have been active in early November, 2007, when that thread went up.
    posted by cortex (staff) at 7:34 PM on August 17, 2009


    Oh, there's another class of user I'd be interested in seeing. I'll call them Special Guests.
    These would be people who have made less than, say, 5 comments, but have received more than say 10,20? favorites total.

    I'm betting these are people who signed up just to make a comment in a thread to which they could add a lot of value. Often because they were the subject of the thread.
    posted by vacapinta at 2:58 AM on August 18, 2009


    Updated statistics for favoriting of contacts vs. general favoriting... this was pretty close to "computationally difficult" territory as the query took about 4 hours to run. This is based on the "favoriting window" method I described earlier (only contributions made during each user's favoriting window count as a favoriting opportunity):
    Favoriting opportunities: 20,762,561,524
    Favorites given: 2,410,574
    % of favoriting opportunities favorited: 0.011610%

    Favoriting opportunities for contacts contributions: 80,654,556
    Favorites given to contacts contributions: 122,111
    % of favoriting opportunities for contacts contributions favorited: 0.151400%

    MeFites are, overall, 13.04 times more likely to favorite contributions by their contacts vs. contributions in general based on this windowing method. Still wow!
    posted by FishBike at 3:32 AM on August 18, 2009


    FishBike, neat.

    I was asking the inverse question: given that U favorites F% of contributions by V, what's the probability U has added V as a contact? To this you would want to compute everyone's F, and then see whether its histogram is different for contacts vs. non-contacts.
    posted by fantabulous timewaster at 4:29 AM on August 18, 2009


    You can't self-contact anymore, apparently. :(
    posted by Optimus Chyme at 6:11 AM on August 18, 2009




    Yeah, that's what I did; it still doesn't show up on either the contact or contacted by pages.
    posted by Optimus Chyme at 6:33 AM on August 18, 2009


    I also listed myself as a contact a few hours ago. It showed up in my sidebar, as did Optimus' self-contacting, but it doesn't show up in my profile. So: Does it work the same way as any other contact, even though it's not listed in my profile?
    posted by Dumsnill at 6:46 AM on August 18, 2009


    I don't remember exactly what the resolution there was; pb may have altered the profile page queries to exclude self in order to keep the numbers intuitive, without actually blocking the self-contacting itself. Which may give you the sidebar activity payoff, at least.
    posted by cortex (staff) at 7:01 AM on August 18, 2009


    Yeah, it doesn't show up in your contacts, just the sidebar items you've chosen to display about yourself.
    posted by gman at 7:06 AM on August 18, 2009


    FishBike has made 56 comments in this thread and cortex has made 50. What are the (say) 25 or 50 highest number of comments made by one user in a single thread?
    posted by Dumsnill at 11:31 AM on August 18, 2009


    I bet there are folks with hundreds of comments in longboat threads.
    posted by box at 11:32 AM on August 18, 2009


    I'd put money on the Ted Haggard thread being one of the high-water marks there, though it seems regardless scaling to thread length would be worth incorporating as an alternate view for any such analysis.
    posted by cortex (staff) at 11:34 AM on August 18, 2009


    What are the (say) 25 or 50 highest number of comments made by one user in a single thread?

    Here are the top 25, with post titles and (hopefully) incredible linking action to the actual posts:
    1. 502 comments by gramschmidt in MetaTalk thread 9622 ("Spoonfed used to host a metatalk parody that's gone now")
    2. 422 comments by cortex in MetaTalk thread 15931 ("Sans quoi?")
    3. 387 comments by peeping_Thomist in MetaFilter thread 51500 ("Political Science & Promiscuity")
    4. 361 comments by mrzarquon in MetaTalk thread 15931 ("Sans quoi?")
    5. 333 comments by dersins in MetaTalk thread 15931 ("Sans quoi?")
    6. 295 comments by Kafkaesque in MetaFilter thread 9622 ("-no title-")
    7. 282 comments by Neale in MetaFilter thread 1142 ("-no title-")
    8. 268 comments by ericb in MetaFilter thread 74487 ("Sarah Palin as McCain's running-mate")
    9. 264 comments by LobsterMitten in MetaTalk thread 15931 ("Sans quoi?")
    10. 237 comments by wendell in MetaTalk thread 9622 ("Spoonfed used to host a metatalk parody that's gone now")
    11. 214 comments by daveadams in MetaFilter thread 1142 ("-no title-")
    12. 199 comments by twoleftfeet in MetaTalk thread 17517 ("Bigger Front Page?")
    13. 197 comments by Cranberry in MetaTalk thread 9622 ("Spoonfed used to host a metatalk parody that's gone now")
    14. 193 comments by fourcheesemac in MetaFilter thread 75125 ("Because there are more important things to do on Friday night")
    15. 193 comments by peeping_Thomist in MetaFilter thread 56002 ("Ted Haggard | New Life Church")
    16. 191 comments by kenko in MetaTalk thread 9622 ("Spoonfed used to host a metatalk parody that's gone now")
    17. 188 comments by sgt.serenity in MetaFilter thread 22802 ("Weblogs & the Disruptive Web ")
    18. 186 comments by fourcheesemac in MetaFilter thread 74881 ("Palin, pancakes, and the straight talk express")
    19. 184 comments by ericb in MetaFilter thread 55140 ("Washington D.C. Pages and Interns")
    20. 184 comments by languagehat in MetaTalk thread 15931 ("Sans quoi?")
    21. 178 comments by Burhanistan in MetaTalk thread 17517 ("Bigger Front Page?")
    22. 178 comments by ericb in MetaFilter thread 39609 ("It's not about the sex.")
    23. 175 comments by bradlands in MetaFilter thread 27495 ("Fun with Puns")
    24. 170 comments by not_on_display in MetaTalk thread 16706 ("No comment from me....")
    25. 164 comments by wendell in MetaTalk thread 15931 ("Sans quoi?")
    I think I'll also try cortex's suggestion of calculating this by percentage of comments in the thread rather than absolute count, with some minimum thread size limits to avoid the list being dominated by single-comment threads.
    posted by FishBike at 3:12 PM on August 18, 2009 [1 favorite]


    And here is the same list by percentage of comments in the thread, limited to threads with at least 50 comments:
    1. 96% of 196 comments by sgt.serenity in MetaFilter thread 22802 ("Weblogs & the Disruptive Web ")
    2. 92% of 109 comments by sgt.serenity in MetaFilter thread 22857 ("Bush Wants $60B for 2004 Fed IT Budget ")
    3. 62% of 281 comments by bradlands in MetaFilter thread 27495 ("Fun with Puns")
    4. 60% of 67 comments by UbuRoivas in MetaTalk thread 17598 ("Favs Broken?")
    5. 53% of 80 comments by evariste in Ask MetaFilter thread 38276 ("Knocked Out Before The Bell Rang")
    6. 51% of 112 comments by amberglow in MetaFilter thread 57026 ("Jersey caught the gay")
    7. 46% of 56 comments by amberglow in MetaFilter thread 58903 ("persecution complex? prosecution complex?")
    8. 46% of 120 comments by homunculus in MetaFilter thread 70026 ("Trouble on the Roof....... of the World")
    9. 45% of 66 comments by amberglow in MetaFilter thread 60178 ("Bikes Against Bush")
    10. 43% of 65 comments by amberglow in MetaFilter thread 52738 ("Win is irreversible, says ruling party's candidate")
    11. 43% of 91 comments by amberglow in MetaFilter thread 60372 ("Fairtrade towns")
    12. 43% of 134 comments by demo in MetaTalk thread 3318 ("My name is demo, for we are many")
    13. 42% of 59 comments by spock in MetaFilter thread 39676 ("Fears growing that an H5 pandemic is likely")
    14. 42% of 69 comments by Satapher in MetaFilter thread 30224 ("What's American About American Poetry?")
    15. 42% of 118 comments by ericb in MetaFilter thread 40078 ("Gannongate update")
    16. 40% of 52 comments by amberglow in MetaFilter thread 62843 ("Low Cost Media and Distribution || High Impact Message Delivery")
    17. 40% of 223 comments by amberglow in MetaFilter thread 41768 ("UK Election 2005")
    18. 40% of 70 comments by amberglow in MetaTalk thread 15190 ("Rome Meetup")
    19. 40% of 80 comments by baltimoretim in Ask MetaFilter thread 103231 ("How is my house making me, my girlfriend, and my dogs sick? ")
    20. 39% of 112 comments by bornjewish in MetaFilter thread 61239 ("I mean..gosh; the're just like us!")
    21. 39% of 87 comments by dgaicun in MetaFilter thread 46778 ("Chuck Norris does not sleep. He waits.")
    22. 39% of 54 comments by amberglow in MetaFilter thread 59822 (""i honestly don't have a recollection..."")
    23. 39% of 75 comments by amberglow in MetaFilter thread 62015 ("Nuevo Havana")
    24. 39% of 476 comments by ericb in MetaFilter thread 55140 ("Washington D.C. Pages and Interns")
    25. 39% of 57 comments by chunking express in MetaFilter thread 81787 (""I started the movement with the firm resolve that I will never be caught alive by the enemy. That has spread down the ranks."")
    The top two entries on this list definitely tripped my "this cannot possibly be right" alarm. But they do appear to be accurate statistics.
    posted by FishBike at 3:27 PM on August 18, 2009


    Man, I called it on the Haggard thread but had no idea peeping_thomist had done nearly twice that streak on the PoliSci one.

    It's interesting to see how dominated this is by longboat threads, actually. 15931, both 9622s, 1142, 16706 (a, or perhaps the?, alphabet thread) are all unabashed longboats; 17517 started out as a legit discussion but fell into alphabetism after a while as well. 27495 is a great big punathon, though bradlands gets special credit for doing more than his (and more than a majority) share of the work there.

    Of those 25, the only ones that aren't the result of cooperative silliness are:

    #3 - in which peeping_thomist argues at great length with several people
    #8 - in which ericb does a great deal of linking and pull-quoting
    #14 - in which fourcheesemac links and quotes but also just plain comments a lot
    #15 - in which peeping_thomist does the same as in #3
    #17 - in which sgt. serenity goes on a bizarre (and thematic?) solo spam fest
    #18 - fourcheesemac in the same thread as #8
    #19 - in which ericb pulls a #8 in an older thread
    #22 - ericb be in an older thread yet

    peeping_thomist's showing both involve contentious discussion of sex & politics/ideology; ericb's appearances all involve hyperattentive following about developing political news stories; fourcheesemac likewise, but with more personal commentary and less citation.
    posted by cortex (staff) at 3:37 PM on August 18, 2009


    vacapinta: Oh, there's another class of user I'd be interested in seeing. I'll call them Special Guests.

    These would be people who have made less than, say, 5 comments, but have received more than say 10,20? favorites total.

    I'm betting these are people who signed up just to make a comment in a thread to which they could add a lot of value. Often because they were the subject of the thread.


    There are a surprising number of such users, like 80-100 depending on exactly how I interpret the criteria. So how about instead of that, the following top-20 list of most favorites received, for users with 5 or fewer posts.

    I must say your bet appears to be accurate for the first two, even when the subject of the post is a hexdecimal string!
    336 favorites: stevewoz [1 contribution(s)]
    306 favorites: 09-F9-11-02-9D-74-E3-5B-D8-41-56-C5-63-56-88-C0 [3 contribution(s)]
    295 favorites: oalocke [4 contribution(s)]
    127 favorites: goliche [3 contribution(s)]
    99 favorites: torietorie [3 contribution(s)]
    95 favorites: Hitler [4 contribution(s)]
    92 favorites: blocked [1 contribution(s)]
    85 favorites: Percy the Scarab-Bedazzled Skull [1 contribution(s)]
    83 favorites: 10jubilee [4 contribution(s)]
    81 favorites: amanlikeme [4 contribution(s)]
    79 favorites: SeanOfTheHillPeople [5 contribution(s)]
    69 favorites: richar4 [5 contribution(s)]
    67 favorites: amandapalmer [1 contribution(s)]
    53 favorites: jack [1 contribution(s)]
    52 favorites: corhermitex [3 contribution(s)]
    47 favorites: eotvos [5 contribution(s)]
    47 favorites: fred [3 contribution(s)]
    46 favorites: runflats [4 contribution(s)]
    44 favorites: cellocat [4 contribution(s)]
    43 favorites: Barack Obama [4 contribution(s)]
    posted by FishBike at 3:43 PM on August 18, 2009


    Previous comment discussing the first list specifically, to be clear. Interesting (but not, I guess, all that surprising) to see how much amberglow represents the second list.

    The top two entries on this list definitely tripped my "this cannot possibly be right" alarm. But they do appear to be accurate statistics.

    These days we'd probably catch something like that fairly promptly and give a timeout before it got even halfway done, and clean it up afterwards too, but in 2003 it was just Matt, and there was no Recent Activity as far as I can recall to tip folks off.

    Here's a metatalk thread about sgt.'s posting spree.
    posted by cortex (staff) at 3:44 PM on August 18, 2009


    cortex, do you think you could change your name to context for the purposes of this discussion? It's really great to have someone who remembers all this stuff to provide some meaning to go with these numbers.
    posted by FishBike at 3:49 PM on August 18, 2009


    It may be better to call that particular result the Special Guest/Apt Sockpuppet/Confused Person/Referenceability Axis. Though that's a pretty wordy name.

    Breaking it down:

    Special guests:
    - stevewoz (because he's Woz)
    - amandapalmer (because she's Amanda Palmer)
    - eotvos (because he's our 10th meetup Antarctic rep)

    Apt Sockpuppets:
    - hex string (take THAT, conflict of community desires and corporate considerations on some other community website!)
    - Hitler (it's, you know, Hitler)
    - Percy is upset that his owners don't like him
    - jack has been doing cock all (exceptional case, this was an OLD account, possibly hijacked by jack_mo by some reeaaaaallly easy password-guessing type thing and later closed by Matt)
    - cohermitex welcomes me aboard the Mefi, LLC Cube
    - Barack Obama (it's, you know, Barack Obama)

    Confused person:
    - goliche wants you to hope them
    - 10jubilee was actually a fairly disturbing situation
    - fred just wanted to but a damn used car

    Referenceability:
    - oalocke asks about bombproof goods
    - blocked wants to know how to focus on writing
    - amanlikeme is looking for R&B
    - richar4 wants to overcome fear of disapproval
    - runflats wants to build a social life
    - cellocat wants to sustain a marital relationship

    SeanOfTheHillPeople is the odd man out:
    1. He's got big favorites on two different contributions,
    2. both of which are metafilter comments.

    One talking about Rich Guy Disappearance stuff just yesterday, the other a riffy Five Guys burger 'interview' back in June.

    torietorie is also an oddity, and actually looks like a faves-on-removed-content bug in the dump. I need to look into that, seems like I'm including favorites on content that's been deleted, which may or may not be useful data if controlled for but is very much a source of noise if it's included without some sort of clear markup at the very least.

    So a couple things, then, about this list:

    All of the stuff in the Referenceability section is askme post content and is explicitly about the question asked (and presumably the answers given or expected to be given, depending on the time of the favorite) rather than about the user.

    The Confused Person list is two old, waaaay pre-favorites situations that got retroactive faves, and one faves-era askme that was deeply contentious at the time for a few different reasons. That latter almost fits in with Referenceability in superficial features, but there's no reference value in that utilitarian sense. It's drama instead; OMG factor.

    The socks and the special guests seem pretty obvious in their own right.

    It'd probably be easy to sort out the Referenceability stuff based just on the form: askme posts, rather than any-subsite comments. But automatically separating out the other sorts seems harder—the backdatedness of some of the Confused Person stuff could be taken advantage of heuristically (and in general harnessing the date-of-faves-launch as an analytical tool is probably a good idea in general), but how do you tell a special guest form a sock without introducing human semantic analysis into the equation?
    posted by cortex (staff) at 4:14 PM on August 18, 2009 [1 favorite]


    cortex, do you think you could change your name to context for the purposes of this discussion? It's really great to have someone who remembers all this stuff to provide some meaning to go with these numbers.

    Heh. Half of it I remember (and half of that only vaguely); the other half I look up, and pin down in Metatalk by searching the archives by date.
    posted by cortex (staff) at 4:19 PM on August 18, 2009


    Most favorites and most comments on delated threads/comments would be interesting, too.
    posted by empath at 4:23 PM on August 18, 2009


    (deleted, even)
    posted by empath at 4:24 PM on August 18, 2009


    A delated thread is a thread where people are thrilled to see it finally go away.
    posted by cortex (staff) at 4:30 PM on August 18, 2009 [1 favorite]


    What I find amusing in both the sgt.serenity threads is, after many utterly random comments, he posted:
    sorry!
    As if to say, "whew! I feel much better! I think I got it out of my system! I do not think I could do any more." Then he catches his breath & gets right back to spewing.

    He's a trooper.
    posted by Pronoiac at 4:50 PM on August 18, 2009


    Most favorites and most comments on delated threads/comments would be interesting, too.

    Well, the favorite counts on deleted threads all seem to be zero. But here are the 25 most-commented deleted threads (with links to the threads, thread title, and deletion reason).
    1. 502 comments on MetaFilter post 77075, "Editing Window Test" (deletion reason:"Test in progress. -- pb")
    2. 321 comments on MetaFilter post 82789, "The King of Pop Joins the King of Rock-and-Roll" (deletion reason:"We have a keeper. -- cortex")
    3. 319 comments on MetaFilter post 45141, "George Bush needs to go pee." (deletion reason:"jesus jumped up christ. ")
    4. 291 comments on MetaFilter post 44915, "Burning man" (deletion reason:"not really a good post (an excuse to talk is more like it)")
    5. 261 comments on MetaFilter post 65232, "Leave The Grocery Shopping to the Women" (deletion reason:"What in the hell? -- cortex")
    6. 251 comments on MetaFilter post 54682, "September 11, 2006" (deletion reason:"somehow I missed this thread this morning. ugh.")
    7. 234 comments on MetaFilter post 54968, "MySpace & RSS Obituary Wall Remembers the Dead" (deletion reason:"thanks for the ok on deletion.")
    8. 213 comments on MetaFilter post 21405, "-no title-" (deletion reason:"[NULL]")
    9. 202 comments on Ask MetaFilter post 47255, "Help someone find a job without pressuring them?" (deletion reason:"removed at poster's request")
    10. 198 comments on MetaFilter post 71201, "Exclusive Free Album Project > Audiobulb" (deletion reason:"oh for christ's sake. -- jessamyn")
    11. 179 comments on MetaFilter post 62093, "BUCKETS!" (deletion reason:"a post about an image? that's the best of the web? -- mathowie")
    12. 176 comments on MetaFilter post 46545, "Starbucks vs. Red Bull" (deletion reason:"Self-link. Jesus, you're a douche. -- cortex")
    13. 170 comments on MetaFilter post 34705, "nigger" (deletion reason:"some dipshit at 64.110.74.244 posted this")
    14. 166 comments on Ask MetaFilter post 97504, "What songs make you wonder whether humanity should even bother?" (deletion reason:"I'm very sorry, but I was busy earlier and this is a terribly chatfilter question for AskMe. -- jessamyn")
    15. 165 comments on MetaFilter post 51633, "This is the new" (deletion reason:"self-link, banned")
    16. 162 comments on MetaFilter post 40482, "The fate of the ANWR" (deletion reason:"double post")
    17. 159 comments on MetaFilter post 46240, "Look at me riding a luck dragon!" (deletion reason:"self-link")
    18. 155 comments on MetaFilter post 47630, "For the season...." (deletion reason:"your point is....?")
    19. 152 comments on MetaFilter post 52397, "A Declaration of War" (deletion reason:"self-link, banned. please read the faq for more details about the commmunity guidelines.")
    20. 151 comments on MetaFilter post 74475, "Wither Obama?" (deletion reason:"What the barack? -- cortex")
    21. 150 comments on MetaFilter post 82890, "ItGÇÖs a story of two uncommon and very different friendshipsGÇöhers for me and mine for her." (deletion reason:"this is pretty creepy-stalkery, while a great trainwreck but not a good idea for here. -- jessamyn")
    22. 146 comments on MetaFilter post 78834, "Gun Safety" (deletion reason:"Tragic story, but using the photos of them to make a joke was kind of crass. Thread went downhill from there. -- mathowie")
    23. 145 comments on MetaFilter post 54575, "maintaining a health 24/7 computer" (deletion reason:"you are confused")
    24. 127 comments on Ask MetaFilter post 85255, "Craigslist flagging" (deletion reason:"please don't do this here -- jessamyn")
    25. 125 comments on Ask MetaFilter post 116922, "Anal sex, not so great actually..." (deletion reason:"poster's request -- jessamyn")

    posted by FishBike at 5:04 PM on August 18, 2009 [2 favorites]


    Oh, and deleted comments are not in the Infodump, so no reporting on those. (Right?)
    posted by FishBike at 5:05 PM on August 18, 2009


    Yeah, they're not in there.
    posted by cortex (staff) at 5:33 PM on August 18, 2009


    Ooh, I've got one. Which member has posted in the largest proportion of deleted posts posted since their join date?
    posted by box at 5:54 PM on August 18, 2009


    I bet that would correlate positively to someone with a high "posts early in threads" index, for obvious reasons.
    posted by cortex (staff) at 6:03 PM on August 18, 2009


    And with a high comment-count-since-join-date, I hypothesize.
    posted by box at 6:14 PM on August 18, 2009


    *scribble, scribble*

    When I have time later this week or this weekend, I am going to try and implement requests if they haven't been fulfilled already.
    posted by mathlete at 6:59 PM on August 18, 2009


    Man, I've got like three solid notebook pages of one-liner stats research ideas I wrote down in a dentist office waiting room, back around when I first found out I'd have access to the db. I need to track that down.
    posted by cortex (staff) at 7:00 PM on August 18, 2009


    Who's posted the most 'Metafilter:' gags?
    posted by box at 7:10 PM on August 18, 2009


    box: "Who's posted the most 'Metafilter:' gags?"

    I don't think it's possible to tell this from the current Infodump; you'd need a full database snapshot that included all the comment text.

    I was going to suggest trying to search for them using the built-in search, but it looks like someone already thought of that:
    You searched for "Metafilter:". Looks like you're looking for Metafilter tagline jokes. You might take a look at Metafilter admin cortex's list of taglines here.

    As you can imagine, the word Metafilter shows up across hundreds of thousands of posts and comments on the site. If you really want to search for the word Metafilter, try your search at Yahoo! or Google.
    Apparently they get searched for a lot...
    posted by Kadin2048 at 9:04 PM on August 18, 2009 [1 favorite]


    Woah.
    posted by carsonb at 10:14 PM on August 18, 2009


    I should really rerun that. It's been a while. I could probably incorprate userid while I'm at it.

    I forgot that we incorporated that as an exceptional search result. pb is awesome.
    posted by cortex (staff) at 10:57 PM on August 18, 2009


    A page on the unofficial MetaFilter Wiki listing specific statistic ideas, and links to the result if somebody takes them on, might be a nice thing to have. I know there's already a page about meta-analysis, but if somebody wanted to go into more detail about specific things that had been requested and/or posted, I think that would be a good thing. I was planning to go back through this thread at some point and write down all the neat ideas that I got distracted from by other neat ideas.
    posted by FishBike at 5:30 AM on August 19, 2009


    A page on the unofficial MetaFilter Wiki listing specific statistic ideas, and links to the result if somebody takes them on, might be a nice thing to have.

    What, you can crunch stats and write crazy database queries six ways to sunday but suddenly you can't edit a wiki?

    (Just kidding-- seriously. Love all the awesome stuff you've come up with in this and prior infodump threads. My only critique is that I am grossly underrepresent in your results. Plz change your metrics to reflect my awesomeness. tia.)
    posted by dersins at 12:40 PM on August 19, 2009


    Man, I've got like three solid notebook pages of one-liner stats research ideas I wrote down in a dentist office waiting room, back around when I first found out I'd have access to the db. I need to track that down.

    Oh, THAT'S where the records of my awesomeness languish. On a forgotten notebook in cortex's house, probably in some box you never got around to unpacking. Great.
    posted by desuetude at 1:36 PM on August 19, 2009


    Oh, THAT'S where the records of my awesomeness languish. On a forgotten notebook in cortex's house, probably in some box you never got around to unpacking. Great.

    You think you've got it bad, my actual awesomeness was also left in a forgotten notebook in some box that God never got around to unpacking.
    posted by It's Raining Florence Henderson at 1:40 PM on August 19, 2009


    FishBike: A page on the unofficial MetaFilter Wiki listing specific statistic ideas, and links to the result if somebody takes them on, might be a nice thing to have.

    I don't think brainstorming in the wiki is a good fit. There's far more of an audience in MetaTalk, & the topic could use an active, ongoing discussion, which the wiki doesn't excel at.
    posted by Pronoiac at 2:12 PM on August 19, 2009


    I don't think brainstorming in the wiki is a good fit. There's far more of an audience in MetaTalk, & the topic could use an active, ongoing discussion, which the wiki doesn't excel at.

    To clarify, then, what I imagined was a page with links to the comments themselves where various kinds of stats had been requested. If someone then posts said stats, a follow-up link to that comment also goes in the wiki.

    I wasn't thinking of a place to record new ideas and discuss them, more just a sort of shared notebook to accompany these discussions, to make it easy to find out what has already been done, and what has already been requested and not done yet.
    posted by FishBike at 3:21 PM on August 19, 2009


    What, you can crunch stats and write crazy database queries six ways to sunday but suddenly you can't edit a wiki?

    Heh, I can, but that doesn't mean I enjoy it. The fact that a wiki even exists suggests there are people who do enjoy creating and maintaining such things, however, and that is awesome.

    My only critique is that I am grossly underrepresent in your results. Plz change your metrics to reflect my awesomeness.

    Well, you could tell us what kind of statistics you think would feature you and then we could run them and see if you're right. Or, if you'd just like to see your name in lights1 I could post a set of per-user stats for your account following the same format as I did in the MeFi User Matching thread.

    1: tiny, but extremely numerous lights.
    posted by FishBike at 3:27 PM on August 19, 2009


    Mostly I'm just curious as to who I favorite most / who favorites me the most. Also, would like some justification that I rank at #1 on the awesomeness scale.

    Which I do.

    FYI.
    posted by dersins at 3:41 PM on August 19, 2009


    Ooh, I've got one. Which member has posted in the largest proportion of deleted posts posted since their join date?

    Well, I thought I'd start with just an outright count of how many deleted posts people have commented on. It's a top 29 list instead of the usual 20 or 25, just for you dersins.
    1002: delmoi
    617: caddis
    587: DU
    563: mr_crash_davis
    553: Blazecock Pileon
    553: quonsar
    537: languagehat
    466: smackfu
    456: jonmc
    446: Astro Zombie
    425: matteo
    424: jessamyn
    418: ericb
    395: cortex
    387: loquacious
    386: pyramid termite
    384: klangklangston
    367: grouse
    349: stavrosthewonderchicken
    349: dhartung
    330: quin
    325: Burhanistan
    322: OmieWise
    322: ThePinkSuperhero
    318: blue_beetle
    316: mathowie
    314: flapjax at midnite
    313: mr_roboto
    312: dersins
    I'll try to do this as a percent of deleted posts since the user's join date next, which was the original request.
    posted by FishBike at 3:45 PM on August 19, 2009


    Yeah! #29! I fucking RULE! In your face, rest of metafilter!
    posted by dersins at 3:49 PM on August 19, 2009


    Ambrolicious!

    I ain't promiscuous!
    And if you was suspicious,
    All that shit is fictitious.
    I blow kisses.

    [+] [!] [+] [!]
    posted by Ambrosia Voyeur at 3:55 PM on August 19, 2009




    Aw, man, I messed it up and forgot to control for duplicate postid numbers between the 4 sites.... sorry folks. Here is what I think are the right numbers for 'most deleted posts that a user has commented in':
    678: delmoi
    462: DU
    387: quonsar
    381: mr_crash_davis
    367: Blazecock Pileon
    320: Astro Zombie
    279: ericb
    270: loquacious
    267: caddis
    265: pyramid termite
    250: jonmc
    250: matteo
    223: fenriq
    223: Burhanistan
    222: languagehat
    213: flapjax at midnite
    212: HuronBob
    212: dersins
    204: boo_radley
    200: mr_roboto
    (you're #18 now, dersins!)
    posted by FishBike at 4:05 PM on August 19, 2009


    Mostly I'm just curious as to who I favorite most / who favorites me the most.

    That was pretty much how MeFi User Matching got going, yeah. And I kept all the queries from that, so:

    Stats for:dersins
    Who does dersins favorite the most?
    (simple count of favorites)

    Anonymous [15]
    jonson [9]
    jessamyn [9]
    Pastabagel [8]
    mathowie [7]
    Miko [7]
    robocop is bleeding [6]
    klangklangston [6]
    cortex [5]
    Pater Aletheias [5]

    Who does dersins favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.41% (5 of 1234) of Pater Aletheias's comments+posts
    0.26% (9 of 3422) of jonson's comments+posts
    0.24% (15 of 6370) of Anonymous's comments+posts
    0.22% (6 of 2717) of robocop is bleeding's comments+posts
    0.19% (8 of 4301) of Pastabagel's comments+posts
    0.09% (7 of 7669) of Miko's comments+posts
    0.08% (7 of 8455) of mathowie's comments+posts
    0.07% (5 of 7012) of mediareport's comments+posts
    0.07% (9 of 13603) of jessamyn's comments+posts
    0.04% (6 of 13339) of klangklangston's comments+posts

    Who favorites dersins the most?
    (simple count of favorites)

    Tennyson D'San [132]
    tehloki [95]
    Pope Guilty [53]
    scrump [44]
    nasreddin [42]
    DevilsAdvocate [40]
    Caduceus [39]
    koeselitz [38]
    shmegegge [36]
    grouse [36]

    Who favorites dersins the most?
    (percent of your comments+posts since they joined)

    smoke: 2.68% (4 of 149) of dersins's comments+posts
    Tennyson D'San: 2.45% (132 of 5392) of dersins's comments+posts
    tehloki: 1.94% (95 of 4905) of dersins's comments+posts
    Joe Beese: 1.67% (25 of 1497) of dersins's comments+posts
    Pope Guilty: 1.09% (53 of 4881) of dersins's comments+posts
    Caduceus: 1.00% (39 of 3892) of dersins's comments+posts
    xingcat: 0.96% (1 of 104) of dersins's comments+posts
    Cats' Concert: 0.96% (1 of 104) of dersins's comments+posts
    SLC Mom: 0.96% (1 of 104) of dersins's comments+posts
    george_morgan: 0.96% (1 of 104) of dersins's comments+posts

    Who are dersins's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    dersins [9] ---- [12] jessamyn
    dersins [7] ---- [14] Miko
    dersins [6] ---- [36] klangklangston
    dersins [9] ---- [6] jonson
    dersins [7] ---- [4] mathowie
    dersins [4] ---- [7] Meatbomb
    dersins [4] ---- [17] LobsterMitten
    dersins [3] ---- [36] shmegegge
    dersins [3] ---- [13] Baby_Balrog
    dersins [3] ---- [3] cmonkey

    Who are dersins's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    dersins [0.23%] ---- [0.32%] MaryDellamorte
    dersins [0.31%] ---- [0.21%] TryTheTilapia
    dersins [0.18%] ---- [0.34%] snofoam
    dersins [0.18%] ---- [0.20%] sleepy pete
    dersins [0.20%] ---- [0.16%] h00py
    dersins [0.63%] ---- [0.16%] numinous
    dersins [0.16%] ---- [0.15%] mudpuppie
    dersins [0.19%] ---- [0.14%] Xere
    dersins [0.14%] ---- [0.15%] 0xFCAF
    dersins [0.13%] ---- [0.45%] flatluigi

    Of the threads where dersins has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 30.7% [915 of 2977]
    Joe Beese: 23.7% [177 of 747]
    jessamyn: 22.9% [683 of 2977]
    quin: 20.5% [611 of 2977]
    filthy light thief: 19.3% [160 of 828]
    Blazecock Pileon: 19.0% [525 of 2757]
    languagehat: 18.0% [537 of 2977]
    DU: 18.0% [423 of 2346]
    klangklangston: 17.6% [524 of 2977]
    Astro Zombie: 16.8% [463 of 2757]

    Of the threads where other users have been active, in whose has dersins also been the most active by percentage?
    (limited to threads active after dersins has joined MetaFilter)

    danOstuporStar: 33.3% [22 of 66]
    double block and bleed: 24.6% [35 of 142]
    Bango Skank: 23.6% [26 of 110]
    panboi: 23.6% [33 of 140]
    waraw: 23.2% [107 of 461]
    cosmonik: 22.4% [19 of 85]
    PugAchev: 22.2% [14 of 63]
    Horken Bazooka: 21.8% [12 of 55]
    little e: 21.6% [30 of 139]
    Hovercraft Eel: 21.6% [30 of 139]

    Who has favorited the same items as dersins the most?

    ifjuly [104]
    divabat [93]
    deborah [84]
    scrump [82]
    flibbertigibbet [75]
    agropyron [65]
    graventy [63]
    CunningLinguist [62]
    streetdreams [62]
    LobsterMitten [57]

    posted by FishBike at 4:26 PM on August 19, 2009


    Fascinating. I apparently only favorite well-known mefites. I had no idea. (Although I'm sure this is significantly influenced by my arbitrary-- and completely chldish-- decision to maintain a constant number of favorites; in other words, when I add one, I subtract one.)
    posted by dersins at 4:33 PM on August 19, 2009


    And here are the top 21 users who have commented in the greatest percentage of deleted posts that were posted after their sign-up date:
    11.65%: DU (462 of 3967 deleted posts)
    8.69%: delmoi (678 of 7801 deleted posts)
    8.28%: Joe Beese (83 of 1003 deleted posts)
    6.56%: Blazecock Pileon (367 of 5596 deleted posts)
    6.25%: Arandia (1 of 16 deleted posts)
    5.82%: Astro Zombie (320 of 5498 deleted posts)
    5.29%: Burhanistan (223 of 4213 deleted posts)
    5.26%: dunkadunc (83 of 1577 deleted posts)
    5.00%: s0urc3 (1 of 20 deleted posts)
    4.96%: quonsar (387 of 7801 deleted posts)
    4.88%: mr_crash_davis (381 of 7800 deleted posts)
    4.35%: flapjax at midnite (213 of 4893 deleted posts)
    4.27%: ericb (279 of 6541 deleted posts)
    4.25%: filthy light thief (53 of 1246 deleted posts)
    3.93%: loquacious (270 of 6870 deleted posts)
    3.79%: orme (32 of 845 deleted posts)
    3.71%: Mister_A (176 of 4741 deleted posts)
    3.61%: hippybear (21 of 581 deleted posts)
    3.57%: pyramid termite (265 of 7427 deleted posts)
    3.54%: caddis (267 of 7543 deleted posts)
    3.40%: dersins (212 of 6238 deleted posts)
    posted by FishBike at 4:33 PM on August 19, 2009


    Although I'm sure this is significantly influenced by my arbitrary-- and completely chldish-- decision to maintain a constant number of favorites; in other words, when I add one, I subtract one.

    How do you decide which one to subtract?
    posted by FishBike at 4:35 PM on August 19, 2009


    In general (although there are some exceptions) , I am trying to transition from using favorites as both bookmarks AND marks of approval to using them solely as bookmarks. Generally this means that when I add a favorite, I then scan back through the older comments I've favorited and remove one which I feel than I'm unlikely to want to find again.

    Eventually I'll run out of such comments. Not sure what exactly I'll do then, but perhaps by then we'll have the ability to sort and categorize our favorites so I'll feel less compelled to artificially limit the number of favorites I've bestowed.

    I may be overthinking this.
    posted by dersins at 4:47 PM on August 19, 2009


    dersins: I may be overthinking this.

    Sorry, we're approaching 350 comments in a thread about MetaFilter statistics, and you're worried you're overthinking something?

    But that is interesting about how you're using the favorites function. I'm still trying to make up my mind how I want to use it, and I think I'm mainly leaning towards treating it like a bookmark system too. The reason I haven't favorited all the comments I like is the fear that I'll lose the value of favoriting as a bookmarking system, and end up having to do a big clear-out to get it back.
    posted by FishBike at 5:12 PM on August 19, 2009


    That's pretty much what I did. I was going to scale back even more, but then the Powers that Be the ability to search (omg thats's what pb stands for!) gave us the ability to search favorites, so I felt like I was still OK.
    posted by dersins at 5:16 PM on August 19, 2009


    It's a top 29 list instead of the usual 20 or 25...

    330: quin


    Yay!

    Aw, man, I messed it up

    *scans new list*

    Boo!

    I'd sulk, but there's gotta be some list out there I fall in the top 20 of, even if it's just "users with the word 'quin' in their name."
    posted by quin at 5:19 PM on August 19, 2009


    I'd sulk, but there's gotta be some list out there I fall in the top 20 of, even if it's just "users with the word 'quin' in their name."

    Alphabetically, you're #15 on that list. That's kind of funny:
    -harlequin-
    Arlequin
    dequinix
    Felix_Bouquin
    Harlequin
    JonathanAquino
    jquinby
    jquinnace
    lauramarquina
    Maquinna
    Marquinho
    ParemosLasMaquinas
    perequin
    pquinn
    quin
    Quinbus Flestrin
    quincepuppet
    quindo
    quine's_gavagai
    Quinn
    posted by FishBike at 5:25 PM on August 19, 2009 [2 favorites]


    But still the top 20. (I could only think of about five other names that fit the bill.)
    posted by quin at 5:48 PM on August 19, 2009


    Ooh! FishBike, can you do me like you did dersins?
    posted by Kattullus at 6:40 PM on August 19, 2009


    I may or may not have intended the innuendo
    posted by Kattullus at 6:40 PM on August 19, 2009


    I got sloppy seconds!

    err, ...thirds!

    [Disclaimer: kidding; don't run that on me ... or if you do, send it to me in MeMail. Interested, but not dying to know.]

    posted by not_on_display at 6:43 PM on August 19, 2009


    Ooh! FishBike, can you do me like you did dersins?

    Stats for:Kattullus
    Who does Kattullus favorite the most?
    (simple count of favorites)

    jessamyn [71]
    cortex [56]
    languagehat [54]
    klangklangston [41]
    koeselitz [40]
    UbuRoivas [36]
    mathowie [32]
    Artw [28]
    Marisa Stole the Precious Thing [28]
    lunit [26]

    Who does Kattullus favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    3.13% (11 of 351) of little e's comments+posts
    2.94% (26 of 885) of lunit's comments+posts
    1.38% (15 of 1085) of Horace Rumpole's comments+posts
    1.24% (7 of 566) of Dumsnill's comments+posts
    1.22% (17 of 1393) of The Great Big Mulp's comments+posts
    1.19% (12 of 1009) of pb's comments+posts
    1.18% (9 of 760) of ricochet biscuit's comments+posts
    1.08% (5 of 461) of Damn That Television's comments+posts
    0.89% (5 of 559) of winna's comments+posts
    0.73% (40 of 5451) of koeselitz's comments+posts

    Who favorites Kattullus the most?
    (simple count of favorites)

    orrnyereg [110]
    nicolin [109]
    Cassilda [56]
    muymuy [53]
    UbuRoivas [53]
    nasreddin [52]
    Haruspex [51]
    languagehat [48]
    tehloki [40]
    nickyskye [38]

    Who favorites Kattullus the most?
    (percent of your comments+posts since they joined)

    orrnyereg: 5.96% (110 of 1846) of Kattullus's comments+posts
    nicolin: 2.90% (109 of 3764) of Kattullus's comments+posts
    Cassilda: 1.84% (56 of 3038) of Kattullus's comments+posts
    Marisa Stole the Precious Thing: 1.40% (32 of 2291) of Kattullus's comments+posts
    Dr Dracator: 1.37% (5 of 364) of Kattullus's comments+posts
    muymuy: 1.30% (53 of 4070) of Kattullus's comments+posts
    nasreddin: 1.27% (52 of 4106) of Kattullus's comments+posts
    UbuRoivas: 1.20% (53 of 4414) of Kattullus's comments+posts
    inire: 1.18% (9 of 760) of Kattullus's comments+posts
    Haruspex: 1.16% (51 of 4414) of Kattullus's comments+posts

    Who are Kattullus's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Kattullus [54] ---- [48] languagehat
    Kattullus [40] ---- [37] koeselitz
    Kattullus [36] ---- [53] UbuRoivas
    Kattullus [28] ---- [32] Marisa Stole the Precious Thing
    Kattullus [41] ---- [27] klangklangston
    Kattullus [28] ---- [23] Artw
    Kattullus [20] ---- [27] Abiezer
    Kattullus [17] ---- [52] nasreddin
    Kattullus [17] ---- [26] The Great Big Mulp
    Kattullus [14] ---- [17] flapjax at midnite

    Who are Kattullus's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Kattullus [0.92%] ---- [1.05%] HumanComplex
    Kattullus [2.52%] ---- [0.86%] weebil
    Kattullus [1.24%] ---- [0.76%] Dumsnill
    Kattullus [0.73%] ---- [0.84%] koeselitz
    Kattullus [0.62%] ---- [1.27%] nasreddin
    Kattullus [1.22%] ---- [0.59%] The Great Big Mulp
    Kattullus [0.58%] ---- [1.40%] Marisa Stole the Precious Thing
    Kattullus [0.52%] ---- [0.66%] Abiezer
    Kattullus [0.83%] ---- [0.49%] dyoneo
    Kattullus [0.69%] ---- [0.49%] burnmp3s

    Of the threads where Kattullus has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    languagehat: 25.0% [529 of 2118]
    cortex: 24.7% [524 of 2118]
    quin: 17.3% [366 of 2118]
    Joe Beese: 17.1% [96 of 561]
    jessamyn: 17.1% [362 of 2118]
    The Whelk: 16.8% [118 of 702]
    Blazecock Pileon: 16.3% [323 of 1977]
    DU: 15.5% [290 of 1874]
    klangklangston: 15.5% [321 of 2076]
    Astro Zombie: 15.4% [303 of 1972]

    Of the threads where other users have been active, in whose has Kattullus also been the most active by percentage?
    (limited to threads active after Kattullus has joined MetaFilter)

    double block and bleed: 26.8% [38 of 142]
    GenjiandProust: 25.5% [13 of 51]
    little e: 17.3% [24 of 139]
    waraw: 15.4% [71 of 461]
    lunit: 15.3% [92 of 601]
    psmith: 14.8% [20 of 135]
    DaDaDaDave: 14.3% [11 of 77]
    subbes: 13.7% [32 of 234]
    aldurtregi: 13.6% [9 of 66]
    Dumsnill: 13.6% [45 of 331]

    Who has favorited the same items as Kattullus the most?

    koeselitz [173]
    scrump [162]
    nasreddin [136]
    schyler523 [136]
    deborah [134]
    tehloki [130]
    shmegegge [125]
    burnmp3s [120]
    misha [117]
    Pope Guilty [114]

    posted by FishBike at 7:12 PM on August 19, 2009 [1 favorite]


    not_on_display: [Disclaimer: kidding; don't run that on me ... or if you do, send it to me in MeMail. Interested, but not dying to know.]

    Sent. It was worth it just to be able note that the statistics for not_on_display are, in fact, not on display.
    posted by FishBike at 7:29 PM on August 19, 2009 [1 favorite]


    Ooh, ooh, do me, do me!
    posted by ocherdraco at 7:46 PM on August 19, 2009


    Thanks FishBike! That was interesting to read. I never would've guessed that my most favorited user is jessamyn, though, thinking about it, she does say a lot of awesome things :)
    posted by Kattullus at 8:29 PM on August 19, 2009


    FishBike, have you considered running some of your queries for every user and outputting them to a file for all to see?
    posted by mathlete at 10:00 PM on August 19, 2009


    Thanks, Fishbike, for the memailed stats! Indeed quite interesting, to be "done" in that manner!

    I'm gonna go shower now.
    posted by not_on_display at 10:39 PM on August 19, 2009


    Which mod's ass does Kattullus enjoy kissing the most?
    (simple count of favorites)

    jessamyn [71]
    cortex [56]
    mathowie [32]
    posted by gman at 5:27 AM on August 20, 2009


    Oh shit! So I do! I'm such a shill for the administration I should apply for a job with MSNBC.
    posted by Kattullus at 6:02 AM on August 20, 2009 [2 favorites]


    FishBike, have you considered running some of your queries for every user and outputting them to a file for all to see?

    Considered, yes, and rejected the idea for now.

    I feel OK about posting results from queries that consider all users and spit out a top-N list with a one-line stat for each user. I'll feel even more OK about that if there's a process for specific users to have their identifying information obfuscated in the Infodump.

    But the single-user statistics reports that I've been running go into considerable depth of analysis on one user. I wouldn't like to post that unless specifically OK'ed by the user being analyzed.

    There's also a practical matter of processing time. That whole long report now takes over 16 minutes to run for one user. I figure to run it as-is for everyone would take more than a year.

    For the data in the last Infodump it took only 10 minutes, though, so obviously this is not scaling up well to larger amounts of data. If it continues to be popular, I'll have to do something about that. There might be some economies of scale to be had by re-writing the queries to produce these stats for all users at once. But I'd still only post them for individual users on request.
    posted by FishBike at 6:53 AM on August 20, 2009


    When Kattullus was in town for the 10th, we went and got coffee and he favorited me like three times.
    posted by cortex (staff) at 7:10 AM on August 20, 2009


    FishBike, if you would do that analysis for me and post it here, I would be very thankful because that looks cool as heck.
    posted by Optimus Chyme at 7:12 AM on August 20, 2009


    This analysis would be boring for me since I appear to be among the tiny minority here that uses favorites as bookmarks not as "I approve of this post/comment" but more of "I'll mark this to check back later or this may be useful to me."

    Of the threads where Kattullus has been active, who else has been active in the highest percentage?


    I'm seeing the same names here as in all the other lists. I think the reason is of course that the same list of users, more or less, are active in all the threads - Katullus or no Katullus.
    posted by vacapinta at 7:22 AM on August 20, 2009


    When Kattullus was in town for the 10th, we went and got coffee and he favorited me like three times.

    Pics, plz.
    posted by desuetude at 7:40 AM on August 20, 2009


    Actually I had cider and cortex had lemongrass tea, which, despite frantic assertions otherwise, are not the Sacred Imbibations of the Cabal.
    posted by Kattullus at 7:43 AM on August 20, 2009


    vacapinta: This analysis would be boring for me since I appear to be among the tiny minority here that uses favorites as bookmarks not as "I approve of this post/comment" but more of "I'll mark this to check back later or this may be useful to me."

    It would be fascinating to see if there were any marked differences.

    I'm seeing the same names here as in all the other lists. I think the reason is of course that the same list of users, more or less, are active in all the threads - Katullus or no Katullus.

    True. The only difference is that dersins has filthy light thief in his list whereas I have The Whelk.

    Both dersins and I comment quite a lot in MeTa, I suspect that this would look very different for people who aren't much in MeTa, especially people who mostly comment in AskMe.

    Conversely the "most active by percentage" lists only share two names, little e and double block and bleed.
    posted by Kattullus at 7:50 AM on August 20, 2009


    Actually I had cider and cortex had lemongrass tea

    I have just discovered about myself that I will use "get coffee" for "go to a coffeeshop" without blinking. Huh.
    posted by cortex (staff) at 8:58 AM on August 20, 2009


    cortex: " When Kattullus was in town for the 10th, we went and got coffee and he favorited me like three times. "

    desuetude: " Pics, plz. "

    SCANDALOUS
    posted by Rhaomi at 3:33 PM on August 20, 2009


    Ooh, ooh, do me, do me!
    posted by ocherdraco at 10:46 PM on August 19 [+] [!]



    Stats for:ocherdraco
    Who does ocherdraco favorite the most?
    (simple count of favorites)

    cortex [30]
    hermitosis [13]
    ColdChef [12]
    Astro Zombie [11]
    Optimus Chyme [11]
    Miko [11]
    EmpressCallipygos [10]
    jessamyn [10]
    BitterOldPunk [9]
    languagehat [9]

    Who does ocherdraco favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.87% (6 of 321) of Ian A.T.'s comments+posts
    1.06% (12 of 1135) of ColdChef's comments+posts
    0.95% (5 of 527) of zoomorphic's comments+posts
    0.94% (13 of 1379) of hermitosis's comments+posts
    0.68% (11 of 1606) of Optimus Chyme's comments+posts
    0.63% (6 of 957) of Jofus's comments+posts
    0.62% (9 of 1447) of BitterOldPunk's comments+posts
    0.42% (6 of 1442) of Greg Nog's comments+posts
    0.35% (6 of 1709) of Devils Rancher's comments+posts
    0.35% (10 of 2867) of EmpressCallipygos's comments+posts

    Who favorites ocherdraco the most?
    (simple count of favorites)

    Optimus Chyme [13]
    limeonaire [11]
    chicainthecity [8]
    deborah [8]
    bitter-girl.com [8]
    Caduceus [7]
    nickyskye [7]
    nicolin [7]
    JHarris [7]
    ShawnStruck [6]

    Who favorites ocherdraco the most?
    (percent of your comments+posts since they joined)

    Optimus Chyme: 1.20% (13 of 1084) of ocherdraco's comments+posts
    scrutiny: 1.13% (3 of 265) of ocherdraco's comments+posts
    The Biggest Dreamer: 1.10% (1 of 91) of ocherdraco's comments+posts
    stuck on an island: 1.10% (1 of 91) of ocherdraco's comments+posts
    pleasebekind: 1.10% (1 of 91) of ocherdraco's comments+posts
    Diagonalize: 1.10% (1 of 91) of ocherdraco's comments+posts
    Cuddo: 1.10% (1 of 91) of ocherdraco's comments+posts
    limeonaire: 1.01% (11 of 1084) of ocherdraco's comments+posts
    deborah: 0.74% (8 of 1084) of ocherdraco's comments+posts
    bitter-girl.com: 0.74% (8 of 1084) of ocherdraco's comments+posts

    Who are ocherdraco's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    ocherdraco [11] ---- [13] Optimus Chyme
    ocherdraco [9] ---- [6] languagehat
    ocherdraco [13] ---- [4] hermitosis
    ocherdraco [5] ---- [4] koeselitz
    ocherdraco [12] ---- [3] ColdChef
    ocherdraco [6] ---- [3] iamkimiam
    ocherdraco [4] ---- [3] mathowie
    ocherdraco [5] ---- [2] grapefruitmoon
    ocherdraco [4] ---- [2] carsonb
    ocherdraco [2] ---- [7] nickyskye

    Who are ocherdraco's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    ocherdraco [0.68%] ---- [1.20%] Optimus Chyme
    ocherdraco [0.84%] ---- [0.39%] fantine
    ocherdraco [0.94%] ---- [0.37%] hermitosis
    ocherdraco [0.33%] ---- [0.56%] peacheater
    ocherdraco [0.61%] ---- [0.29%] so_gracefully
    ocherdraco [0.76%] ---- [0.29%] JoeXIII007
    ocherdraco [1.06%] ---- [0.28%] ColdChef
    ocherdraco [0.32%] ---- [0.28%] kidsleepy
    ocherdraco [0.35%] ---- [0.28%] iamkimiam
    ocherdraco [0.27%] ---- [0.74%] deborah

    Of the threads where ocherdraco has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 14.2% [96 of 674]
    jessamyn: 13.8% [93 of 674]
    Brandon Blatcher: 12.6% [85 of 674]
    EmpressCallipygos: 11.8% [77 of 653]
    kathrineg: 11.0% [27 of 246]
    DU: 10.2% [69 of 674]
    quin: 9.8% [66 of 674]
    box: 9.1% [61 of 674]
    klangklangston: 8.9% [60 of 674]
    filthy light thief: 8.8% [57 of 647]

    Of the threads where other users have been active, in whose has ocherdraco also been the most active by percentage?
    (limited to threads active after ocherdraco has joined MetaFilter)

    y6y6y6: 15.9% [11 of 69]
    little e: 14.4% [20 of 139]
    rakish_yet_centered: 13.8% [8 of 58]
    yawper: 13.5% [10 of 74]
    Lush: 12.1% [7 of 58]
    JeffK: 11.9% [16 of 135]
    motsque: 11.8% [6 of 51]
    Captain Cardanthian!: 11.7% [13 of 111]
    eamondaly: 11.6% [8 of 69]
    double block and bleed: 11.3% [16 of 142]

    Who has favorited the same items as ocherdraco the most?

    deborah [92]
    scrump [88]
    koeselitz [67]
    misha [65]
    schyler523 [59]
    Optimus Chyme [56]
    blueberry [54]
    minifigs [53]
    twins named Lugubrious and Salubrious [53]
    shmegegge [52]

    posted by FishBike at 3:34 PM on August 20, 2009


    Any analysis of daily activity will find a weird bump or two today, especially the two hours without comments in the blue. Heh.
    posted by cortex (staff) at 3:57 PM on August 20, 2009


    FishBike, if you would do that analysis for me and post it here, I would be very thankful because that looks cool as heck.
    posted by Optimus Chyme at 10:12 AM on August 20 [+] [!]



    Stats for:Optimus Chyme
    Who does Optimus Chyme favorite the most?
    (simple count of favorites)

    Pope Guilty [113]
    cortex [66]
    Blazecock Pileon [65]
    zoomorphic [57]
    hermitosis [53]
    orthogonality [53]
    DU [46]
    delmoi [44]
    jessamyn [37]
    scody [35]

    Who does Optimus Chyme favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    9.36% (57 of 609) of zoomorphic's comments+posts
    4.12% (19 of 461) of Damn That Television's comments+posts
    4.08% (6 of 147) of scrutiny's comments+posts
    2.46% (113 of 4597) of Pope Guilty's comments+posts
    1.98% (7 of 354) of grounded's comments+posts
    1.93% (7 of 362) of VikingSword's comments+posts
    1.82% (6 of 329) of Benjy's comments+posts
    1.29% (53 of 4117) of hermitosis's comments+posts
    1.23% (6 of 488) of Dormant Gorilla's comments+posts
    1.20% (13 of 1084) of ocherdraco's comments+posts

    Who favorites Optimus Chyme the most?
    (simple count of favorites)

    Pope Guilty [175]
    blueberry [79]
    zoomorphic [78]
    tehloki [74]
    Blazecock Pileon [69]
    scrump [48]
    scody [45]
    axon [43]
    marble [41]
    Nattie [37]

    Who favorites Optimus Chyme the most?
    (percent of your comments+posts since they joined)

    Pope Guilty: 9.46% (175 of 1850) of Optimus Chyme's comments+posts
    scrutiny: 5.41% (8 of 148) of Optimus Chyme's comments+posts
    tehloki: 3.81% (74 of 1944) of Optimus Chyme's comments+posts
    Joe Beese: 3.36% (36 of 1073) of Optimus Chyme's comments+posts
    zoomorphic: 2.78% (78 of 2805) of Optimus Chyme's comments+posts
    Nattie: 2.44% (37 of 1514) of Optimus Chyme's comments+posts
    Reverend John: 2.08% (33 of 1583) of Optimus Chyme's comments+posts
    Blazecock Pileon: 1.94% (69 of 3558) of Optimus Chyme's comments+posts
    minifigs: 1.69% (22 of 1300) of Optimus Chyme's comments+posts
    Twicketface: 1.62% (26 of 1606) of Optimus Chyme's comments+posts

    Who are Optimus Chyme's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Optimus Chyme [113] ---- [175] Pope Guilty
    Optimus Chyme [65] ---- [69] Blazecock Pileon
    Optimus Chyme [57] ---- [78] zoomorphic
    Optimus Chyme [35] ---- [45] scody
    Optimus Chyme [44] ---- [29] delmoi
    Optimus Chyme [46] ---- [25] DU
    Optimus Chyme [53] ---- [24] orthogonality
    Optimus Chyme [20] ---- [20] klangklangston
    Optimus Chyme [20] ---- [22] sotonohito
    Optimus Chyme [18] ---- [21] shmegegge

    Who are Optimus Chyme's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Optimus Chyme [4.08%] ---- [5.41%] scrutiny
    Optimus Chyme [9.36%] ---- [2.78%] zoomorphic
    Optimus Chyme [2.46%] ---- [9.46%] Pope Guilty
    Optimus Chyme [1.96%] ---- [1.49%] thsmchnekllsfascists
    Optimus Chyme [0.92%] ---- [0.84%] HumanComplex
    Optimus Chyme [0.83%] ---- [1.29%] kathrineg
    Optimus Chyme [1.15%] ---- [0.82%] East Manitoba Regional Junior Kabaddi Champion '94
    Optimus Chyme [0.76%] ---- [1.01%] scrump
    Optimus Chyme [1.20%] ---- [0.68%] ocherdraco
    Optimus Chyme [3.23%] ---- [0.68%] Frobenius Twist

    Of the threads where Optimus Chyme has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    delmoi: 30.8% [814 of 2641]
    Joe Beese: 21.5% [129 of 599]
    DU: 21.4% [248 of 1161]
    Astro Zombie: 20.4% [389 of 1909]
    EmpressCallipygos: 18.3% [157 of 859]
    filthy light thief: 17.5% [116 of 663]
    klangklangston: 17.3% [443 of 2555]
    Blazecock Pileon: 17.2% [344 of 1999]
    five fresh fish: 16.6% [438 of 2641]
    Smedleyman: 16.6% [437 of 2640]

    Of the threads where other users have been active, in whose has Optimus Chyme also been the most active by percentage?
    (limited to threads active after Optimus Chyme has joined MetaFilter)

    bevets: 29.9% [20 of 67]
    Farengast: 24.0% [12 of 50]
    darukaru: 23.0% [67 of 291]
    slf: 22.1% [15 of 68]
    Reggie Knoble: 21.9% [14 of 64]
    all-seeing eye dog: 21.5% [28 of 130]
    jnaps: 21.3% [16 of 75]
    Combustible Edison Lighthouse: 20.5% [36 of 176]
    lord_wolf: 18.9% [78 of 413]
    mooncrow: 18.8% [22 of 117]

    Who has favorited the same items as Optimus Chyme the most?

    tehloki [446]
    Pope Guilty [404]
    blueberry [362]
    scrump [324]
    lalochezia [238]
    marble [236]
    schyler523 [227]
    nooneyouknow [205]
    JHarris [202]
    Nattie [201]
    posted by FishBike at 4:13 PM on August 20, 2009 [1 favorite]


    I was messing around more with the contact network stuff. Instead of looking at contacts in one direction, or either direction, I'm now looking at only mutual contacts and ignoring non-mutual contacts.

    Looked at this way:
    • There 4,095 users with at least 1 mutual contact.
    • 3,488 of us are on the big island that mathowie owns (by virtue of having the lowest userid on the island).
    • The next biggest island has only 7 users on it. Maybe this is the cabal?
    • There are only 51 islands with more than 2 users.
    • There are 268 islands in total, most with only 2 users.
    • Everyone other than those 4,095 users is adrift in a sea of mutual contactlessness.
    There are also a bunch of duplicate usernames in the usernames table (but all userids appear to be unique).
    posted by FishBike at 4:22 PM on August 20, 2009


    0.35% (6 of 1709) of Devils Rancher's comments+posts

    So it took a week of goosing & tweaking the stats every which-a-way for my name to finally show up in this thread. Thanks ocherdraco!
    posted by Devils Rancher at 4:25 PM on August 20, 2009


    I wonder if there are some good ways to gerrymander Matt's island. Every time I've looked at this kind of graph it tends to like like a big lump, but there's got to be some alternate views that lend to some kind of partitioning or other. Weighting plurality of contactdom by a few heavyweights on the big island, turning that into binary choices to create (fragile, semi-arbitrary) districts, etc.
    posted by cortex (staff) at 4:29 PM on August 20, 2009


    I wonder if there are some good ways to gerrymander Matt's island. Every time I've looked at this kind of graph it tends to like like a big lump, but there's got to be some alternate views that lend to some kind of partitioning or other. Weighting plurality of contactdom by a few heavyweights on the big island, turning that into binary choices to create (fragile, semi-arbitrary) districts, etc.

    When I generated the "degrees of mathowie" table based on mutual contacts, I filtered out redundant paths. So I can't use that to find the choke points in the network where disconnecting a few links would partition it. However, it was easy enough to generate the full set of network connections for the big island. I've been experimenting with graph drawing programs to see if there's any hope of visualizing this.

    Results so far are not encouraging. While there are a few users connected to the network via single paths, the middle of it is just a giant intractable mass of redundant connections. So far, it is not at all obvious how to analyze this. I also think there must be a way, though.
    posted by FishBike at 6:35 PM on August 20, 2009


    FishBike, have you considered running some of your queries for every user and outputting them to a file for all to see?

    Considered, yes, and rejected the idea for now.


    I asked because I didn't want to duplicate anything you were doing, but I may end up doing so. I don't know. We'll see.

    I am also finding it hard to follow all the analysis posted here. My scroll bar thingy is getting quite narrow.

    There are also a bunch of duplicate usernames in the usernames table (but all userids appear to be unique).

    A noticed a couple of seemingly identical user names, but one would have white space at the end. Perhaps this is the case?

    While there are a few users connected to the network via single paths, the middle of it is just a giant intractable mass of redundant connections.

    This is kinda why I asked about type of contact. But even without it, I think there could be ways to scale down the amount of noise, e.g., by the user's join date. Also, instead of expecting there to be easily distinguishable groupings connected to each other by relatively few contacts, I think, likely, there are groupings with 1000s of contacts connected to others by 100s of contacts. Obviously, more analysis is needed.
    posted by mathlete at 7:52 PM on August 20, 2009


    Thanks for running the numbers FishBike! Eentawesting. It's pretty obvious from those stats who I know in real life. But there were some surprises, too.

    Also: you're welcome, DevilsAdvocate.
    posted by ocherdraco at 8:11 PM on August 20, 2009


    FishBike, if you're not too overwhelmed, you can "do me" too :)
    posted by amyms at 8:23 PM on August 20, 2009


    I am an island.
    posted by iamkimiam at 8:38 PM on August 20, 2009


    I am a rock.
    posted by ocherdraco at 8:49 PM on August 20, 2009


    I am an island.

    No, youarekimyouare.
    posted by rtha at 8:52 PM on August 20, 2009


    A noticed a couple of seemingly identical user names, but one would have white space at the end. Perhaps this is the case?

    Possible other hijinks too, including unicode trickery using look-alike entities with different core values. That got plugged up eventually after some folks on IRC got overly creative with the opportunities, I think.
    posted by cortex (staff) at 8:56 PM on August 20, 2009


    Also: you're welcome, DevilsAdvocate.

    I see what you did there. :(
    posted by Devils Rancher at 8:58 PM on August 20, 2009


    I see what you did there. :( (Devils Rancher)

    *facepalm*

    Sorry, friend. You are welcome to call me otherdraco all you want, if it'll make up for it.

    posted by ocherdraco at 9:20 PM on August 20, 2009


    I've mis-read your username a "Ocahedro" since forever, if that helps to mitigate in any way.

    Hey, at least I like that DevilsAdvocate guy. He's alright!
    posted by Devils Rancher at 9:40 PM on August 20, 2009


    Ocahedro is a new one. I will add it to otherdraco, och-er-a-CHAY-do, et al.
    posted by ocherdraco at 10:04 PM on August 20, 2009


    Who does Optimus Chyme favorite the most?
    (simple count of favorites)

    Pope Guilty [113]
    cortex [66]
    Blazecock Pileon [65]
    zoomorphic [57]
    hermitosis [53]
    orthogonality [53]
    DU [46]
    delmoi [44]
    jessamyn [37]
    scody [35]


    heh heh BOY WHO WOULDA GUESSED?

    Thanks, FishBike, this is awesome.
    posted by Optimus Chyme at 6:59 AM on August 21, 2009


    I would be interested in the FishBike analysis, if you've got some free time.
    posted by Pope Guilty at 7:29 AM on August 21, 2009


    I haven't been following this whole thread (although I've done some searching) so apologies if I'm repeating a request. Can we get some sort of a dump of the Tenth Anniversary site? I'd like to figure out how many people attended their first meetup for the Tenth celebration, and the simplest estimate would be the number of people who commented/rsvpd for tenth meetups and had never commented in a "get-together"-categorized MetaTalk thread before.
    posted by Plutor at 8:44 AM on August 21, 2009


    Aw, this just confirms my suspicion that while Optimus Chyme likes me as a girlfriend, he'll probably still leave me for Pope Guilty.
    posted by zoomorphic at 1:35 PM on August 21, 2009 [3 favorites]


    Stats for:Pope Guilty
    Who does Pope Guilty favorite the most?
    (simple count of favorites)

    DU [224]
    Blazecock Pileon [218]
    Astro Zombie [177]
    Optimus Chyme [175]
    cortex [139]
    Artw [128]
    Avenger [114]
    Marisa Stole the Precious Thing [108]
    delmoi [104]
    jessamyn [95]

    Who does Pope Guilty favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    9.46% (175 of 1850) of Optimus Chyme's comments+posts
    7.14% (7 of 98) of mobunited's comments+posts
    6.74% (114 of 1692) of Avenger's comments+posts
    5.56% (7 of 126) of Project F's comments+posts
    5.41% (8 of 148) of Legomancer's comments+posts
    5.38% (5 of 93) of aihal's comments+posts
    4.70% (65 of 1383) of XQUZYPHYR's comments+posts
    4.62% (9 of 195) of graymouser's comments+posts
    4.32% (16 of 370) of Talez's comments+posts
    4.20% (5 of 119) of teece's comments+posts

    Who favorites Pope Guilty the most?
    (simple count of favorites)

    tehloki [221]
    axon [135]
    Optimus Chyme [113]
    JHarris [96]
    Blazecock Pileon [73]
    aeschenkarnos [63]
    DU [61]
    Marisa Stole the Precious Thing [59]
    MikeKD [53]
    ShawnStruck [52]

    Who favorites Pope Guilty the most?
    (percent of your comments+posts since they joined)

    tehloki: 4.81% (221 of 4597) of Pope Guilty's comments+posts
    axon: 2.94% (135 of 4597) of Pope Guilty's comments+posts
    Marisa Stole the Precious Thing: 2.70% (59 of 2187) of Pope Guilty's comments+posts
    Optimus Chyme: 2.46% (113 of 4597) of Pope Guilty's comments+posts
    dunkadunc: 2.27% (43 of 1895) of Pope Guilty's comments+posts
    JHarris: 2.09% (96 of 4597) of Pope Guilty's comments+posts
    Blazecock Pileon: 1.59% (73 of 4597) of Pope Guilty's comments+posts
    thsmchnekllsfascists: 1.39% (20 of 1438) of Pope Guilty's comments+posts
    aeschenkarnos: 1.37% (63 of 4597) of Pope Guilty's comments+posts
    liza: 1.37% (39 of 2855) of Pope Guilty's comments+posts

    Who are Pope Guilty's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Pope Guilty [175] ---- [113] Optimus Chyme
    Pope Guilty [218] ---- [73] Blazecock Pileon
    Pope Guilty [224] ---- [61] DU
    Pope Guilty [108] ---- [59] Marisa Stole the Precious Thing
    Pope Guilty [52] ---- [47] sotonohito
    Pope Guilty [114] ---- [46] Avenger
    Pope Guilty [77] ---- [45] five fresh fish
    Pope Guilty [44] ---- [42] nasreddin
    Pope Guilty [46] ---- [30] rokusan
    Pope Guilty [34] ---- [27] fleetmouse

    Who are Pope Guilty's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Pope Guilty [9.46%] ---- [2.46%] Optimus Chyme
    Pope Guilty [2.22%] ---- [2.70%] Marisa Stole the Precious Thing
    Pope Guilty [2.39%] ---- [1.59%] Blazecock Pileon
    Pope Guilty [1.96%] ---- [1.39%] thsmchnekllsfascists
    Pope Guilty [2.32%] ---- [1.33%] DU
    Pope Guilty [1.78%] ---- [1.25%] Joe Beese
    Pope Guilty [1.23%] ---- [1.33%] Caduceus
    Pope Guilty [3.92%] ---- [1.15%] MikeKD
    Pope Guilty [1.08%] ---- [2.27%] dunkadunc
    Pope Guilty [1.07%] ---- [2.09%] JHarris

    Of the threads where Pope Guilty has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    delmoi: 30.4% [661 of 2173]
    DU: 29.1% [632 of 2173]
    Blazecock Pileon: 27.5% [598 of 2173]
    quin: 26.0% [564 of 2173]
    Joe Beese: 24.4% [148 of 606]
    filthy light thief: 20.9% [150 of 719]
    Artw: 19.3% [419 of 2173]
    Astro Zombie: 19.1% [414 of 2173]
    Smedleyman: 18.6% [405 of 2173]
    Marisa Stole the Precious Thing: 18.0% [191 of 1063]

    Of the threads where other users have been active, in whose has Pope Guilty also been the most active by percentage?
    (limited to threads active after Pope Guilty has joined MetaFilter)

    mobunited: 35.8% [24 of 67]
    Gnostic Novelist: 33.0% [31 of 94]
    erikharmon: 31.7% [19 of 60]
    uri: 29.3% [29 of 99]
    rockhopper: 29.0% [27 of 93]
    ELF Radio: 28.8% [19 of 66]
    nofundy: 28.4% [19 of 67]
    quarter waters and a bag of chips: 27.8% [32 of 115]
    kuatto: 27.3% [15 of 55]
    Legomancer: 27.1% [36 of 133]

    Who has favorited the same items as Pope Guilty the most?

    tehloki [2131]
    scrump [823]
    JHarris [723]
    axon [694]
    schyler523 [683]
    blueberry [640]
    Blazecock Pileon [570]
    shmegegge [522]
    aeschenkarnos [512]
    nasreddin [499]


    posted by FishBike at 3:42 PM on August 21, 2009


    I was going to do some analysis of tags vs. favorites and comments last night, but I got distracted by the contact network visualization stuff, and then some tornadoes came along which were also distracting.

    So, I hope you folks will forgive the super-lengthy comment, because I think these results are sort of awesome.

    First, what do MeFites like? This is average number of favorites given to posts with the following key words, separately for each site. Limited to tags that have been used at least 10 times.
    MetaFilter
    51.20: 90s [512 favorites on 10 posts)
    41.09: cartoonnetwork [452 favorites on 11 posts)
    33.45: gladwell [368 favorites on 11 posts)
    28.30: recipe [934 favorites on 33 posts)
    27.36: Franklin [383 favorites on 14 posts)
    27.25: hplovecraft [327 favorites on 12 posts)
    27.08: freestuff [677 favorites on 25 posts)
    26.26: vegetarian [499 favorites on 19 posts)
    26.25: middleages [315 favorites on 12 posts)
    26.20: effects [262 favorites on 10 posts)
    26.09: lecture [600 favorites on 23 posts)
    25.92: lessons [337 favorites on 13 posts)
    25.88: thewire [621 favorites on 24 posts)
    25.72: cuisine [643 favorites on 25 posts)
    25.42: silentfilm [305 favorites on 12 posts)
    25.30: capitalmarkets [253 favorites on 10 posts)
    25.27: Excel [278 favorites on 11 posts)
    24.92: stephenfry [299 favorites on 12 posts)
    24.89: AdultSwim [448 favorites on 18 posts)
    24.81: palin [1042 favorites on 42 posts)

    Ask Metafilter
    103.18: introduction [1135 favorites on 11 posts)
    67.40: metafilterhistory [1348 favorites on 20 posts)
    55.08: materialism [716 favorites on 13 posts)
    53.93: luxury [809 favorites on 15 posts)
    36.73: frugality [404 favorites on 11 posts)
    32.67: sandwiches [392 favorites on 12 posts)
    30.24: non-fiction [514 favorites on 17 posts)
    29.00: thriller [319 favorites on 11 posts)
    28.75: mailorder [345 favorites on 12 posts)
    28.45: ingredients [569 favorites on 20 posts)
    28.36: catalogue [312 favorites on 11 posts)
    27.96: laziness [755 favorites on 27 posts)
    26.70: IndieRock [267 favorites on 10 posts)
    25.90: memories [803 favorites on 31 posts)
    24.93: slowcooker [349 favorites on 14 posts)
    24.62: erotic [320 favorites on 13 posts)
    24.55: sandwich [540 favorites on 22 posts)
    24.37: lifehacks [463 favorites on 19 posts)
    23.82: challenge [262 favorites on 11 posts)
    23.65: essays [615 favorites on 26 posts)

    MetaTalk
    20.30: bestof [203 favorites on 10 posts)
    12.25: best [196 favorites on 16 posts)
    10.00: memes [100 favorites on 10 posts)
    7.36: sexism [81 favorites on 11 posts)
    6.40: writing [64 favorites on 10 posts)
    6.06: history [109 favorites on 18 posts)
    5.98: metafilterhistory [365 favorites on 61 posts)
    5.94: wiki [101 favorites on 17 posts)
    5.49: threads [192 favorites on 35 posts)
    4.82: 10th [53 favorites on 11 posts)
    4.67: games [84 favorites on 18 posts)
    4.62: video [60 favorites on 13 posts)
    4.61: anniversary [83 favorites on 18 posts)
    4.23: death [55 favorites on 13 posts)
    3.75: jessamyn [75 favorites on 20 posts)
    3.60: google [108 favorites on 30 posts)
    3.57: swap [82 favorites on 23 posts)
    3.53: advice [53 favorites on 15 posts)
    3.50: mail [35 favorites on 10 posts)
    3.50: socialapps [35 favorites on 10 posts)

    Music
    45.25: mefi [543 favorites on 12 posts)
    37.07: metafilter [556 favorites on 15 posts)
    25.36: parody [279 favorites on 11 posts)
    19.81: nsfw [317 favorites on 16 posts)
    16.10: rpm [161 favorites on 10 posts)
    15.87: bgm [841 favorites on 53 posts)
    11.59: stevegoldberg [197 favorites on 17 posts)
    11.23: jawharp [146 favorites on 13 posts)
    10.96: violin [296 favorites on 27 posts)
    10.27: indiepop [113 favorites on 11 posts)
    10.25: banjo [738 favorites on 72 posts)
    9.67: French [116 favorites on 12 posts)
    9.50: nintendo [114 favorites on 12 posts)
    9.48: solo [569 favorites on 60 posts)
    9.25: requests [185 favorites on 20 posts)
    8.69: collaboration [226 favorites on 26 posts)
    8.56: trumpet [137 favorites on 16 posts)
    8.16: hiphop [465 favorites on 57 posts)
    8.10: alternative [170 favorites on 21 posts)
    7.92: bluegrass [206 favorites on 26 posts)

    Okay, so that's what MeFites appear to favorite a lot. But what do we like to comment on a lot? Here are the most commented-on (average comments per post) tags, for tags used 10 times or more:
    MetaFilter
    687.40: Sarah [6874 comments on 10 posts)
    630.83: VP [7570 comments on 12 posts)
    507.22: sarahpalin [11666 comments on 23 posts)
    471.85: Election2008 [12268 comments on 26 posts)
    452.40: palin [19001 comments on 42 posts)
    357.88: Biden [5726 comments on 16 posts)
    315.96: RNC [8531 comments on 27 posts)
    297.22: vicepresident [6836 comments on 23 posts)
    262.99: McCain [21039 comments on 80 posts)
    185.73: evangelicals [2786 comments on 15 posts)
    183.40: prop8 [1834 comments on 10 posts)
    178.73: barack [2681 comments on 15 posts)
    147.79: meth [3547 comments on 24 posts)
    145.42: metafilterhistory [10034 comments on 69 posts)
    139.90: snobbery [1399 comments on 10 posts)
    137.11: blue [3702 comments on 27 posts)
    137.10: CivilUnions [1371 comments on 10 posts)
    135.40: taser [2031 comments on 15 posts)
    135.33: atheist [2030 comments on 15 posts)
    131.21: AIG [1837 comments on 14 posts)

    Ask MetaFilter
    66.85: metafilterhistory [1337 comments on 20 posts)
    55.10: nickname [551 comments on 10 posts)
    54.64: nicknames [765 comments on 14 posts)
    52.95: viral [1006 comments on 19 posts)
    51.92: adultery [623 comments on 12 posts)
    51.33: affair [770 comments on 15 posts)
    50.50: grammer [505 comments on 10 posts)
    49.33: bullying [592 comments on 12 posts)
    48.92: materialism [636 comments on 13 posts)
    48.73: creepy [1949 comments on 40 posts)
    48.54: arguments [631 comments on 13 posts)
    48.13: atheist [1540 comments on 32 posts)
    48.02: cheating [4226 comments on 88 posts)
    48.00: rude [480 comments on 10 posts)
    47.64: insults [524 comments on 11 posts)
    46.92: atheism [1783 comments on 38 posts)
    46.30: witty [463 comments on 10 posts)
    45.18: inappropriate [768 comments on 17 posts)
    44.23: sexism [575 comments on 13 posts)
    44.00: betrayal [616 comments on 14 posts)

    MetaTalk
    265.27: sexism [2918 comments on 11 posts)
    148.71: frontpage [2082 comments on 14 posts)
    135.55: sockpuppets [1491 comments on 11 posts)
    131.14: obama [1836 comments on 14 posts)
    115.57: religion [1618 comments on 14 posts)
    109.36: contacts [2406 comments on 22 posts)
    103.95: metafilterhistory [6341 comments on 61 posts)
    99.64: language [1395 comments on 14 posts)
    96.79: obit [2710 comments on 28 posts)
    95.80: derail [958 comments on 10 posts)
    92.60: 2008 [926 comments on 10 posts)
    92.17: snark [1659 comments on 18 posts)
    91.35: troll [1553 comments on 17 posts)
    90.85: death [1181 comments on 13 posts)
    90.77: cortex [1180 comments on 13 posts)
    88.64: callouts [1241 comments on 14 posts)
    88.46: privacy [1150 comments on 13 posts)
    88.35: jessamyn [1767 comments on 20 posts)
    86.31: trolling [1381 comments on 16 posts)
    86.15: demographics [1120 comments on 13 posts)

    Music
    27.00: mefi [324 comments on 12 posts)
    24.13: metafilter [362 comments on 15 posts)
    18.45: parody [203 comments on 11 posts)
    15.56: nsfw [249 comments on 16 posts)
    13.40: rpm [134 comments on 10 posts)
    12.62: collaboration [328 comments on 26 posts)
    12.31: hallelujah [357 comments on 29 posts)
    12.09: bgm [641 comments on 53 posts)
    11.77: jawharp [153 comments on 13 posts)
    11.00: requests [220 comments on 20 posts)
    10.40: MeFiMuCover [104 comments on 10 posts)
    10.06: banjo [724 comments on 72 posts)
    9.71: NotInEnglish [466 comments on 48 posts)
    9.58: French [115 comments on 12 posts)
    9.57: VelvetUnderground [354 comments on 37 posts)
    9.52: percussion [257 comments on 27 posts)
    9.38: trumpet [150 comments on 16 posts)
    9.17: solo [550 comments on 60 posts)
    9.00: electricukulele [90 comments on 10 posts)
    8.92: bluegrass [232 comments on 26 posts)

    posted by FishBike at 3:42 PM on August 21, 2009 [4 favorites]


    Dear Ask MetaFilter:

    Which mailorder slowcooker should I order to make the most erotic sandwiches? And what ingredients should I use?

    (that ought to take care of trolling for favorites for a while...)
    posted by FishBike at 3:46 PM on August 21, 2009 [1 favorite]


    FishBike, are the keywords tags or are they extracted from the text?
    posted by pb (staff) at 3:55 PM on August 21, 2009


    Gotta be tags; we don't provide text in the dump. (Unless that's from a scrap on FishBike's part, I guess.)

    That's totally awesome, FishBike.
    posted by cortex (staff) at 3:59 PM on August 21, 2009


    pb, they're the tags. I wanted to do the same for words extracted from post titles, but I'm too lazy to try to parse them into separate words right now. I also wondered about whether titles are often only obliquely related to the subject of the actual post.

    Actually I started thinking about this from the point of view of the post titles, then remembered the tags are in the Infodump too. And probably much better represent what the post is really about, because human judgement has been used in applying the appropriate ones.
    posted by FishBike at 3:59 PM on August 21, 2009


    Those definitely look like tags; also, it's pretty obvious where certain outliers are throwing off the results.
    posted by dersins at 3:59 PM on August 21, 2009


    ahh thanks, very nice work. It'd be interesting to see how fast these lists change over time. It could serve as a MeFi Zeitgeist.
    posted by pb (staff) at 4:06 PM on August 21, 2009


    It'd be interesting to do a token comparison between tags and titles for posts. See where they intersect and where they don't.
    posted by cortex (staff) at 4:07 PM on August 21, 2009



    66.85: metafilterhistory [1337 comments on 20 posts)
    (FishBike)

    leet!
    posted by ocherdraco at 4:07 PM on August 21, 2009 [1 favorite]


    These tag-based queries are really fast to run, too, so if anybody wants to see different variations of these, I'd be happy to do that. Some possibilities: total number of favorites or comments (instead of average per post), restricting to a specific date range of posts (this sounds kinda Zeitgeist-y), longer than top 20 to see what else is buried under the political stuff...
    posted by FishBike at 4:12 PM on August 21, 2009


    It'd be interesting to do a token comparison between tags and titles for posts. See where they intersect and where they don't.

    Just for the blue, top 20 tags which have been used at least 10 times, in order of percentage of the time that the post title contains the same text as the tag name:
    84.62%: a
    63.64%: foster
    61.54%: salt
    61.11%: beard
    60.00%: pie
    60.00%: saint
    60.00%: mud
    60.00%: night
    56.25%: hat
    54.55%: Banksy
    54.55%: faq
    54.55%: hammer
    54.55%: hummer
    54.55%: Tintin
    54.17%: bacon
    53.85%: rain
    53.33%: super
    51.28%: face
    50.00%: bigfoot
    50.00%: self
    I was going to do the opposite end of this list, too, but there are 1,492 tags which have been used more than 10 times without ever appearing in the post title.

    But the following tags have been used 200 or more times, and yet appear in the post title only 0 or 1 times:
    0.00%: sciencefiction
    0.00%: NewYork
    0.00%: unitedstates
    0.00%: newyorktimes
    0.00%: middleeast
    0.00%: iraqwar
    0.00%: brokenlink
    0.00%: globalwarming
    0.00%: brokenlinks
    0.00%: GeorgeWBush
    0.00%: webdesign
    0.00%: deadlink
    0.18%: obituary
    0.19%: batshitinsane
    0.21%: 9-11
    0.21%: georgebush
    0.28%: gwb
    0.31%: newsfilter
    0.32%: nytimes
    0.45%: starwars


    The latter mainly looks like an artifact of tags being condensed to single words, whereas post titles contain spaces.
    posted by FishBike at 4:28 PM on August 21, 2009


    Metafilter: hammer hummer tintin bacon, super face bigfoot self.
    posted by cortex (staff) at 4:33 PM on August 21, 2009 [1 favorite]


    I can't wait to make my next post, entitled "Brokenlink Sciencefiction Newsfilter: GeorgeWBush Obituary Batshitinsane."
    posted by dersins at 4:35 PM on August 21, 2009


    Metafilter: hammer hummer tintin bacon, super face bigfoot self.

    Ok, that made me laugh. But do I want to know what a "foster salt beard pie" is? Because it sounds terrible.
    posted by FishBike at 4:35 PM on August 21, 2009


    foster salt beard pie. Australian for cunnilingus.
    posted by dersins at 4:38 PM on August 21, 2009 [3 favorites]


    FishBike, if you're not too overwhelmed, you can "do me" too :)
    posted by amyms at 11:23 PM on August 20


    (Not overwhelmed, it's a couple of minutes effort from me and the rest from the machine. What? You all have dirty minds!)

    Stats for:amyms
    Who does amyms favorite the most?
    (simple count of favorites)

    flapjax at midnite [26]
    nickyskye [25]
    madamjujujive [21]
    scody [14]
    Miko [13]
    Blazecock Pileon [11]
    Astro Zombie [11]
    ericb [9]
    fourcheesemac [9]
    EarBucket [9]

    Who does amyms favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.53% (21 of 1376) of madamjujujive's comments+posts
    1.04% (6 of 578) of melissa may's comments+posts
    0.67% (7 of 1038) of sleepy pete's comments+posts
    0.66% (8 of 1207) of maryh's comments+posts
    0.63% (5 of 795) of anastasiav's comments+posts
    0.59% (6 of 1013) of Secret Life of Gravy's comments+posts
    0.51% (9 of 1750) of EarBucket's comments+posts
    0.48% (25 of 5180) of nickyskye's comments+posts
    0.40% (14 of 3489) of scody's comments+posts
    0.39% (5 of 1275) of jonp72's comments+posts

    Who favorites amyms the most?
    (simple count of favorites)

    nickyskye [32]
    nicolin [31]
    tehloki [26]
    divabat [26]
    melorama [20]
    limeonaire [20]
    agregoli [20]
    Tennyson D'San [16]
    UbuRoivas [16]
    deborah [16]

    Who favorites amyms the most?
    (percent of your comments+posts since they joined)

    Joe Beese: 1.57% (8 of 509) of amyms's comments+posts
    nicolin: 1.06% (31 of 2925) of amyms's comments+posts
    Marisa Stole the Precious Thing: 0.95% (11 of 1156) of amyms's comments+posts
    nickyskye: 0.87% (32 of 3698) of amyms's comments+posts
    so_gracefully: 0.72% (2 of 276) of amyms's comments+posts
    inire: 0.72% (2 of 276) of amyms's comments+posts
    divabat: 0.70% (26 of 3698) of amyms's comments+posts
    tehloki: 0.70% (26 of 3698) of amyms's comments+posts
    LittleMissItneg: 0.65% (1 of 155) of amyms's comments+posts
    marchismo: 0.65% (1 of 155) of amyms's comments+posts

    Who are amyms's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    amyms [25] ---- [32] nickyskye
    amyms [21] ---- [12] madamjujujive
    amyms [26] ---- [7] flapjax at midnite
    amyms [7] ---- [11] Marisa Stole the Precious Thing
    amyms [9] ---- [7] fourcheesemac
    amyms [13] ---- [7] Miko
    amyms [6] ---- [11] loquacious
    amyms [7] ---- [6] sleepy pete
    amyms [6] ---- [8] psmealey
    amyms [5] ---- [8] grapefruitmoon

    Who are amyms's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    amyms [0.48%] ---- [0.87%] nickyskye
    amyms [3.03%] ---- [0.36%] dolca
    amyms [1.53%] ---- [0.32%] madamjujujive
    amyms [0.37%] ---- [0.32%] scblackman
    amyms [0.27%] ---- [0.35%] GaelFC
    amyms [0.30%] ---- [0.26%] [NOT HERMITOSIS-IST]
    amyms [0.29%] ---- [0.25%] prefpara
    amyms [0.79%] ---- [0.25%] Askiba
    amyms [0.23%] ---- [0.35%] MaryDellamorte
    amyms [0.50%] ---- [0.22%] Dillonlikescookies

    Of the threads where amyms has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 15.7% [390 of 2485]
    languagehat: 14.1% [351 of 2485]
    jessamyn: 13.7% [341 of 2485]
    quin: 13.3% [331 of 2485]
    flapjax at midnite: 12.4% [307 of 2485]
    The Whelk: 12.2% [60 of 492]
    Blazecock Pileon: 12.0% [297 of 2485]
    Joe Beese: 11.8% [46 of 391]
    Brandon Blatcher: 11.3% [280 of 2485]
    Astro Zombie: 11.2% [279 of 2485]

    Of the threads where other users have been active, in whose has amyms also been the most active by percentage?
    (limited to threads active after amyms has joined MetaFilter)

    maggieb: 19.6% [10 of 51]
    django_z: 18.8% [12 of 64]
    Duncan: 17.6% [13 of 74]
    bigbigdog: 16.4% [9 of 55]
    ltracey: 16.2% [12 of 74]
    MarvinTheCat: 16.1% [9 of 56]
    h00py: 16.0% [52 of 324]
    Miss Otis' Egrets: 15.8% [9 of 57]
    mmahaffie: 15.2% [25 of 165]
    event: 15.1% [8 of 53]

    Who has favorited the same items as amyms the most?

    blueberry [120]
    tehloki [110]
    deborah [82]
    madamjujujive [82]
    schyler523 [80]
    nickyskye [79]
    dejah420 [76]
    flibbertigibbet [72]
    LobsterMitten [72]
    scody [71]

    posted by FishBike at 5:34 PM on August 21, 2009 [1 favorite]


    O! O! O! me next! me next!
    posted by The Whelk at 5:37 PM on August 21, 2009


    Stats for:The Whelk
    Who does The Whelk favorite the most?
    (simple count of favorites)

    Artw [15]
    not_on_display [9]
    Astro Zombie [9]
    Joe Beese [8]
    Pastabagel [8]
    tkchrist [8]
    jb [8]
    jessamyn [8]
    cortex [8]
    Marisa Stole the Precious Thing [7]

    Who does The Whelk favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.10% (8 of 727) of Pastabagel's comments+posts
    0.99% (8 of 805) of jb's comments+posts
    0.95% (7 of 737) of Greg Nog's comments+posts
    0.89% (5 of 560) of Lipstick Thespian's comments+posts
    0.80% (7 of 875) of dirtynumbangelboy's comments+posts
    0.76% (7 of 921) of JHarris's comments+posts
    0.69% (7 of 1014) of Navelgazer's comments+posts
    0.67% (8 of 1203) of tkchrist's comments+posts
    0.59% (6 of 1022) of hermitosis's comments+posts
    0.55% (9 of 1646) of not_on_display's comments+posts

    Who favorites The Whelk the most?
    (simple count of favorites)

    JHarris [59]
    Joe Beese [44]
    liza [43]
    tehloki [43]
    Artw [34]
    nasreddin [32]
    Pope Guilty [31]
    deborah [29]
    misha [29]
    darkstar [28]

    Who favorites The Whelk the most?
    (percent of your comments+posts since they joined)

    JHarris: 2.04% (59 of 2887) of The Whelk's comments+posts
    Joe Beese: 1.91% (44 of 2308) of The Whelk's comments+posts
    tehloki: 1.49% (43 of 2887) of The Whelk's comments+posts
    liza: 1.49% (43 of 2887) of The Whelk's comments+posts
    polymodus: 1.29% (2 of 155) of The Whelk's comments+posts
    Artw: 1.18% (34 of 2887) of The Whelk's comments+posts
    nasreddin: 1.11% (32 of 2887) of The Whelk's comments+posts
    Pope Guilty: 1.07% (31 of 2887) of The Whelk's comments+posts
    deborah: 1.00% (29 of 2887) of The Whelk's comments+posts
    misha: 1.00% (29 of 2887) of The Whelk's comments+posts

    Who are The Whelk's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    The Whelk [15] ---- [34] Artw
    The Whelk [9] ---- [21] not_on_display
    The Whelk [8] ---- [44] Joe Beese
    The Whelk [7] ---- [59] JHarris
    The Whelk [7] ---- [23] dirtynumbangelboy
    The Whelk [7] ---- [9] Navelgazer
    The Whelk [7] ---- [20] Marisa Stole the Precious Thing
    The Whelk [6] ---- [11] Kattullus
    The Whelk [6] ---- [9] hermitosis
    The Whelk [6] ---- [27] Blazecock Pileon

    Who are The Whelk's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    The Whelk [1.06%] ---- [0.94%] LMGM
    The Whelk [0.80%] ---- [0.80%] dirtynumbangelboy
    The Whelk [0.76%] ---- [2.04%] JHarris
    The Whelk [0.70%] ---- [0.83%] minifigs
    The Whelk [0.55%] ---- [0.73%] not_on_display
    The Whelk [0.54%] ---- [1.11%] nasreddin
    The Whelk [0.44%] ---- [0.62%] Rock Steady
    The Whelk [2.63%] ---- [0.43%] Philby
    The Whelk [0.73%] ---- [0.42%] MrVisible
    The Whelk [0.89%] ---- [0.39%] Lipstick Thespian

    Of the threads where The Whelk has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    DU: 31.9% [441 of 1382]
    Joe Beese: 26.5% [291 of 1099]
    quin: 20.5% [283 of 1382]
    Astro Zombie: 20.0% [277 of 1382]
    Blazecock Pileon: 19.5% [269 of 1382]
    Artw: 18.5% [255 of 1382]
    filthy light thief: 18.4% [247 of 1342]
    Brandon Blatcher: 17.2% [238 of 1382]
    delmoi: 17.1% [237 of 1382]
    Marisa Stole the Precious Thing: 17.1% [236 of 1382]

    Of the threads where other users have been active, in whose has The Whelk also been the most active by percentage?
    (limited to threads active after The Whelk has joined MetaFilter)

    leftcoastbob: 34.0% [17 of 50]
    kanewai: 31.0% [18 of 58]
    octobersurprise: 31.0% [48 of 155]
    lord_wolf: 30.8% [20 of 65]
    h00py: 30.6% [34 of 111]
    flipyourwig: 29.7% [22 of 74]
    Mr. Bad Example: 29.0% [40 of 138]
    Servo5678: 28.6% [16 of 56]
    vbfg: 28.0% [21 of 75]
    An Infinity Of Monkeys: 27.6% [37 of 134]

    Who has favorited the same items as The Whelk the most?

    tehloki [76]
    JHarris [69]
    nasreddin [62]
    Caduceus [60]
    Pope Guilty [55]
    lalochezia [49]
    deborah [48]
    Marisa Stole the Precious Thing [48]
    minifigs [47]
    burnmp3s [46]


    posted by FishBike at 5:51 PM on August 21, 2009


    Ah, just as I thought. I'm downright cliquish.
    posted by The Whelk at 5:58 PM on August 21, 2009


    FishBike: "First, what do MeFites like? This is average number of favorites given to posts with the following key words, separately for each site. Limited to tags that have been used at least 10 times.
    MetaFilter
    51.20: 90s [512 favorites on 10 posts)
    41.09: cartoonnetwork [452 favorites on 11 posts)
    "
    Oh man. Something tells me that a certain epic post I'm planning for next month is going to be pretty well-received.

    PS: As long as you're taking requests, FishBike, I'd like to see my mutual activity stats too. Mainly so I know who to stop favoriting in order to balance my Favorites Feng Shui.
    posted by Rhaomi at 6:00 PM on August 21, 2009


    Stats for:Rhaomi
    Who does Rhaomi favorite the most?
    (simple count of favorites)

    homunculus [15]
    Astro Zombie [12]
    ericb [11]
    empath [11]
    Pastabagel [9]
    East Manitoba Regional Junior Kabaddi Champion '94 [9]
    ColdChef [9]
    allkindsoftime [8]
    orthogonality [8]
    WCityMike [8]

    Who does Rhaomi favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.74% (6 of 807) of Shepherd's comments+posts
    0.74% (9 of 1220) of ColdChef's comments+posts
    0.72% (8 of 1112) of allkindsoftime's comments+posts
    0.69% (9 of 1306) of East Manitoba Regional Junior Kabaddi Champion '94's comments+posts
    0.58% (5 of 866) of XQUZYPHYR's comments+posts
    0.51% (9 of 1775) of Pastabagel's comments+posts
    0.50% (6 of 1205) of robocop is bleeding's comments+posts
    0.46% (7 of 1530) of Greg Nog's comments+posts
    0.38% (8 of 2102) of WCityMike's comments+posts
    0.35% (5 of 1419) of EarBucket's comments+posts

    Who favorites Rhaomi the most?
    (simple count of favorites)

    JHarris [29]
    tehloki [28]
    blueberry [28]
    flatluigi [28]
    ShawnStruck [26]
    Pope Guilty [20]
    liza [18]
    Caduceus [16]
    iamkimiam [15]
    Nattie [15]

    Who favorites Rhaomi the most?
    (percent of your comments+posts since they joined)

    This Guy: 2.73% (3 of 110) of Rhaomi's comments+posts
    JHarris: 1.93% (29 of 1503) of Rhaomi's comments+posts
    blueberry: 1.86% (28 of 1503) of Rhaomi's comments+posts
    tehloki: 1.86% (28 of 1503) of Rhaomi's comments+posts
    flatluigi: 1.86% (28 of 1503) of Rhaomi's comments+posts
    ShawnStruck: 1.73% (26 of 1503) of Rhaomi's comments+posts
    Pope Guilty: 1.33% (20 of 1503) of Rhaomi's comments+posts
    liza: 1.23% (18 of 1463) of Rhaomi's comments+posts
    Potomac Avenue: 1.16% (10 of 862) of Rhaomi's comments+posts
    filthy light thief: 1.11% (8 of 720) of Rhaomi's comments+posts

    Who are Rhaomi's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Rhaomi [9] ---- [10] East Manitoba Regional Junior Kabaddi Champion '94
    Rhaomi [7] ---- [13] Marisa Stole the Precious Thing
    Rhaomi [6] ---- [12] Artw
    Rhaomi [5] ---- [8] Kattullus
    Rhaomi [9] ---- [5] ColdChef
    Rhaomi [6] ---- [5] Shepherd
    Rhaomi [5] ---- [7] scody
    Rhaomi [5] ---- [8] delmoi
    Rhaomi [5] ---- [13] EarBucket
    Rhaomi [5] ---- [10] goodnewsfortheinsane

    Who are Rhaomi's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Rhaomi [0.84%] ---- [1.73%] ShawnStruck
    Rhaomi [0.69%] ---- [0.77%] shadytrees
    Rhaomi [0.68%] ---- [0.69%] litterateur
    Rhaomi [0.69%] ---- [0.67%] East Manitoba Regional Junior Kabaddi Champion '94
    Rhaomi [1.05%] ---- [0.53%] kosher_jenny
    Rhaomi [0.57%] ---- [0.51%] Captain Cardanthian!
    Rhaomi [0.49%] ---- [0.60%] crossoverman
    Rhaomi [0.88%] ---- [0.47%] chrismear
    Rhaomi [0.41%] ---- [0.67%] madamjujujive
    Rhaomi [0.41%] ---- [0.60%] scrump

    Of the threads where Rhaomi has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    DU: 25.8% [264 of 1022]
    quin: 24.1% [246 of 1022]
    Blazecock Pileon: 20.1% [205 of 1022]
    filthy light thief: 17.8% [101 of 566]
    The Whelk: 17.2% [105 of 612]
    Joe Beese: 17.0% [75 of 440]
    delmoi: 16.9% [173 of 1022]
    cortex: 16.3% [167 of 1022]
    hippybear: 15.6% [37 of 237]
    Astro Zombie: 14.9% [152 of 1022]

    Of the threads where other users have been active, in whose has Rhaomi also been the most active by percentage?
    (limited to threads active after Rhaomi has joined MetaFilter)

    leftcoastbob: 30.7% [23 of 75]
    vac2003: 20.0% [13 of 65]
    JoeXIII007: 19.6% [22 of 112]
    joedan: 19.4% [12 of 62]
    adamt: 19.0% [12 of 63]
    thewittyname: 18.2% [16 of 88]
    effwerd: 18.2% [28 of 154]
    hellbient: 16.7% [10 of 60]
    Combustible Edison Lighthouse: 15.4% [27 of 175]
    Surfurrus: 15.4% [14 of 91]

    Who has favorited the same items as Rhaomi the most?

    JHarris [142]
    schyler523 [121]
    tehloki [114]
    flibbertigibbet [113]
    limeonaire [108]
    graventy [105]
    BrotherCaine [100]
    iamkimiam [98]
    scrump [98]
    deborah [97]

    posted by FishBike at 6:13 PM on August 21, 2009


    Who's got two thumbs and favorites me the most (as a percentage of my contributions since they joined)? THIS GUY!

    Thanks, FishBike, this is great stuff. Apparently me and JHarris are of the same mind about a lot of things, since he both faves a lot of my stuff and a lot of the same stuff I do.

    posted by Rhaomi at 6:55 PM on August 21, 2009


    Thanks for doing my stats, FishBike!
    posted by amyms at 7:22 PM on August 21, 2009


    Can I ride?
    posted by contraption at 7:50 PM on August 21, 2009


    Thank you FishBike! That was interesting (though not terribly surprising).
    posted by Pope Guilty at 8:48 PM on August 21, 2009


    Okay, I've been watching all this scroll by in recent activity and I'm still riveted. I thought I didn't care about my stats but I was wrong.

    I'd love to see my mutual activity stuff, too, please, if you have time/inclination. Thanks!
    posted by rtha at 10:56 PM on August 21, 2009


    Does anyone else sing the Naaaaame Game every time someone is like oh, do me!

    (e)rtha-e)rtha-bo-bertha-bananafanafofertha-me-mi-mo-(e)rtha-(e)rrrrrr-tha!
    whelk-whelk-bo-belk-bananafanafofelk-me-mi-mo-melk. whellll-ellk!
    contraption-contraption-bo-paption-bananafanafofaption-me-mi-mo-maption. contraaaaaaa-ption!
    posted by desuetude at 11:09 PM on August 21, 2009


    Can I ride?
    posted by contraption at 10:50 PM on August 21


    Sure!

    Stats for:contraption
    Who does contraption favorite the most?
    (simple count of favorites)

    Ambrosia Voyeur [73]
    loquacious [22]
    adipocere [11]
    hermitosis [10]
    Astro Zombie [9]
    jessamyn [9]
    cortex [8]
    ikkyu2 [5]
    Burhanistan [5]
    rtha [4]

    Who does contraption favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.22% (73 of 5991) of Ambrosia Voyeur's comments+posts
    0.60% (22 of 3656) of loquacious's comments+posts
    0.48% (11 of 2285) of adipocere's comments+posts
    0.32% (10 of 3167) of hermitosis's comments+posts
    0.17% (5 of 3014) of ikkyu2's comments+posts
    0.14% (9 of 6494) of Astro Zombie's comments+posts
    0.09% (9 of 9592) of jessamyn's comments+posts
    0.07% (5 of 7564) of Burhanistan's comments+posts
    0.05% (8 of 15688) of cortex's comments+posts

    Who favorites contraption the most?
    (simple count of favorites)

    Ambrosia Voyeur [18]
    tehloki [11]
    DevilsAdvocate [6]
    Rock Steady [4]
    grouse [4]
    grobstein [4]
    loquacious [3]
    divabat [3]
    rtha [3]
    melorama [3]

    Who favorites contraption the most?
    (percent of your comments+posts since they joined)

    Ambrosia Voyeur: 2.55% (18 of 706) of contraption's comments+posts
    tehloki: 1.56% (11 of 706) of contraption's comments+posts
    MesoFilter: 0.97% (1 of 103) of contraption's comments+posts
    ejazen: 0.97% (1 of 103) of contraption's comments+posts
    Think_Long: 0.97% (1 of 103) of contraption's comments+posts
    DevilsAdvocate: 0.85% (6 of 706) of contraption's comments+posts
    fantine: 0.65% (1 of 155) of contraption's comments+posts
    Joe Beese: 0.65% (1 of 155) of contraption's comments+posts
    quosimosaur: 0.65% (1 of 155) of contraption's comments+posts
    mumstheword: 0.63% (2 of 315) of contraption's comments+posts

    Who are contraption's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    contraption [73] ---- [18] Ambrosia Voyeur
    contraption [22] ---- [3] loquacious
    contraption [4] ---- [3] rtha
    contraption [3] ---- [2] mudpuppie
    contraption [3] ---- [2] klangklangston
    contraption [1] ---- [1] mek
    contraption [1] ---- [1] felix
    contraption [1] ---- [1] languagehat
    contraption [1] ---- [1] Kwine
    contraption [1] ---- [4] grouse

    Who are contraption's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    contraption [1.22%] ---- [2.55%] Ambrosia Voyeur
    contraption [0.60%] ---- [0.42%] loquacious
    contraption [0.25%] ---- [0.28%] mudpuppie
    contraption [0.16%] ---- [0.27%] Countess Elena
    contraption [0.19%] ---- [0.16%] Atom Eyes
    contraption [0.20%] ---- [0.14%] swift
    contraption [0.37%] ---- [0.14%] felix
    contraption [0.16%] ---- [0.14%] mek
    contraption [0.17%] ---- [0.14%] scrump
    contraption [0.28%] ---- [0.14%] Cat Pie Hurts

    Of the threads where contraption has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    Ambrosia Voyeur: 28.9% [143 of 495]
    turgid dahlia: 11.6% [25 of 215]
    klangklangston: 10.7% [53 of 495]
    jessamyn: 10.5% [52 of 495]
    cortex: 10.3% [51 of 495]
    Joe Beese: 10.0% [10 of 100]
    not_on_display: 9.8% [30 of 306]
    quin: 9.7% [48 of 495]
    mr_crash_davis mark II: Jazz Odyssey: 9.1% [6 of 66]
    Blazecock Pileon: 8.5% [42 of 495]

    Of the threads where other users have been active, in whose has contraption also been the most active by percentage?
    (limited to threads active after contraption has joined MetaFilter)

    tatiana wishbone: 7.4% [5 of 68]
    wzcx: 6.8% [8 of 118]
    bobot: 6.3% [4 of 64]
    lilnemo: 6.0% [3 of 50]
    theoddball: 6.0% [4 of 67]
    vaportrail: 5.9% [3 of 51]
    speicus: 5.8% [8 of 139]
    Gilbert: 5.7% [3 of 53]
    quintessencesluglord: 5.6% [4 of 71]
    flod logic: 5.6% [5 of 89]

    Who has favorited the same items as contraption the most?

    scrump [59]
    tehloki [51]
    Ambrosia Voyeur [40]
    deborah [36]
    flibbertigibbet [35]
    Pope Guilty [35]
    schyler523 [35]
    lalochezia [34]
    loquacious [32]
    nasreddin [28]


    posted by FishBike at 5:46 AM on August 22, 2009


    rtha: I thought I didn't care about my stats but I was wrong.


    Now you understand the power of the Dark Side...


    Incidentally, there's some odd stuff in the "who favorites rtha the most (by percentage)..." section that maybe needs some explaining. The comment+post count since the other user joined MetaFilter is an approximation—it's actually the count since the month when they joined, since it's too slow to figure out this count vs. every other user.

    So that's why a bunch of recent users all have 132 as the count of posts+comments you've made since they joined. That's not many, and since they've favorited one or two things, they show up high on the list. If you like, I could try to re-run that section with some additional criteria (like only showing users who've given you 5+ favorites).

    Stats for:rtha
    Who does rtha favorite the most?
    (simple count of favorites)

    jessamyn [89]
    cortex [80]
    Miko [57]
    languagehat [49]
    loquacious [45]
    Smedleyman [43]
    Astro Zombie [42]
    scody [36]
    Marisa Stole the Precious Thing [35]
    Ambrosia Voyeur [34]

    Who does rtha favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    5.05% (10 of 198) of DaShiv's comments+posts
    2.76% (22 of 796) of gingerbeer's comments+posts
    1.63% (19 of 1163) of Dee Xtrovert's comments+posts
    1.50% (7 of 468) of kyrademon's comments+posts
    1.43% (8 of 559) of winna's comments+posts
    1.38% (24 of 1742) of mrzarquon's comments+posts
    1.30% (17 of 1306) of padraigin's comments+posts
    1.19% (8 of 673) of melissa may's comments+posts
    1.03% (5 of 487) of ooga_booga's comments+posts
    1.01% (7 of 690) of mattdidthat's comments+posts

    Who favorites rtha the most?
    (simple count of favorites)

    Pope Guilty [39]
    tehloki [38]
    languagehat [33]
    scrump [29]
    deborah [28]
    marble [28]
    klangklangston [26]
    Miko [24]
    DevilsAdvocate [23]
    blueberry [21]

    Who favorites rtha the most?
    (percent of your comments+posts since they joined)

    Tsuga: 1.52% (2 of 132) of rtha's comments+posts
    Blue Jello Elf: 1.26% (4 of 317) of rtha's comments+posts
    Lexica: 1.16% (8 of 688) of rtha's comments+posts
    Pope Guilty: 0.88% (39 of 4415) of rtha's comments+posts
    tehloki: 0.83% (38 of 4602) of rtha's comments+posts
    Tooty McTootsalot: 0.76% (1 of 132) of rtha's comments+posts
    Wordwoman: 0.76% (1 of 132) of rtha's comments+posts
    Diagonalize: 0.76% (1 of 132) of rtha's comments+posts
    zer0render: 0.76% (1 of 132) of rtha's comments+posts
    Marisa Stole the Precious Thing: 0.73% (18 of 2461) of rtha's comments+posts

    Who are rtha's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    rtha [49] ---- [33] languagehat
    rtha [26] ---- [39] Pope Guilty
    rtha [57] ---- [24] Miko
    rtha [45] ---- [19] loquacious
    rtha [35] ---- [18] Marisa Stole the Precious Thing
    rtha [36] ---- [18] scody
    rtha [16] ---- [26] klangklangston
    rtha [34] ---- [16] Ambrosia Voyeur
    rtha [14] ---- [15] LobsterMitten
    rtha [14] ---- [18] Blazecock Pileon

    Who are rtha's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    rtha [0.72%] ---- [0.73%] Marisa Stole the Precious Thing
    rtha [2.44%] ---- [0.63%] fraula
    rtha [0.93%] ---- [0.63%] scrump
    rtha [0.57%] ---- [0.88%] Pope Guilty
    rtha [0.52%] ---- [0.71%] languagehat
    rtha [0.98%] ---- [0.52%] Miko
    rtha [0.86%] ---- [0.41%] loquacious
    rtha [0.96%] ---- [0.39%] scody
    rtha [1.00%] ---- [0.39%] h00py
    rtha [0.39%] ---- [0.35%] velvet winter

    Of the threads where rtha has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 19.5% [543 of 2787]
    Blazecock Pileon: 19.1% [532 of 2787]
    jessamyn: 18.9% [528 of 2787]
    quin: 18.6% [519 of 2787]
    Brandon Blatcher: 18.2% [508 of 2787]
    languagehat: 16.8% [467 of 2787]
    DU: 16.7% [453 of 2718]
    ericb: 15.5% [432 of 2787]
    klangklangston: 14.1% [393 of 2787]
    Astro Zombie: 14.1% [393 of 2787]

    Of the threads where other users have been active, in whose has rtha also been the most active by percentage?
    (limited to threads active after rtha has joined MetaFilter)

    gingerbeer: 31.1% [171 of 550]
    rjs: 27.8% [22 of 79]
    obloquy: 26.7% [20 of 75]
    shiu mai baby: 26.3% [55 of 209]
    every_one_needs_a_hug_sometimes: 25.4% [16 of 63]
    ArmyOfKittens: 24.4% [22 of 90]
    cybercoitus interruptus: 23.6% [65 of 276]
    double block and bleed: 23.2% [33 of 142]
    merelyglib: 22.8% [21 of 92]
    subbes: 22.6% [53 of 234]

    Who has favorited the same items as rtha the most?

    scrump [326]
    tehloki [272]
    schyler523 [254]
    Pope Guilty [245]
    deborah [243]
    scody [218]
    lalochezia [209]
    flibbertigibbet [200]
    blueberry [198]
    nooneyouknow [197]


    posted by FishBike at 6:24 AM on August 22, 2009 [1 favorite]


    Some more fun with the tags now. I thought it might be fun to review what MeFites have been talking about on the blue for the past 10 years. So here are the tags with the most comments associated with them, by year. I've excluded "brokenlink" since it really has nothing to do with the subject of the posts, and tends to be in the #1 position for posts more than about 5 years old.
    1999

    webdesign: 17 comments on 18 posts
    internet: 15 comments on 21 posts
    redesign: 13 comments on 9 posts
    Mac: 9 comments on 8 posts
    search: 9 comments on 8 posts
    software: 7 comments on 10 posts
    apple: 7 comments on 9 posts
    MattDabrowski: 7 comments on 7 posts
    design: 7 comments on 7 posts
    shopping: 6 comments on 8 posts

    2000

    politics: 3363 comments on 287 posts
    metafilterhistory: 1962 comments on 21 posts
    usa: 1737 comments on 149 posts
    blogs: 1695 comments on 102 posts
    election: 1693 comments on 144 posts
    webdesign: 1532 comments on 123 posts
    internet: 1504 comments on 180 posts
    algore: 1373 comments on 102 posts
    elections: 1351 comments on 119 posts
    bush: 1232 comments on 83 posts

    2001

    9-11: 6953 comments on 303 posts
    Politics: 6655 comments on 305 posts
    terrorism: 5553 comments on 267 posts
    music: 5297 comments on 253 posts
    usa: 5091 comments on 225 posts
    911: 4705 comments on 219 posts
    Bush: 4589 comments on 171 posts
    metafilterhistory: 3892 comments on 22 posts
    war: 3720 comments on 136 posts
    Television: 3668 comments on 175 posts

    2002

    music: 9461 comments on 373 posts
    politics: 9050 comments on 312 posts
    terrorism: 6306 comments on 211 posts
    war: 5712 comments on 195 posts
    Iraq: 5459 comments on 149 posts
    Bush: 5167 comments on 137 posts
    usa: 4778 comments on 168 posts
    religion: 4334 comments on 131 posts
    television: 3817 comments on 150 posts
    Art: 3730 comments on 252 posts

    2003

    iraq: 14442 comments on 371 posts
    politics: 10073 comments on 308 posts
    war: 9489 comments on 274 posts
    music: 8690 comments on 358 posts
    bush: 6827 comments on 170 posts
    USA: 5420 comments on 159 posts
    Iraqwar: 5322 comments on 142 posts
    georgewbush: 4168 comments on 86 posts
    art: 4125 comments on 334 posts
    religion: 3799 comments on 99 posts

    2004

    politics: 17197 comments on 457 posts
    bush: 11597 comments on 236 posts
    iraq: 10615 comments on 269 posts
    music: 8802 comments on 410 posts
    war: 6694 comments on 163 posts
    election: 6427 comments on 143 posts
    georgebush: 6088 comments on 110 posts
    religion: 6018 comments on 134 posts
    USA: 5711 comments on 163 posts
    JohnKerry: 4827 comments on 72 posts

    2005

    iraq: 16671 comments on 286 posts
    politics: 16016 comments on 311 posts
    bush: 14111 comments on 198 posts
    music: 12930 comments on 477 posts
    war: 10597 comments on 202 posts
    katrina: 9737 comments on 142 posts
    Religion: 8914 comments on 147 posts
    art: 8762 comments on 494 posts
    science: 8195 comments on 235 posts
    terrorism: 6448 comments on 106 posts

    2006

    politics: 17098 comments on 301 posts
    music: 16189 comments on 506 posts
    Bush: 15105 comments on 201 posts
    iraq: 13937 comments on 221 posts
    war: 10673 comments on 191 posts
    video: 10603 comments on 299 posts
    art: 10128 comments on 493 posts
    batshitinsane: 9778 comments on 140 posts
    religion: 8134 comments on 133 posts
    youtube: 8055 comments on 195 posts

    2007

    music: 25556 comments on 682 posts
    youtube: 17031 comments on 405 posts
    politics: 15758 comments on 252 posts
    art: 11080 comments on 522 posts
    iraq: 10464 comments on 198 posts
    war: 9962 comments on 196 posts
    video: 9867 comments on 301 posts
    batshitinsane: 8404 comments on 125 posts
    Photography: 7479 comments on 277 posts
    science: 7310 comments on 200 posts

    2008

    politics: 29059 comments on 275 posts
    Obama: 27053 comments on 164 posts
    election: 24560 comments on 128 posts
    music: 21341 comments on 650 posts
    McCain: 19837 comments on 63 posts
    Palin: 16799 comments on 33 posts
    art: 11596 comments on 504 posts
    election2008: 11532 comments on 19 posts
    youtube: 10318 comments on 294 posts
    video: 9456 comments on 261 posts

    2009

    music: 10403 comments on 316 posts
    obama: 9688 comments on 113 posts
    politics: 9229 comments on 126 posts
    art: 7399 comments on 300 posts
    science: 6907 comments on 148 posts
    youtube: 6345 comments on 141 posts
    video: 5813 comments on 159 posts
    food: 5579 comments on 85 posts
    photography: 5230 comments on 184 posts
    film: 4963 comments on 127 posts

    posted by FishBike at 6:38 AM on August 22, 2009 [1 favorite]


    And now the same thing for favorites, which look like they took off in 2006 (although there are small numbers of them for posts back to 1999). Since there are only 4 years to show, I'll make these top-20 lists instead.
    2006

    music: 2495 favorites on 506 posts
    art: 1950 favorites on 493 posts
    video: 1296 favorites on 299 posts
    history: 902 favorites on 229 posts
    youtube: 851 favorites on 195 posts
    photography: 826 favorites on 258 posts
    science: 673 favorites on 192 posts
    film: 543 favorites on 145 posts
    TV: 452 favorites on 67 posts
    books: 440 favorites on 89 posts
    animation: 410 favorites on 90 posts
    literature: 405 favorites on 64 posts
    flash: 376 favorites on 188 posts
    games: 370 favorites on 122 posts
    illustration: 366 favorites on 50 posts
    streaming: 354 favorites on 11 posts
    comics: 352 favorites on 100 posts
    war: 339 favorites on 191 posts
    poetry: 322 favorites on 56 posts
    philosophy: 318 favorites on 40 posts

    2007

    music: 9175 favorites on 682 posts
    art: 4986 favorites on 522 posts
    youtube: 4595 favorites on 405 posts
    Photography: 3283 favorites on 277 posts
    video: 3180 favorites on 301 posts
    history: 3024 favorites on 250 posts
    film: 2102 favorites on 172 posts
    science: 2035 favorites on 200 posts
    games: 1730 favorites on 110 posts
    food: 1607 favorites on 102 posts
    design: 1473 favorites on 141 posts
    Comedy: 1419 favorites on 103 posts
    television: 1377 favorites on 94 posts
    tv: 1342 favorites on 104 posts
    animation: 1329 favorites on 125 posts
    game: 1320 favorites on 115 posts
    flash: 1301 favorites on 150 posts
    Books: 1275 favorites on 96 posts
    war: 1273 favorites on 196 posts
    humor: 1237 favorites on 122 posts

    2008

    music: 9386 favorites on 650 posts
    art: 6448 favorites on 504 posts
    history: 5116 favorites on 310 posts
    photography: 4744 favorites on 318 posts
    youtube: 4692 favorites on 294 posts
    video: 3833 favorites on 261 posts
    science: 3154 favorites on 224 posts
    games: 2917 favorites on 142 posts
    film: 2874 favorites on 184 posts
    politics: 2804 favorites on 275 posts
    Books: 2718 favorites on 130 posts
    Obama: 2681 favorites on 164 posts
    Literature: 2539 favorites on 109 posts
    game: 2265 favorites on 157 posts
    flash: 2179 favorites on 155 posts
    design: 2075 favorites on 141 posts
    election: 2006 favorites on 128 posts
    food: 1868 favorites on 98 posts
    television: 1812 favorites on 94 posts
    animation: 1796 favorites on 111 posts

    2009

    music: 5586 favorites on 316 posts
    art: 4016 favorites on 300 posts
    history: 3494 favorites on 176 posts
    video: 2823 favorites on 159 posts
    photography: 2714 favorites on 184 posts
    youtube: 2688 favorites on 141 posts
    science: 2309 favorites on 148 posts
    film: 2034 favorites on 127 posts
    food: 1722 favorites on 85 posts
    comics: 1442 favorites on 89 posts
    flash: 1410 favorites on 110 posts
    game: 1405 favorites on 101 posts
    books: 1383 favorites on 65 posts
    obama: 1324 favorites on 113 posts
    games: 1318 favorites on 78 posts
    movies: 1261 favorites on 69 posts
    literature: 1232 favorites on 60 posts
    tv: 1229 favorites on 63 posts
    animation: 1183 favorites on 72 posts
    politics: 1171 favorites on 126 posts

    posted by FishBike at 6:46 AM on August 22, 2009


    can you do me, too?
    posted by empath at 7:47 AM on August 22, 2009


    Thank you, FishBike! This is really interesting.

    sings/
    contraption and Ambrosia Voyeur
    sittin' in a tree
    k-i-s-s-i-n-g
    /sings
    posted by rtha at 8:06 AM on August 22, 2009


    So we've established that, sure, we're gonna talk about Iraq, but we're not gonna like it.
    posted by cortex (staff) at 8:24 AM on August 22, 2009 [1 favorite]


    Stats for:empath
    Who does empath favorite the most?
    (simple count of favorites)

    Astro Zombie [13]
    It's Raining Florence Henderson [9]
    klangklangston [8]
    loquacious [7]
    Alvy Ampersand [7]
    robocop is bleeding [6]
    felix betachat [6]
    geoff. [5]
    dw [5]
    nasreddin [5]

    Who does empath favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.48% (5 of 1031) of gompa's comments+posts
    0.47% (6 of 1274) of felix betachat's comments+posts
    0.25% (6 of 2366) of robocop is bleeding's comments+posts
    0.22% (5 of 2250) of Navelgazer's comments+posts
    0.19% (5 of 2702) of geoff.'s comments+posts
    0.18% (5 of 2764) of nasreddin's comments+posts
    0.17% (5 of 2967) of dw's comments+posts
    0.16% (9 of 5682) of It's Raining Florence Henderson's comments+posts
    0.13% (13 of 9838) of Astro Zombie's comments+posts
    0.11% (7 of 6578) of Alvy Ampersand's comments+posts

    Who favorites empath the most?
    (simple count of favorites)

    tehloki [133]
    Pope Guilty [54]
    JHarris [54]
    blueberry [31]
    liza [30]
    wires [26]
    flatluigi [26]
    Joe Beese [25]
    jeffburdges [23]
    Senor Cardgage [23]

    Who favorites empath the most?
    (percent of your comments+posts since they joined)

    tehloki: 2.70% (133 of 4926) of empath's comments+posts
    Joe Beese: 1.67% (25 of 1496) of empath's comments+posts
    neewom: 1.52% (1 of 66) of empath's comments+posts
    kiwi-epitome: 1.52% (1 of 66) of empath's comments+posts
    Pope Guilty: 1.26% (54 of 4277) of empath's comments+posts
    liza: 0.98% (30 of 3075) of empath's comments+posts
    JHarris: 0.85% (54 of 6377) of empath's comments+posts
    Malice: 0.68% (4 of 584) of empath's comments+posts
    inire: 0.65% (5 of 767) of empath's comments+posts
    flatluigi: 0.61% (26 of 4277) of empath's comments+posts

    Who are empath's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    empath [7] ---- [8] loquacious
    empath [8] ---- [6] klangklangston
    empath [5] ---- [21] nasreddin
    empath [5] ---- [19] delmoi
    empath [4] ---- [14] DU
    empath [4] ---- [10] orthogonality
    empath [4] ---- [10] Potomac Avenue
    empath [7] ---- [4] Alvy Ampersand
    empath [4] ---- [9] shmegegge
    empath [4] ---- [11] Rhaomi

    Who are empath's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    empath [0.33%] ---- [0.33%] naju
    empath [2.22%] ---- [0.29%] Sebmojo
    empath [0.27%] ---- [0.32%] Rhaomi
    empath [0.23%] ---- [0.23%] MaryDellamorte
    empath [0.22%] ---- [0.60%] Potomac Avenue
    empath [0.28%] ---- [0.22%] penduluum
    empath [0.19%] ---- [0.24%] scrump
    empath [0.18%] ---- [0.40%] nasreddin
    empath [0.16%] ---- [0.51%] @troy
    empath [0.20%] ---- [0.16%] Richard Daly

    Of the threads where empath has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    delmoi: 29.3% [953 of 3250]
    DU: 23.6% [524 of 2221]
    Joe Beese: 22.1% [140 of 634]
    hippybear: 21.1% [70 of 331]
    Blazecock Pileon: 19.4% [623 of 3218]
    cortex: 18.7% [607 of 3250]
    filthy light thief: 18.2% [129 of 709]
    Astro Zombie: 18.1% [577 of 3186]
    The Whelk: 16.8% [125 of 742]
    quin: 16.4% [533 of 3250]

    Of the threads where other users have been active, in whose has empath also been the most active by percentage?
    (limited to threads active after empath has joined MetaFilter)

    MetaMan: 26.5% [18 of 68]
    theroadahead: 24.6% [17 of 69]
    aqhong: 24.0% [12 of 50]
    every_one_needs_a_hug_sometimes: 23.8% [15 of 63]
    Reggie Knoble: 23.4% [15 of 64]
    chance: 22.2% [26 of 117]
    avriette: 22.0% [11 of 50]
    The Loch Ness Monster: 21.8% [19 of 87]
    aihal: 21.8% [17 of 78]
    stumcg: 21.5% [17 of 79]

    Who has favorited the same items as empath the most?

    tehloki [94]
    scrump [78]
    Pope Guilty [72]
    schyler523 [63]
    nasreddin [55]
    graventy [49]
    JHarris [48]
    flibbertigibbet [47]
    iamkimiam [47]
    papafrita [44]


    posted by FishBike at 8:35 AM on August 22, 2009


    What we've been talking about on the blue this year, by month:
    1/2009

    Obama: 2959 comments on 37 posts
    music: 2157 comments on 53 posts
    politics: 1600 comments on 27 posts
    inauguration: 1518 comments on 12 posts
    Israel: 1431 comments on 7 posts
    Gaza: 1061 comments on 7 posts
    film: 1014 comments on 24 posts
    bush: 1010 comments on 14 posts
    photography: 980 comments on 28 posts
    youtube: 931 comments on 26 posts

    2/2009

    politics: 1728 comments on 19 posts
    economy: 1543 comments on 16 posts
    obama: 1273 comments on 13 posts
    music: 1076 comments on 34 posts
    art: 1002 comments on 33 posts
    batshitinsane: 842 comments on 11 posts
    Photography: 824 comments on 28 posts
    food: 822 comments on 16 posts
    tv: 794 comments on 11 posts
    movies: 716 comments on 11 posts

    3/2009

    music: 1928 comments on 59 posts
    Politics: 1723 comments on 20 posts
    art: 1461 comments on 58 posts
    science: 1171 comments on 27 posts
    YouTube: 1081 comments on 18 posts
    economy: 1066 comments on 13 posts
    video: 927 comments on 20 posts
    Obama: 916 comments on 11 posts
    women: 896 comments on 5 posts
    games: 847 comments on 16 posts

    4/2009

    gay: 1190 comments on 8 posts
    music: 1140 comments on 38 posts
    Obama: 1091 comments on 13 posts
    film: 946 comments on 17 posts
    America: 910 comments on 8 posts
    food: 845 comments on 12 posts
    video: 832 comments on 23 posts
    science: 824 comments on 21 posts
    movies: 770 comments on 7 posts
    tv: 725 comments on 10 posts

    5/2009

    music: 1585 comments on 43 posts
    torture: 1466 comments on 11 posts
    art: 1298 comments on 46 posts
    video: 1063 comments on 24 posts
    film: 1057 comments on 26 posts
    politics: 1037 comments on 14 posts
    murder: 958 comments on 4 posts
    Science: 932 comments on 17 posts
    ScienceFiction: 867 comments on 10 posts
    youtube: 853 comments on 15 posts

    6/2009

    batshitinsane: 1792 comments on 10 posts
    obit: 1651 comments on 10 posts
    politics: 1402 comments on 16 posts
    music: 1381 comments on 46 posts
    Iran: 1353 comments on 10 posts
    2012: 1255 comments on 3 posts
    obama: 1188 comments on 14 posts
    science: 1134 comments on 24 posts
    election: 1107 comments on 5 posts
    gender: 1096 comments on 4 posts

    7/2009

    sarahpalin: 1672 comments on 4 posts
    Palin: 1593 comments on 4 posts
    science: 1301 comments on 20 posts
    Obama: 1096 comments on 11 posts
    health: 1092 comments on 7 posts
    healthcare: 1079 comments on 6 posts
    food: 1023 comments on 14 posts
    race: 1006 comments on 2 posts
    police: 1004 comments on 2 posts
    africanamerican: 990 comments on 1 posts

    8/2009

    murder: 621 comments on 2 posts
    Youtube: 589 comments on 8 posts
    autobiography: 535 comments on 1 posts
    diary: 535 comments on 1 posts
    gunman: 535 comments on 1 posts
    journal: 535 comments on 1 posts
    killer: 535 comments on 1 posts
    shooter: 535 comments on 1 posts
    healthcare: 464 comments on 5 posts
    slate: 453 comments on 3 posts

    Anybody want to see this by month for previous years, or for any of the other three sites?
    posted by FishBike at 8:41 AM on August 22, 2009


    MattDabrowski

    That's his name! cortex and I spent about five minutes trying unsuccessfully to remember that kid's name the other day.
    posted by Kattullus at 9:58 AM on August 22, 2009


    In case you were wondering, Matt Dabrowski is tdecius, who was one of the heaviest posters in the first year and kind of treated it like his own blog. This is before editorializing and ownblogging became bad things on MetaFilter.
    posted by Kattullus at 10:01 AM on August 22, 2009


    When Kattullus was in town for the 10th, we went and got coffee and he favorited me like three times.

    I DROVE HIS ASS to the 10th and has he favorited me? Of course not. Ungrateful hyper-intelligent moptop.
    posted by dw at 1:38 PM on August 22, 2009 [1 favorite]


    I should probably ask to have my stats done. But I'm also interested in another thing.

    I now have six MeFi contributions (posts + comments) with 50 or more favorites. I know that's nowhere near the top 20, but I'm wondering if there's something there in terms of tuning the MeFi Genius Index, though I'd probably want to parse the scoring for posts vs. comments -- since comments get favorited for being pithy where posts rarely are.
    posted by dw at 1:51 PM on August 22, 2009


    I should probably ask to have my stats done.

    Ask away! The process for me is as simple as changing a user name at the top of a script, clicking an icon to run the script, leaving it to do its thing for 15 minutes or so, and then cut-and-pasting the output into a comment here. If you or others would rather get the results privately, just send me some MeFi Mail and we'll do it that way.

    I now have six MeFi contributions (posts + comments) with 50 or more favorites. I know that's nowhere near the top 20, but I'm wondering if there's something there in terms of tuning the MeFi Genius Index, though I'd probably want to parse the scoring for posts vs. comments -- since comments get favorited for being pithy where posts rarely are.

    I'm not sure if you saw the comment from Rhaomi where the calculation of that index was defined. There don't seem to be any arbitrary numeric criteria in it that would be subject to tuning, unlike some of the other calculations here where we've set minimum numbers of posts, etc., to qualify.

    I would love it if we could find another name for that index, too. I guess calling it the f≥p index ("favorites greater than or equal to position index") would be more accurate, since the correlation between this index and "genius" has not been established. So, given that that's what it really is, what variation of it would you like to see calculated?
    posted by FishBike at 2:47 PM on August 22, 2009


    f≥p

    Looks sort of like FAP, but with the A disassembled. Yeah, The FAP Index!

    Also, do me? Please?
    posted by carsonb at 2:58 PM on August 22, 2009 [1 favorite]


    (apparently Ceiling Cat really is watching you)

    Stats for:carsonb
    Who does carsonb favorite the most?
    (simple count of favorites)

    loquacious [27]
    cortex [21]
    Kattullus [18]
    y2karl [15]
    madamjujujive [15]
    klangklangston [14]
    It's Raining Florence Henderson [14]
    Ambrosia Voyeur [14]
    Anonymous [13]
    goodnewsfortheinsane [13]

    Who does carsonb favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.41% (18 of 4420) of Kattullus's comments+posts
    0.33% (15 of 4506) of madamjujujive's comments+posts
    0.26% (27 of 10313) of loquacious's comments+posts
    0.24% (14 of 5941) of It's Raining Florence Henderson's comments+posts
    0.22% (14 of 6364) of Ambrosia Voyeur's comments+posts
    0.22% (13 of 5913) of goodnewsfortheinsane's comments+posts
    0.22% (8 of 3669) of not_on_display's comments+posts
    0.22% (6 of 2765) of vronsky's comments+posts
    0.20% (13 of 6654) of Anonymous's comments+posts
    0.19% (10 of 5250) of jonson's comments+posts

    Who favorites carsonb the most?
    (simple count of favorites)

    tehloki [47]
    muymuy [22]
    nicolin [18]
    not_on_display [16]
    nickyskye [14]
    misha [12]
    DevilsAdvocate [12]
    Ambrosia Voyeur [12]
    Pope Guilty [12]
    klangklangston [12]

    Who favorites carsonb the most?
    (percent of your comments+posts since they joined)

    tehloki: 1.53% (47 of 3074) of carsonb's comments+posts
    andrewcilento: 1.09% (1 of 92) of carsonb's comments+posts
    soft and hardcore taters: 1.09% (1 of 92) of carsonb's comments+posts
    not_on_display: 0.87% (16 of 1832) of carsonb's comments+posts
    muymuy: 0.73% (22 of 3004) of carsonb's comments+posts
    nicolin: 0.67% (18 of 2705) of carsonb's comments+posts
    smoke: 0.65% (1 of 153) of carsonb's comments+posts
    Antidisestablishmentarianist: 0.65% (1 of 153) of carsonb's comments+posts
    dabug: 0.65% (1 of 153) of carsonb's comments+posts
    ixohoxi: 0.65% (1 of 153) of carsonb's comments+posts

    Who are carsonb's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    carsonb [14] ---- [12] Ambrosia Voyeur
    carsonb [14] ---- [12] klangklangston
    carsonb [15] ---- [11] madamjujujive
    carsonb [10] ---- [14] nickyskye
    carsonb [8] ---- [16] not_on_display
    carsonb [27] ---- [8] loquacious
    carsonb [7] ---- [11] flapjax at midnite
    carsonb [6] ---- [11] vronsky
    carsonb [18] ---- [6] Kattullus
    carsonb [9] ---- [4] mathowie

    Who are carsonb's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    carsonb [0.33%] ---- [0.27%] madamjujujive
    carsonb [0.68%] ---- [0.26%] Del Far
    carsonb [0.22%] ---- [0.36%] Ambrosia Voyeur
    carsonb [0.22%] ---- [0.87%] not_on_display
    carsonb [0.22%] ---- [0.27%] vronsky
    carsonb [0.26%] ---- [0.20%] loquacious
    carsonb [0.39%] ---- [0.18%] flatluigi
    carsonb [0.18%] ---- [0.23%] ocherdraco
    carsonb [0.18%] ---- [0.18%] katillathehun
    carsonb [0.15%] ---- [0.33%] nickyskye

    Of the threads where carsonb has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 35.2% [891 of 2534]
    jessamyn: 24.7% [627 of 2534]
    languagehat: 20.6% [510 of 2474]
    klangklangston: 18.6% [454 of 2437]
    quin: 18.5% [458 of 2474]
    mathowie: 18.0% [455 of 2534]
    Alvy Ampersand: 17.2% [409 of 2376]
    loquacious: 16.5% [403 of 2444]
    stavrosthewonderchicken: 16.1% [409 of 2534]
    blue_beetle: 16.1% [398 of 2470]

    Of the threads where other users have been active, in whose has carsonb also been the most active by percentage?
    (limited to threads active after carsonb has joined MetaFilter)

    Ceiling Cat: 24.5% [13 of 53]
    pb: 18.9% [93 of 493]
    Kwine: 16.7% [147 of 880]
    Duncan: 15.9% [24 of 151]
    and hosted from Uranus: 15.9% [91 of 573]
    h00py: 15.1% [49 of 324]
    double block and bleed: 14.8% [21 of 142]
    team lowkey: 14.7% [99 of 673]
    SteveTheRed: 14.6% [36 of 246]
    phoque: 14.6% [21 of 144]

    Who has favorited the same items as carsonb the most?

    nicolin [99]
    nickyskye [69]
    flibbertigibbet [67]
    ifjuly [67]
    limeonaire [66]
    not_on_display [58]
    roll truck roll [57]
    deborah [56]
    tickingclock [55]
    divabat [54]


    posted by FishBike at 3:18 PM on August 22, 2009 [1 favorite]


    I decided to run the top 20 tags (based on number of comments in the threads they're used on), by month, from the beginning of MetaFilter until the end of July 2009 (last complete month). Of course, the results are too long to post here. So I put them on a web page instead.

    I present An Automated History of MetaFilter.
    posted by FishBike at 3:26 PM on August 22, 2009


    Ha!
    posted by cortex (staff) at 3:52 PM on August 22, 2009


    FishBike, no worries if you're inundated with requests, but I'd love to see mine.

    This is all reading like a somewhat nerdy Dark Art, and it's delightful.
    posted by carbide at 6:42 PM on August 22, 2009


    carbide: This is all reading like a somewhat nerdy Dark Art, and it's delightful.

    Only somewhat? What on earth would we have to do to reach a nerdiness quotent of 1.0?

    Stats for:carbide
    Who does carbide favorite the most?
    (simple count of favorites)

    Anonymous [12]
    jessamyn [8]
    Greg Nog [6]
    Miko [5]
    Juliet Banana [5]
    scody [5]
    EmpressCallipygos [5]
    cortex [5]
    doobiedoo [4]
    drjimmy11 [4]

    Who does carbide favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    2.23% (8 of 358) of carbide's comments+posts
    0.97% (5 of 514) of Juliet Banana's comments+posts
    0.52% (6 of 1159) of Greg Nog's comments+posts
    0.40% (12 of 2988) of Anonymous's comments+posts
    0.21% (5 of 2374) of scody's comments+posts
    0.17% (5 of 2867) of EmpressCallipygos's comments+posts
    0.15% (8 of 5310) of jessamyn's comments+posts
    0.15% (5 of 3356) of Miko's comments+posts
    0.07% (5 of 7477) of cortex's comments+posts

    Who favorites carbide the most?
    (simple count of favorites)

    chicainthecity [7]
    yohko [4]
    arcticwoman [3]
    streetdreams [3]
    libraryhead [3]
    elisynn [2]
    melorama [2]
    hought20 [2]
    lunit [2]
    limeonaire [2]

    Who favorites carbide the most?
    (percent of your comments+posts since they joined)

    carbide: 2.23% (8 of 358) of carbide's comments+posts
    chicainthecity: 1.94% (7 of 360) of carbide's comments+posts
    inire: 1.72% (1 of 58) of carbide's comments+posts
    joseph conrad is fully awesome: 1.49% (2 of 134) of carbide's comments+posts
    ejazen: 1.25% (1 of 80) of carbide's comments+posts
    yohko: 1.11% (4 of 360) of carbide's comments+posts
    filthy light thief: 0.99% (2 of 202) of carbide's comments+posts
    macg02: 0.97% (1 of 103) of carbide's comments+posts
    Potomac Avenue: 0.88% (2 of 228) of carbide's comments+posts
    arcticwoman: 0.83% (3 of 360) of carbide's comments+posts

    Who are carbide's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    carbide [3] ---- [2] Potomac Avenue
    carbide [3] ---- [2] filthy light thief
    carbide [5] ---- [2] scody
    carbide [1] ---- [1] AceRock
    carbide [1] ---- [2] PhoBWanKenobi
    carbide [3] ---- [1] koeselitz
    carbide [1] ---- [1] RedEmma
    carbide [1] ---- [1] phrontist
    carbide [1] ---- [3] streetdreams
    carbide [6] ---- [1] Greg Nog

    Who are carbide's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    carbide [2.23%] ---- [2.23%] carbide
    carbide [1.82%] ---- [0.83%] streetdreams
    carbide [0.86%] ---- [0.56%] blueberry
    carbide [0.56%] ---- [0.39%] Grlnxtdr
    carbide [0.38%] ---- [0.56%] bristolcat
    carbide [0.38%] ---- [0.44%] cranberrymonger
    carbide [2.20%] ---- [0.36%] Ira_
    carbide [0.30%] ---- [0.36%] minifigs
    carbide [1.12%] ---- [0.30%] doobiedoo
    carbide [1.04%] ---- [0.28%] Flying Squirrel

    Of the threads where carbide has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    jessamyn: 13.7% [41 of 299]
    Brandon Blatcher: 13.7% [41 of 299]
    desjardins: 12.0% [36 of 299]
    PhoBWanKenobi: 11.9% [28 of 236]
    Potomac Avenue: 11.7% [23 of 196]
    EmpressCallipygos: 11.3% [33 of 292]
    filthy light thief: 11.3% [21 of 186]
    DU: 11.0% [33 of 299]
    klangklangston: 10.7% [32 of 299]
    turgid dahlia: 10.2% [30 of 295]

    Of the threads where other users have been active, in whose has carbide also been the most active by percentage?
    (limited to threads active after carbide has joined MetaFilter)

    uxo: 8.3% [5 of 60]
    still_wears_a_hat: 8.0% [4 of 50]
    twistofrhyme: 7.8% [4 of 51]
    Miss Otis' Egrets: 7.0% [4 of 57]
    beccaj: 6.3% [5 of 80]
    fizzix: 6.0% [3 of 50]
    penguinliz: 6.0% [4 of 67]
    kosem: 5.8% [7 of 120]
    govtdrone: 5.8% [3 of 52]
    Eicats: 5.8% [3 of 52]

    Who has favorited the same items as carbide the most?

    limeonaire [49]
    flibbertigibbet [42]
    DulcineaX [41]
    ifjuly [41]
    yohko [38]
    hot soup girl [37]
    nooneyouknow [36]
    khaibit [34]
    scrump [32]
    nasreddin [31]


    posted by FishBike at 7:00 PM on August 22, 2009 [1 favorite]


    FishBike, I'd love to see mine, if you have time :D
    posted by ThePinkSuperhero at 7:59 PM on August 22, 2009 [1 favorite]


    It's too bad the HTML that would make this comment's text show up pink would be stripped out...

    Stats for:ThePinkSuperhero
    Who does ThePinkSuperhero favorite the most?
    (simple count of favorites)

    Stynxno [22]
    hermitosis [19]
    jessamyn [11]
    mullacc [11]
    Optimus Chyme [9]
    grumblebee [7]
    NDó [6]
    CunningLinguist [6]
    dame [6]
    Burhanistan [5]

    Who does ThePinkSuperhero favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.24% (22 of 1772) of Stynxno's comments+posts
    0.56% (11 of 1976) of mullacc's comments+posts
    0.46% (19 of 4117) of hermitosis's comments+posts
    0.44% (5 of 1134) of onlyconnect's comments+posts
    0.23% (5 of 2214) of bingo's comments+posts
    0.21% (6 of 2926) of dame's comments+posts
    0.20% (6 of 2944) of NDó's comments+posts
    0.18% (9 of 5053) of Optimus Chyme's comments+posts
    0.15% (7 of 4578) of grumblebee's comments+posts
    0.14% (6 of 4347) of CunningLinguist's comments+posts

    Who favorites ThePinkSuperhero the most?
    (simple count of favorites)

    davey_darling [7524]
    Stynxno [147]
    tehloki [76]
    blueberry [56]
    DevilsAdvocate [41]
    Nattie [39]
    grouse [38]
    nooneyouknow [38]
    melorama [34]
    chicainthecity [33]

    Who favorites ThePinkSuperhero the most?
    (percent of your comments+posts since they joined)

    davey_darling: 97.85% (7524 of 7689) of ThePinkSuperhero's comments+posts
    kiki_s: 3.42% (28 of 818) of ThePinkSuperhero's comments+posts
    Nattie: 2.05% (39 of 1899) of ThePinkSuperhero's comments+posts
    PhoBWanKenobi: 1.98% (22 of 1109) of ThePinkSuperhero's comments+posts
    Lexica: 1.93% (4 of 207) of ThePinkSuperhero's comments+posts
    Stynxno: 1.90% (147 of 7724) of ThePinkSuperhero's comments+posts
    tehloki: 1.56% (76 of 4885) of ThePinkSuperhero's comments+posts
    futureisunwritten: 1.45% (3 of 207) of ThePinkSuperhero's comments+posts
    oinopaponton: 1.22% (1 of 82) of ThePinkSuperhero's comments+posts
    wiretap: 1.22% (1 of 82) of ThePinkSuperhero's comments+posts

    Who are ThePinkSuperhero's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    ThePinkSuperhero [22] ---- [147] Stynxno
    ThePinkSuperhero [19] ---- [28] hermitosis
    ThePinkSuperhero [9] ---- [14] Optimus Chyme
    ThePinkSuperhero [5] ---- [9] klangklangston
    ThePinkSuperhero [5] ---- [11] Brandon Blatcher
    ThePinkSuperhero [5] ---- [7] onlyconnect
    ThePinkSuperhero [4] ---- [7524] davey_darling
    ThePinkSuperhero [4] ---- [38] grouse
    ThePinkSuperhero [4] ---- [10] [NOT HERMITOSIS-IST]
    ThePinkSuperhero [4] ---- [19] ColdChef

    Who are ThePinkSuperhero's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    ThePinkSuperhero [1.24%] ---- [1.90%] Stynxno
    ThePinkSuperhero [0.63%] ---- [97.85%] davey_darling
    ThePinkSuperhero [0.61%] ---- [0.53%] [NOT HERMITOSIS-IST]
    ThePinkSuperhero [0.46%] ---- [0.39%] hermitosis
    ThePinkSuperhero [2.33%] ---- [0.37%] Never teh Bride
    ThePinkSuperhero [0.26%] ---- [0.34%] anniecat
    ThePinkSuperhero [0.35%] ---- [0.24%] Lipstick Thespian
    ThePinkSuperhero [0.56%] ---- [0.23%] taff
    ThePinkSuperhero [0.27%] ---- [0.22%] Locative
    ThePinkSuperhero [0.39%] ---- [0.22%] The1andonly

    Of the threads where ThePinkSuperhero has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    jessamyn: 23.2% [1013 of 4374]
    cortex: 19.8% [867 of 4374]
    kathrineg: 18.8% [24 of 128]
    languagehat: 17.1% [749 of 4374]
    Brandon Blatcher: 16.5% [721 of 4374]
    klangklangston: 14.9% [613 of 4122]
    jonmc: 14.2% [623 of 4374]
    turgid dahlia: 13.7% [143 of 1042]
    delmoi: 13.5% [591 of 4374]
    Alvy Ampersand: 13.4% [498 of 3728]

    Of the threads where other users have been active, in whose has ThePinkSuperhero also been the most active by percentage?
    (limited to threads active after ThePinkSuperhero has joined MetaFilter)

    Stynxno: 34.0% [430 of 1264]
    PugAchev: 33.3% [21 of 63]
    every_one_needs_a_hug_sometimes: 30.2% [19 of 63]
    Lola_G: 29.3% [22 of 75]
    kkokkodalk: 29.2% [81 of 277]
    lacedback: 27.3% [24 of 88]
    Help, I can't stop talking!: 25.5% [52 of 204]
    smallstatic: 25.3% [24 of 95]
    Duncan: 25.0% [37 of 148]
    Ceiling Cat: 24.5% [13 of 53]

    Who has favorited the same items as ThePinkSuperhero the most?

    tehloki [38]
    scrump [36]
    nooneyouknow [35]
    Nattie [34]
    blueberry [29]
    grouse [29]
    schyler523 [28]
    scody [28]
    deborah [27]
    nasreddin [26]

    posted by FishBike at 8:15 PM on August 22, 2009 [1 favorite]


    this is fun and I'd like to go next, FishBike.
    I'm a relatively quiet user, though; will the stats run okay as is or should I go favorite and get favorited some more?
    posted by heeeraldo at 10:08 PM on August 22, 2009

    Who favorites ThePinkSuperhero the most?
    (percent of your comments+posts since they joined)

    davey_darling: 97.85% (7524 of 7689) of ThePinkSuperhero's comments+posts
    Christ, davey_darling, pick up the pace already.
    posted by Rhaomi at 10:27 PM on August 22, 2009 [1 favorite]


    Ok, my inner-metric-junkie has been fighting it out with my anti-ego-search angel and, after a couple of days worth of numeric temptations, the statistical demon finally won out. So if I may, FishBike, would you mind running me through the engine?
    posted by quin at 11:12 PM on August 22, 2009


    Please will you do me, fishbike?
    posted by Meatbomb at 11:16 PM on August 22, 2009


    FishBike sent me my results in a MeMail on my request. Thanks, bud!

    Geez, I'm really going to have to buy Blazecock a beer, aren't I?

    Well, I just hope he's a cheap date.

    Stats for:Cool Papa Bell

    Who does Cool Papa Bell favorite the most?
    (simple count of favorites)

    Blazecock Pileon [25]
    cortex [23]
    Brandon Blatcher [15]
    Astro Zombie [13]
    DU [12]
    Pastabagel [11]
    rokusan [10]
    tkchrist [10]
    Artw [9]
    klangklangston [9]

    Who does Cool Papa Bell favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.56% (5 of 896) of felix betachat's comments+posts
    0.53% (5 of 936) of Ynoxas's comments+posts
    0.42% (9 of 2123) of HuronBob's comments+posts
    0.42% (11 of 2598) of Pastabagel's comments+posts
    0.36% (5 of 1373) of Chocolate Pickle's comments+posts
    0.36% (5 of 1383) of XQUZYPHYR's comments+posts
    0.35% (6 of 1692) of Avenger's comments+posts
    0.29% (7 of 2378) of Flunkie's comments+posts
    0.29% (6 of 2045) of Greg Nog's comments+posts
    0.28% (5 of 1771) of Class Goat's comments+posts

    Who favorites Cool Papa Bell the most?
    (simple count of favorites)

    tehloki [70]
    liza [31]
    Pope Guilty [31]
    BrotherCaine [31]
    Joe Beese [28]
    JHarris [26]
    sebastienbailard [25]
    koeselitz [23]
    nasreddin [23]
    blueberry [23]

    Who favorites Cool Papa Bell the most?
    (percent of your comments+posts since they joined)

    Joe Beese: 2.58% (28 of 1084) of Cool Papa Bell's comments+posts
    tehloki: 1.79% (70 of 3905) of Cool Papa Bell's comments+posts
    Peztopiary: 1.75% (1 of 57) of Cool Papa Bell's comments+posts
    liza: 1.10% (31 of 2830) of Cool Papa Bell's comments+posts
    thsmchnekllsfascists: 1.01% (11 of 1084) of Cool Papa Bell's comments+posts
    inire: 0.97% (7 of 720) of Cool Papa Bell's comments+posts
    PhoBWanKenobi: 0.95% (18 of 1899) of Cool Papa Bell's comments+posts
    Pope Guilty: 0.80% (31 of 3872) of Cool Papa Bell's comments+posts
    BrotherCaine: 0.79% (31 of 3905) of Cool Papa Bell's comments+posts
    ThatRandomGuy: 0.78% (2 of 255) of Cool Papa Bell's comments+posts

    Who are Cool Papa Bell's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Cool Papa Bell [25] ---- [17] Blazecock Pileon
    Cool Papa Bell [10] ---- [19] rokusan
    Cool Papa Bell [9] ---- [22] Artw
    Cool Papa Bell [12] ---- [9] DU
    Cool Papa Bell [9] ---- [15] klangklangston
    Cool Papa Bell [7] ---- [8] quin
    Cool Papa Bell [9] ---- [6] jonmc
    Cool Papa Bell [6] ---- [28] Joe Beese
    Cool Papa Bell [5] ---- [7] Sys Rq
    Cool Papa Bell [5] ---- [5] KokuRyu

    Who are Cool Papa Bell's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Cool Papa Bell [0.48%] ---- [0.41%] the other side
    Cool Papa Bell [0.40%] ---- [0.57%] Reverend John
    Cool Papa Bell [0.52%] ---- [0.37%] benzenedream
    Cool Papa Bell [0.31%] ---- [0.31%] contessa
    Cool Papa Bell [0.29%] ---- [0.32%] jock@law
    Cool Papa Bell [0.27%] ---- [0.33%] Snyder
    Cool Papa Bell [0.27%] ---- [0.44%] Blazecock Pileon
    Cool Papa Bell [0.26%] ---- [0.31%] Inspector.Gadget
    Cool Papa Bell [0.55%] ---- [0.25%] Malice
    Cool Papa Bell [0.23%] ---- [0.46%] aeschenkarnos

    Of the threads where Cool Papa Bell has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    Blazecock Pileon: 19.2% [492 of 2562]
    Joe Beese: 17.2% [130 of 755]
    quin: 17.0% [435 of 2562]
    filthy light thief: 15.7% [143 of 909]
    DU: 15.5% [398 of 2562]
    delmoi: 14.4% [370 of 2562]
    Astro Zombie: 14.0% [359 of 2562]
    turgid dahlia: 13.7% [220 of 1602]
    cortex: 13.6% [349 of 2562]
    ericb: 12.8% [327 of 2562]

    Of the threads where other users have been active, in whose has Cool Papa Bell also been the most active by percentage?
    (limited to threads active after Cool Papa Bell has joined MetaFilter)

    mano: 24.2% [15 of 62]
    sloe: 21.6% [22 of 102]
    shiu mai baby: 20.2% [38 of 188]
    erikharmon: 20.0% [12 of 60]
    jennaratrix: 19.6% [10 of 51]
    Edgewise: 19.3% [29 of 150]
    aihal: 19.2% [15 of 78]
    L.P. Hatecraft: 18.8% [12 of 64]
    JoeXIII007: 18.8% [21 of 112]
    decagon: 18.3% [22 of 120]

    Who has favorited the same items as Cool Papa Bell the most?

    tehloki [108]
    Pope Guilty [101]
    scrump [88]
    blueberry [84]
    schyler523 [80]
    shmegegge [73]
    nasreddin [66]
    BrotherCaine [60]
    lalochezia [60]
    JHarris [52]

    posted by Cool Papa Bell at 12:51 AM on August 23, 2009


    If you have the time could you do me please?
    posted by minifigs at 2:45 AM on August 23, 2009


    I know this has been asked before, but can someone do a definitive "peak hours" analysis for the different subsites? At what hours of the day are there the most posts/questions, the most comments, and the most favorites? Maybe also extend the analysis to days of the week or days of the month? It would be interesting to see if there are anomalies like certain subsites being more active on weekends, or if people tend to waste less time on MeFi towards end-of-month crunchtimes.

    I'm also interested in seeing new user signup trends. Obviously there are big dates like the opening of the floodgates in November 2004, and the MeFi 10th anniversary season, but are there other dates with 50+ new signups that could possibly be attributed to other occasions, or maybe even highly active threads at the time?
    posted by Lush at 2:51 AM on August 23, 2009


    this is fun and I'd like to go next, FishBike.
    I'm a relatively quiet user, though; will the stats run okay as is or should I go favorite and get favorited some more?
    posted by heeeraldo at 1:08 AM on August 23


    That does result in some of the tables being empty, or filled by relatively meaningless data. But some of them still work, and it's interesting to see some of the same names on them as we see in many of the statistics for more active users.

    So here are the ones that still make sense:
    Who does heeeraldo favorite the most?
    (simple count of favorites)

    Pastabagel [4]
    klangklangston [3]
    flibbertigibbet [3]
    Kattullus [3]
    dersins [2]
    DaShiv [2]
    cortex [2]
    Astro Zombie [2]
    jonmc [2]
    netbros [2]

    Who favorites heeeraldo the most?
    (simple count of favorites)

    misha [3]
    tehloki [3]
    Reverend John [3]
    Pope Guilty [3]
    limeonaire [3]
    Richard Daly [2]
    imperium [2]
    rtha [2]
    Rock Steady [2]
    Jpfed [2]

    Of the threads where heeeraldo has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 21.4% [91 of 425]
    Fuzzy Skinner: 19.2% [32 of 167]
    jessamyn: 18.1% [77 of 425]
    Sys Rq: 18.0% [36 of 200]
    quin: 16.9% [72 of 425]
    Blazecock Pileon: 16.5% [65 of 394]
    delmoi: 16.2% [69 of 425]
    turgid dahlia: 15.8% [23 of 146]
    Astro Zombie: 15.5% [60 of 386]
    Brandon Blatcher: 15.5% [66 of 425]

    Of the threads where other users have been active, in whose has heeeraldo also been the most active by percentage?
    (limited to threads active after heeeraldo has joined MetaFilter)

    aihal: 9.0% [7 of 78]
    sushiwiththejury: 7.3% [6 of 82]
    gummi: 6.8% [10 of 146]
    myopicman: 6.8% [5 of 74]
    Duug: 6.7% [4 of 60]
    podwarrior: 6.6% [4 of 61]
    danOstuporStar: 6.3% [4 of 64]
    aqhong: 6.0% [3 of 50]
    Hugonaut: 5.7% [3 of 53]
    troika: 5.5% [3 of 55]

    Who has favorited the same items as heeeraldo the most?

    hot soup girl [12]
    twins named Lugubrious and Salubrious [12]
    flibbertigibbet [11]
    nicolin [11]
    shmegegge [11]
    agregoli [10]
    jtron [10]
    Cassilda [9]
    JHarris [9]
    jokeefe [9]

    posted by FishBike at 6:28 AM on August 23, 2009


    Stats for:quin
    Who does quin favorite the most?
    (simple count of favorites)

    Astro Zombie [28]
    loquacious [17]
    Pastabagel [17]
    Smedleyman [17]
    cortex [17]
    DU [16]
    It's Raining Florence Henderson [14]
    tkchrist [11]
    Pope Guilty [11]
    eriko [10]

    Who does quin favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.44% (8 of 1808) of vorfeed's comments+posts
    0.40% (17 of 4301) of Pastabagel's comments+posts
    0.35% (6 of 1738) of Avenger's comments+posts
    0.32% (7 of 2184) of burnmp3s's comments+posts
    0.32% (10 of 3173) of eriko's comments+posts
    0.29% (5 of 1707) of katillathehun's comments+posts
    0.28% (28 of 9838) of Astro Zombie's comments+posts
    0.24% (6 of 2479) of tehloki's comments+posts
    0.24% (8 of 3331) of robocop is bleeding's comments+posts
    0.24% (11 of 4597) of Pope Guilty's comments+posts

    Who favorites quin the most?
    (simple count of favorites)

    tehloki [175]
    misha [77]
    JHarris [73]
    Pope Guilty [66]
    sebastienbailard [52]
    deborah [51]
    drezdn [51]
    Rock Steady [43]
    blueberry [39]
    brundlefly [36]

    Who favorites quin the most?
    (percent of your comments+posts since they joined)

    tehloki: 2.43% (175 of 7208) of quin's comments+posts
    neewom: 1.56% (1 of 64) of quin's comments+posts
    gingerailment: 1.56% (1 of 64) of quin's comments+posts
    harujion: 1.56% (1 of 64) of quin's comments+posts
    Diagonalize: 1.56% (1 of 64) of quin's comments+posts
    endotoxin: 1.56% (1 of 64) of quin's comments+posts
    Joe Beese: 1.38% (19 of 1377) of quin's comments+posts
    misha: 1.35% (77 of 5702) of quin's comments+posts
    Pope Guilty: 1.10% (66 of 5989) of quin's comments+posts
    liza: 0.95% (35 of 3673) of quin's comments+posts

    Who are quin's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    quin [17] ---- [23] loquacious
    quin [16] ---- [26] DU
    quin [11] ---- [66] Pope Guilty
    quin [10] ---- [12] eriko
    quin [9] ---- [9] miss lynnster
    quin [9] ---- [51] drezdn
    quin [8] ---- [22] Blazecock Pileon
    quin [7] ---- [15] burnmp3s
    quin [8] ---- [7] Cool Papa Bell
    quin [7] ---- [9] Sys Rq

    Who are quin's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    quin [0.38%] ---- [0.36%] scrump
    quin [0.35%] ---- [0.79%] sebastienbailard
    quin [0.35%] ---- [0.36%] Lipstick Thespian
    quin [0.32%] ---- [0.38%] burnmp3s
    quin [0.30%] ---- [0.35%] minifigs
    quin [0.24%] ---- [2.43%] tehloki
    quin [0.24%] ---- [1.10%] Pope Guilty
    quin [0.22%] ---- [0.28%] Jilder
    quin [0.21%] ---- [0.22%] batmonkey
    quin [0.21%] ---- [0.26%] Sys Rq

    Of the threads where quin has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    DU: 29.7% [1410 of 4750]
    Joe Beese: 28.9% [292 of 1012]
    filthy light thief: 26.0% [319 of 1227]
    cortex: 23.7% [1458 of 6142]
    delmoi: 22.8% [1402 of 6142]
    Blazecock Pileon: 21.5% [1274 of 5913]
    The Whelk: 21.5% [283 of 1319]
    Astro Zombie: 20.9% [1234 of 5907]
    turgid dahlia: 20.1% [438 of 2176]
    Potomac Avenue: 17.0% [220 of 1294]

    Of the threads where other users have been active, in whose has quin also been the most active by percentage?
    (limited to threads active after quin has joined MetaFilter)

    aihal: 47.4% [37 of 78]
    Ragma: 47.4% [73 of 154]
    double block and bleed: 42.3% [60 of 142]
    HVAC Guerilla: 41.5% [22 of 53]
    VicNebulous: 41.1% [62 of 151]
    Hugonaut: 39.6% [21 of 53]
    Aversion Therapy: 39.1% [43 of 110]
    panboi: 38.6% [54 of 140]
    JoeXIII007: 37.5% [42 of 112]
    uri: 37.4% [37 of 99]

    Who has favorited the same items as quin the most?

    tehloki [221]
    Pope Guilty [136]
    schyler523 [128]
    scrump [114]
    flibbertigibbet [107]
    shmegegge [96]
    graventy [92]
    deborah [85]
    JHarris [85]
    misha [82]


    posted by FishBike at 6:46 AM on August 23, 2009


    Stats for:Meatbomb
    Who does Meatbomb favorite the most?
    (simple count of favorites)

    languagehat [19]
    jessamyn [18]
    cortex [18]
    stavrosthewonderchicken [14]
    Burhanistan [14]
    Blazecock Pileon [14]
    BitterOldPunk [13]
    klangklangston [13]
    Artw [12]
    delmoi [11]

    Who does Meatbomb favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    1.71% (6 of 351) of little e's comments+posts
    0.99% (6 of 605) of billyfleetwood's comments+posts
    0.55% (13 of 2379) of BitterOldPunk's comments+posts
    0.45% (7 of 1556) of Mutant's comments+posts
    0.37% (11 of 2944) of NDó's comments+posts
    0.35% (6 of 1694) of felix betachat's comments+posts
    0.33% (5 of 1500) of chillmost's comments+posts
    0.31% (8 of 2553) of adipocere's comments+posts
    0.31% (5 of 1625) of DecemberBoy's comments+posts
    0.29% (11 of 3755) of TheOnlyCoolTim's comments+posts

    Who favorites Meatbomb the most?
    (simple count of favorites)

    tehloki [60]
    misha [44]
    jtron [40]
    nasreddin [26]
    koeselitz [25]
    doctor_negative [24]
    scrump [24]
    TheOnlyCoolTim [20]
    Rock Steady [20]
    DevilsAdvocate [19]

    Who favorites Meatbomb the most?
    (percent of your comments+posts since they joined)

    tehloki: 2.23% (60 of 2696) of Meatbomb's comments+posts
    misha: 1.93% (44 of 2283) of Meatbomb's comments+posts
    watch out for turtles: 1.85% (1 of 54) of Meatbomb's comments+posts
    little e: 1.56% (11 of 706) of Meatbomb's comments+posts
    jtron: 1.12% (40 of 3572) of Meatbomb's comments+posts
    ixohoxi: 1.03% (2 of 194) of Meatbomb's comments+posts
    not_on_display: 0.98% (18 of 1837) of Meatbomb's comments+posts
    nasreddin: 0.90% (26 of 2905) of Meatbomb's comments+posts
    Joe Beese: 0.85% (6 of 706) of Meatbomb's comments+posts
    Reverend John: 0.85% (14 of 1653) of Meatbomb's comments+posts

    Who are Meatbomb's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Meatbomb [11] ---- [20] TheOnlyCoolTim
    Meatbomb [11] ---- [12] delmoi
    Meatbomb [8] ---- [25] koeselitz
    Meatbomb [18] ---- [8] jessamyn
    Meatbomb [14] ---- [7] Blazecock Pileon
    Meatbomb [6] ---- [6] Joe Beese
    Meatbomb [6] ---- [26] nasreddin
    Meatbomb [9] ---- [6] flapjax at midnite
    Meatbomb [6] ---- [10] shmegegge
    Meatbomb [8] ---- [6] Marisa Stole the Precious Thing

    Who are Meatbomb's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Meatbomb [1.71%] ---- [1.56%] little e
    Meatbomb [0.29%] ---- [0.56%] TheOnlyCoolTim
    Meatbomb [0.83%] ---- [0.26%] Wrinkled Stumpskin
    Meatbomb [0.25%] ---- [0.25%] bonobothegreat
    Meatbomb [0.28%] ---- [0.23%] The Whelk
    Meatbomb [0.22%] ---- [0.90%] nasreddin
    Meatbomb [0.21%] ---- [0.85%] Joe Beese
    Meatbomb [0.21%] ---- [0.48%] Kraftmatic Adjustable Cheese
    Meatbomb [0.56%] ---- [0.20%] Grlnxtdr
    Meatbomb [0.25%] ---- [0.18%] dunkadunc

    Of the threads where Meatbomb has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 29.4% [738 of 2507]
    languagehat: 24.0% [601 of 2507]
    jessamyn: 22.7% [568 of 2507]
    Brandon Blatcher: 19.0% [477 of 2506]
    delmoi: 18.8% [471 of 2507]
    Blazecock Pileon: 18.7% [449 of 2404]
    quin: 18.6% [467 of 2507]
    Astro Zombie: 18.3% [436 of 2387]
    The Whelk: 17.6% [111 of 631]
    klangklangston: 17.4% [432 of 2486]

    Of the threads where other users have been active, in whose has Meatbomb also been the most active by percentage?
    (limited to threads active after Meatbomb has joined MetaFilter)

    Ceiling Cat: 20.8% [11 of 53]
    every_one_needs_a_hug_sometimes: 20.6% [13 of 63]
    subbes: 20.1% [47 of 234]
    double block and bleed: 19.0% [27 of 142]
    h00py: 18.5% [60 of 324]
    Duncan: 17.6% [26 of 148]
    "Tex" Connor and the Wily Roundup Boys: 16.9% [22 of 130]
    phoque: 16.7% [24 of 144]
    waraw: 16.3% [75 of 461]
    vapidave: 16.2% [32 of 197]

    Who has favorited the same items as Meatbomb the most?

    tehloki [136]
    scrump [93]
    blueberry [88]
    Pope Guilty [69]
    nasreddin [66]
    schyler523 [66]
    marble [61]
    shmegegge [61]
    misha [60]
    lalochezia [56]


    posted by FishBike at 7:04 AM on August 23, 2009


    Stats for:minifigs
    Who does minifigs favorite the most?
    (simple count of favorites)

    Astro Zombie [67]
    cortex [35]
    Marisa Stole the Precious Thing [34]
    Greg Nog [27]
    Artw [27]
    The Whelk [24]
    NDó [24]
    klangklangston [24]
    East Manitoba Regional Junior Kabaddi Champion '94 [22]
    Miko [22]

    Who does minifigs favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    3.55% (24 of 677) of NDó's comments+posts
    3.52% (8 of 227) of Ian A.T.'s comments+posts
    3.28% (22 of 671) of robocop is bleeding's comments+posts
    3.15% (27 of 858) of Greg Nog's comments+posts
    2.73% (5 of 183) of Parasite Unseen's comments+posts
    2.60% (22 of 846) of East Manitoba Regional Junior Kabaddi Champion '94's comments+posts
    2.42% (6 of 248) of brain_drain's comments+posts
    2.23% (10 of 448) of hifiparasol's comments+posts
    2.03% (10 of 492) of Shepherd's comments+posts
    2.03% (9 of 443) of Pater Aletheias's comments+posts

    Who favorites minifigs the most?
    (simple count of favorites)

    Deathalicious [4]
    shmegegge [4]
    SuperSquirrel [3]
    BrotherCaine [3]
    Pope Guilty [3]
    marble [3]
    koeselitz [3]
    limeonaire [3]
    lysistrata [2]
    myopicman [2]

    Who favorites minifigs the most?
    (percent of your comments+posts since they joined)

    litterateur: 3.33% (2 of 60) of minifigs's comments+posts
    parkan: 1.67% (1 of 60) of minifigs's comments+posts
    HumuloneRanger: 1.67% (1 of 60) of minifigs's comments+posts
    TonyDanza: 1.67% (1 of 60) of minifigs's comments+posts
    imabanana: 1.33% (1 of 75) of minifigs's comments+posts
    Edwahd: 1.33% (1 of 75) of minifigs's comments+posts
    He Is Only The Imposter: 1.33% (1 of 75) of minifigs's comments+posts
    shmegegge: 1.19% (4 of 335) of minifigs's comments+posts
    Deathalicious: 1.19% (4 of 335) of minifigs's comments+posts
    HumanComplex: 0.96% (1 of 104) of minifigs's comments+posts

    Who are minifigs's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    minifigs [13] ---- [4] shmegegge
    minifigs [14] ---- [3] Pope Guilty
    minifigs [5] ---- [2] Devils Rancher
    minifigs [2] ---- [2] Roach
    minifigs [24] ---- [2] The Whelk
    minifigs [22] ---- [2] East Manitoba Regional Junior Kabaddi Champion '94
    minifigs [2] ---- [3] koeselitz
    minifigs [5] ---- [2] Metroid Baby
    minifigs [4] ---- [2] hermitosis
    minifigs [2] ---- [2] brundlefly

    Who are minifigs's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    minifigs [0.77%] ---- [1.19%] shmegegge
    minifigs [0.74%] ---- [0.90%] Pope Guilty
    minifigs [0.83%] ---- [0.70%] The Whelk
    minifigs [0.81%] ---- [0.60%] loquacious
    minifigs [1.37%] ---- [0.60%] Roach
    minifigs [0.76%] ---- [0.60%] jtron
    minifigs [1.58%] ---- [0.60%] Spatch
    minifigs [0.74%] ---- [0.60%] brain cloud
    minifigs [2.60%] ---- [0.60%] East Manitoba Regional Junior Kabaddi Champion '94
    minifigs [0.89%] ---- [0.60%] Metroid Baby

    Of the threads where minifigs has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    DU: 33.1% [86 of 260]
    Blazecock Pileon: 28.5% [74 of 260]
    quin: 27.3% [71 of 260]
    Joe Beese: 25.4% [48 of 189]
    Potomac Avenue: 24.3% [54 of 222]
    Marisa Stole the Precious Thing: 23.8% [62 of 260]
    cortex: 22.3% [58 of 260]
    Artw: 21.5% [56 of 260]
    turgid dahlia: 21.2% [55 of 260]
    filthy light thief: 20.9% [45 of 215]

    Of the threads where other users have been active, in whose has minifigs also been the most active by percentage?
    (limited to threads active after minifigs has joined MetaFilter)

    thivaia: 12.9% [11 of 85]
    muckster: 10.9% [6 of 55]
    athenian: 10.7% [13 of 122]
    effwerd: 10.6% [10 of 94]
    yeti: 10.6% [21 of 199]
    Acey: 10.5% [6 of 57]
    milquetoast: 10.3% [7 of 68]
    leftcoastbob: 10.2% [6 of 59]
    sparkletone: 10.1% [7 of 69]
    chowflap: 10.1% [11 of 109]

    Who has favorited the same items as minifigs the most?

    tehloki [389]
    scrump [347]
    Pope Guilty [303]
    schyler523 [301]
    blueberry [260]
    burnmp3s [260]
    shmegegge [245]
    graventy [216]
    Caduceus [212]
    koeselitz [212]


    posted by FishBike at 7:22 AM on August 23, 2009 [1 favorite]


    I know this has been asked before, but can someone do a definitive "peak hours" analysis for the different subsites? At what hours of the day are there the most posts/questions, the most comments, and the most favorites? Maybe also extend the analysis to days of the week or days of the month? It would be interesting to see if there are anomalies like certain subsites being more active on weekends, or if people tend to waste less time on MeFi towards end-of-month crunchtimes.

    I've added three (rather big, sorry) pictures to my Charts and Graphs page. These show separate lines for posts, comments, and favorites, for each of the four sites in the Infodump. That's 12 lines, total, so hopefully the colour scheme I used helps people to make sense of these.

    Lines are coloured based on the colour of their site (MeFi = blue, AskMe = green, Meta = grey) except for Music, which I made orange (since dark grey looks too much like Meta). The brightest line is posts, comment brightness is in the middle, and the darkest one is favorites.

    All lines are showing the count of posts, comments, or favorites in each time or day bin, but have been normalized so that the peak of each line reaches 1.0 on the chart (though something strange about Music data for the 30th of the month causes it to bust through the ceiling, more on this below).

    The activity by hour and activity by weekday graphs show pretty clear trends around busy vs. quiet periods. They also show that posts tend to be ahead of comments, and favorites come last. Of the four main sites in the Infodump, Music appears to be the odd one out in all respects.

    On the activity by day of the month chart, it appears to be mostly noise but with an overall decreasing activity trend towards the end of the month. This could be a real thing or just an artifact. Something very odd happens with the Music site activity on the 30th of the month that may be related to my attempts to control for the fact that only 11 out of 12 months have a 30th.

    I'm also interested in seeing new user signup trends. Obviously there are big dates like the opening of the floodgates in November 2004, and the MeFi 10th anniversary season, but are there other dates with 50+ new signups that could possibly be attributed to other occasions, or maybe even highly active threads at the time?

    I think this has already been posted somewhere, hasn't it?
    posted by FishBike at 8:11 AM on August 23, 2009 [1 favorite]


    I think one of the ways to interpret that data is that while most sections of the site are a distraction or a break from 9 to 5 work, Music is actually something that people actively pursue in their free time.
    posted by ocherdraco at 8:34 AM on August 23, 2009


    I'm also interested in seeing new user signup trends. Obviously there are big dates like the opening of the floodgates in November 2004, and the MeFi 10th anniversary season, but are there other dates with 50+ new signups that could possibly be attributed to other occasions, or maybe even highly active threads at the time?

    I think this has already been posted somewhere, hasn't it?


    Well, there is this: http://waxy.org/mefi/users/
    posted by mathlete at 9:17 AM on August 23, 2009


    It might be interesting to find the top 20 days by number of signups, and then find the most commented-on thread, site-wide, for each of those 20 days (going by the datestamp on the comments themselves rather than the datestamp of the post they're in).
    posted by FishBike at 9:31 AM on August 23, 2009


    The end-of-month spike on Music may not be noise or any analytical error at all—we have monthly challenges, and I would not be at all surprised by the notion that the last day or two of each month sees a sudden uptick in posts (and hence associated activity like comments and favorite) on account of folks trying to get last-minute submissions up with the challenge window is still open.

    I think one of the ways to interpret that data is that while most sections of the site are a distraction or a break from 9 to 5 work, Music is actually something that people actively pursue in their free time.

    Yeah, I think that's part of the general later-breaking trend for Music. Not that it's necessarily entirely a freetime vs. workslack thing, though, since there may also be an aspect of folks (a) listening to Music not-at-work just for the sake of having better-than-office-speakers listening environments at home and (b) people generally being more active when they're in some respect doing music stuff (like uploading a song), which they can't really do from the office.
    posted by cortex (staff) at 10:02 AM on August 23, 2009


    Whoa, FishBike - those graphs are gorgeous, and your analysis is excellent!

    They also show that posts tend to be ahead of comments, and favorites come last.

    That does make sense. Members create content first, and then people participate, and then readers/lurkers favorite comments upon review. Favorites coming last also betrays, somehow, the trend of using favorites as an agreement/approval thing, I think. If someone comes in late to the party and reviews all the comments, they favorite the post according to its potential usefulness based on it, or favorite the comments that already express their sentiment, without commenting themselves?

    Of the four main sites in the Infodump, Music appears to be the odd one out in all respects.

    I somehow suspected that Music would have anomalous trends when pitted against the text-heavy subsites but it's amazing to see actual, hard data that shows people tend to frequent it towards nighttime, and weekends don't see as big of a drop. It's also interesting to see that the posting activity (presumably creators) in Music tends to follow the regular weekday trends like the other sites, but the favoriting activity (presumably composed more of listeners who are not creators) actually dips towards the middle of the week and picks up again towards the weekend. It almost seems like music is treated more like work by its creators, and more like a leisure activity by its listeners.

    It might be interesting to find the top 20 days by number of signups, and then find the most commented-on thread, site-wide, for each of those 20 days (going by the datestamp on the comments themselves rather than the datestamp of the post they're in).

    Yes!

    (P.S. FishBike, could I ask for my user stats as well and would you mind posting it through MeMail? Thank you so much!)
    posted by Lush at 10:14 AM on August 23, 2009 [1 favorite]


    Cool graphs, fishbike. I like imagining all the MeFi musicians getting their groove on late into the night.
    posted by Meatbomb at 10:21 AM on August 23, 2009


    The end-of-month spike on Music may not be noise or any analytical error at all—we have monthly challenges, and I would not be at all surprised by the notion that the last day or two of each month sees a sudden uptick in posts (and hence associated activity like comments and favorite) on account of folks trying to get last-minute submissions up with the challenge window is still open.

    I wondered why the spike is not seen on the 31st of the month as well, even though the stats for that day are scaled up even more than they are for the 30th (since only 7 months have a 31st day). Although I guess it is, looking at the graph again, just not as strongly as the 30th.

    I wonder if this has something to do with timezone issues, specifically, that a lot of challenge submissions sent before a month-end deadline are actually overflowing into the next day according to whatever time zone the Infodump datestamps are in? That would explain why the 1st of the month looks pretty busy, too.
    posted by FishBike at 10:29 AM on August 23, 2009


    Members create content first, and then people participate, and then readers/lurkers favorite comments upon review.

    That seems to be the most logical explanation, yeah. But I'm still a little surprised to see that reflected in activity graphs. I imagined that each person would be active during a certain window of hours most days, and towards the beginning of that window, people would be commenting on and favoriting things from the day before rather than waiting for new stuff to show up.

    I wonder if maybe the explanation has to do with the relative difficulty of the three types of activity? Composing a new post is maybe something people aren't going to do so much towards the end of the day, but commenting is easier and favoriting is easier yet. People might be more inclined to click that little [+] when they are half asleep than they are inclined to put together a new FPP in that state.
    posted by FishBike at 10:37 AM on August 23, 2009


    FishBike: I wonder if maybe the explanation has to do with the relative difficulty of the three types of activity? Composing a new post is maybe something people aren't going to do so much towards the end of the day, but commenting is easier and favoriting is easier yet. People might be more inclined to click that little [+] when they are half asleep than they are inclined to put together a new FPP in that state.

    That's an interesting hypothesis. I wonder if there's a way to test this. The only thing I can think of is that MeFites that have posted a lot would presumably get better at the mechanics of putting a post together and therefore would post at different hours of the day.

    Another theory is that morning in the US catches the waking hours of the entire Anglosphere, evening in Australia and New Zealand and afternoon in the UK and South Africa.
    posted by Kattullus at 11:18 AM on August 23, 2009


    Wait... server time is Pacific Time, I was thinking in East Coast terms. Scotch that theory, Pacific morning is Australian night.
    posted by Kattullus at 11:20 AM on August 23, 2009


    Thanks FishBike!
    I tend to favorite zings and in agreement so my top ten is pretty much how I imagined it would be.
    posted by minifigs at 3:37 PM on August 23, 2009


    FishBike, how finely sliced are the time bins in your curves-on-black plots, and how did you make the curves? (splines?)

    It looks like the auto-splining done by (I think) recent MS Excels. I saw a plot in that style in a talk a couple months ago. That audience was used to dot-line plots, where a smooth curve with well-defined peaks and shoulders takes a great deal of data to observe clearly and suggests that the underlying system can be understood very well. In fact he only had an ordinary amount of data with an ordinary amount of noise, like I think you do --- it looks like 7, 24, 31 bins on the week, day, month plots respectively. I don't think your figures are misleading but I am curious how you made them, since I may have to discourage similar plots in the future.

    Everyone seems to have left my lawn.

    posted by fantabulous timewaster at 6:05 PM on August 23, 2009


    Those are indeed Excel charts, done with the built-in chart style that displays these curved lines. There are, as you suspected, only 24 bins on the hourly chart, 7 on the weekly one, and 31 on the monthly one.

    I had an internal debate about whether to use the curved-line style or the straight-line style. Comparing the two, the straight-line version seemed to distract the eye due to the spiky nature of the lines, whereas the curved-line version made it easier to see the overall trends.

    It also seemed like the curved lines better presented this data as being somewhat imprecise, though of course it can imply the opposite as well to people who assume it means there's enough data to draw a smooth curve like that.
    posted by FishBike at 6:16 PM on August 23, 2009 [1 favorite]


    I feel like we should do something nice for FishBike.
    posted by Pope Guilty at 8:17 PM on August 23, 2009


    Like a plaque or a sandwich, I mean.
    posted by Pope Guilty at 8:18 PM on August 23, 2009


    I tried a few things to see if I could figure out which posts have resulted in the most new user signups. It's hard (or maybe impossible) to figure this out directly from what's in the Infodump, so as a kind of proxy for that, here are the posts that had the most new users (<24 hours since signup) comment in them.
    just in time for thanksgiving! on MetaFilter (37 new users commented)
    I, for one, would like to welcome our new members! April 2004 on MetaTalk (30 new users commented)
    Thanks from Mathowie on MetaTalk (20 new users commented)
    We don't need no pancakes! on MetaFilter (19 new users commented)
    A Word of Advice for New Users, From Your Uncle y6y6y6 on MetaTalk (19 new users commented)
    beyond lyrics on MetaFilter (16 new users commented)
    Announcement: new user signups are back on. 20 people a day. on MetaTalk (15 new users commented)
    MetaFilter on MonkeyFilter. on MetaTalk (14 new users commented)
    because next to maps, we love rules on MetaFilter (13 new users commented)
    Blood on the Tracks on MetaFilter (13 new users commented)
    RU-ready for the next 4 years? on MetaFilter (13 new users commented)
    Addiction to porn destroying lives. on MetaFilter (12 new users commented)
    High-definition pornography is right around the corner. on MetaFilter (12 new users commented)
    What Is This Creepy Site Advertising? on Ask MetaFilter (11 new users commented)
    Alexander inherited the idea of an invasion of the Persian Empire from his father on MetaFilter (11 new users commented)
    Did Lightning Strike Twice in Florida? on MetaFilter (11 new users commented)
    Register that bicycle on MetaFilter (11 new users commented)
    God be wi'ye. on MetaTalk (11 new users commented)
    Gaming girlz on MetaFilter (10 new users commented)
    "Mainstream" Fetishes on Ask MetaFilter (9 new users commented)
    I think some of these might be on the money... others are either slightly after whatever caused the influx of new users, or are the sort of posts that new users would especially tend to comment in.
    posted by FishBike at 6:37 PM on August 24, 2009


    There are only three threads there that are not directly following the November 18, 2004 reopening, just judging by a glance at thread ids. And one of them is a 2002 reopening.

    The other two are the taos askme (aka The Greenest Mystery Ever Told) and the other is the hermitosis 500 metatalk, which featured a whole lot of sockpuppets.

    Step one here is exclude stuff from late november 2004 and and run it again. :)
    posted by cortex (staff) at 7:07 PM on August 24, 2009


    Here it is with November 2004 signups excluded:
    I, for one, would like to welcome our new members! April 2004 on MetaTalk (30 new users commented)
    Announcement: new user signups are back on. 20 people a day. on MetaTalk (15 new users commented)
    What Is This Creepy Site Advertising? on Ask MetaFilter (11 new users commented)
    God be wi'ye. on MetaTalk (11 new users commented)
    What experience most shaped who you are? on Ask MetaFilter (9 new users commented)
    GiveWell, or Give 'em Hell? on MetaTalk (9 new users commented)
    How Stupid can you Be? on MetaFilter (8 new users commented)
    RIP, DFW on MetaFilter (8 new users commented)
    MetaFilter jargon. on MetaTalk (7 new users commented)
    User Signup Policy on MetaTalk (7 new users commented)
    HP -1 on MetaFilter (6 new users commented)
    Did you win a TiVo? Post your winning essay here to compare and contrast. on MetaTalk (6 new users commented)
    850/1900 MHz Motofone? on Ask MetaFilter (5 new users commented)
    I Like To Read Things on Ask MetaFilter (5 new users commented)
    is an ebay fraud seller free? on Ask MetaFilter (5 new users commented)
    Bzzt. on MetaFilter (5 new users commented)
    old dog ... new tricks department on MetaFilter (5 new users commented)
    'Gift account' confirmation on MetaTalk (5 new users commented)
    Thank you for turning off Suicide Girls. on MetaTalk (5 new users commented)
    getbackonhorse music on Ask MetaFilter (4 new users commented)
    Sadly, the numbers seem to be pretty small. Extending the window to 3 days increases them a bit, but produces basically the same list. The list still seems to be headed by threads about allowing signups again. So far, it's looking like not many people join the site to immediately comment in specific highly popular threads.
    posted by FishBike at 7:19 PM on August 24, 2009


    hplovecraft [327 favorites on 12 posts)

    Awesome!
    posted by Artw at 9:58 AM on August 27, 2009


    That doesn't tell the whole story as there are 35 posts that use the tag lovecraft not all of which use hplovecraft.
    posted by Kattullus at 10:16 AM on August 27, 2009


    In fact, any analysis that relies on tags is going to run into all kinds of problems with consistency and thoroughness.
    posted by Kattullus at 10:33 AM on August 27, 2009


    In fact, any analysis that relies on tags is going to run into all kinds of problems with consistency and thoroughness.

    Indeed. But the only other thing in the Infodump that is even close to identifying post content is the title data. So it's not that the tag data is ideal, but that the second option is worse, and there's no third option.

    In the mean time, the back-tagging superstars deserve a lot of credit for making this type analysis even remotely possible. Figuring out how to consolidated and make consistent existing tags looks like a difficult task... I think I remember counting over 100,000 distinct tags, so that's a lot to go through.
    posted by FishBike at 10:47 AM on August 27, 2009


    I wonder what the average user # of commenters for each month would look like.
    posted by smackfu at 11:15 AM on August 27, 2009


    Wow, we're still on this? Excellent work, FishBike! If you have time, could you put me through your infernal machine next? Now I want to know my stats.
    posted by goodnewsfortheinsane at 8:29 AM on August 28, 2009


    crunchcrunchcrunchcrunchcrunch... ding!

    Stats for:goodnewsfortheinsane
    Who does goodnewsfortheinsane favorite the most?
    (simple count of favorites)

    cortex [35]
    jessamyn [23]
    mathowie [14]
    stavrosthewonderchicken [14]
    ColdChef [13]
    Cranberry [12]
    Astro Zombie [12]
    languagehat [11]
    UbuRoivas [10]
    Ambrosia Voyeur [10]

    Who does goodnewsfortheinsane favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.67% (10 of 1503) of Rhaomi's comments+posts
    0.42% (7 of 1654) of East Manitoba Regional Junior Kabaddi Champion '94's comments+posts
    0.39% (5 of 1292) of jimmythefish's comments+posts
    0.37% (12 of 3215) of Cranberry's comments+posts
    0.37% (7 of 1915) of jouke's comments+posts
    0.36% (13 of 3642) of ColdChef's comments+posts
    0.31% (9 of 2944) of NDó's comments+posts
    0.30% (10 of 3321) of yhbc's comments+posts
    0.28% (5 of 1803) of chrismear's comments+posts
    0.25% (6 of 2425) of cashman's comments+posts

    Who favorites goodnewsfortheinsane the most?
    (simple count of favorites)

    tehloki [65]
    divabat [33]
    koeselitz [32]
    Rock Steady [23]
    Blazecock Pileon [18]
    flatluigi [18]
    chrismear [18]
    UbuRoivas [18]
    nicolin [17]
    scrump [17]

    Who favorites goodnewsfortheinsane the most?
    (percent of your comments+posts since they joined)

    tehloki: 1.75% (65 of 3719) of goodnewsfortheinsane's comments+posts
    liza: 0.62% (12 of 1926) of goodnewsfortheinsane's comments+posts
    divabat: 0.61% (33 of 5396) of goodnewsfortheinsane's comments+posts
    flatluigi: 0.58% (18 of 3115) of goodnewsfortheinsane's comments+posts
    Potomac Avenue: 0.56% (6 of 1078) of goodnewsfortheinsane's comments+posts
    nicolin: 0.55% (17 of 3115) of goodnewsfortheinsane's comments+posts
    burnmp3s: 0.54% (11 of 2021) of goodnewsfortheinsane's comments+posts
    koeselitz: 0.54% (32 of 5913) of goodnewsfortheinsane's comments+posts
    papafrita: 0.51% (12 of 2374) of goodnewsfortheinsane's comments+posts
    andrewcilento: 0.50% (1 of 201) of goodnewsfortheinsane's comments+posts

    Who are goodnewsfortheinsane's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    goodnewsfortheinsane [13] ---- [11] ColdChef
    goodnewsfortheinsane [10] ---- [18] UbuRoivas
    goodnewsfortheinsane [9] ---- [32] koeselitz
    goodnewsfortheinsane [9] ---- [11] Kattullus
    goodnewsfortheinsane [9] ---- [10] flapjax at midnite
    goodnewsfortheinsane [8] ---- [16] nickyskye
    goodnewsfortheinsane [8] ---- [14] loquacious
    goodnewsfortheinsane [10] ---- [7] delmoi
    goodnewsfortheinsane [7] ---- [7] Artw
    goodnewsfortheinsane [7] ---- [11] East Manitoba Regional Junior Kabaddi Champion '94

    Who are goodnewsfortheinsane's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    goodnewsfortheinsane [0.39%] ---- [0.58%] flatluigi
    goodnewsfortheinsane [0.42%] ---- [0.38%] East Manitoba Regional Junior Kabaddi Champion '94
    goodnewsfortheinsane [0.28%] ---- [0.30%] chrismear
    goodnewsfortheinsane [0.25%] ---- [0.27%] cashman
    goodnewsfortheinsane [0.31%] ---- [0.24%] fantabulous timewaster
    goodnewsfortheinsane [0.67%] ---- [0.23%] Rhaomi
    goodnewsfortheinsane [0.21%] ---- [0.62%] liza
    goodnewsfortheinsane [0.23%] ---- [0.20%] madamjujujive
    goodnewsfortheinsane [0.36%] ---- [0.19%] ColdChef
    goodnewsfortheinsane [0.20%] ---- [0.19%] Kattullus

    Of the threads where goodnewsfortheinsane has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 31.2% [888 of 2847]
    languagehat: 21.6% [616 of 2847]
    The Whelk: 21.6% [120 of 556]
    DU: 20.8% [364 of 1748]
    delmoi: 20.4% [582 of 2847]
    Joe Beese: 20.4% [95 of 465]
    Astro Zombie: 19.6% [465 of 2368]
    jessamyn: 19.5% [556 of 2847]
    Marisa Stole the Precious Thing: 18.3% [145 of 793]
    quin: 17.8% [507 of 2847]

    Of the threads where other users have been active, in whose has goodnewsfortheinsane also been the most active by percentage?
    (limited to threads active after goodnewsfortheinsane has joined MetaFilter)

    Nice Donkey: 41.8% [23 of 55]
    every_one_needs_a_hug_sometimes: 25.4% [16 of 63]
    aihal: 19.2% [15 of 78]
    jnaps: 18.7% [14 of 75]
    Combustible Edison Lighthouse: 18.2% [32 of 176]
    Duncan: 17.7% [26 of 147]
    MiguelCardoso: 16.9% [11 of 65]
    maus: 16.9% [12 of 71]
    Catfry: 16.8% [52 of 310]
    shiu mai baby: 16.7% [35 of 209]

    Who has favorited the same items as goodnewsfortheinsane the most?

    tehloki [130]
    scrump [115]
    schyler523 [90]
    deborah [81]
    flibbertigibbet [79]
    nasreddin [70]
    graventy [67]
    koeselitz [67]
    blueberry [66]
    shmegegge [62]

    posted by FishBike at 7:27 PM on August 28, 2009 [1 favorite]


    Dammit, I'm curious. Hit me, FishBike.
    posted by Pronoiac at 8:49 PM on August 28, 2009


    In fact, any analysis that relies on tags is going to run into all kinds of problems with consistency and thoroughness.

    Hey, I tried to convince jessamyn that we need a controlled vocabulary, but...
    posted by Pope Guilty at 1:12 AM on August 29, 2009


    Bam!

    Stats for:Pronoiac
    Who does Pronoiac favorite the most?
    (simple count of favorites)

    Artw [9]
    cortex [7]
    rtha [5]
    ROU_Xenophobe [4]
    Pastabagel [4]
    loquacious [4]
    DU [4]
    Blazecock Pileon [4]
    Ambrosia Voyeur [4]
    goodnewsfortheinsane [3]

    Who does Pronoiac favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    0.11% (5 of 4519) of rtha's comments+posts
    0.08% (9 of 11966) of Artw's comments+posts
    0.05% (7 of 14996) of cortex's comments+posts

    Who favorites Pronoiac the most?
    (simple count of favorites)

    BrotherCaine [11]
    JHarris [11]
    Artw [10]
    tehloki [10]
    Reverend John [8]
    rokusan [8]
    lukemeister [7]
    Rock Steady [7]
    liza [7]
    misha [7]

    Who favorites Pronoiac the most?
    (percent of your comments+posts since they joined)

    BrotherCaine: 0.55% (11 of 2010) of Pronoiac's comments+posts
    JHarris: 0.55% (11 of 2010) of Pronoiac's comments+posts
    Lexica: 0.53% (2 of 379) of Pronoiac's comments+posts
    Artw: 0.50% (10 of 2010) of Pronoiac's comments+posts
    tehloki: 0.50% (10 of 2010) of Pronoiac's comments+posts
    Reverend John: 0.43% (8 of 1847) of Pronoiac's comments+posts
    rokusan: 0.40% (8 of 2010) of Pronoiac's comments+posts
    liza: 0.38% (7 of 1847) of Pronoiac's comments+posts
    misha: 0.35% (7 of 1975) of Pronoiac's comments+posts
    lukemeister: 0.35% (7 of 2010) of Pronoiac's comments+posts

    Who are Pronoiac's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Pronoiac [9] ---- [10] Artw
    Pronoiac [4] ---- [6] Blazecock Pileon
    Pronoiac [3] ---- [4] klangklangston
    Pronoiac [3] ---- [4] goodnewsfortheinsane
    Pronoiac [3] ---- [7] koeselitz
    Pronoiac [4] ---- [2] Ambrosia Voyeur
    Pronoiac [2] ---- [2] shmegegge
    Pronoiac [2] ---- [2] filthy light thief
    Pronoiac [2] ---- [3] Devils Rancher
    Pronoiac [2] ---- [5] Pope Guilty

    Who are Pronoiac's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Pronoiac [1.72%] ---- [0.17%] hamida2242
    Pronoiac [0.22%] ---- [0.15%] regicide is good for you
    Pronoiac [0.14%] ---- [0.20%] grubi
    Pronoiac [0.12%] ---- [0.11%] Shepherd
    Pronoiac [0.13%] ---- [0.11%] Lemurrhea
    Pronoiac [0.12%] ---- [0.10%] loquacious
    Pronoiac [0.29%] ---- [0.10%] jtron
    Pronoiac [0.26%] ---- [0.10%] tzikeh
    Pronoiac [0.35%] ---- [0.10%] nooneyouknow
    Pronoiac [0.15%] ---- [0.10%] flibbertigibbet

    Of the threads where Pronoiac has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 27.7% [309 of 1115]
    DU: 25.0% [279 of 1115]
    quin: 23.1% [258 of 1115]
    Blazecock Pileon: 19.3% [215 of 1115]
    Brandon Blatcher: 18.8% [210 of 1115]
    jessamyn: 18.6% [207 of 1115]
    Artw: 18.0% [201 of 1115]
    blue_beetle: 15.1% [168 of 1115]
    Astro Zombie: 15.0% [167 of 1115]
    The Whelk: 14.9% [108 of 727]

    Of the threads where other users have been active, in whose has Pronoiac also been the most active by percentage?
    (limited to threads active after Pronoiac has joined MetaFilter)

    double block and bleed: 18.3% [26 of 142]
    y6y6y6: 14.3% [10 of 70]
    cthuljew: 14.3% [11 of 77]
    dead cousin ted: 14.2% [34 of 239]
    Electric Dragon: 13.4% [15 of 112]
    mr_crash_davis mark II: Jazz Odyssey: 13.1% [55 of 421]
    egypturnash: 13.0% [15 of 115]
    theroadahead: 13.0% [9 of 69]
    Liver: 13.0% [10 of 77]
    little e: 12.9% [18 of 139]

    Who has favorited the same items as Pronoiac the most?

    graventy [25]
    shmegegge [25]
    JHarris [23]
    lalochezia [23]
    nasreddin [23]
    schyler523 [23]
    tehloki [23]
    scrump [21]
    Caduceus [20]
    koeselitz [20]


    posted by FishBike at 7:43 AM on August 29, 2009 [1 favorite]


    Hey, I tried to convince jessamyn that we need a controlled vocabulary, but...

    I suppose it would be possible to make a lookup table to translate tags actually used into tags one thinks should have been used, to consolidate similar tags, singular vs. plural, etc. Then the tag data could be transformed via this lookup table and various kinds of analysis done with that.

    How to actually do this for all 143,647 distinct tags in the latest Infodump is left as an exercise for the reader. But perhaps this also gives some insight into the size of problem it would be to have a limited set of tags to choose from, requiring maintenance.
    posted by FishBike at 8:40 AM on August 29, 2009


    Yup. Even though my number of posts & favorites is low, that analysis still pointed out news to me.
    posted by Pronoiac at 12:13 PM on August 29, 2009


    FishBike, when you ran these stats last time (a couple months ago), were they based on a previous info dump? If so, what was the date (roughly)?
    posted by iamkimiam at 12:15 PM on August 29, 2009


    The previous round of these, in the user matching discussion in June, was based on the January 1, 2009 Infodump.
    posted by FishBike at 5:39 AM on August 30, 2009


    Awesome, thanks. How much of a pain would it be to run mine again? My activity has changed significantly since then (especially since I used to favorite myself a lot, not knowing it was a socially weird thing to do – I was using it as a way to find my stuff easily, for reference. Anyways, I was my own favorite MeFite, embarrassingly enough). Either way, I'd be curious to compare the different results. Meta meta meta...

    (If you run mine again, and you'd prefer to MeMail me the results instead of posting them here, that'd be cool too.)
    posted by iamkimiam at 11:36 AM on August 30, 2009


    It's not a pain at all:

    Stats for:iamkimiam (previously)
    Who does iamkimiam favorite the most?
    (simple count of favorites)

    cortex [64]
    Miko [56]
    jessamyn [46]
    fourcheesemac [35]
    Pastabagel [34]
    Astro Zombie [30]
    Anonymous [29]
    grumblebee [28]
    mathowie [28]
    ericb [26]

    Who does iamkimiam favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    3.05% (6 of 197) of tractorfeed's comments+posts
    2.62% (5 of 191) of DaShiv's comments+posts
    2.11% (7 of 332) of neroli's comments+posts
    2.01% (12 of 598) of billyfleetwood's comments+posts
    1.54% (6 of 389) of L. Fitzgerald Sjoberg's comments+posts
    1.35% (15 of 1114) of Dee Xtrovert's comments+posts
    1.14% (5 of 439) of Naberius's comments+posts
    1.11% (34 of 3068) of Pastabagel's comments+posts
    1.08% (28 of 2596) of grumblebee's comments+posts
    1.04% (10 of 958) of felix betachat's comments+posts

    Who favorites iamkimiam the most?
    (simple count of favorites)

    nasreddin [18]
    roll truck roll [14]
    chicainthecity [13]
    Miko [13]
    tractorfeed [12]
    blueberry [12]
    divabat [12]
    rtha [12]
    limeonaire [12]
    ifjuly [11]

    Who favorites iamkimiam the most?
    (percent of your comments+posts since they joined)

    Diagonalize: 1.61% (1 of 62) of iamkimiam's comments+posts
    nasreddin: 0.79% (18 of 2281) of iamkimiam's comments+posts
    Marisa Stole the Precious Thing: 0.78% (10 of 1287) of iamkimiam's comments+posts
    oinopaponton: 0.75% (1 of 133) of iamkimiam's comments+posts
    Blue Jello Elf: 0.75% (1 of 133) of iamkimiam's comments+posts
    Amethyst, Princess of Gemworld: 0.75% (1 of 133) of iamkimiam's comments+posts
    iamkimiam: 0.75% (17 of 2275) of iamkimiam's comments+posts
    chicainthecity: 0.67% (13 of 1944) of iamkimiam's comments+posts
    twins named Lugubrious and Salubrious: 0.64% (11 of 1725) of iamkimiam's comments+posts
    liza: 0.63% (10 of 1588) of iamkimiam's comments+posts

    Who are iamkimiam's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    iamkimiam [56] ---- [13] Miko
    iamkimiam [14] ---- [12] rtha
    iamkimiam [20] ---- [9] languagehat
    iamkimiam [9] ---- [8] LobsterMitten
    iamkimiam [8] ---- [10] Marisa Stole the Precious Thing
    iamkimiam [17] ---- [7] Blazecock Pileon
    iamkimiam [7] ---- [14] roll truck roll
    iamkimiam [16] ---- [7] klangklangston
    iamkimiam [7] ---- [7] box
    iamkimiam [7] ---- [7] Pope Guilty

    Who are iamkimiam's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    iamkimiam [0.75%] ---- [0.75%] iamkimiam
    iamkimiam [0.99%] ---- [0.57%] Miko
    iamkimiam [3.05%] ---- [0.53%] tractorfeed
    iamkimiam [0.61%] ---- [0.53%] blueberry
    iamkimiam [0.44%] ---- [0.48%] special-k
    iamkimiam [0.44%] ---- [0.61%] roll truck roll
    iamkimiam [0.43%] ---- [0.63%] liza
    iamkimiam [0.45%] ---- [0.39%] flibbertigibbet
    iamkimiam [0.69%] ---- [0.39%] njbradburn
    iamkimiam [0.34%] ---- [0.39%] Sova

    Of the threads where iamkimiam has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    cortex: 22.5% [348 of 1545]
    jessamyn: 20.6% [319 of 1545]
    Brandon Blatcher: 17.9% [276 of 1545]
    quin: 17.0% [263 of 1545]
    languagehat: 15.5% [240 of 1545]
    Blazecock Pileon: 15.5% [239 of 1545]
    DU: 14.6% [225 of 1541]
    filthy light thief: 13.5% [77 of 572]
    turgid dahlia: 13.4% [115 of 857]
    The Whelk: 13.1% [81 of 616]

    Of the threads where other users have been active, in whose has iamkimiam also been the most active by percentage?
    (limited to threads active after iamkimiam has joined MetaFilter)

    every_one_needs_a_hug_sometimes: 22.4% [13 of 58]
    obloquy: 19.4% [13 of 67]
    bigbigdog: 17.6% [9 of 51]
    sambosambo: 14.9% [29 of 195]
    shiu mai baby: 14.9% [29 of 195]
    Combustible Edison Lighthouse: 14.9% [26 of 175]
    Surfurrus: 14.8% [16 of 108]
    theroadahead: 14.5% [10 of 69]
    gofargogo: 14.1% [13 of 92]
    double block and bleed: 14.1% [20 of 142]

    Who has favorited the same items as iamkimiam the most?

    schyler523 [381]
    scrump [368]
    tehloki [367]
    blueberry [325]
    flibbertigibbet [323]
    graventy [255]
    deborah [252]
    Pope Guilty [250]
    lalochezia [239]
    shmegegge [237]


    posted by FishBike at 7:14 AM on August 31, 2009 [1 favorite]


    Awesome, super thanks!
    posted by iamkimiam at 7:25 AM on August 31, 2009


    Is it possible to figure out how many comments are only quoted text and username plus the word "eponysterical" and nothing else?
    posted by Kattullus at 12:39 PM on September 1, 2009


    I've gone ahead and internet married you, iamkimiam. We can toast at the next meetup.
    posted by rtha at 1:07 PM on September 1, 2009


    Pronoiac, I never knew you cared.

    Can we check to see who I am stalking?
    posted by Artw at 1:13 PM on September 1, 2009


    Yes we will!
    posted by iamkimiam at 1:53 PM on September 1, 2009


    Thanks FishBike, and sorry for the late reply!
    posted by goodnewsfortheinsane at 8:05 PM on September 5, 2009


    I'm poring over this thread for additions to the wiki.

    mathlete: having the Celebrity Mefites & Groupie Mefites be counted by mutual contacts seems a bit odd.

    Artw: I thought you knew. I thought it was so blatantly obvious. I'm a comic book geek, baby.
    posted by Pronoiac at 2:53 PM on September 7, 2009


    Double checking the numbers, I think the "mutual contact" verbiage is inaccurate, included by accident.
    posted by Pronoiac at 2:59 PM on September 7, 2009


    That's a minus sign.

    Number of people who contact them minus number of mutual contacts.
    Number of mefites they contact minus mutual contacts.
    posted by mathlete at 3:54 PM on September 7, 2009


    Oops! Okay, I clarified that on the wiki page.

    Between mathlete & FishBike & other new additions, that page is 70% longer than it was.
    posted by Pronoiac at 4:27 PM on September 7, 2009


    How many threads are 500 or more comments long?
    posted by Kattullus at 4:44 PM on September 7, 2009


    How many threads are 500 or more comments long?

    102, across all the sites represented in the latest Infodump. There are 229,389 threads in the Infodump, so we're in the top 0.05% here.
    posted by FishBike at 5:24 PM on September 7, 2009


    I am apparently stalking fearfulsymatrey and assorted posters on comicsy threads.

    Stats for:Artw

    Who does Artw favorite the most?
    (simple count of favorites)

    fearfulsymmetry [203]
    homunculus [62]
    Astro Zombie [59]
    cortex [58]
    kittens for breakfast [52]
    Alvy Ampersand [51]
    Blazecock Pileon [42]
    Marisa Stole the Precious Thing [40]
    DU [37]
    The Whelk [34]

    Who does Artw favorite the most?
    (percent of their comments+posts since you joined)
    (limited to users you've favorited 5+ times)

    6.73% (203 of 3017) of fearfulsymmetry's comments+posts
    4.05% (6 of 148) of cstross's comments+posts
    4.02% (7 of 174) of Mr. Bad Example's comments+posts
    3.33% (5 of 150) of UKnowForKids's comments+posts
    3.03% (6 of 198) of permafrost's comments+posts
    2.73% (5 of 183) of justsomebodythatyouusedtoknow's comments+posts
    2.33% (12 of 515) of straight's comments+posts
    2.11% (7 of 332) of neroli's comments+posts
    2.07% (6 of 290) of davemee's comments+posts
    1.96% (6 of 306) of fullerine's comments+posts

    Who favorites Artw the most?
    (simple count of favorites)

    fearfulsymmetry [205]
    tehloki [153]
    Pope Guilty [128]
    JHarris [124]
    BrotherCaine [109]
    Marisa Stole the Precious Thing [81]
    liza [75]
    twins named Lugubrious and Salubrious [60]
    Blazecock Pileon [59]
    Joe Beese [58]

    Who favorites Artw the most?
    (percent of your comments+posts since they joined)

    fearfulsymmetry: 1.88% (205 of 10916) of Artw's comments+posts
    Joe Beese: 1.28% (58 of 4537) of Artw's comments+posts
    tehloki: 1.25% (153 of 12208) of Artw's comments+posts
    Peztopiary: 1.22% (3 of 245) of Artw's comments+posts
    Pope Guilty: 1.10% (128 of 11617) of Artw's comments+posts
    Marisa Stole the Precious Thing: 0.96% (81 of 8472) of Artw's comments+posts
    JHarris: 0.93% (124 of 13364) of Artw's comments+posts
    BrotherCaine: 0.82% (109 of 13364) of Artw's comments+posts
    liza: 0.74% (75 of 10194) of Artw's comments+posts
    twins named Lugubrious and Salubrious: 0.57% (60 of 10475) of Artw's comments+posts

    Who are Artw's top 10 mutual favorites?
    (by simple count of whoever has favorited the other the least)

    Artw [203] ---- [205] fearfulsymmetry
    Artw [52] ---- [55] kittens for breakfast
    Artw [42] ---- [59] Blazecock Pileon
    Artw [40] ---- [81] Marisa Stole the Precious Thing
    Artw [37] ---- [39] DU
    Artw [51] ---- [33] Alvy Ampersand
    Artw [23] ---- [28] Kattullus
    Artw [23] ---- [58] rokusan
    Artw [23] ---- [24] klangklangston
    Artw [20] ---- [58] Joe Beese

    Who are Artw's top 10 mutual favorites?
    (by percentage favorited of others posts since joining)

    Artw [6.73%] ---- [1.88%] fearfulsymmetry
    Artw [0.82%] ---- [0.96%] Marisa Stole the Precious Thing
    Artw [0.71%] ---- [1.28%] Joe Beese
    Artw [0.61%] ---- [0.93%] JHarris
    Artw [1.66%] ---- [0.47%] kittens for breakfast
    Artw [0.87%] ---- [0.46%] Caduceus
    Artw [0.40%] ---- [0.38%] nasreddin
    Artw [0.37%] ---- [1.10%] Pope Guilty
    Artw [0.36%] ---- [0.48%] rokusan
    Artw [0.35%] ---- [0.45%] Blazecock Pileon

    Of the threads where Artw has been active, who else has been active in the highest percentage?
    (limited to threads active after the comparison user has joined MetaFilter)

    DU: 27.3% [916 of 3360]
    delmoi: 27.2% [1120 of 4123]
    Joe Beese: 25.5% [291 of 1139]
    fearfulsymmetry: 23.3% [686 of 2944]
    quin: 22.4% [925 of 4123]
    Blazecock Pileon: 21.8% [881 of 4043]
    filthy light thief: 20.4% [279 of 1369]
    Astro Zombie: 19.3% [777 of 4033]
    The Whelk: 17.3% [255 of 1471]
    Smedleyman: 16.3% [673 of 4123]

    Of the threads where other users have been active, in whose has Artw also been the most active by percentage?
    (limited to threads active after Artw has joined MetaFilter)

    ymgve: 40.2% [33 of 82]
    JustAsItSounds: 39.6% [21 of 53]
    theroadahead: 36.2% [25 of 69]
    Doktor Zed: 35.8% [44 of 123]
    fearfulsymmetry: 35.3% [686 of 1942]
    aihal: 34.6% [27 of 78]
    Reggie Knoble: 34.4% [22 of 64]
    Amanojaku: 33.5% [55 of 164]
    panboi: 32.1% [45 of 140]
    Ragma: 30.5% [47 of 154]

    Who has favorited the same items as Artw the most?

    tehloki [341]
    JHarris [305]
    Pope Guilty [224]
    blueberry [210]
    BrotherCaine [201]
    nicolin [178]
    limeonaire [164]
    shmegegge [161]
    schyler523 [155]
    scrump [150]

    posted by Artw at 5:32 PM on September 7, 2009


    « Older PAX refugee-con   |   IANABroker Newer »

    You are not logged in, either login or create an account to post comments