Data Wankery Inside December 16, 2012 6:08 AM   Subscribe

I recently got my hands on a copy of Tableau Desktop. It's supposed to be a business intelligence app, but nevermind that. The first thing I did was hook it up to the infodump files!

I'd be curious to see if anyone else has tried their hand at visualizing the infodump. We've had a lot of threads about it, but a curious lack of awesome charts. Anyway, here are some initial results using the AskMeFi data:

Above-the-fold post length has stayed remarkably steady over the years, but not so with the average below-the-fold post length, which has been growing slowly but surely. This goes double for Human Relations questions, which are far and away the most lengthy.

We all know Human Relations posts tend to get a lot of comments, but did you know that (oddly) the Computers & Internet and Technology categories average the fewest comments and favorites?

Food & Drink comes out on top in terms of post "efficiency" (average number of favorites per 100 characters in the post itself).

Check out how certain categories get more favorites on average during certain times of the year: Education peaks right before the school year starts, and Shopping gets popular with MeFites right before the holidays.

When and where are the mods' hammers of justice busy? Try the middle of the week in the Human Relations category, where the most posts have been deleted by far.

Speaking of busy, here's a chart of AskMeFi's top 10 posters and where they post most often.

Bonus: Average username length is gradually increasing. This is probably bad news for East Manitoba Regional Junior Kabaddi Champion '94.
posted by aheckler to MetaFilter-Related at 6:08 AM (65 comments total) 74 users marked this as a favorite

ooh; shiny charts! Thanks, aheckler!
posted by taz (staff) at 6:14 AM on December 16, 2012 [2 favorites]


Fun!
posted by jessamyn (staff) at 6:15 AM on December 16, 2012 [1 favorite]


Got Tableau Desktop curious and apparently it's free software. Might be useful one day when you have large datasets to analyze.
posted by Foci for Analysis at 6:18 AM on December 16, 2012 [10 favorites]


I've *just* started tinkering with Tableau for making geographic heatmaps of the 2010 & 2012 MeFi pronunciation survey data for my dissertation research (about language change on MetaFilter). In the next couple weeks I plan on really getting into the software. I'm waaay encouraged by what you've done here...these are super duper brilliant! Loving this so much.
posted by iamkimiam at 6:38 AM on December 16, 2012 [6 favorites]


How cool! Thank you for doing this.
posted by LobsterMitten (staff) at 7:31 AM on December 16, 2012 [1 favorite]


I'm not wasting time on Metafilter, I'm making posts more efficient.
posted by arcticseal at 7:43 AM on December 16, 2012 [6 favorites]


Cool, thanks aheckler!
posted by carter at 7:52 AM on December 16, 2012


Nice!
posted by OmieWise at 7:54 AM on December 16, 2012


The downside of Tableau Public (their free version) is that it limits the numbers of rows in the dataset you can use, and you can only save the results to their public website.
posted by aheckler at 7:57 AM on December 16, 2012


Neat!
posted by rtha at 7:59 AM on December 16, 2012


Shiny!

> my dissertation research (about language change on MetaFilter)

I trust we all get to serve as your dissertation committee. Also, please let us know if we are evolving a new branch of Indo-European.
posted by languagehat at 8:10 AM on December 16, 2012 [15 favorites]


I added this to my favourites. Then I removed it so I could add it again. A++ would favourite again (again!).
posted by shelleycat at 8:35 AM on December 16, 2012


What?

INDO-EUROPEAN is evolving!

Congratulations! Your INDO-EUROPEAN evolved into CHARIZARD!
posted by griphus at 8:52 AM on December 16, 2012 [22 favorites]


I can think of nothing less terrifying!
posted by iamkimiam at 8:53 AM on December 16, 2012 [2 favorites]


Is there a correlation between post length and competition months?
posted by zamboni at 9:02 AM on December 16, 2012


reluctantly, walter accompanies a pregnant gunslinger to the dixie pig.
posted by quonsar II: smock fishpants and the temple of foon at 9:07 AM on December 16, 2012 [1 favorite]


Is there a correlation between post length and competition months?

I made a stacked bar chart showing the average total length of posts to the Blue for years in which we've had a December "Best Post" contest (2006, 2008, 2010, and 2011).

December is almost 3 standard deviations above the mean, so yes?
posted by aheckler at 9:22 AM on December 16, 2012 [4 favorites]


Additionally, more favorites are given out in December, but from some of the other calculations I've done, that appears to hold true whether or not there's a post contest.
posted by aheckler at 9:34 AM on December 16, 2012


I wonder if computer questions get fewer answers because they're so rarely open-ended.
posted by Holy Zarquon's Singing Fish at 9:45 AM on December 16, 2012 [3 favorites]


Maybe a correlation to consumption of Christmas spirits?
posted by arcticseal at 9:45 AM on December 16, 2012


Speaking of post contests: voting closes tonight for Week 2 of the Mefite's Choice contest.
-vote with the fantastic flag
-for as many FPPs as you like
-by midnight Pacific time tonight.
posted by LobsterMitten (staff) at 10:01 AM on December 16, 2012


> Got Tableau Desktop curious and apparently it's free software. Might be useful one day when you have large datasets to analyze.

Unfortunately it's not free, it's a free 14 day trial.
posted by bjrn at 10:36 AM on December 16, 2012 [2 favorites]


Do Meta Talk posts qualify for fantastic flags ---> gifting?
posted by infini at 10:45 AM on December 16, 2012


Also, please let us know if we are evolving a new branch of Indo-European.

Yes!

Represent!
posted by infini at 10:46 AM on December 16, 2012


Do Meta Talk posts qualify for fantastic flags ---> gifting?

Nope, just for MeFi.
posted by jessamyn (staff) at 10:46 AM on December 16, 2012


Data wanking is one of the best wankings.
posted by Mister_A at 11:11 AM on December 16, 2012 [7 favorites]


Interesting that the top ten AskMefi posters seem to post most about Computers & Internet, when that category averages the fewest comments and favorites.
posted by koeselitz at 11:13 AM on December 16, 2012 [1 favorite]


Try, try again.
posted by infini at 11:26 AM on December 16, 2012


Beautiful, thank you aheckler.

languagehat: "Also, please let us know if we are evolving a new branch of Indo-European."

Knowing Metafilter, we'd probably accidentally resurrect Tocharian.
posted by goodnewsfortheinsane (staff) at 11:30 AM on December 16, 2012 [1 favorite]


Awesome datawankery, aheckler.
posted by cortex (staff) at 12:29 PM on December 16, 2012


Is this where I admit I have a script for juggling favorite counts and best answer counts over a given period of time in AskMe?

I stopped messing with it for three reasons: 1) too many arbitrary decisions about minimum answers and date ranges; 2) too little difference among users; 3) not actually useful for anything.

As an example, here's a report that considers only answers given in 2012 and users who gave 250+ answers in 2012 to rank their best answer percentages:
1  16.6% holgate                   26 11.6% valkyryn
2  15.8% straw	                   27 11.6% dhartung
3  14.7% peagood                   28 11.5% showbiz_liz
4  14.7% jetlagaddict              29 11.5% oneirodynia
5  14.6% drlith	                   30 11.5% barnone
6  14.3% bonehead                  31 11.5% anaelith
7  14.2% Kadin2048                 32 11.5% Mizu
8  14.1% nebulawindphone           33 11.4% halfbuckaroo
9  14.1% Monsieur Caution          34 11.4% carmicha
10 14%   Eyebrows McGee	           35 11.4% argonauta
11 13.8% crush-onastick	           36 11.4% Frowner
12 13.8% RJ Reynolds               37 11.3% KathrynT
13 13.7% supercres                 38 11.1% jquinby
14 13.6% brainmouse                39 11.1% caclwmr4
15 13.5% nickrussell               40 11%   gauche
16 13.4% rongorongo                41 10.9% hoyland
17 13.3% snorkmaiden               42 10.8% hattifattener
18 13.1% Houstonian                43 10.8% Rosie M. Banks
19 12.8% Tomorrowful               44 10.7% jon1270
20 12.5% acidic	                   45 10.6% carsonb
21 12.5% MonkeyToes                46 10.5% ook
22 12.4% bilabial                  47 10.5% hurdy gurdy girl
23 12.3% smoke	                   48 10.5% grouse
24 11.8% Slap*Happy                49 10.4% restless_nomad
25 11.8% FAMOUS MONSTER	           50 10.4% looli
Incidentally, the 100th best percentage is 8.7. The 200th is 6.1. And there are only 277 users who made the cut in number of answers.

The conclusion I draw from this is that we're all doing pretty well giving people answers they regard as helpful.

And here's a report that likewise considers only answers given in 2012 and users who gave 250+ answers in 2012 to rank their average favorites per answer:
1  7.1419 cairdeas                 26 3.4852 Blasdelb
2  7.0718 Frowner                  27 3.4813 Brandon Blatcher
3  5.6672 scody	                   28 3.4707 griphus
4  5.4714 nickrussell              29 3.4489 phunniemee
5  5.0963 ThePinkSuperhero         30 3.4457 jbenben
6  4.9281 headnsouth               31 3.4109 tel3path
7  4.8268 alphanerd                32 3.4054 restless_nomad
8  4.7714 the young rope-rider     33 3.3324 RJ Reynolds
9  4.6773 hermitosis               34 3.3147 xingcat
10 4.6475 Snarl Furillo	           35 3.3107 PhoBWanKenobi
11 4.5697 FAMOUS MONSTER           36 3.2321 wolfdreams01
12 4.4412 Eyebrows McGee           37 3.1808 Ragged Richard
13 4.3709 decathecting             38 3.0497 These Birds of a Feather
14 4.2825 DarlingBri               39 2.9876 St. Alia of the Bunnies
15 4.2272 jayder                   40 2.947  ellF
16 4.1405 French Fry               41 2.9414 A Terrible Llama
17 4.0716 showbiz_liz              42 2.9045 Rodrigo Lamaitre
18 4.049  elizardbits              43 2.8732 BlahLaLa
19 4.0277 smoke	                   44 2.8458 inturnaround
20 3.9242 brainmouse               45 2.8074 gauche
21 3.683  KathrynT                 46 2.7526 mochapickle
22 3.6506 The World Famous         47 2.7403 nebulawindphone
23 3.587  schroedinger             48 2.7315 argonauta
24 3.5184 bilabial                 49 2.7306 facetious
25 3.5106 Tomorrowful              50 2.72   something something
Here the 100th best ratio is 1.9719. The 200th is 1.0981. And of course there are only 277 users who made the cut in number of answers.

So those top couple dozen are amazing, but I'm not sure the rest of us are meaningfully differentiated with this metric (says #52).
posted by Monsieur Caution at 2:45 PM on December 16, 2012 [4 favorites]


Knowing Metafilter, we'd probably accidentally resurrect Tocharian.

Alas, in the course of toying irresponsibly with the Heart of Ahriman, Reddit already managed to resurrect Tocharian A last year. All that's left for us is Tocharian B, but we will have to find the Heart of Tammuz to revive it.
posted by Nomyte at 2:54 PM on December 16, 2012


It's not so odd that computer questions get the fewest comments. With computer questions, there is generally a testable, correct answer. Once the correct answer is given, there is little reason to continue discussion.
posted by fings at 3:24 PM on December 16, 2012 [4 favorites]


I'm trying to figure out how many answers I gave in 2012. Fie upon thee, person who managed to add a new thing to track when it comes to judging my participation on this site as either adequate or insufficient.
posted by SMPA at 3:55 PM on December 16, 2012


Average username length is gradually increasing.

Is this an "all the good short names are taken" effect?
posted by We had a deal, Kyle at 4:06 PM on December 16, 2012 [2 favorites]


Neat. Beats the heck out of running calculated data through excel.
posted by Tell Me No Lies at 4:08 PM on December 16, 2012


Okay, what happened in 2004? Why the massive spike?
posted by marienbad at 4:35 PM on December 16, 2012


All I know is that signups were re-opened (and the $5 fee was started) in late 2004. Maybe the unbridled joy over this led to a spike in long, jokey usernames?
posted by goodnewsfortheinsane (staff) at 4:56 PM on December 16, 2012


Nov. 18th, 2004 is when things went bad.
posted by Brandon Blatcher at 5:20 PM on December 16, 2012 [2 favorites]


I'm trying to figure out how many answers I gave in 2012. Fie upon thee, person who managed to add a new thing to track when it comes to judging my participation on this site as either adequate or insufficient.

The infodumpster can give those numbers. It also has raw best answer counts, raw favorite counts, etc.

But my strong suggestion is not to read anything into any of it. I've generated piles of lists and considered dozens of reasons why desirable behavior leads to various outcomes, and the simple fact no analysis can get around is that AskMe doesn't have a unified field of motivations, interactions, results, etc.

I think it's nice to recognize a few people whose contributions are usually so thoughtful that their favorite-to-answer ratio is high, but even then it's a shame not to recognize people who are just a little more inclined toward conversational posts or people who're always willing to take a stab at answering questions that have few answers, skewing their personal results. And people with a high volume of good answers have done fantastic things, even if their average is modest. And people with a low volume of good answers have almost always helped someone. Some of the best answer percentages for people with just, say, 50-100 answers are amazing--there are some serious specialists out there only answering when they can nail it.

In short, people aren't all doing the same kind of thing, and a leader board is silly. Obviously, I thought it was fun to play with, but really, if there's something I can contribute based on that fun, I hope it's that no one should sweat this stuff, ever.
posted by Monsieur Caution at 5:22 PM on December 16, 2012 [1 favorite]


All I know is that signups were re-opened (and the $5 fee was started) in late 2004. Maybe the unbridled joy over this led to a spike in long, jokey usernames
posted by goodnewsfortheinsane (staff)


Eponysterical (with a 2004 join date).
posted by Superplin at 5:27 PM on December 16, 2012


The infodumpster can give those numbers.

Woot, 620 answers given! And there's still more than two weeks left in the year!

;)

Also, holy heck, Sidhedevil. You posted just shy of 10 answers a day so far this year!
posted by SMPA at 5:56 PM on December 16, 2012


This is awesome. can you use Tableau's interactive twiddly functions to set up an interactive twiddly MeFi Big Data Visualization Customizable-O-Matic Experience website where we can play with the data or is that not straightforward?

That is not straightforward. Or cheap. You'd probably need to drop some serious cash on Tableau Server (not Desktop) or a similar service, such as Chartio.

What are the two big spikes in the username length chart?

The wild variations in average username length from 2003ish to 2005ish are due to a very low number of user registrations during those periods of time. The highest spike, for example, is for March 2004. Only one user registered that month, and their username was 17 characters long. Thus, the average for that month is abnormally high.
posted by aheckler at 6:19 PM on December 16, 2012 [4 favorites]


Aha, one of the few times I've ever shown up on a list like that in MeTa. And it's a good one too! Also undeniably happy that this username is on the one and not the other—the best answer isn't always the most popular one!
posted by carsonb at 6:47 PM on December 16, 2012


Data wanking is one of the best wankings.

Thank you, Geordi.
posted by It's Raining Florence Henderson at 7:03 PM on December 16, 2012 [5 favorites]


My best answer percentage for this year is 24.24%, but I didn't make Monsieur Caution's list because I only had 230 answers.

Woo!
posted by zsazsa at 10:47 PM on December 16, 2012 [2 favorites]


Woo!

In the 200-249 answer bracket, you and bcwinters both have spectacular best answer rates for 2012:

24.2% zsazsa
21.8% bcwinters

It hardly accounts for everything, but technical skills and especially web development skills probably give this metric a boost (they clearly did for me). I'm reminded of how college quiz bowl tournaments can't be won solely by chemistry/history double majors who like sports and pop music and happen to have gone to Catholic school, but if you're lucky enough to have one on your team, look out.
posted by Monsieur Caution at 11:23 PM on December 16, 2012


Yeah, it's easier to have a higher best answer rate when you are less prolific than the true Ask MetaFilter stars. My best answer rate for 2012 is 33%, but I only posted 94 comments to Ask MetaFilter in 2012. I wonder if there is a clear trend, i.e. do the top rates consistently rise as you progress through the 150-199, 100-149, 75-99, 50-74, and 25-49 brackets?
posted by RichardP at 11:48 PM on December 16, 2012


do the top rates consistently rise as you progress through the 150-199, 100-149, 75-99, 50-74, and 25-49 brackets?

I'd have guessed so, but the top position is actually kind of stable:
750+: 11.6% valkyryn
700+: 14%   Eyebrows McGee
650+: 14%   Eyebrows McGee
600+: 14%   Eyebrows McGee
550+: 14%   Eyebrows McGee
500+: 16.6% holgate
450+: 16.6% holgate
400+: 16.6% holgate
350+: 16.6% holgate
300+: 16.6% holgate
250+: 16.6% holgate
200+: 24.2% zsazsa
150+: 24.2% zsazsa
100+: 28%   needled
75+:  33%   RichardP
50+:  33%   RichardP
25+:  42.9% retypepassword
BTW, in all ranges between 800 and 1200, jessamyn's 10.3% is the highest rate. I think her moderator comments probably get added in among her regular answers, skewing her best answer rate down (I can only suppose severely).
posted by Monsieur Caution at 12:20 AM on December 17, 2012


Knowing Metafilter, we'd probably accidentally resurrect Tocharian.

Probably we'd end up using Georgian, just out of sheer bloodymindedness.
posted by atrazine at 3:40 AM on December 17, 2012


BTW, in all ranges between 800 and 1200, jessamyn's 10.3% is the highest rate. I think her moderator comments probably get added in among her regular answers, skewing her best answer rate down (I can only suppose severely).

I wouldn't be too sure, a lot of her moderator comments also get marked as best answer. In contentious threads where mod comments are needed the OP frequently marks her mod comments, or so I feel like I've seen. She's awesome, on doubt, but I wonder how you would tease that out.
posted by OmieWise at 5:22 AM on December 17, 2012


I did not know until just now that we had a user names retypepassword.
posted by jessamyn (staff) at 6:29 AM on December 17, 2012 [5 favorites]


RichardP needs to answer in AskMe 6 more times and knock me off the list!
posted by needled at 7:29 AM on December 17, 2012


Having fun looking at and thinking about these. I do some data vis stuff for work, so it's totally okay to read this at eleven on a Monday morning, right?
posted by madcaptenor at 11:11 AM on December 17, 2012


I did not know until just now that we had a user names retypepassword.

Reminds me of a semi-trolling kind of thing someone did in college. You could create mailing lists that were simply listname@school.edu so someone created the mailing list undisclosed-recipients@school.edu*. Turns out that one of the email programs commonly used by admins turned any word not properly formatting in the To: field as a local address. That means f you hit reply-all to an email that had been bcc'd you sent an email to undisclosed-recipients@school.edu - and anyone who added themselves to that public mailing list. They got a few highish profile emails before it got confiscated, IIRC, though never anything particularly damning.

* note: I can't remember the exact name of the list, but this is the gist of it.
posted by maryr at 12:25 PM on December 17, 2012 [1 favorite]


25+: 42.9% retypepassword

I believe this is called playing tight-aggressive, no?
posted by goodnewsfortheinsane (staff) at 1:36 PM on December 17, 2012


As an example, here's a report that considers only answers given in 2012 and users who gave 250+ answers in 2012 to rank their best answer percentages:

Holy cats, I made it onto a list!
posted by jquinby at 6:05 PM on December 17, 2012 [1 favorite]


It's interesting (to me, anyway) that the overlap between best answerers and most favourited in Ask is so slight. I wonder what that means? Lots of best answers in questions few read? More favourites in fewer, more heavily-trafficked questions? It would be interesting to figure those patterns out.
posted by bonehead at 10:41 AM on December 18, 2012


My first thought on this would be to say that favourites are crowd sourced whereas only the OP knows which answers suit their distinct individual situation and needs.
posted by infini at 10:50 AM on December 18, 2012


It's interesting (to me, anyway) that the overlap between best answerers and most favourited in Ask is so slight.

I think it's pretty simple. The questions whose answers get the most favorites are often human relations questions. And in human relations questions the OP doesn't usually mark the Best Answer as Best Answer, he or she marks the answer which most nearly corresponds with the answer he or she was looking for in the first place.

So you get relationship questions with the equivalent of "I know people say mixing bleach and ammonia is bad for me but I'm thinking about mixing bleach and ammonia because I really like it, what do you guys think?" And the DONT MIX BLEACH AND AMMONIA answer is favorited 278 times and repeated a few dozen times but the OP marks the one "Yeah, buy a copy of The Rules, follow them religiously, and MIX THAT BLEACH" as Best Answer because it's what he or she was going to do anyway.
posted by Justinian at 10:51 AM on December 18, 2012 [4 favorites]


infini is far more charitable than I.
posted by Justinian at 10:52 AM on December 18, 2012


I know... right? I suspect its the hot flashes frying up the hamburger cells into crisp ash
posted by infini at 10:57 AM on December 18, 2012 [1 favorite]


Ah. As a non-human, I generally stay out of the HR questions.
posted by bonehead at 11:45 AM on December 18, 2012


Given that it's darker, colder and drearier for most of the user base (OH SHUT UP AUSTRALIA!) it doesn't surprise me that December gets more MeFi love than other months. Now if you'll pardon me, I have some forlorn sighing to get back to before I take a nap.
posted by Kid Charlemagne at 7:31 AM on December 19, 2012


Justinian: "It's interesting (to me, anyway) that the overlap between best answerers and most favourited in Ask is so slight.

I think it's pretty simple. The questions whose answers get the most favorites are often human relations questions. And in human relations questions the OP doesn't usually mark the Best Answer as Best Answer, he or she marks the answer which most nearly corresponds with the answer he or she was looking for in the first place.
"

Another possibility is that "The questions whose answers get the most favorites are often human relations questions" is more or less the whole explanation. You can get "best answer" from across the site, but to really rack up the favorites you have to give answers to human relations questions. So I imagine a person who gives good answers in (say) "computer and internet" is much more likely to be a good answerer than most favorited.
posted by dd42 at 2:26 PM on December 19, 2012


« Older Somebody already made a pony. Can we borrow it?   |   Change scares me Newer »

You are not logged in, either login or create an account to post comments