As Balzac said, 'There goes another novel.'” July 24, 2012 6:26 AM   Subscribe

Last year, cortex went out of his way to quantify our participation on MetaFilter. I asked if he'd be willing to do it again, and he agreed. So if you want to know how many words you've typed into MetaFilter instead of that dissertation/novel/revolutionary manifesto, now is the time and here is the place.
posted by griphus to MetaFilter-Related at 6:26 AM (651 comments total) 4 users marked this as a favorite

(Also, thank you for being game, cortex!)
posted by griphus at 6:27 AM on July 24, 2012 [2 favorites]


Sure.
posted by jonmc at 6:27 AM on July 24, 2012


I'm afraid to ask... but I will anyway.
posted by valkyryn at 6:31 AM on July 24, 2012


Please.
posted by MartinWisse at 6:32 AM on July 24, 2012


Ooooooh, yes please yes please yes please!

Of course, this won't actually quantify how many words I've typed into Metafilter because I spend a lot of time typing up angry screeds and deleting them in the interest of being a reasonable and civil participant.

As long as we're doing this, is there any chance this is also the thread where we ask how much stuff we've had deleted? I'm super curious but if it's not that's cool too.
posted by Mrs. Pterodactyl at 6:33 AM on July 24, 2012 [3 favorites]


Please quantify my existence.
posted by arcticseal at 6:34 AM on July 24, 2012


Oooooh, count me in! I might be in the running for least prolific.
posted by Grither at 6:37 AM on July 24, 2012


Mrs. Pterodactyl: "As long as we're doing this, is there any chance this is also the thread where we ask how much stuff we've had deleted? I'm super curious but if it's not that's cool too."

You can just shoot the mods an e-mail through the contact form for that, if you really want to know.
posted by Grither at 6:38 AM on July 24, 2012


Yes, please!
posted by rmd1023 at 6:38 AM on July 24, 2012


Please!
posted by dirtdirt at 6:42 AM on July 24, 2012


Yes, I would like to know.
posted by Rock Steady at 6:46 AM on July 24, 2012


At least I'll finally really know why my book sits 90% unfinished on my hard drive.
posted by COD at 6:47 AM on July 24, 2012


I'd be interested in knowing how many words I've thrown down the filter hole.
posted by Ghidorah at 6:51 AM on July 24, 2012


me too please.
posted by sweetkid at 6:51 AM on July 24, 2012


sure
posted by Think_Long at 6:52 AM on July 24, 2012


ME ME ME!

(do those three count? Wait, what about those four? Crap. That's five more.)
posted by bondcliff at 6:52 AM on July 24, 2012


Oh me, me!
posted by troika at 6:53 AM on July 24, 2012


(I love the title of this post btw)
posted by sweetkid at 6:54 AM on July 24, 2012


I would like to know. Also the frequency table if that's possible. TY.
posted by Potomac Avenue at 6:56 AM on July 24, 2012


Quantify me, too, please.
posted by crush-onastick at 6:57 AM on July 24, 2012


*raises hand* Please!
posted by zarq at 6:57 AM on July 24, 2012


Please tell me all the information, my deleted comment count, and how I stack up on that secret metric. You know the one.
posted by fleacircus at 6:58 AM on July 24, 2012


Oh god yes!
posted by Blasdelb at 6:59 AM on July 24, 2012


fleacircus: "that secret metric"

Contribution Index ;)
posted by zarq at 6:59 AM on July 24, 2012 [2 favorites]


But I want to know how many words I haven't typed into MetaFilter, either due to closed threads or comments that I didn't think were worthy of posting once I had typed them up.
posted by deezil at 7:00 AM on July 24, 2012 [1 favorite]


Oh, go on. I occasionally get verbose.
posted by h00py at 7:00 AM on July 24, 2012


Sure.
posted by codacorolla at 7:01 AM on July 24, 2012


meeeeeee

prediction: lots of butts
posted by elizardbits at 7:04 AM on July 24, 2012


Thank you, but no.
posted by kuujjuarapik at 7:05 AM on July 24, 2012


Yes, please do me again!
posted by heyho at 7:06 AM on July 24, 2012


Yes, please do me again!

One time is fine, no strings, but if you want a second go round, shouldn't you at least pay for dinner or something? Mayhap some flowers? Doesn't hallmark make cards for things like that?
posted by Ghidorah at 7:09 AM on July 24, 2012 [1 favorite]


I'd like to see. But please separate into drunk and sober comments please.
posted by Splunge at 7:11 AM on July 24, 2012 [1 favorite]


Please thanks!
posted by nile_red at 7:13 AM on July 24, 2012


Yep, yes please.
posted by lazaruslong at 7:17 AM on July 24, 2012


If it is not too much trouble, please include me.
posted by JohnnyGunn at 7:19 AM on July 24, 2012


Yes.
posted by knile at 7:23 AM on July 24, 2012


I'm afraid of the answer but if course I want to know. Yes, please.
posted by rtha at 7:24 AM on July 24, 2012


Sure, I'd be curious.
posted by Wretch729 at 7:34 AM on July 24, 2012


Quantity, not quality!
posted by The Deej at 7:41 AM on July 24, 2012


Count me in, please.
posted by MonkeyToes at 7:41 AM on July 24, 2012


Please.
posted by fake at 7:41 AM on July 24, 2012


Me please!
posted by A Terrible Llama at 7:50 AM on July 24, 2012


Me twenty-eight!
posted by Madamina at 7:50 AM on July 24, 2012


Cool! Any chance we could see some community-wide bulk data (like averages)?
posted by wenestvedt at 7:51 AM on July 24, 2012


Me, please. I suspect it won't be that high but I am curious! Thanks.
posted by mlle valentine at 7:52 AM on July 24, 2012


Ooh, ooh, please choose me!
posted by dotgirl at 7:52 AM on July 24, 2012


I admit that my curiosity compels me to request that my data be included in this experiment.
posted by fantabulous timewaster at 7:53 AM on July 24, 2012


Do me, cortex, do me!
posted by Brandon Blatcher at 7:53 AM on July 24, 2012 [1 favorite]


I like numbers.
posted by jessamyn (staff) at 7:53 AM on July 24, 2012


Quantify my bad self!
posted by slogger at 7:54 AM on July 24, 2012


Okay! Here's the plan, based on how it went last time:

1. I'll make periodic passes through the "me toos" here, collecting usernumbers and running the script as a bulk job for that group. That'll generate a user-specific frequency table that will include data on your total number of used words as well as the number of times you've used each individual word.

The total-words number is at the top of that file so if you just want that it's easy to find, but it's just a tab-separated text file so you can search through it to for fun just using Ctrl-F if you're curious but don't want or know how to use any cleverer search tools. (At some point I intend to build some actual cleverer search tools.)

2. I'll send out a mefimail with the link to that file so that you know where to get it. That file will stick around for a while but not forever, so be sure to save a copy on your end if you want to do more than just glance at it once.

I'll probably run a couple batches today and then a batch every day or so as long as people keep piping up in here. I'll do the first one this morning so all you early birds can get some reasonably instant gratification.
posted by cortex (staff) at 8:04 AM on July 24, 2012 [3 favorites]


A mod once tried to quantify me ...
posted by octobersurprise at 8:04 AM on July 24, 2012 [3 favorites]


As long as we're doing this, is there any chance this is also the thread where we ask how much stuff we've had deleted? I'm super curious but if it's not that's cool too.

Like Grither said, this is better as a separate email to the contact form, so feel free to poke us there if you want.

Cool! Any chance we could see some community-wide bulk data (like averages)?

That's basically what the Mefi Corpus Project is. It's got frequency tables for the entire site by subsite and time period, so if you want to compare the frequency of a given word in your own table to the frequency of that word as used by everyone on the site since it started, you can just search for that word in both tables.

The script I'm using for this thread is actually just a specialization of the script that generated that set of data.
posted by cortex (staff) at 8:07 AM on July 24, 2012


Me me me
posted by grouse at 8:07 AM on July 24, 2012


I WOULD ALSO LIKE TO KNOW
posted by Coatlicue at 8:08 AM on July 24, 2012


I too would like to calculate my MeFi:dissertation ratio
posted by en forme de poire at 8:16 AM on July 24, 2012


Sure.
posted by Atreides at 8:16 AM on July 24, 2012


Me please!
posted by immlass at 8:17 AM on July 24, 2012


I shudder to think, but... Me!
posted by brundlefly at 8:18 AM on July 24, 2012


And for an example and some context, here's the comment I made last time that links to my (now out of date) personal frequency table and talks a little about what you'll find in these.
posted by cortex (staff) at 8:20 AM on July 24, 2012


me me me me

Fortunately I've also been writing 1000 words a day or so for the last several months so I will possibly not feel as abysmally terrible? NOPE PROBABLY STILL WILL
posted by shakespeherian at 8:21 AM on July 24, 2012


pie can fix that
posted by elizardbits at 8:24 AM on July 24, 2012


Yes please
posted by Gilgongo at 8:25 AM on July 24, 2012


elizardbits: "pie can fix that"

pie fixes everything.
posted by zarq at 8:35 AM on July 24, 2012


cortex in old thread: "I've said "zsazsa", "zucker", "zydeco", and probably thirty thousand other words, only once, ever."

I see someone was trying to up their weird word count last go round!
posted by Grither at 8:39 AM on July 24, 2012


I would love to be quantified.

I am also probably one of the oldest here, so getting this info before I expire would be a plus!

But not to bother you. . .
posted by Danf at 8:40 AM on July 24, 2012


Why not.
posted by zamboni at 8:41 AM on July 24, 2012


I'm in.
posted by languagehat at 8:44 AM on July 24, 2012


I want to see languagehat's but can you only show me the times he misused your/you're?
posted by shakespeherian at 8:48 AM on July 24, 2012 [2 favorites]


Hit me!

I keep forgetting that the house always wins.
posted by argonauta at 8:49 AM on July 24, 2012


...as well as the number of times you've used each individual word.

That's the interesting part for me. Thanks in advance, Cortex.
posted by cribcage at 8:50 AM on July 24, 2012


I'm a little curious
posted by dismas at 8:51 AM on July 24, 2012


ok
posted by StickyCarpet at 8:52 AM on July 24, 2012


Numerology!
posted by likeso at 8:57 AM on July 24, 2012


I'm awfully curious about mine, actually.
posted by restless_nomad (staff) at 8:58 AM on July 24, 2012


oh no
posted by Wordwoman at 9:04 AM on July 24, 2012


but yes I must know
posted by Wordwoman at 9:04 AM on July 24, 2012


I think it would be really interesting to see if there are words that mods use more especially compared to before they were mods. For example, I wonder if you say "thanks" a lot more if you are a mod.
posted by Mrs. Pterodactyl at 9:04 AM on July 24, 2012


I know I say "thanks" way more now. Also "folks," "please," and probably "knock it off."
posted by restless_nomad (staff) at 9:06 AM on July 24, 2012 [4 favorites]


sure.
posted by Stynxno at 9:06 AM on July 24, 2012


Yeah "folks" was my guess. And I probably never used the word "troll" before I worked here.
posted by jessamyn (staff) at 9:08 AM on July 24, 2012 [1 favorite]


I spend a lot of time typing up angry screeds and deleting them in the interest of being a reasonable and civil participant.

Heh, yeah. I'm probably at about a 2/3's posted/typed ratio, myself.

I'd like to know how many amazings and awesomes I've posted, and if amazing replaced awesome over time, or if they're eqaully distributed along the timeline.

That would be aweso amazing to look at.

If that's too specific, I'll just have what everyone else is having.
posted by Devils Rancher at 9:14 AM on July 24, 2012


please and thank you. :)
posted by royalsong at 9:23 AM on July 24, 2012


Yes, please.
posted by purpleclover at 9:23 AM on July 24, 2012


I'm also curious.
posted by bookwibble at 9:25 AM on July 24, 2012


Me too please.
posted by brina at 9:26 AM on July 24, 2012


Numbers, please. *bites nails*
Thank you!
posted by madamjujujive at 9:27 AM on July 24, 2012


Yes, please!
posted by leahwrenn at 9:29 AM on July 24, 2012


I know I have far too few words on here yet (Though that's changing), but this sounds fascinating.
posted by CrystalDave at 9:32 AM on July 24, 2012


I am curious infinitewindow.
posted by infinitewindow at 9:35 AM on July 24, 2012


Indubitably sanctioning this codification venture.
posted by vegartanipla at 9:38 AM on July 24, 2012


I would love to see mine.
posted by KathrynT at 9:45 AM on July 24, 2012


Crunchy!
posted by Renoroc at 9:46 AM on July 24, 2012


Yes please, me too.
posted by chavenet at 9:49 AM on July 24, 2012


This sounds fun, I like silly data. Me too please!
posted by Joh at 9:59 AM on July 24, 2012


Yes, please.
posted by ambrosia at 10:15 AM on July 24, 2012


Me, please?
posted by peep at 10:17 AM on July 24, 2012


Seeing as this is the year I learned about quantitative research methods, yes, please! I ♥ data wankery. Thanks, Cortex!
posted by smirkette at 10:21 AM on July 24, 2012


Please. I want to see what eight years looks like.
posted by The White Hat at 10:21 AM on July 24, 2012


Thanks, Greg. Threg.
posted by en forme de poire at 10:22 AM on July 24, 2012 [1 favorite]


You know, I am remembering now one of the fiddly bits of doing this last time was that sending out dozens and dozens of mefimails with slightly edited contents was a pain in the ass.

The simpler thing to do here would be to just provide the format for the url the datafiles will be at, and then you can just plug your userid in there to grab your file. Since it's just doing counts of publicly available and site-searchable comment contents, that seems like kind of a privacy non-issue.

But if you for some reason specifically don't want your file sitting in a place where other people can easily fetch it, let me know and I'll put it somewhere a bit different and let you know where you can find it via mefimail.

I'm gonna kick off the first run in a bit here, will post word counts here along with instructions for grabbing the frequency table file.
posted by cortex (staff) at 10:23 AM on July 24, 2012 [2 favorites]


Count me in!
posted by bswinburn at 10:26 AM on July 24, 2012


Yes, I couldn't resist the bad pun.
posted by bswinburn at 10:26 AM on July 24, 2012


It appears to be broken.
posted by Eideteker at 10:28 AM on July 24, 2012


Me too, sir.
posted by Falconetti at 10:29 AM on July 24, 2012


I was going to try comparing my metafilter activity to my PhD thesis but I seem to remember closing my account while I finished the latter, so it's not really a fair contest. But the overall stats sound like fun so please add a general "me too!" for me.
posted by shelleycat at 10:34 AM on July 24, 2012


I like metadata! Pick me!
posted by antonymous at 10:37 AM on July 24, 2012


Also griphus thank you for the Annie Hall title.
posted by shakespeherian at 10:39 AM on July 24, 2012 [1 favorite]


I sorta wanna know.
posted by eyeballkid at 10:42 AM on July 24, 2012


And me, please!
posted by the latin mouse at 10:43 AM on July 24, 2012


I am not a number, I am a free man collection of a bunch of numbers.
posted by juv3nal at 10:47 AM on July 24, 2012 [1 favorite]


Okay, first dump done, up through antonymous. Will continue to do periodic update dumps going forward, so feel free to speak up if you're interested.

Here's the stat line for everybody in the first dump, in the order in which they spoke up in the thread. If you should be in here but I mistyped your userid, let me know and I'll re-run you in the next group.

If you've never gone looking for your userid before and are wondering how to find it, just mouse over (or click on) your own username. It's that number.

And so:
user 7418:	2262257 words,	70920 unique, in	33385 comments.
user 49346:	478891 words,	29350 unique, in	10194 comments.
user 58:	1244985 words,	52803 unique, in	28468 comments.
user 67400:	888467 words,	29377 unique, in	5750 comments.
user 141010:	80828 words,	10535 unique, in	1017 comments.
user 81760:	76732 words,	7394 unique, in	739 comments.
user 39488:	94958 words,	11209 unique, in	2820 comments.
user 61784:	79297 words,	8977 unique, in	1768 comments.
user 38483:	110633 words,	12600 unique, in	2607 comments.
user 23431:	142464 words,	14514 unique, in	2415 comments.
user 22627:	252704 words,	20352 unique, in	4222 comments.
user 34848:	183312 words,	12134 unique, in	3064 comments.
user 77383:	408053 words,	22298 unique, in	3570 comments.
user 38780:	134944 words,	10079 unique, in	2566 comments.
user 88408:	138805 words,	14656 unique, in	3676 comments.
user 2726:	333977 words,	18882 unique, in	5043 comments.
user 63377:	19645 words,	4293 unique, in	483 comments.
user 80913:	244659 words,	22269 unique, in	4915 comments.
user 10279:	274957 words,	18889 unique, in	3032 comments.
user 18312:	1007000 words,	41575 unique, in	12463 comments.
user 20122:	142263 words,	15850 unique, in	2314 comments.
user 90947:	263989 words,	19432 unique, in	2510 comments.
user 77879:	31885 words,	4848 unique, in	899 comments.
user 52224:	70925 words,	8088 unique, in	1291 comments.
user 97321:	203394 words,	17330 unique, in	2711 comments.
user 71074:	189063 words,	18665 unique, in	5558 comments.
user 58356:	65083 words,	8345 unique, in	1344 comments.
user 94835:	158555 words,	15298 unique, in	3826 comments.
user 20423:	20031 words,	3929 unique, in	424 comments.
user 17027:	130646 words,	13728 unique, in	2314 comments.
user 37485:	318197 words,	16322 unique, in	4601 comments.
user 73567:	47971 words,	7965 unique, in	1026 comments.
user 46088:	604432 words,	34818 unique, in	21227 comments.
user 43189:	893010 words,	35160 unique, in	12771 comments.
user 137029:	21841 words,	4530 unique, in	321 comments.
user 21431:	288715 words,	19356 unique, in	5126 comments.
user 18811:	160672 words,	18209 unique, in	2350 comments.
user 18169:	162581 words,	14345 unique, in	2175 comments.
user 75964:	240127 words,	14567 unique, in	2554 comments.
user 55262:	299616 words,	19291 unique, in	2661 comments.
user 77819:	122382 words,	14545 unique, in	1791 comments.
user 115961:	24735 words,	3965 unique, in	388 comments.
user 147513:	5108 words,	1496 unique, in	99 comments.
user 69606:	82475 words,	9872 unique, in	1019 comments.
user 17675:	1015309 words,	33048 unique, in	17789 comments.
user 292:	2944596 words,	59120 unique, in	29632 comments.
user 23588:	46646 words,	7931 unique, in	1613 comments.
user 15312:	179844 words,	18554 unique, in	3188 comments.
user 17563:	587954 words,	30746 unique, in	11859 comments.
user 56070:	41731 words,	5766 unique, in	344 comments.
user 88591:	84371 words,	10740 unique, in	1113 comments.
user 26998:	235001 words,	17395 unique, in	2664 comments.
user 81979:	301434 words,	19388 unique, in	3446 comments.
user 17897:	191131 words,	18588 unique, in	7080 comments.
user 37801:	422235 words,	26878 unique, in	11609 comments.
user 143850:	1923 words,	762 unique, in	22 comments.
user 11183:	132531 words,	13802 unique, in	2969 comments.
user 48377:	87692 words,	14474 unique, in	2032 comments.
user 14752:	1704253 words,	65853 unique, in	22187 comments.
user 123851:	65182 words,	9532 unique, in	953 comments.
user 18609:	264911 words,	18298 unique, in	3080 comments.
user 56010:	25772 words,	5037 unique, in	592 comments.
user 21429:	269797 words,	22943 unique, in	6633 comments.
user 124648:	72392 words,	11478 unique, in	1730 comments.
user 28936:	207556 words,	15060 unique, in	3047 comments.
user 94555:	40644 words,	5914 unique, in	912 comments.
user 14474:	159131 words,	11981 unique, in	2408 comments.
user 36852:	346275 words,	27146 unique, in	5836 comments.
user 19026:	338971 words,	27150 unique, in	6468 comments.
user 86246:	72440 words,	8393 unique, in	1200 comments.
user 70245:	43645 words,	7198 unique, in	486 comments.
user 85807:	2027 words,	720 unique, in	33 comments.
user 16148:	153111 words,	12979 unique, in	1185 comments.
user 15971:	290405 words,	23791 unique, in	5005 comments.
user 43872:	75968 words,	8069 unique, in	1079 comments.
user 94819:	8639 words,	2299 unique, in	122 comments.
user 40206:	71021 words,	11933 unique, in	2040 comments.
user 144659:	28224 words,	4619 unique, in	274 comments.
user 96867:	284350 words,	19191 unique, in	2747 comments.
user 130954:	24077 words,	5594 unique, in	1072 comments.
user 96486:	46420 words,	8882 unique, in	1859 comments.
user 20976:	138357 words,	11514 unique, in	1374 comments.
user 17499:	177225 words,	14284 unique, in	1850 comments.
user 18122:	102852 words,	10800 unique, in	1825 comments.
user 110619:	88728 words,	9977 unique, in	1183 comments.
user 19185:	79942 words,	12569 unique, in	1585 comments.
user 68910:	14104 words,	2832 unique, in	168 comments.
user 20821:	412632 words,	28892 unique, in	8808 comments.
user 23554:	103559 words,	12444 unique, in	2168 comments.
user 32440:	387296 words,	17403 unique, in	2689 comments.
user 70792:	18043 words,	3985 unique, in	180 comments.
posted by cortex (staff) at 10:48 AM on July 24, 2012 [8 favorites]


Please please me!
posted by carsonb at 10:51 AM on July 24, 2012


Where to find your frequency table file:

Take this url:

http://stuff.metafilter.com/corpus/freq/temp/XXX--1-gram--allsites--1999-01-01--2013-01-01.txt

Paste it into your browser's address bar, replace XXX with your userid, and blam.
posted by cortex (staff) at 10:52 AM on July 24, 2012 [12 favorites]


Holy crap:

July 24, 2012: 1007000 words, 41575 unique, in 12463 comments.
July 22, 2011: 690117 words, 34674 unique, in 8654 comments.

I talk too much.

Thank you, cortex! One additional question, does this count posts and comments or just comments?
posted by zarq at 10:53 AM on July 24, 2012


This is just comments, on Mefi, Ask, Metatalk and Music.
posted by cortex (staff) at 10:54 AM on July 24, 2012


I have used the word 'the' almost 20,000 times on this site.

Well goddam.
posted by shakespeherian at 10:55 AM on July 24, 2012


OK. Thank you!

So figure, at least another thousand or three for posts and.... wow, I really do talk too much.
posted by zarq at 10:55 AM on July 24, 2012


WOW hey yeah thanks! This is awesome! Have I really posted fewer than 750 comments? That's actually the most surprising part to me.

Also apparently I say "I" significantly more than I say "the" which makes me feel super narcissistic so thanks a ton cortex.
posted by Mrs. Pterodactyl at 10:56 AM on July 24, 2012 [1 favorite]


Question: It looks like my 40th most-used word is BLANK-- is that some sort of null return, or have I really typed 'BLANK' 1500 times? That seems unlikely.
posted by shakespeherian at 10:57 AM on July 24, 2012


"BLANK" is definitely an artifact of the script's parsing, I don't recall offhand the meaning of which but you can safely ignore it. All actual words appear in lower case to simplify the counting process.
posted by cortex (staff) at 10:58 AM on July 24, 2012


Sounds interesting, I'll put my name down for it.
posted by EmGeeJay at 10:59 AM on July 24, 2012


This is also neat because I can see how often I've referenced other users by name, like apparently I've said zarq three times and shakespeherian only once which kind of surprises me; I thought I talked about him more.

Anyway seriously once again thanks, this is fascinating.
posted by Mrs. Pterodactyl at 11:00 AM on July 24, 2012


Not even half a million? I gotta quit with the one-liners and get on the expository anecdotes!
posted by Devils Rancher at 11:00 AM on July 24, 2012


That's what I figured.

I'm finding the most interesting stuff is all the words I've used exactly twice (vespucci, marjanen, kryptonions).
posted by shakespeherian at 11:01 AM on July 24, 2012


shakespeherian: "I have used the word 'the' almost 20,000 times on this site."

42593.

Atop my list are 35 monosyllabic words. The first word I use with two syllables is "people" and it is only my 36th most used word on MeFi. Wow.
posted by zarq at 11:02 AM on July 24, 2012


I've said 'zarq' 20 times.
posted by shakespeherian at 11:02 AM on July 24, 2012 [1 favorite]


Me too!! Me too!!
posted by bearwife at 11:03 AM on July 24, 2012


The requested URL /corpus/freq/temp/49436--1-gram--allsites--1999-01-01--2013-01-01.txt was not found on this server.

Okay what am I doing wrong here?
posted by griphus at 11:03 AM on July 24, 2012


LOL

"shakespeherian" 71
posted by zarq at 11:03 AM on July 24, 2012 [1 favorite]


Okay what am I doing wrong here?

You are typing your userid wrong.
posted by cortex (staff) at 11:04 AM on July 24, 2012 [8 favorites]


Also: 246,981 over the course of a year? Jesus. I think I need a sabbatical; anyone want to be griphus for a few months?
posted by griphus at 11:04 AM on July 24, 2012


Why have I said 'shakespeherian' 87 times? Quoting people referring to me?
posted by shakespeherian at 11:05 AM on July 24, 2012 [1 favorite]


You don't remember that day you spoke entirely in third person?
posted by griphus at 11:05 AM on July 24, 2012 [1 favorite]


shakespeherian would never do that.
posted by shakespeherian at 11:06 AM on July 24, 2012 [5 favorites]


I would love to see some sort of statistical enrichment figure—the frequency with which I use a word divided by the frequency with which the userbase as a total does it.

anyone want to be griphus for a few months?

Just MeMail me your password.
posted by grouse at 11:07 AM on July 24, 2012


Please to be adding me to this.
posted by middleclasstool at 11:07 AM on July 24, 2012


I've said "pterodactyl" 31 times, "pterodactyl's" 4 times, "pterodactyls" 2 times, and for some reason "ptero" once but this is explicable because pterodactyls are awesome.

I assume every single one of yours is you being like "uh, it's SHAKESPEHERIAN not SHAKESPERIAN".
posted by Mrs. Pterodactyl at 11:08 AM on July 24, 2012 [1 favorite]


Mine gets poetic:

how things time
make way which
good because something
no also want

And the number of words I've used exactly as much as my usernumber: met x yours

longest word: wake-up-to-a-meta-thread-where-people-have-been-talking-about-you-for-eight-hours
posted by jessamyn (staff) at 11:08 AM on July 24, 2012 [2 favorites]


I wonder if anyone doesn't have "I" as their top word.
posted by sweetkid at 11:08 AM on July 24, 2012


Oooh wait is there an easy way to dump this to Excel?

/is at work
posted by shakespeherian at 11:09 AM on July 24, 2012


'I' is my 5th most-used word.
posted by shakespeherian at 11:10 AM on July 24, 2012


me too ooh
posted by cashman at 11:10 AM on July 24, 2012


One of my longest words is cyxzbrglvyylvxmmktijraly2xixchxapsllaoxtwgyvwsbkw1pbkglqdnlg2lwcbbrtpi4cjcjc62u4hb1kzrewltco0ijelkoqi5dkgzcoxdrhzrosrokedqg00lnalibfmwltwowlg1qjg0zxkngc1nixqfisunxdsdbtoamuwoqqcfw5gkh.

I don't remember ever saying this.
posted by grouse at 11:11 AM on July 24, 2012


me too, please!
posted by nadawi at 11:11 AM on July 24, 2012


cortex: " You are typing your userid wrong."

When my wife and I were dating, we saw a Friends episode where Joey says something dumb and Rachel looks down at him kindly, gently caresses his cheek and says ever-so-sweetly, "You're so pretty."

For years, this has been the go-to phrase in my house any time one of us does something stupid. It happens a lot.

Griphus?

You're so pretty. :D
posted by zarq at 11:11 AM on July 24, 2012 [3 favorites]


Me too! 38102
posted by 0xFCAF at 11:12 AM on July 24, 2012


the first word that I have that's not a preposition or pronoun is 'people' at 932.
posted by sweetkid at 11:12 AM on July 24, 2012


sweetkid: "I wonder if anyone doesn't have "I" as their top word."

Sixth here. 17579 times.
posted by zarq at 11:12 AM on July 24, 2012


Jessamyn, mine reads like fridge poetry too:

i'm all
what

who

one: we
because from had
get up
people would them more

really

no
posted by KathrynT at 11:13 AM on July 24, 2012 [2 favorites]


Mine is a little poetic as well!

word
absolutely came city living:
looks, music, reading
actual class
comes games, though
year brooklyn
posted by griphus at 11:13 AM on July 24, 2012 [1 favorite]


ok it's just me that's self absorbed then, got it!
posted by sweetkid at 11:13 AM on July 24, 2012 [1 favorite]


word
absolutely came city living:
looks, music, reading
actual class
comes games, though
year brooklyn


my mid twenties right there
posted by sweetkid at 11:14 AM on July 24, 2012 [1 favorite]


Also, I've got "I" in 6th place as well.
posted by griphus at 11:15 AM on July 24, 2012


I would like to be quantified, please.
posted by EvaDestruction at 11:15 AM on July 24, 2012


Yes please, if it's not too late.
posted by alms at 11:19 AM on July 24, 2012


Sure, should be a hoot!
posted by absalom at 11:19 AM on July 24, 2012


I've used the word "no" 177 times and "yes" 37 times. I wonder what that says about me.

yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

Just trying to even it out a bit for next year.

posted by Grither at 11:23 AM on July 24, 2012 [1 favorite]


And 'love' is at 81. Awwwwwwww.
posted by Grither at 11:24 AM on July 24, 2012 [1 favorite]


The real meat: I've got 41 different variations of 'fuck' (including 'worsethanfuck' and 'twenty-fucking-six') with 'fucking' being the most common, at 198 uses, followed by 'fuck' at 160.

48 variations of 'shit.'
posted by shakespeherian at 11:24 AM on July 24, 2012


Atop my list are 35 monosyllabic words. The first word I use with two syllables is "people" and it is only my 36th most used word on MeFi. Wow.

This is pretty typical. The most common words in everybody's vocabulary are highly functional glue words like articles (the, a, an), prepositions (of, on, in, with), pronouns (you, i, it, he, she), copulas (is, be), and conjunctions (and, but, not). It's the structure of natural language at work, and the general consistency of these things from person to person and every from language to language is a very neat thing.

I wonder if anyone doesn't have "I" as their top word.

It likely varies! Especially for users with a lower overall count of words. The rough proportions of the most common words will be the same, see above, but on smaller corpora and with different modes and styles of speech those proportions won't be identical.

One thing I've got in the sitewide frequency tables is breakdowns by subsite, and if I remember right I think you see "I" at a higher frequency on Ask than you do on Mefi, for example, which may be explicable in terms of the more personal/anecdotal/demonstrative nature of askme answers vs. mefi conversations. Lot's of fun stuff to dig into there.

I would love to see some sort of statistical enrichment figure—the frequency with which I use a word divided by the frequency with which the userbase as a total does it.

You can compare by hand by searching the tables in the Mefi Corpus; I'd link directly to the big daddy of those but it's half a gig zipped up. So for a quicker reference, here instead is the frequency table for just comments on Metafilter proper, from all users, in 2010, including only words that were used at least 10 times that year:

2010 mefi comments, all users, 10+ incidents. 1.7 megabyte text file.

Still a pretty big file, so if you're going to do more than glance once I'd suggest saving it to your computer and playing with it there.
posted by cortex (staff) at 11:26 AM on July 24, 2012


It would be interesting, thank you.
posted by Jehan at 11:26 AM on July 24, 2012


My longest word: "i-know-this-thread-will-be-deleted-since-it's-chatfilter-so-i'll-post-a-book-that-has-been-made-into-a-movie-to-be-funny"

Do comments in deleted threads get counted?
posted by Grither at 11:26 AM on July 24, 2012


Me please! Thank you.
posted by zachlipton at 11:28 AM on July 24, 2012


the a i
to of and
in that
it is for

on you was my this
with but as have at

be not just
it's all me
about if
or
i'm like out
an one so are they
from what when we up
by he there had can some

do think
don't really

who i've people
would your his
get BLANK time
posted by Devils Rancher at 11:28 AM on July 24, 2012


shakespeherian: "The real meat: I've got 41 different variations of 'fuck'"

fuck: 3
fucking: 2
fuckwad: 1


And that's it! I'm so polite. And for fun:

shit: 5
shitty: 2
dipshit: 1
shithead: 1 (pronounced sha-tee-ad, no doubt)
shithole: 1
shitstorms: 1
posted by Grither at 11:30 AM on July 24, 2012


Hahah "poutine" 25 times. Nice!
posted by Grither at 11:33 AM on July 24, 2012


Do comments in deleted threads get counted?

Yep! As do, I think, comments that were themselves deleted.
posted by cortex (staff) at 11:38 AM on July 24, 2012 [1 favorite]


cortex: "Yep! As do, I think, comments that were themselves deleted."

Oh, sweet. I can just scrape MeFi, run a diff against the Corpus, and use the leftovers to reconstruct a deletion-free version of MeFi.
posted by Rock Steady at 11:41 AM on July 24, 2012


me!
posted by a snickering nuthatch at 11:46 AM on July 24, 2012


:::raises hand:::: me please, thank you!
posted by NoraCharles at 11:47 AM on July 24, 2012


Yes please! And thank you!
posted by carmicha at 11:52 AM on July 24, 2012


fucking - 210
fuck - 184
fucked - 20
fuckin - 13
fuck's - 12
fuckers - 12
motherfucker - 5
fucker - 3
fuck-the-system - 2
fuck-you - 2
motherfuckers - 2

Then 18 singly-occuring variations on this theme. The best one is "werewolf-fucking."
posted by griphus at 11:53 AM on July 24, 2012 [2 favorites]


I've got both fuck-weasel and stone-fucking.
posted by shakespeherian at 11:55 AM on July 24, 2012


Me, please!
posted by leesh at 11:57 AM on July 24, 2012


Ooo, please quantify me!
posted by limeonaire at 12:00 PM on July 24, 2012


Yes, please.
posted by Scientist at 12:03 PM on July 24, 2012


I would like not only to be quantified but also quantized.
posted by jedicus at 12:04 PM on July 24, 2012


Cortex, use your powers.
posted by ersatz at 12:04 PM on July 24, 2012


252 fucking
223 awesome
47 lol
22 butts
posted by elizardbits at 12:07 PM on July 24, 2012 [4 favorites]


Yes, please count me in. Or you, know, whatever. Quantify me!
posted by Lynsey at 12:08 PM on July 24, 2012


I am a-twitter with anticipation at my object quantification. Do these replies count?
posted by CancerMan at 12:09 PM on July 24, 2012


Ooh me!
posted by curuinor at 12:12 PM on July 24, 2012


Oooh, me please!

Also, I imagine there's a not-insignificant gap in my contributions to the site, where I just didn't have the time to keep up and gave up for a good-ish while.
posted by antifuse at 12:15 PM on July 24, 2012


Count me in. I probably says penis a lot.
posted by cjorgensen at 12:15 PM on July 24, 2012


oh, hell, I guess I need to know. me too.
posted by Eyebrows McGee at 12:17 PM on July 24, 2012


Me, please!

And just to make the stats more exciting...

butts butts butts butts butts
posted by phunniemee at 12:20 PM on July 24, 2012 [2 favorites]


And I doubt this will be included in any sort of stats of mine - but I am genuinely curious about how many threads exist where I am the last commenter. I tend to fall behind on my MeFi RSS quite a bit, and only catch threads 2 or 3 weeks after they're posted and (mostly) forgotten. So I imagine there's quite a few of these types of threads in existence.
posted by antifuse at 12:23 PM on July 24, 2012


I would like to be quantized as well. I imagine it's like the special effects from Innerspace.
posted by MrVisible at 12:24 PM on July 24, 2012


And I doubt this will be included in any sort of stats of mine - but I am genuinely curious about how many threads exist where I am the last commenter.

That is something that can be calculated from the data that exists in the Infodump, and in fact I feel like I remember a previous metatalk thread where someone was making those calculations. So there may be a mefite with an existing script that could tell you that.
posted by cortex (staff) at 12:27 PM on July 24, 2012


Please include me! I love words and numbers and numbers of words and words about numbers and words about numbers of words. I'm going to stop now, but I could keep going along the same lines for a long time.
posted by aubilenon at 12:29 PM on July 24, 2012


151 fucking
106 fuck
16 fucked
6 fuckers
5 fuck-all
5 fuckit
3 fuckton
2 abso-fucking-lutely
2 fuck's
2 fuck-ton
2 fuckin
2 fucks
2 pig-fucker
1 a-fucking-men
1 cluster-fuck
1 fan-fucking-tastic
1 fuckidy
1 fucko
1 fuckup
1 fuckwad
1 fuckwit
1 goat-fucking

Have I covered all the fucking forms?
posted by Devils Rancher at 12:32 PM on July 24, 2012 [1 favorite]


I'm feeling a little slow. I've tried plugging 7721 and carsonb into the URL and it's still 404-ing me.
posted by carsonb at 12:34 PM on July 24, 2012 [1 favorite]


I'm pretty new, so I'm probably going to be way below average. But it's not the number of words, its the something...something... that counts, right?

In case you couldn't tell, this was a request to count my data.
posted by FirstMateKate at 12:35 PM on July 24, 2012


I wonder if you could tell who the most common user of a given word is. I'm pretty sure I say "folk" all the time, yet nobody else does.
posted by Jehan at 12:37 PM on July 24, 2012


Sooooo hopefully my advisor will never read this but my MeFi:thesis ratio is around 13:5, or 2.6:1. In fairness my MeFi contributions are significantly enriched for the word "fucking" (p = 0.03, chi-square/Yates) which I feel like must even the balance a little.
posted by en forme de poire at 12:43 PM on July 24, 2012


I'm feeling a little slow. I've tried plugging 7721 and carsonb into the URL and it's still 404-ing me.

If you spoke up at some point after antonymous's comment, you'll be showing up in the next cohort when I run it again this afternoon; until then, no frequency table exists for you.
posted by cortex (staff) at 12:44 PM on July 24, 2012


cyxzbrglvyylvxmmktijraly2xixchxapsllaoxtwgyvwsbkw1pbkglqdnlg2lwcbbrtpi4cjcjc62u4hb1kzrewltco0ijelkoqi5dkgzcoxdrhzrosrokedqg00lnalibfmwltwowlg1qjg0zxkngc1nixqfisunxdsdbtoamuwoqqcfw5gkh.

I don't remember ever saying this.


I found it, apparently it was part of a comment in a Caps Lock Day thread, as part of a Base64-encoded gzip-compressed ALL CAPS version of the Treaty of Münster, part of the Peace of Westphalia.
posted by grouse at 12:48 PM on July 24, 2012


me, too, please. thank you!
posted by batmonkey at 12:51 PM on July 24, 2012


If you spoke up at some point after antonymous's comment, you'll be showing up in the next cohort when I run it again this afternoon; until then, no frequency table exists for you.
OK, thanks cortex! I just missed that cutoff.
posted by carsonb at 12:53 PM on July 24, 2012


Me! Thanks.
posted by hal9k at 12:55 PM on July 24, 2012


Quantify me, too, please. Thanks!
posted by Ostara at 12:59 PM on July 24, 2012


Please and thank you!
posted by pointystick at 1:07 PM on July 24, 2012


I can't believe I've only said "googly" three times in all my years here.

Well, 4 now.
posted by troika at 1:10 PM on July 24, 2012


WORDS PER FAVORITE
posted by rhizome at 1:11 PM on July 24, 2012 [1 favorite]


I've only used "cat" 324 times? Better get to work...
posted by rtha at 1:13 PM on July 24, 2012 [1 favorite]


Yeah, my "kitteh" count seemed awfully low.
posted by elizardbits at 1:18 PM on July 24, 2012 [1 favorite]


grouse: " I found it, apparently it was part of a comment in a Caps Lock Day thread, as part of a Base64-encoded gzip-compressed ALL CAPS version of the Treaty of Münster, part of the Peace of Westphalia."

I'm pretty sure to a lot of people in the world that 'explanation' makes about as much sense as the comment you were trying to explain.
posted by MCMikeNamara at 1:23 PM on July 24, 2012 [4 favorites]


That is something that can be calculated from the data that exists in the Infodump, and in fact I feel like I remember a previous metatalk thread where someone was making those calculations. So there may be a mefite with an existing script that could tell you that.

Oooh, I was not aware of the existence of the Infodump, and now I have a fun project for tonight after my son goes to bed. :)
posted by antifuse at 1:24 PM on July 24, 2012


rhizome: "WORDS PER FAVORITE"

50.979

<notbad.jpg>
posted by Rock Steady at 1:28 PM on July 24, 2012 [1 favorite]

I found it, apparently it was part of a comment in a Caps Lock Day thread, as part of a Base64-encoded gzip-compressed ALL CAPS version of the Treaty of Münster, part of the Peace of Westphalia.
I'm pretty sure to a lot of people in the world that 'explanation' makes about as much sense as the comment you were trying to explain.

My mistake, it was bzip2, not gzip. Hope this helps!
posted by grouse at 1:30 PM on July 24, 2012 [3 favorites]


I've said cat 385 times and boyfriend only 152 (187 if I include husband). Which pretty much sums up my priorities in life right there.

Also, I apparently only average 6.5 words per comment. I thought that would be higher.
posted by shelleycat at 1:40 PM on July 24, 2012


Thanks Cortext.
posted by Splunge at 1:41 PM on July 24, 2012


Words per favorite?

Try favorites per word, turkeys.
posted by griphus at 1:47 PM on July 24, 2012 [1 favorite]


Me!
posted by RichardP at 1:49 PM on July 24, 2012


Words per favorite?

Try favorites per word, turkeys.
posted by griphus at 1:47 PM on July 24 [+] [!]

Probably never going to favorite you ever again. On principle.
posted by FirstMateKate at 2:00 PM on July 24, 2012 [2 favorites]


Me too!
posted by johnofjack at 2:17 PM on July 24, 2012


yesh.
posted by Sebmojo at 2:19 PM on July 24, 2012


/Connery
posted by Sebmojo at 2:19 PM on July 24, 2012


I'm such a joiner - me too please!
posted by deborah at 2:26 PM on July 24, 2012


Alright! Here's round two, from eyeballkid through deborah.
user 9139:	146930 words,	14527 unique, in	4830 comments.
user 64260:	60039 words,	8803 unique, in	601 comments.
user 10585:	169379 words,	16468 unique, in	4236 comments.
user 7721:	352817 words,	28119 unique, in	7280 comments.
user 111738:	5727 words,	1875 unique, in	120 comments.
user 96315:	253234 words,	18554 unique, in	3424 comments.
user 32429:	135763 words,	14210 unique, in	2220 comments.
user 49538:	346024 words,	22581 unique, in	5521 comments.
user 15688:	384251 words,	19906 unique, in	5452 comments.
user 38102:	88358 words,	11271 unique, in	1298 comments.
user 94911:	83620 words,	10059 unique, in	1050 comments.
user 14751:	192799 words,	14994 unique, in	2877 comments.
user 18425:	105389 words,	12377 unique, in	2374 comments.
user 127127:	112687 words,	12534 unique, in	1533 comments.
user 22916:	301862 words,	18488 unique, in	2104 comments.
user 63452:	82019 words,	9764 unique, in	1035 comments.
user 43236:	56643 words,	6229 unique, in	663 comments.
user 64979:	68900 words,	10027 unique, in	1123 comments.
user 35715:	16603 words,	3390 unique, in	415 comments.
user 28036:	237171 words,	19565 unique, in	2757 comments.
user 114110:	155697 words,	12971 unique, in	1207 comments.
user 27428:	583171 words,	30941 unique, in	5651 comments.
user 62962:	93743 words,	13182 unique, in	1845 comments.
user 744:	88473 words,	11391 unique, in	2358 comments.
user 76990:	43833 words,	6710 unique, in	349 comments.
user 114526:	7707 words,	2290 unique, in	86 comments.
user 18329:	178772 words,	14348 unique, in	2973 comments.
user 59453:	354369 words,	18972 unique, in	6566 comments.
user 102200:	744890 words,	29989 unique, in	4037 comments.
user 74248:	367041 words,	21197 unique, in	5024 comments.
user 53053:	50604 words,	7296 unique, in	604 comments.
user 19328:	500 words,	292 unique, in	28 comments.
user 140431:	16944 words,	3330 unique, in	195 comments.
user 36360:	229072 words,	17746 unique, in	2346 comments.
user 23766:	63497 words,	10891 unique, in	2488 comments.
user 18293:	83255 words,	9154 unique, in	1103 comments.
user 61293:	60313 words,	8421 unique, in	1159 comments.
user 35452:	291491 words,	21853 unique, in	6802 comments.
user 17881:	95695 words,	10570 unique, in	1017 comments.
user 67315:	17132 words,	3723 unique, in	271 comments.
user 91319:	71423 words,	10765 unique, in	1101 comments.
user 16990:	125597 words,	12396 unique, in	3219 comments.
As above, plug your userid into that url if you want to grab your full frequency table.
posted by cortex (staff) at 2:31 PM on July 24, 2012 [3 favorites]


I think you made a typo for my usernumber. 19328 (who isn't in this thread) should be 19628.

:( How long do I have to wait for the next batch?
posted by aubilenon at 2:35 PM on July 24, 2012


Aww, just missed ya. Include me in round three please?
posted by gubenuj at 2:36 PM on July 24, 2012


Totally cool. Thanks so much.
posted by bearwife at 2:50 PM on July 24, 2012


heyho heyho heyho heyho
posted by Brandon Blatcher at 2:52 PM on July 24, 2012 [1 favorite]


Hmm, it seems I've said "Cameron" thirty times. Is it possible to know how many of them are near to instances of the word "twat"?
posted by Jehan at 2:55 PM on July 24, 2012 [1 favorite]


Interesting. Most words are lowercase, but my usage of "blank" is uppercase in the txt file.

Now I have to go through my history to see where I used the single letter words "s" and "b".
posted by CancerMan at 2:56 PM on July 24, 2012


I say "fuck" a lot.
posted by brundlefly at 3:03 PM on July 24, 2012


(That didn't help.)
posted by brundlefly at 3:04 PM on July 24, 2012


Yes!
posted by stoneandstar at 3:05 PM on July 24, 2012


Me please!
posted by Rory Marinich at 3:06 PM on July 24, 2012


Thank you!
posted by MonkeyToes at 3:06 PM on July 24, 2012


huh, I've used the words punk, sex, and ska exactly 38 times each. Also, sandwich.

I am really enjoying the found poetry aspect of this.
posted by Ghidorah at 3:10 PM on July 24, 2012 [1 favorite]


I'm in.
posted by Errant at 3:17 PM on July 24, 2012


Interesting. Most words are lowercase, but my usage of "blank" is uppercase in the txt file.

"BLANK" is an artifact of my parsing process and does not represent the actual use of the word blank. Every actual word you've used is normalized to lowercase by the script to simplify the counting/display process.

Now I have to go through my history to see where I used the single letter words "s" and "b".

It's possible you have actually used those as stand-alone tokens at some point; it's also possible that the parsing process orphaned them from some sort of compound word at some point if they were connected to the rest of the word by some sort of punctuation other than a hyphen or an apostrophe, since I convert most punctuation to whitespace early on in the comment-digestion process.

How long do I have to wait for the next batch?

As an apology for the typo fakeout, I'll do another mini batch right now!
user 19628:	124530 words,	12881 unique, in	2561 comments.
user 94190:	14498 words,	3095 unique, in	151 comments.
user 136774:	92312 words,	8798 unique, in	695 comments.
user 101698:	326738 words,	18492 unique, in	1629 comments.
user 25909:	215598 words,	16250 unique, in	1525 comments.
posted by cortex (staff) at 3:24 PM on July 24, 2012 [2 favorites]


Gracious, I've been out all day and look what I've missed! If there's to be another round, pretty please include me. Thanks!
posted by The Wrong Kind of Cheese at 3:30 PM on July 24, 2012


Me too! Me too!
posted by trip and a half at 3:30 PM on July 24, 2012


Include me in your next batch please!
posted by danny the boy at 3:38 PM on July 24, 2012


Thanks!
posted by aubilenon at 3:39 PM on July 24, 2012


I would like to know too!
posted by lollusc at 3:41 PM on July 24, 2012


Ooh, me please!
posted by decathecting at 3:43 PM on July 24, 2012


94958 total words, 11209 unique words

I don't know 11209 unique words.
posted by arcticseal at 3:44 PM on July 24, 2012


I would love this too, if you're still doing the dumps
posted by never used baby shoes at 3:47 PM on July 24, 2012


I also used the words "socks" and "nom" 42 times each; presumably in the same sentence.
posted by arcticseal at 3:50 PM on July 24, 2012


semiotics,yes!
posted by clavdivs at 3:58 PM on July 24, 2012


Me too, please
posted by ChrisR at 3:59 PM on July 24, 2012


yes, I guess I should be included here, though anything I learn will probably make me sad.
posted by oneswellfoop at 4:01 PM on July 24, 2012 [1 favorite]


Yes please!
posted by pompomtom at 4:03 PM on July 24, 2012


Me too! It will be interesting to compare what I think the results will be with what they actually are.
posted by julen at 4:03 PM on July 24, 2012


Please and thank you.
posted by gingerest at 4:09 PM on July 24, 2012


Yes, please!
posted by Ruki at 4:11 PM on July 24, 2012


Looks like I've averaged around 60 words per comment. That is... Significantly more than expected.
posted by antifuse at 4:13 PM on July 24, 2012


Out of respect (and the wish to see this name crop up in my stats next time): R.I.P. Sherman Hemsley.
posted by MonkeyToes at 4:15 PM on July 24, 2012


Me, too (she said, cringing in anticipation).
posted by PhoBWanKenobi at 4:16 PM on July 24, 2012


Somebody less lazy than me needs to plug this stuff into R and create a frequency distribution of MeFite verbosity.
posted by Scientist at 4:17 PM on July 24, 2012


And I'm super surprised that I've only used the word "yumtastic" once. Well, twice now. And I've used the prefix non- (on various words) quite a few times. Most interestingly, "non-burn-y" - I really wonder wtf I was talking about there.
posted by antifuse at 4:24 PM on July 24, 2012


Have I covered all the fucking forms?

Not really. This was my fuckform-tally back in 2007. I suspect I may have increased it in both volume and variety since.
posted by stavrosthewonderchicken at 4:30 PM on July 24, 2012 [1 favorite]


Whoops, the first line was meant to quote Devils Rancher, here.
posted by stavrosthewonderchicken at 4:31 PM on July 24, 2012


Thank you, great cortex!
posted by brina at 4:37 PM on July 24, 2012


Me too.
posted by benito.strauss at 4:49 PM on July 24, 2012


Ooh, I do!
posted by OverlappingElvis at 4:50 PM on July 24, 2012


Ooh boy, ok. Yep, I'm in.
posted by thinkpiece at 4:52 PM on July 24, 2012


Yes, please.
posted by Dr. Zira at 4:53 PM on July 24, 2012


I want to play, too. Thanks in advance.
posted by Jonathan Livengood at 4:54 PM on July 24, 2012


Do me!
posted by goethean at 4:59 PM on July 24, 2012


Hit me.
posted by maudlin at 5:00 PM on July 24, 2012


Two things I'm curious about.

1) I really like to use the word 'guy' to refer to anyone or anything, but have decided to cut it out as it's too gendered. I wonder how well I stuck to that.

2) I remember reading about how people trained Bayesian filters to recognize spam, working off of two corpuses, one of email identified as spam, the other identified as not spam. The end result was that each word was identified as either pointing towards the email containing it being spam or non-spam, and also a measure of how strongly it pointed.

If I could run this training for my words vs. the mefi corpus, I should be able to pick out the one word that, if it's used in a comment, makes it most likely that the comment is by me. It'd be kinda cool to know what my word is.
posted by benito.strauss at 5:14 PM on July 24, 2012 [1 favorite]


Yes pls me too.
posted by cybercoitus interruptus at 5:15 PM on July 24, 2012


Oh yes please. Thank you.
posted by motty at 5:15 PM on July 24, 2012


Me too thanks. Pad it out, make it pretty and give it an accent. French this time.
posted by peacay at 5:23 PM on July 24, 2012


Yes please!
posted by usonian at 5:30 PM on July 24, 2012


R.I.P. Sherman Hemsley.

nooooooooooo
posted by elizardbits at 5:44 PM on July 24, 2012


Datawankery? Count me in. By which I mean "me too please."
posted by Alterscape at 5:45 PM on July 24, 2012


If I could run this training for my words vs. the mefi corpus, I should be able to pick out the one word that, if it's used in a comment, makes it most likely that the comment is by me. It'd be kinda cool to know what my word is.

If you figure out how to do this, please share. I'd love to find out what my word is. Although I fear it will be something embarrassing.
posted by antifuse at 5:47 PM on July 24, 2012


Thanks for doing this cortex.
posted by Brandon Blatcher at 5:51 PM on July 24, 2012


I'd love to find out what my word is. Although I fear it will be something embarrassing.

It'll probably take a while, if I ever get around to it. But if the spam filtering is any guide, the best indicator words turned out to be unexpectedly innocuous.
posted by benito.strauss at 5:57 PM on July 24, 2012


Not really. This was my fuckform-tally back in 2007.

I bow in humbled awe at you fucktastic fuckatiousness, sir. Two fucking thumbs up!
posted by Devils Rancher at 6:01 PM on July 24, 2012


It'd be kinda cool to know what my word is.

"guy".
posted by kengraham at 6:03 PM on July 24, 2012 [1 favorite]


Please be gentle.
posted by SollosQ at 6:04 PM on July 24, 2012


Oh, and my fuck variety is not all that varied - the only interesting variations are "chickenfucking" and "toofuck". I'm surprised, no fuckarella, fuckstick, or fucksicle. Well, next time around I suppose :)
posted by antifuse at 6:08 PM on July 24, 2012


For what it's worth, that's what the (long-dead now) mefi word cloud thing did: it parsed a user's comment history to build a frequency table, then compared each word that appeared in that table against a pre-calculated global frequency table for the site, coming up with a ratio value for each, then sorted by value and listed the top like 50 or whatever.

This'd be very doable with the existing Corpus files—just use one of the whole-site files as the baseline standard, and compare the words in a user's newly calculated table and bob's your uncle. The only futzy part here is grabbing the user-specific files, but for the purposes of experimentation the ones generated for this thread are obviously readily available.
posted by cortex (staff) at 6:39 PM on July 24, 2012 [2 favorites]


And if we're talking fucking, the thing to remember is that none of us fucks as much as all of us. In the spirit of which:

Every variation on "fuck" in Metafilter history.
posted by cortex (staff) at 6:43 PM on July 24, 2012 [10 favorites]


Buncha sweet fuck-nothings, there.
posted by box at 6:53 PM on July 24, 2012


Seven doughnut-fuckings?
posted by MonkeyToes at 6:55 PM on July 24, 2012


Wow, fishfucker is way up there. Does that connote a slur against a particular sub-culture that I'm unaware of, or do MeFites just find it to be generally melodious? Fucko is a fast-rising star on my personal fuck-variant horizon, just for future general use, I think. It has a caché and isn't so horribly overdone as some.
posted by Devils Rancher at 6:55 PM on July 24, 2012


There was a user named fishfucker that probably accounts for that.
posted by restless_nomad (staff) at 6:57 PM on July 24, 2012


There is/was a user named fishfucker. I'm guessing that's a contributing factor.

Dang, too slow.
posted by box at 6:58 PM on July 24, 2012


Ah, that's right.

There is also a really long tail of single-use constructs, there. We're an impressive fuck construct generator, in aggregate.
posted by Devils Rancher at 7:02 PM on July 24, 2012


I'd like to know. Thanks!
posted by defenestration at 7:03 PM on July 24, 2012


I'd like to be in the next batch or mini-batch, please. Not really fussy as to my batchiness.
posted by barnacles at 7:04 PM on July 24, 2012


A friend of mine who learnt English through a weird combination of Pink Floyd and deduction -- he once responded to my look of puzzlement with "I think you overstand" -- got hit by a weird variant of the multiple-negative bus, once, while "redisunfuckulating" his computer. I was thus glad to to find the etymo-fucking-logical ancestor "deunfuck" on cortex's epic list.

The relative paucity of MeFi Fucktaculars is weird, though.
posted by kengraham at 7:16 PM on July 24, 2012


Not interested in my word count, but that txt file of collected 'fuck' variations has made my god damn day. I love EVERYONE ON THE GREY (and blue and green).
posted by zennish at 7:21 PM on July 24, 2012


Every variation on "fuck" in Metafilter history.

No shit?

/Actually quite a lot of shit, including christcocksuckerpickleshitfuckfuckfuck. Pickleshit?
posted by benito.strauss at 7:25 PM on July 24, 2012


I think it's interesting that "fucking" is slightly more popular than "fuck" itself.
posted by Scientist at 7:31 PM on July 24, 2012


So people say, "unfuckingbelievable" (and variants) more than "unbefuckinglievable" (and variants)? Interesting.
posted by gingerest at 7:37 PM on July 24, 2012


I'm not sure if I want to know, but do me.
posted by madcaptenor at 7:40 PM on July 24, 2012


So people say, "unfuckingbelievable" (and variants) more than "unbefuckinglievable" (and variants)? Interesting.

That's not too surprising to me - the former is much less awkward than the latter.
posted by antifuse at 7:43 PM on July 24, 2012


It's actually one of those Linguistics 101 things you learn... why do people say hippofuckingpotamus and not hipfuckingpopotamus or hippopotfuckingamus? How do they know?
posted by jessamyn (staff) at 7:46 PM on July 24, 2012 [1 favorite]


Me too!
posted by carter at 7:49 PM on July 24, 2012


You learned it, but I never took Linguistics 101 and I would love to know how people know to do this. I can see the rhythmic structure inherent in unbefuckinglievable and hippofuckingpotamus and I can see that unfuckingbelievable and hipfuckingpopotamus lack a certain mellifluousness to me, but if you are able to explain why that is (or point to a source where I could learn) then I for one would be fascinated.
posted by Scientist at 7:50 PM on July 24, 2012


Expletive infixation.
posted by barnacles at 7:54 PM on July 24, 2012 [1 favorite]


Ladies and gentlemen: expletive infixation. See also: prosody.

On prefuckingview: fuck.
posted by maudlin at 7:55 PM on July 24, 2012 [4 favorites]


It's actually one of those Linguistics 101 things you learn... why do people say hippofuckingpotamus and not hipfuckingpopotamus or hippopotfuckingamus? How do they know?

What sort of search terms would a person who wanted to know the answer to this be using? (Not least because to me, "unbefuckinglievable" trips pleasantly off the tongue while "unfuckingbelievable" sounds disjointed, or at least like three distinct words.)
posted by kengraham at 8:01 PM on July 24, 2012


OK, so basically people like to put the "fucking" just before the main stressed syllable in a multisyllabic word because it sounds better, except sometimes if there's a morpheme boundary (as between un- and -believable) people will put it there instead because it makes more sense to them. And if you want to know why it sounds better to have it before the main stress then you have to read about the minimal restructuring of the metrical stress tree of the host, which is something that I would understand if I'd ever taken a Linguistics class. I feel like such an ignorant barfuckingbarian.
posted by Scientist at 8:01 PM on July 24, 2012 [1 favorite]


the url doesn't work for my user number, for some reason :(
posted by seawallrunner at 8:07 PM on July 24, 2012


See, maudlin, I failed to preview too, it looks like, but thanks for the links!

I can't see infuckingfixation working with "preview", though. Saying a bunch aloud, I conjecture:

Infuckingsertion of a word with n syllables into a word with fewer than n syllables cannot give a pleafuckingsant result.

ALSO: we definabsofuckinglutelyitely need nested expletive infixation.

I still don't understand the random decision to give the morpheme boundary the right of way, sometimes, but I probably just haven't tried enough examples.
posted by kengraham at 8:12 PM on July 24, 2012


1 3.95719893630493 fuuuuuck
1 3.95719893630493 fuuuuuuuuuck
1 3.95719893630493 fuuuuuuuuuuuck
posted by Rock Steady at 8:24 PM on July 24, 2012


Ooh! Data! Shiny! Me too please! Yay! Thanks!

Infuckingsertion of a word with n syllables into a word with fewer than n syllables cannot give a pleafuckingsant result.
Doubtmotherfuckinful.
posted by Homeboy Trouble at 8:25 PM on July 24, 2012


I want. Please. Thanks.
posted by Kerasia at 8:41 PM on July 24, 2012


Every variation on "fuck" in Metafilter history.

How many of those comments were ultimately deleted, I wonder? More generally, is the presence of profanity positively correlated with deletion?
posted by jedicus at 8:42 PM on July 24, 2012


"Doubtmotherfuckinful" is unfuckingpleasant, though.

(I wanted to say "...is not a [counterexample]" but couldn't figure out how to put "fucking" into "counterexample". The most metrically okay is pretending the "x" is "cks" and saying "countereckfuckingsample", but it seems dubious. This is very complicated.)
posted by kengraham at 8:52 PM on July 24, 2012


Some people call me Maurice
Cause' I speak of the hippofuckingpotamus of love
posted by Devils Rancher at 8:54 PM on July 24, 2012 [2 favorites]


Note that a common solution to a tricky infix is to opt for a simple prefix. "fucking doubtful" and "fucking counterexample" are simple, elegant solutions to the problem. Sometimes the natural boundary was already the word boundary.

I'll run another batch here before bed.
posted by cortex (staff) at 8:55 PM on July 24, 2012


cortex may have saved the day. I was wondering about whether one follows the before-the-stressed syllable rule even when that syllable is the first. Is "fucking doubtful" an instance expletive infixation (or perhaps fuckingdoubtful?)? Or are there words -- the unfuckables -- that admit no such infixation?

Formalizing this is a prefuckingcarious propofuckingsition, but for practical prosody, prefix is probably proper.
posted by kengraham at 9:05 PM on July 24, 2012


My contribution to the fuck list is "oh-fuck-it-i'll-just-get-the-instant-gratification-now."

Sounds about right.
posted by argonauta at 9:06 PM on July 24, 2012


Do Me too! This looks like fun!
posted by MultiFaceted at 9:09 PM on July 24, 2012


I swear to god that "hippofuckingpotamus" thing is also called the "fucking insertion," which I only remember because my friend was taking Linguistics 101 in college and we laughed about that for a week.

yup, here we go! adding "linguistics" to that search query was pretty clutch, btw
posted by en forme de poire at 9:16 PM on July 24, 2012 [2 favorites]


Righto, final batch for tonight, from The Wrong Kind of Cheese up through MultiFaceted. As always, find your full frequency table like this.
user 138535:	15485 words,	3352 unique, in	87 comments.
user 23593:	67691 words,	9347 unique, in	2220 comments.
user 17506:	61600 words,	8118 unique, in	789 comments.
user 115705:	240234 words,	14076 unique, in	2206 comments.
user 43105:	319929 words,	14538 unique, in	2711 comments.
user 58837:	78283 words,	9092 unique, in	1534 comments.
user 6915:	435527 words,	33820 unique, in	9513 comments.
user 18075:	18691 words,	4083 unique, in	353 comments.
user 20007:	205224 words,	20529 unique, in	3747 comments.
user 18444:	145691 words,	15279 unique, in	5060 comments.
user 534:	99838 words,	11676 unique, in	628 comments.
user 118218:	82086 words,	11641 unique, in	861 comments.
user 17095:	24785 words,	4471 unique, in	454 comments.
user 78000:	797041 words,	30986 unique, in	7351 comments.
user 2238:	1071010 words,	47830 unique, in	18462 comments.
user 126778:	112545 words,	12853 unique, in	2254 comments.
user 35482:	26319 words,	4799 unique, in	646 comments.
user 17698:	108900 words,	12541 unique, in	1888 comments.
user 17827:	92439 words,	12074 unique, in	1472 comments.
user 139308:	35236 words,	5186 unique, in	277 comments.
user 14682:	43382 words,	7975 unique, in	1596 comments.
user 801:	342199 words,	24769 unique, in	4377 comments.
user 27438:	165390 words,	15356 unique, in	1180 comments.
user 47942:	56288 words,	7239 unique, in	777 comments.
user 770:	298866 words,	23134 unique, in	4176 comments.
user 93688:	98006 words,	11622 unique, in	1205 comments.
user 36058:	87317 words,	10601 unique, in	967 comments.
user 104829:	9138 words,	2266 unique, in	84 comments.
user 27016:	56665 words,	8559 unique, in	1515 comments.
user 32016:	56145 words,	8613 unique, in	870 comments.
user 91774:	126441 words,	10687 unique, in	2359 comments.
user 15578:	170370 words,	16203 unique, in	4044 comments.
user 18854:	98670 words,	10922 unique, in	1363 comments.
user 112581:	83348 words,	11206 unique, in	550 comments.
user 70418:	83536 words,	10258 unique, in	787 comments.
user 94553:	64505 words,	6119 unique, in	381 comments.
I'll be happy to run some more tomorrow.
posted by cortex (staff) at 9:28 PM on July 24, 2012


Somehow I knew everyone would jump on this bandwagon.
posted by shakespeherian at 9:28 PM on July 24, 2012


Lame. I missed the last batch before bedtime. Oh, well. Count me in for the morning group!
posted by youngergirl44 at 9:34 PM on July 24, 2012


And a great big pile of shittery to tide you over.
posted by cortex (staff) at 9:37 PM on July 24, 2012 [2 favorites]


I am terse. I have no fear of your numbers! (curious though)
posted by a humble nudibranch at 10:00 PM on July 24, 2012


Neat! Slip me into the next batch, please.
posted by Kattullus at 10:37 PM on July 24, 2012


Well, apparently the words I use much more than anyone else on MeFi (based on the old cloud method) are:

knotted, fenway, haskell, baptism, lambda, universals, esperanto, allston, concierge, pulleys, madcaptenor, flatland, clade, baptisms, memri, 1-d, meany, db8

which sounds pretty fair. Although two of them are memories of unpleasant arguments I just couldn't igfuckingnore, and 'db8' turns out to be from a pretty good IPv6 joke I once made.
posted by benito.strauss at 10:52 PM on July 24, 2012


cortex: "And a great big pile of shittery to tide you over."

Ahh, yes, the famed long brown tail.
posted by barnacles at 11:03 PM on July 24, 2012 [1 favorite]


One for me please. Thanks Cortex, you're a gem.
posted by nestor_makhno at 11:49 PM on July 24, 2012


And me too. But only my good comments (6 words in 1 comment, 1 unique word).
posted by maxwelton at 12:11 AM on July 25, 2012 [1 favorite]


Yes, please.
posted by pracowity at 12:24 AM on July 25, 2012


I wonder if anyone doesn't have "I" as their top word.

Not even in my top ten. It's at 11 with "you" at ten; I guess I like to berate you more than I like to talk about myself.
posted by MartinWisse at 12:36 AM on July 25, 2012


Yes...me, too, please — in this incarnation and as Ethereal Bligh.
posted by Ivan Fyodorovich at 12:37 AM on July 25, 2012


Hi cortex, please count me in. Thanks for offering to do this again!
posted by rangefinder 1.4 at 12:38 AM on July 25, 2012


"Not even in my top ten. It's at 11 with 'you' at ten; I guess I like to berate you more than I like to talk about myself."

The idea that the use of the first-person pronoun (specifically the nominative) is diagnostic of personality is pernicious and false. Rather, it's culturally inflected, to some degree a matter of social register, and idiosyncratic.

It's also the case that frequency doesn't vary that much and often perceived trends are the result of confirmation bias. This is particularly true with regard to high-profile people like Presidents and claims made about their usage. I've yet to see an actual quantitative analysis that doesn't, amazingly, completely disprove the claims made by pundits about this stuff — both in the cases of Bush and Obama.

In general, using frequency analysis as clues to mental attitude is very problematic. I've read a lot of analysis, commentary, and criticism of this kind of thing by (primarily) Mark Liberman at Language Log, both with regard to pundits arguing for naracissism in Presidential usage of first person pronouns, and more interestingly with regard to some academic work.

My impression has been that this sort of academic work is more often questionable or faulty than reliable — for example, recently he looked at some research on the use of individualistic versus communitarian related words over the last hundred years. The conclusion of the paper was that the higher incidence of individualistic terms indicate a cultural turn toward more individualism. But that's a faulty conclusion, as is made clear by one good counterexample: today in the US, it is the right which is more likely to use the terms "marxism" and "socialism" and related. If you were running such an analysis on the rhetoric of the right over the last fifty years, you'd be tempted to conclude that it's moved leftward. The point is that a frequency analysis of a corpus tells you...frequency of usage. It doesn't tell you how those words are used, in what context. And, even if further analysis does, in isolation you don't have an understanding of how those frequencies, in those contexts, compares to the culture within which they occur.
posted by Ivan Fyodorovich at 12:58 AM on July 25, 2012 [2 favorites]


Me too please!
posted by Iteki at 2:25 AM on July 25, 2012


Data! :D Me too, please!
posted by Dysk at 3:42 AM on July 25, 2012


Quantify me, baby!
posted by b33j at 3:43 AM on July 25, 2012


Seconding Ivan there. Context is so important! You cannot make accurate claims about social behavior from frequency results of a corpus. That is quantitative. You cannot make global claims about social behavior from individual units (be them words, comments or MeFites). That is qualitative. If you want to say anything worthwhile about social behavior you really need to look at the interaction between quantitative and qualitative analyses. That is macro-level and micro-level at once and requires taking the context into account. Understanding the context at the macro-level usually requires some sort of ethnography — getting to know the population/community, how it works, what it cares about and how it presents itself (identity), as well as where it stands in the social space, relative to other populations/communities. Understanding the context at the micro-level requires some qualitative tools, like conversation analysis or examination of linguistic structure. And that must be examined relative to other nearby structures, individuals, comments/conversations.

For example, somebody mentioned the frequency of fishfucker above. That 'word' is community specific, referentially tied to the history of an individual and the entire community's relationship and history with them. Spikes in mentions of fishfucker are probably related to certain events. And even the use of the 'word' fishfucker itself is related to linguistic structure. If that MeFite's name happened to be stargazer or something else that wasn't alliterative, expletive-containing, and had this other semantic meaning (we know what a stargazer is, as it's a more common English phrase; what the hell is a fish fucker?), we might use that name referentially more or less than fishfucker.

And even our meta-uses of it get folded in. Today and forevermore, I will have a much higher frequency of fishfucker than countless thousands of other words. And sadly, also related a spike in the words 'related' and 'spike'. So, fishfucker, related, spike. Make of that what you will. (But please take the context into account, otherwise you forevermore might just think I'm weird.)
posted by iamkimiam at 4:28 AM on July 25, 2012 [3 favorites]


I want to be quantified. Thanking you in advance.
posted by flapjax at midnite at 4:53 AM on July 25, 2012


yes please
posted by crocomancer at 4:54 AM on July 25, 2012


Please to be including me in this public abuse.
posted by Plutor at 4:59 AM on July 25, 2012


I may as well, even though I'm just looking for a good used car.
posted by Fuzzy Skinner at 5:37 AM on July 25, 2012 [1 favorite]


I'd like to be quantified also! Thanks.
posted by King Bee at 5:56 AM on July 25, 2012


Fuzzy Skinner: "I may as well, even though I'm just looking for a good used car."

I've taken the liberty of doing fred manually:

6 i
4 a
3 used
3 am
3 looking
3 car
3 for
2 just
1 all
1 that
1 will
1 goodbye
1 this
1 around
1 was
1 instead
1 thanx
1 is
1 think
1 bashed
1 tired
1 being
1 walk
1 lot
1 what
1 of
1 and
1 really

And a whopping 0.522 Words Per Favorite. Being a cult icon has its benefits, I guess.
posted by Rock Steady at 6:18 AM on July 25, 2012 [2 favorites]


I wonder if anyone doesn't have "I" as their top word.

3rd for me. "to" and "the" take the top two places. And apparently the words I use more than others MeFite are hrmm, Mississauga, antifuse (only 8 times, but still), refrigerate, telus, dublin, lycos (I wonder why I was talking about lycos so much), gf's, newsgroups, shortcake, galway and zidane.

And I've said hrmm 65 times. It's my default go-to for "need to compose my thoughts, and still end up typing something as I'm composing my thoughts" I guess.
posted by antifuse at 6:45 AM on July 25, 2012 [1 favorite]


Hah, also in that list: "umm" (72 times). Clearly I spend a lot of time trying to express that I'm pausing/contemplating.
posted by antifuse at 6:47 AM on July 25, 2012


(also - thanks to benito.strauss for calculating that for me)
posted by antifuse at 6:48 AM on July 25, 2012


And one last interesting (to me) note - scrolling down past the words in order, after a certain frequency level (once you get past the prepositions and conjunctions and pronouns), it starts to sounds like sentences that SHOULD make sense but don't, like something that you've run back and forth through babelfish 100 times or something. Particularly if you put some interesting usage of punctuation. About 150 words in, for me, you get this sentence: "Every since after most before around last old enough, oh show getting need take while doing down thought always read big question." Then, a bit further down... "Fact: few stuff home made person remember saying Ireland best fun thread"

It's kind of a fun (to me, I am weird) game. Around my first "fuck" occurrence: "Europe expensive, forget fucking"
posted by antifuse at 7:07 AM on July 25, 2012


I want to see what the stats look like for the anonymous AskMe user as far as words in posts. But that pony is awfully far outside this ranch.
posted by deezil at 7:16 AM on July 25, 2012 [1 favorite]


I'm grateful to see that I have only used the word "zuckerberg" once.
posted by The Deej at 7:16 AM on July 25, 2012


I'm grateful to see that I have only used the word "zuckerberg" once.

Gotcha beat there. Never used it once. (I know that even though my results aren't in yet!)
posted by flapjax at midnite at 7:21 AM on July 25, 2012


I never said I was perfect.
posted by The Deej at 7:22 AM on July 25, 2012


Please wrangle and mine my data as well.
posted by Drastic at 7:27 AM on July 25, 2012


I'm interested
posted by burnmp3s at 7:27 AM on July 25, 2012


I should probably resist the temptation, but I won't.

Count me in.
posted by philipy at 7:41 AM on July 25, 2012


Gotcha beat there. Never used it once. (I know that even though my results aren't in yet!)

Of course, now that you've quoted The Deej saying it, since your stats haven't been calculated yet, it's going to show up in them.
posted by antifuse at 7:45 AM on July 25, 2012 [1 favorite]


PPM = ??
posted by alms at 8:03 AM on July 25, 2012


Good morning! youngergirl44 through philipy, below. Frequency tables here.
user 43729:	31924 words,	4985 unique, in	354 comments.
user 112842:	14829 words,	3688 unique, in	413 comments.
user 16000:	507098 words,	34785 unique, in	7990 comments.
user 95433:	19014 words,	4151 unique, in	515 comments.
user 41040:	284300 words,	20915 unique, in	5432 comments.
user 3518:	748303 words,	36500 unique, in	9602 comments.
user 137535:	456338 words,	22175 unique, in	1789 comments.
user 17454:	1242525 words,	40563 unique, in	8167 comments.
user 76988:	26372 words,	4406 unique, in	194 comments.
user 63210:	57882 words,	8664 unique, in	935 comments.
user 84634:	91283 words,	10517 unique, in	1296 comments.
user 33014:	224277 words,	14804 unique, in	2249 comments.
user 39010:	674348 words,	35191 unique, in	16829 comments.
user 30089:	22362 words,	3859 unique, in	375 comments.
user 17646:	225690 words,	21174 unique, in	5420 comments.
user 69556:	130832 words,	11920 unique, in	1829 comments.
user 54984:	77086 words,	8125 unique, in	981 comments.
user 20613:	67188 words,	10341 unique, in	790 comments.
user 116224:	100135 words,	8737 unique, in	901 comments.
posted by cortex (staff) at 8:04 AM on July 25, 2012 [1 favorite]


PPM = ??

Parts per million. That is, if you scaled the source corpus (in this case, a given user's total set of comments) so that it had exactly one million words in it, the PPM value for a given word is the number of times that word would have appeared.

So if "the" has a PPM value of say 50000, that means that 50,000 out of 1,000,000 words would have been "the", or 50K/1M = .05 = 5%. One in every twenty words is "the".

You can easily manually calculate this for any word in the frequency table by dividing it's raw number of occurrences in the table by the total word count for the table, but having it precalculated in the table makes it much more convenient to do comparisons between different tables to find how proportions differ from one corpus to another. If you were just comparing raw word count form one table to another that'd tell you basically nothing if the tables weren't calculated from corpora of identical size.
posted by cortex (staff) at 8:11 AM on July 25, 2012


Pretty fact friend
hard last used enough.

Put 1 anything doing never
own case made, though you'll keep sounds.

Tell while few pay
world part same course.

Such old point.
2 without
here.
posted by alms at 8:11 AM on July 25, 2012


Good morning! youngergirl44 through philipy, below

Not sure if I wasn't clear enough above or was just skipped accidentally but I would like to join the data analysis party.
posted by burnmp3s at 8:27 AM on July 25, 2012


My Top 10 has flow.

"I to the A, and that is it for you!"
posted by benito.strauss at 8:52 AM on July 25, 2012 [5 favorites]


I'm thinking it would be interesting to find out which words I use a lot more frequently than is common. Does anyone know of any handy resources for doing that? e.g. A list of most frequently used words, or a frequency table of words?

Any rough and ready becnhmark is fine, as this is just for fun.
posted by philipy at 8:53 AM on July 25, 2012 [1 favorite]


philipy, check your memail. (Those are word you use more frequently, compared to all the other people on mefi.)
posted by benito.strauss at 9:04 AM on July 25, 2012


That is exactly what I was hoping for also, would this be possible to put together for me also benito.strauss?
posted by jessamyn (staff) at 9:12 AM on July 25, 2012


Give me a whack while you're at it. And given the popularity of the idea, maybe I'll go ahead and put together an adjunct script on my side so I can run these automatically as part of the generation process and save you the effort in the future.
posted by cortex (staff) at 9:17 AM on July 25, 2012


If the novelty hasn't worn off benito.strauss, I would be curious about that for me too.
posted by Iteki at 9:21 AM on July 25, 2012


I would be interested in the benito.strauss analysis, too.
posted by EvaDestruction at 9:23 AM on July 25, 2012


Hey, that would be swell!
posted by griphus at 9:30 AM on July 25, 2012


Me too, benito!
posted by madcaptenor at 9:35 AM on July 25, 2012


Ha. Satisfying:
Words you use heavily that (practically) no one else does:

anony, signups, subsites, userid, munging, nixing, not-great, nomic, munged, asker's, answerers, bugbread, meficomp, followups, per-user, hiya, undeleted, dayjob, sidebarring, datawankery, askers, answerer, fight-starting, metadiscussion, bright-line, rothko, munge, gnfti, jonson, lofi, this'd, usernumber, administratively, mefi-related, heya

Words you use much more than anyone else:

infodump, dios, deletions, asker, matt's, subsite, deletion, jess, nixed, flagging, mefimail, driveby, timeout, metacommentary, konolia, goddam, pb, self-links, chatfilter, metatalk, bumpy, toolset, in-thread, clearcut, callouts, chatty, x-com, dhoyt, referrer, jessamyn's, matteo, followup
So a ton of job-specific mefi jargon, a few user names, and two games I'm obsessive about. That feels about right, and actually feels more close to home than my recommended tags for My Mefi since that tends to be influenced a lot more heavily by the blue threads I have to tell people to simmer down in and so tends to capture incidental tag exposure rather than working vocabulary.

What's your metric for the two categories, benito?
posted by cortex (staff) at 9:52 AM on July 25, 2012


cortex: I feel like the kid who finally decides to be not scared of the roller coasters right as the the park is closing, but if you're still running numbers, I'd appreciate being included. I didn't think I cared, but now I do (which might end up as my epitaph)
posted by MCMikeNamara at 9:52 AM on July 25, 2012 [2 favorites]


It turns out my list is:

c2, msc, datapoints, goodreads, grandmaster, pinboard, malawi, eq, hamming, univ, asker

I'd be interested to know how that was produced. I guess that is *very* unusual words I've used a few times. From my reading of my file I was also curious whether I use words like "because" and "probably" a lot more often than most.

I guess msc is the downcased version of MSc, though I'm surprised I mention that more than most. And I guess maybe other people don't abbreviate university to univ but something else, and call people who ask questions something other than "the asker".

And c2 and eq most likely come from just one answer to a question about probabilities where I answered with a bunch of cases (C1, C2, ...) and equations (Eq 1, Eq 2, ...) and discussed them in detail.

But you'd be right in concluding from the list that I know some stats, like books and chess, use pinboard a lot, etc.
posted by philipy at 9:58 AM on July 25, 2012


Thanks, benito!

Words you use heavily that (practically) no one else does:
bensonhurst, xiu, ows, dashiell, 01100101, doorman, mentallo, puggle, gerd, gcal, adderal, tovarisch, god-knows-what-else, newsradio, 2nding, gaiden, punk-rock, uscis, thwack, sf4, early-30s, kontroll, de-friend, lactaid, csar, llcs, tekken, pomade, tanf, slim-fit, phantasmagoric, dietitian, edm, 99c, trip-hop, jameco, hammett's, barcade, shmuck, late-20s, existenz, levittown, gammond

Words you use much more than anyone else:
i-9, cuny, ukranian, melatonin, bushwick, glamorama, cronenberg's, god-knows-what, onstad, quotidian
Something funny: I use the word "xiu" heavily, but that's only because I refer to the band "Xiu Xiu." I wonder how stuff like that effects the rest of the stats (both cortex's and benito's.)

Also, I am enjoying the fact that I use the words "tovarisch" and "shmuck" regularly enough for it to make a list of any sort.
posted by griphus at 10:08 AM on July 25, 2012 [1 favorite]


For the user words that didn't appear in the list of SiteWords, I cut it off at 10 PPM.

For words that appear in both, I report those with UserFreq/SiteFreq > 40.0, so you're at least 40 times more likely to use those words than the average MeFite.

It's pretty easy to run the script, so if anyone else would like their results, just post in here with "Cloud me". I'll memail you the results, but I hope you'll post it here.

BTW, if you're looking for the one word in there that is "your word", I think you should take the first word in the second list. The first list is kind of iffy, as there's no site data compare it against. And the lists are ordered by decreasing importance.

Going by that standard, my word is "Knotted", which is fine, and strangely more appropriate the more I think about it.

/I really encourage jessamyn to report her word.
posted by benito.strauss at 10:11 AM on July 25, 2012


"Cloud me"?

Cloudmir?
posted by griphus at 10:16 AM on July 25, 2012


Cloud me, Benito!

If I had a nickel, yada yada...
posted by Rock Steady at 10:17 AM on July 25, 2012


benito--me too, please!

Also, I've apparently said stuff 268 times. And t-rex 11.
posted by phunniemee at 10:20 AM on July 25, 2012


Benito, could you please cloud me? Thanks!
posted by a snickering nuthatch at 10:25 AM on July 25, 2012


First, thank you, cortex!

Next, benito, benito, me, me!
posted by thinkpiece at 10:29 AM on July 25, 2012


God, Greg, you never do shut up about the time you got backstage at that Finntroll show in Concord, do you?
posted by griphus at 10:30 AM on July 25, 2012


Ooo I would like that cloud thing too please!
posted by shelleycat at 10:30 AM on July 25, 2012

Words you use heavily that (practically) no one else does:
slingbox, multifilter, easement, communique, bloodwork, renal, clabby, gostak, veterinarians, behaviorist, deskology, tplo, secretlife, headboard, broksonic, babymama, prosted, crockety, beston, euthanize

Words you use much more than anyone else:
veterinary, urinary, metafilters, vet, burlington, overdraft, acl, montpelier, fenway, espn's, wiimote, embroidery, tufts, asker, picasa, limping, urination, orthopedic, notary, listerine
That is eerie.
posted by Rock Steady at 10:30 AM on July 25, 2012


(I turned mefimail back on and everything)
posted by shelleycat at 10:31 AM on July 25, 2012


Well, what happens between you and Samu "Beast Dominator" Ruotsalainen stays in Samu "Beast Dominator" Ruotsalainen.
posted by griphus at 10:32 AM on July 25, 2012


Wait hang on that came out wrong.
posted by griphus at 10:32 AM on July 25, 2012 [1 favorite]


me too please
posted by dpx.mfx at 10:33 AM on July 25, 2012


I mean, cloud me! I didn't read the directions! I'm sorry!
posted by thinkpiece at 10:34 AM on July 25, 2012


Folks (esp. the young rope-rider), you have to ask cortex to generate the listing first before I can cloud you.
posted by benito.strauss at 10:35 AM on July 25, 2012


thinkpiece, your words are in your MeFi mail.
posted by benito.strauss at 10:37 AM on July 25, 2012


ME ME CLOUD ME

plz
posted by elizardbits at 10:39 AM on July 25, 2012


If it's helpful, I'm second from the bottom of the first batch Cortex ran.
posted by shelleycat at 10:40 AM on July 25, 2012


Oh, benito.strauss, I would be your best friend! I'm dying to know my unique frequency words :D
posted by Eyebrows McGee at 10:42 AM on July 25, 2012


shellycat, it should be in your memail, with the very useful subject line of "Y" (should have been "Your words". Oops).
posted by benito.strauss at 10:42 AM on July 25, 2012 [1 favorite]


Thanks!
Words you use heavily that (practically) no one else does:
reddits, askreddit, hsv, luminance, partials, bombadil, f2, allele, photoreceptor, ludum, lbh, tms, hippocampus, mass-having, n-k, extraversion, addclass, eigenvector, eyedropper, unir

Words you use much more than anyone else:
subreddits, cyan, becky, rgb, var, centaur, nonpartisan, gur, hue, vg, recessive, ovulation, jfc, convolution, datapoints, drezdn, thalamus, n-1, associative, yoshi's
My sockpuppet will have to be "Bombadil's hippocampus" or "Yoshi's ovulation".

Also, uh, sorry about talking about reddit so much. I just wish reddit and metafilter could get along.
posted by a snickering nuthatch at 10:43 AM on July 25, 2012


All the cool kids are being clouded. Why not me?

(thanks)
posted by The Deej at 10:43 AM on July 25, 2012


Words you use heavily that (practically) no one else does:
obvsly, hungrily, grotendous, meniere's, sammiches, idek, tampax, daytrana, lactaid, hilarrible, vickers, sry, finlandia, blankie, ibiza, trufax, pleh, mishegoss, lapdancing, janek

Words you use much more than anyone else:
idk, prolly, lactose, migraines, poc, derp, seekrit, sammich, kitteh, bedazzled, googly, gramma, ghastly, etsy, rambly, westside, holloway, noms, idris, vexing


SCREAMY LOLS
posted by elizardbits at 10:43 AM on July 25, 2012 [3 favorites]




"The Corpus of Contemporary American English" does not contain the word 'grotendous'. I reject your corpus.
posted by benito.strauss at 10:47 AM on July 25, 2012 [1 favorite]


Holy crap it's like my whole life reduced to a couple of lists of words:
Words you use heavily that (practically) no one else does:
reflux, orthotics, ibd, podiatrist, anti-inflammatory, gerd, astigmatism, kiwifruit, polymorphism, nutrigenomics, ppis, cattery, hydrated, pharmac, rumen, merino, physiologist, ringworm, antihistamines, trademe

Words you use much more than anyone else:
ibs, dietician, optometrist, msc, auckland, sentance, anaesthetic, inflammation, nz, endnote, gait, ug, sinus, constipation, mandy, ferrous, practise, multivitamin, intestinal, hydration
(livestock, weird or icky health problems, some obscure biology, and my now-dead (of old age) cat)

Thank you so much benito.strauss!
posted by shelleycat at 10:51 AM on July 25, 2012


Sure! I'd love to quantify my internet existence.
posted by Turkey Glue at 10:51 AM on July 25, 2012


All the cool kids are being clouded. Why not me?

At last I am counted among the cool kids.

But why were my results in a different format than everyone else's huh?

*Frets that his juiciest data got lost*
posted by philipy at 10:51 AM on July 25, 2012


I also love that my distinctive 'word' is ibs. I still remember sitting in seventh form biology drawing a kidney and deciding I'm going to study intestines for ever.
posted by shelleycat at 10:53 AM on July 25, 2012 [1 favorite]


The COCA would be a good thing to do a sort of baseline comparison to the larger Mefi Corpus frequency tables themselves, to establish how site usage itself varies compared to a more general collection of English language usage in a variety of modes, yeah, as well as to identify site-specific jargon by looking for relatively common terms in local usage that don't appear in the more general corpus at all.

I played around with that briefly at one point, in fact, but didn't get very far into it and don't know what I did with the output.
posted by cortex (staff) at 10:56 AM on July 25, 2012


I would like the cortex thing and the benito thing please, thank you.
posted by stupidsexyFlanders at 10:58 AM on July 25, 2012


Thanks! This is awesome!

From my list, you can tell that I...

Have a ton of medical issues, enjoy crafting and cake and pie, live in Chicago and love Chinese food, and am weirdly into ear cleanliness. Sounds about right.

Words you use heavily that (practically) no one else does:
chiari, pilsen, malformation, n'thing, whatsit, stenciling, consular, hbc, jacquard, bookbag, pumice, cipro, sze, ricotta, carlsbad, gerd, chuan, neutrogena, 5'7, tomochichi

Words you use much more than anyone else:
nordstrom, phunniemee, tsp, fondant, q-tip, neurologist, rhubarb, lao, landlady, mspaint, prosciutto, ziplock, levi's, poopy, hiccups, savannah, zit, acetone, aldous, creamer

posted by phunniemee at 11:03 AM on July 25, 2012


Cloud me! Whee!
posted by purpleclover at 11:04 AM on July 25, 2012


"'The Corpus of Contemporary American English' does not contain the word 'grotendous'. I reject your corpus."

Heh. I just threw that up there for anyone who is interested and motivated.

In my case, I am both frustrated and relieved that I do not know enough about either linguistic corpus analysis or statistics to attempt to do what I'm feeling an almost unbearable itch to do. That means that I'll shrug the urge off in the next few minutes instead of spending the next fifteen hours doing stuff that would probably be reinventing the wheel. Leaving me time to, I dunno, read some other genre novel or sleep. I suppose that's a win.

But my curiosity is piqued because my corpus here is pretty large: 1.2M words as EB and 450K in this incarnation, for a total of 1.7 million words. That works out to be about 152 per comment when I was EB, and 255 words per comment now. My top twenty words are the same and almost in exactly the same order in both incarnations. And the word should is the 100th most common word in both.

The analysis I'd be interested in — not that benito.strauss's isn't also quite interesting — would be ignoring the true rarities and instead look, among the widely used nouns and verbs, those which are used quite abnormally more frequently (or rarely). Yeah, yeah...that sort of analysis is pretty much tempting one to do what I was warning against earlier (and a warning a certified professional linguist underscored). But, you know, I'm human and fallible and more curious than is good for me.
posted by Ivan Fyodorovich at 11:09 AM on July 25, 2012


Here's some fun poetry from my list (each word used 20 times):

Cat couldn't deal exactly
friend happy
issue
looks mental

Add your own punctuation and line breaks for additional interpretations!
posted by MultiFaceted at 11:17 AM on July 25, 2012


Cloud me too!
posted by MultiFaceted at 11:21 AM on July 25, 2012


I can has cloud too? Please and thank you.
posted by usonian at 11:25 AM on July 25, 2012


If it's not too late, quantify and cloud me, please.
posted by peppermind at 11:52 AM on July 25, 2012


Benito, if you don't mind....

(thanks!)
posted by troika at 11:53 AM on July 25, 2012


Hmmm....apparently I like medical sounding words.

Words you use heavily that (practically) no one else does:
pcos, bcbs, eap, bloodwork, scoliosis, i-55, jeweler, orthopedist, banquets, mucinex, bcp, girardeau, workup, trazodone, msw, insurances, i-40, clomid, gallbladder, iupui

More common words you use much more than anyone else:
uhc, cysts, endocrinologist, ianad, inpatient, wellbutrin, consignment, uti, mediator, counselor, pcp, meds, outpatient, therapist, taper, hospitalization, counseling, antidepressant, pharmacist, adjuster

posted by MultiFaceted at 11:54 AM on July 25, 2012


Please, please, please, let me get on that cloud -- thanks!
posted by maudlin at 12:00 PM on July 25, 2012


Oh, me too, please benito!
posted by likeso at 12:05 PM on July 25, 2012


me too!
posted by Corduroy at 12:11 PM on July 25, 2012


I talk a lot about schools and toddlers/babies, no surprise there.

But I am delighted that the words "bookmobile" and "dramarama" turned up on my "words you use heavily that no one else does." Also "dressy." It's weird that none of you are using "dressy," that's a totally normal word.
posted by Eyebrows McGee at 12:21 PM on July 25, 2012 [1 favorite]


Also "dressy." It's weird that none of you are using "dressy," that's a totally normal word.

I'm just ashamed at the rest of you for underusing prosciutto.
posted by phunniemee at 12:26 PM on July 25, 2012


One of my most common words (that isn't found in the site file) is "egt" - I appear to mistype "get" a lot (13 times!).
posted by antifuse at 12:30 PM on July 25, 2012


Cloud me! (Unless you're completely snowed under with requests, of course!)
posted by Dysk at 12:37 PM on July 25, 2012


Dramarama is a great word, one I'm eager to use. Er, I just don't know what it means.
posted by Ivan Fyodorovich at 12:38 PM on July 25, 2012


The latest batch, burnmp3s through Corduroy. Tables.
user 63307:	618305 words,	24581 unique, in	4826 comments.
user 24139:	295692 words,	17780 unique, in	3129 comments.
user 111601:	361242 words,	16903 unique, in	4703 comments.
user 21945:	131353 words,	9679 unique, in	1550 comments.
user 92600:	20225 words,	4209 unique, in	212 comments.
user 15461:	173483 words,	17089 unique, in	3766 comments.
user 125051:	6435 words,	2047 unique, in	139 comments.
user 64727:	738 words,	377 unique, in	11 comments.
posted by cortex (staff) at 12:38 PM on July 25, 2012


A dramarama is a festival of drama, a drama nexus. A system defined by its high drama quotient.

One who precipitates a dramarama is generally a drama rocket.
posted by cortex (staff) at 12:39 PM on July 25, 2012 [1 favorite]


Variations on "drama".
posted by cortex (staff) at 12:46 PM on July 25, 2012


Okay, I am curious what words I use more than other people do.
posted by rmd1023 at 12:48 PM on July 25, 2012


cortex: "Variations on "drama"."

My Great-Uncle was killed in a dramamine cave-in.
posted by Rock Steady at 12:51 PM on July 25, 2012 [1 favorite]


cortex: The userid for corduroy looks to me like '64724' not '64727'.
posted by benito.strauss at 1:01 PM on July 25, 2012


Is it too late to ask benito.strauss to cloud me? Pretty please?
posted by ambrosia at 1:06 PM on July 25, 2012


I like clouds if that's a thing that's getting done.
posted by shakespeherian at 1:14 PM on July 25, 2012


Words you use heavily that (practically) no one else does:
fetlife, circlet, arginine, lysine, cryp, passwordsafe, clubman, ovary, penninsula, instructables, delica, romex, harrah's, mbta, rmd, badassed, skete, noticable, fnx, boxcutter

More common words you use much more than anyone else:
somerville, hysterectomy, electrician, crochet, ovarian, cysts, griswold, pyro, cpap, live-in, crocheted, optionally, ovaries, electricians, dorchester, sailboat, bidet, lanyard, softener, 70mm


LET ME TELL YOU ABOUT MY (now-removed) LADY PARTS!
posted by rmd1023 at 1:21 PM on July 25, 2012 [3 favorites]


The userid for corduroy looks to me like '64724' not '64727'.

Fixed! That did seem awful quiet.
user 64724:	37890 words,	4856 unique, in	1173 comments.
posted by cortex (staff) at 1:21 PM on July 25, 2012


Thank you, benito!
And that was completely weird. With few exceptions, all the words I used heavily that practically no-one else did were usernames. But first on the list was my own username. Also high in use was its possesive. Both had me stumped because I don't think I ever have... until I realized the "Mr." had been stripped. So that's okay then.

Also, yay, tanuki, Islay and Laphraoig!
posted by likeso at 1:22 PM on July 25, 2012


Cloud me too please
posted by burnmp3s at 1:28 PM on July 25, 2012


I always knew I was fond of the creating hyphenated compounds to try to get my point across online but my list really shows it might be more of a problem than I realized.

But please cloud me if you can so I can make sure.

(Thanks for this -- both those of you who are doing the work and those of you who are getting as geeked about it as I am.)
posted by MCMikeNamara at 1:31 PM on July 25, 2012


Cloud me too please, if it's not too much trouble and doesn't get in the way of eating, sleeping and generally living your life.
posted by arcticseal at 1:39 PM on July 25, 2012


Am I too late for the benito treatment?
posted by Devils Rancher at 1:40 PM on July 25, 2012


I'd like that thing too, if possible, please.
posted by box at 1:43 PM on July 25, 2012


I don't even know what this cloud thing is but I am but a sheeple so me too, please! and please god I would like it if work stop freaking out long enough for me to float in the cloud for a little while.
posted by rtha at 1:47 PM on July 25, 2012


cloud powers activate! Thanks, benito.
posted by zamboni at 1:49 PM on July 25, 2012


If this isn't over... sign me up!
posted by owtytrof at 2:02 PM on July 25, 2012


Benito, since you seem to be a fount of generous wisdom, I would like to find out exactly how pedestrian I am.
posted by maxwelton at 2:04 PM on July 25, 2012


I too would like to live in a cloud.
posted by languagehat at 2:06 PM on July 25, 2012


My benito list is hysterical, thank you benito.strauss!

Words you use heavily that (practically) no one else does:
equivelant, sammiges, safir, jklmnop, gotland, neger, scandyland, editplus, drawdio, adress, sammige, ballymun, alt-1, jnkping, absolutly, jellyish, tissot, hurra, mbits, barrys, siobhan, b1, semi-fitted, och, ovary, wallander, zzzz, ane, dealextreme, brun, webmonkey, bests, swedex, svenska, lyons, svarting, sugru, chrono

Words you use much more than anyone else:
somone, cloves, ftr, ofredande, starbuck, duvet, divx, sexuellt, stepmom, dcu, icecream, a1, worksheet, exif, det, asa, sapphire, scuse, mats, mu, apparantly, stockholm, knickers, swedes, xxxx, kan, slimmer, keystroke, eachother, thirding, auntie, cv, noone, semi-regular, swede, bobs, mcd, youse, cos, aspx, alba, whiteboard, sweden, teaspoon, handball, prolly, swedish, mame, swimmers, fab, darlingbri, scandinavia, womans, petrol, sakes, omigod, battlestar, dublin


Point the first, I can't spell for shit, but I am consistent with it!
My special subjects are: Battlestar Galactica, knickers, brands of Irish tea, classic windows uttilities and racist Swedish slang.
posted by Iteki at 2:11 PM on July 25, 2012 [2 favorites]


If it's not too late, please could I also be quantified and clouded? Thank you, cortex and benito.strauss!
posted by daisyk at 2:13 PM on July 25, 2012


Will pay $$$ to see Battlestar Dublin.
posted by griphus at 2:19 PM on July 25, 2012 [1 favorite]


I would love this too please, benito, if you would be so kind! Much gratitude.
posted by grouse at 2:22 PM on July 25, 2012


Hey, I'm down for both as well! Serves me right for ignoring Talk for days...
posted by yellowbinder at 2:27 PM on July 25, 2012


Thanks Cortex!
posted by b33j at 3:10 PM on July 25, 2012


benito, cloud me bro
posted by danny the boy at 3:11 PM on July 25, 2012


benito.strauss? please cloud me?
posted by b33j at 3:13 PM on July 25, 2012


Could I be clouded as well?
posted by Ghidorah at 3:15 PM on July 25, 2012


me too benito please
posted by sweetkid at 3:22 PM on July 25, 2012


Can I get clouded?
posted by codacorolla at 3:32 PM on July 25, 2012


Ooh, benito, can I have the cloudiness too? This is awesome.
posted by argonauta at 3:37 PM on July 25, 2012


Yes, benito, I must also be compared to others.
posted by fantabulous timewaster at 4:21 PM on July 25, 2012


Oh dear. "I" is #2 on my list. "Fuck" on its own is at 26 although there are many other uses of it. Swearing is fun!
posted by deborah at 4:22 PM on July 25, 2012


Yes please with jam on.
posted by Jofus at 4:35 PM on July 25, 2012


Take me up to your cloud, benito, if you please!
posted by Kattullus at 4:49 PM on July 25, 2012


Earlier I did my math wrong and have spent the rest of the afternoon thinking my comments averaged 950+ words per. Watch your decimals, folks.
posted by MCMikeNamara at 4:53 PM on July 25, 2012

Words you use heavily that (practically) no one else does:
prospero, doodle-do, ba-dum, mcnamara, hermeneutic, caliban, teeshirt, pequod's, yeager, thready, tideland, pedway, baziotes, murch, yoink, antoninus, loughner, metaphysic, serrano's, gilliam's

More common words you use much more than anyone else:
beale, omigod, kathrineg, kronos, lydia, ladin, eraserhead, muh, metafictional, koopa, teevee, shakespeherian's, x-rated, nc-17, camilla, pemulis, zimmerman, obviates, mulholland, caruso
If I posted this anonymously everyone would guess it was me anyway.
posted by shakespeherian at 5:01 PM on July 25, 2012 [2 favorites]


Please generate my file, if you don't mind, cortex.
posted by box at 5:10 PM on July 25, 2012


oooh, thanks benito!

Words you use heavily that (practically) no one else does:
cairns, veges, mossman, h00py, hygene, qut, vege, uq, b33j, centrelink, daintree, cindarella, quitters, qld, cqu, townsville, socialise, wetcanvas, bushwalking, ex-smokers

More common words you use much more than anyone else:
queensland, ipswich, mccaffrey, endnote, committment, counselling, lecturers, pseudoephedrine, brisbane, whereever, mindfulness, apologised, a1, sympathise, counsellors, icecream, askme's, tinned, uni, hairdresser

I'm kinda scared now. Icecream, really? I bet some of this is because of my non-US spelling. Also, hello h00py, it appears I'm stalking you.
posted by b33j at 5:12 PM on July 25, 2012 [2 favorites]


benito.strauss is a prince among men:
Words you use heavily that (practically) no one else does:
allele, cygwin, workrave, bollards, germline, mpb, indent, refundable, seq, belltown, kuro5hin, win32, bhrt, vnc, priceline, mphil, methylation, fdic-insured, kinesis, paddington

More common words you use much more than anyone else:
flyertalk, gbp, komen, utc, merriam-webster, grouse, thinkpad, issuer, thinkpads, endnote, alleles, dict, rei, airline's, hsbc, eviction, fremont, rsi, str, eur
posted by grouse at 5:14 PM on July 25, 2012 [1 favorite]


I hug you, benito.strauss!
Words you use heavily that (practically) no one else does:
birding, yorvit, binos, metas, menlo, headlands, peregrines, crissy, gay-married, starlings, frjtz, unvaccinated, redtails, birders, annulled, distilleries, couple-three, reacher, redtail, recs

More common words you use much more than anyone else:
gingerbeer, askmes, peregrine, pdx, sfo, islay, full-fat, potrero, raptors, francisco's, haight, schmoopy, rei, separatists, ianad, wf, relationshipfilter, mefimail, dartmouth, meetups
It's a shame that more people don't use the word "distilleries." The world would likely be a happier place.
posted by rtha at 5:16 PM on July 25, 2012 [6 favorites]


Sure.
posted by jadepearl at 5:20 PM on July 25, 2012


Hey cloudy folks. I stepped out for dinner (at a restaurant that this comment directed me to). I've gone through and done all the people who have asked and who had cortex's files available.

If I didn't send you email, get cortex to make your file, and then post in here again. I'll come back and look for requests starting after this comment of mine.

And may I say, having seen deep into the secret hearts of many of you, and having visited more profile pages in one day than in the past year (as part of sending the memail), y'all are deliciously weird and a fine looking group of people.

P.S. And thanks, cortex, for creating the extracts.
posted by benito.strauss at 5:22 PM on July 25, 2012 [6 favorites]


Oh, please, benito, please, I want you to cloud me.
posted by gingerest at 5:27 PM on July 25, 2012


I suspected that my cloud would out me as a physics nerd:
Words you use heavily that (practically) no one else does:
1023, u-233, antisymmetric, symmetries, nucleon, cesium, quantum-mechanical, nucleons, tritiated, lorentz, myr, polarizations, wavefunction, microstates, invariant, alphas, localhost, antiprotons, mev, unfalsifiable

More common words you use much more than anyone else:
nucleus, neutrons, nuclei, protons, muons, proton, neutron, c2, electrons, muon, decays, polarized, cambrian, long-lived, electron, angular, 033, parabola, lhc, fermions
CONFIRMED
posted by fantabulous timewaster at 5:29 PM on July 25, 2012 [3 favorites]


Heh, I am outed as an epidemiologist who likes acronyms and apostrophes.

Words you use heavily that (practically) no one else does:
epidemiologist, hcg, iany, hogweed, tapeworms, labials, dermatitis, allergist, spicata, executor, dvt, numinous, ramon, estate's, academy's, fmr, braiding, cdc's, hepatic, toileting

More common words you use much more than anyone else:
iana, iu, two-bedroom, hallowe'en, earplugs, ianad, leprosy, pillowcases, cadaver, adelaide, louse, ovulation, incision, anniversaries, influenza, pe, ovarian, cadavers, rabies, long-ago
posted by gingerest at 5:40 PM on July 25, 2012 [2 favorites]



Words you use heavily that (practically) no one else does:
rockridge, schuylkill, philadelphian, m-1, bendy-straw, divisible, one-bedroom, boltbus, phl, 46th, mathoverflow, locust, grothendieck, nomyte, robot-making, firstname, njt, stackexchange, 701735, stterlin, escabeche's, one-fourth, wharton, paulos, i-10, megabus, cinnaholic, chocolate-covered, swedesboro, probabilist, ratemyprofessors, cooperstown, 108th, doylestown, fiord, e0, dianda's, mlbaway, poisson, ashby, 551, sriracha, 01t, droit, m-by-n, mlbhome, ablow, binghamton, mhum, westbound, tetrahedra, yb, carrollton, zoneinfo, embarcadero, arctan, 2-part, apportionment, syllabi, dfriedman, 00875, subdivided, utc-11, grise, mollymayhem, gentrifiers, sublets, gowers, diagonal-to-side, mostlybeans, panhandler, bustelo, cannoli, chalkboards, spf, bite-size, polyhedra

Words you use much more than anyone else:
combinatorics, utc, sqrt, septa, madcaptenor, n-1, 1s, ithaca, bwi, blaise, combinatorial, 45th, madcap, phillies, timezone, trenton, housemates, escabeche, quadratic, philadelphia, lastname, 229, transpose, cloves, 395, 0s, beloit, 2n, gpas, turnpike, parkway, trolleys, fielder, english-speakers, klout, 34th, eastbound, kippur, binomial, fellowships, out-of-state, chipotle, quarts, markov, posterior, ln, palindromes, cheesesteaks, oxen, 30th, kilogram, mathematician, maryr, algebraic, newark, matrices, yom, benito, gerryblog, e-mails, tutors, centimeters, i-95, desi, one-year, philly, departmental, chalkboard, payrolls, east-west, undergrads, berkeley, grande, amtrak, euclid, provolone, circumference, coefficient, northampton, non-academic, cross-country, uc


I think it's interesting that my lists are longer than other people's who have posted here (unless you guys are editing). Does that mean I'm somehow weirder? For some of these I actually remember the time that I posted about them. And this pretty much outs me as an academic mathematician who has lived in Philadelphia and the Bay Area. In particular it's kind of weird that random numbers show up.
posted by madcaptenor at 5:44 PM on July 25, 2012 [1 favorite]


I didn't edit mine.
posted by shakespeherian at 5:57 PM on July 25, 2012


me, please, benito.strauss.
posted by juv3nal at 6:00 PM on July 25, 2012


Words you use heavily that (practically) no one else does:
withnail, orig, psot, chalkhills, poshlost, karst, whilest, caver, dery, catarrh, poisonwood, blackfoot, aquismon, celica, prefs, vreeland, posas, moulding, mexi-jazz, cobbler

More common words you use much more than anyone else:
basses, s'pose, cavers, xtc, fsck, fretless, long-run, rancher, caving, porcupine, lobos, flameout, monterrey, tree's, longhorns, g3, g5, single-user, austin's, goddam


Pretty much sums it up. Music loving, apple-using, book-reading, outdoorsy counter-culture-consuming Austinite. Unique and special, just like the other million of us.
posted by Devils Rancher at 6:02 PM on July 25, 2012


I would love to be part of this party please...
posted by schyler523 at 6:08 PM on July 25, 2012


madcaptenor: I experimented a bit with different ways of choosing how many words to take from the sorted list. Yours was generated before I put in a hard cut-off of at most 20.
posted by benito.strauss at 6:30 PM on July 25, 2012


benito.strauss, thanks for my list. I, uh, uh....

Words you use heavily that (practically) no one else does:
ueno, narita, asakusa, harajuku, ubud, kamakura, tanuki, izakaya, saute, tepco, fukushima, wuhan, efl, kuta, phuket, nikko, ameyoko, kanto, bugbread, miso

More common words you use much more than anyone else:
chiba, shinjuku, nhk, oregano, minced, snorkeling, bali, blackouts, waikiki, loin, charcuterie, kanji, skewers, lemongrass, thyme, ianad, asahi, cumin, gaijin, balinese


So, Japan, cooking, Asia. I feel pretty one note at this point.
posted by Ghidorah at 6:38 PM on July 25, 2012


I'm really bummed that quincunx didn't make either list. i've tried so hard.
posted by Devils Rancher at 6:50 PM on July 25, 2012


Thanks benito.strauss, it was a glimpse into my self-centered self!
posted by arcticseal at 6:52 PM on July 25, 2012


548.
Words you use heavily that (practically) no one else does:
talulah, icelander, lightly-soiled, bolao [Bolaño], silencio, triple-zero, mulp, codenamed, clzio [Clézio], arneson, alekhine, tulinius, hossein, capablanca, kri [Kári], lunit, breivik, mller [Müller], salmacis, catullus

More common words you use much more than anyone else:
lederhosen, reykjavk [Reykjavík], kickball, icelandic, icelanders, reykjavik, tripoli, lobster, likelier, blogpost, sagas, merriam-webster, roberto, intertubes, byzantium, bugle, earthsea, anglophone, meet-up, iceland
posted by Kattullus at 7:27 PM on July 25, 2012 [1 favorite]


benito, could I trouble you? :)
posted by zarq at 7:28 PM on July 25, 2012


This looks like fun. May I have my weird-words list?
posted by cmyk at 7:37 PM on July 25, 2012

Words you use heavily that (practically) no one else does:
mml, tipi, dixieland, edc, vishanti, wahoo, kiowa, msl, roo, breivik, lcc, thiokol, marionberry, stick-head, 2cv, ub, docteur, hotdish, canth, tineye

More common words you use much more than anyone else:
zamboni, carbonation, meteorites, ithaca, beetroot, mer, barnum, osorio, grog, pasties, whoopie, lutefisk, armoured, cloves, leatherman, three-toed, chiro, gak, impassable, spelt
posted by zamboni at 8:19 PM on July 25, 2012


lightly-soiled?
posted by Devils Rancher at 8:26 PM on July 25, 2012


Words you use heavily that (practically) no one else does:
1e4c5, bd7, 3bb5, 2nc3d6, 4nge2bxb5, 5nxb5qd7, 6nbc3nc6, 7d3e6, 9d4bxg5, 8bg5be7, http402, 10dxc50-0-0, 11h4bf6, 12cxd6qxd6, djahandarie, 13qxd6rxd6, 14h5nge7, 15h6g6, 16rd1rxd1, 17kxd1rd8

More common words you use much more than anyone else:
lucasarts, greenaway, juv3nal, maxx, tri, kieth, mana, win2k, takeshi, tpb, psn, barthes, ruiz, autechre, confield, cfm, tableau, stylesheet, seekrit, cheezy


LOL. my epic chess match with fladablet appears to be dominating the count. also I am inordinately proud of barthes & seekrit being on the more common list.
posted by juv3nal at 8:28 PM on July 25, 2012 [1 favorite]


Words you use heavily that (practically) no one else does:
trurl, haggadah, fyodorovich, restless_nomad, paterno, bnd, siri, pcos, comfrey, memails, sopa, conlin, ret, bris, perinatologist, amniotic, judeophobia, villanelles, reflux, sandusky

More common words you use much more than anyone else:
ovarian, abbas, ha'aretz, iui, antisemitic, squabble, non-jews, rope-rider, hoder, circumcisions, pb, antisemitism, non-orthodox, poet_lariat


The Greasemonkey quote script tilts the answers. Without user names, the words become:

Words you use heavily that (practically) no one else does:
haggadah, paterno, bnd, siri, pcos, comfrey, memails, sopa, conlin, ret, bris, perinatologist, amniotic, judeophobia, reflux, sandusky

More common words you use much more than anyone else:
ovarian, abbas, ha'aretz, iui, antisemitic, non-jews, circumcisions, antisemitism, non-orthodox


These are really not the words I would have expected. Also, I seem to talk a lot about Jews. :D
posted by zarq at 9:28 PM on July 25, 2012 [1 favorite]


Benito! Me please!
posted by pompomtom at 9:33 PM on July 25, 2012


Words you use heavily that (practically) no one else does:
sweetkid, brosh, xarnop, miko's, benzos, therapist's, klonopin, appt, journaling, myrick, trolly, seattle_surfer_geek, hotlines, bops, meowing, frizz, allergist, goren, smash's, villanelles

More common words you use much more than anyone else:
therapist, allie, womp, hotline, dumbo, shingles, therapists, twop, josephine, bombay, palomar, hawke, chianti, amir, irene, copywriter, extroverts, montezuma, quidnunc, therapy



posted by sweetkid at 10:15 PM on July 25, 2012


Cloud me!
posted by pracowity at 10:24 PM on July 25, 2012


Cortex, please make me a list?
posted by Night_owl at 10:45 PM on July 25, 2012


Cortex, can I have one too?
posted by cmyk at 10:56 PM on July 25, 2012


Man, that what-words-identify-you thing looks fun! Benito, in your copious free time, would you do one for me?
posted by KathrynT at 11:17 PM on July 25, 2012


Would you cloud me, please?
posted by the latin mouse at 1:47 AM on July 26, 2012


We could have had a bunch o fun playing having benito.strauss just post the lists and let us guess who it was. I smell a mefi-minigame in the making.
posted by Iteki at 1:54 AM on July 26, 2012 [1 favorite]


I picked a fine day to sleep in. Count me in on the "want want want!"
posted by Saydur at 2:21 AM on July 26, 2012


Report for user 84634

Words you use heavily that (practically) no one else does:
screamo, avaliable, dimebag, w-9, janteloven, xr, antonym, crayz, effexor, aberfeldy, paracetamol, vicars, mclusky, xubuntu, venlafaxine, reverie, 40aa, webern, runkelfinker, breivik

More common words you use much more than anyone else:
pantera, intensively, socialised, tak, moslem, cloves, pronoiac, industrially, diablevert, i'd've, childrearing, pop-ups, amusedetachment, tranny, marisa, jaduncan, prefixes, coexistence, cis, esa


I thought "I'd've" would be further up the list, really, but it seems more people use it than I'd've thought...
posted by Dysk at 3:33 AM on July 26, 2012


benito.strauss, please cloud me. Thank you! I appreciate it!
posted by MonkeyToes at 4:06 AM on July 26, 2012


"Cloud me" is beginning to sound like a euphemism.

Maybe there should be T-shirts: "I got clouded by Benito".
posted by philipy at 6:41 AM on July 26, 2012 [1 favorite]


I LIKE CLOUDY CLOUDS AS IF IT WERE MY PROFESSION
posted by shakespeherian at 6:49 AM on July 26, 2012 [2 favorites]


Just to be clear - my previous request was to cortex. Quantify me!

... please?
posted by owtytrof at 7:13 AM on July 26, 2012


"I paid $5 (SAIT) and all I got was clouded by benito.strauss"
posted by rtha at 7:16 AM on July 26, 2012


I think cortex should write a song: "Quantify me, baby!"

You know you want to.
posted by philipy at 7:24 AM on July 26, 2012


I am not a number, I am a man! A man made out of numbers.
posted by Devils Rancher at 8:05 AM on July 26, 2012


Alright, here we go, from WCityMike on down. Get your tables here.
user 22023:	356666 words,	25113 unique, in	4440 comments.
user 64035:	58102 words,	8189 unique, in	1316 comments.
user 72963:	8344 words,	2101 unique, in	103 comments.
user 31444:	59171 words,	7823 unique, in	1056 comments.
user 1574:	61616 words,	9226 unique, in	2109 comments.
user 17573:	290452 words,	25149 unique, in	10154 comments.
user 14594:	105039 words,	11803 unique, in	1267 comments.
user 23037:	67294 words,	10005 unique, in	1920 comments.
user 18064:	587672 words,	28208 unique, in	11534 comments.
user 21202:	72180 words,	9560 unique, in	1005 comments.
user 84093:	54695 words,	7428 unique, in	1169 comments.
user 18726:	113506 words,	11286 unique, in	864 comments.
posted by cortex (staff) at 8:06 AM on July 26, 2012 [1 favorite]


All this treating my Perl scripting as if it were a smutty sexual act?

I really, really like it.
posted by benito.strauss at 8:47 AM on July 26, 2012 [9 favorites]


Yay, thanks cortex! benito.strauss, could you cloud me now please?

This is my first time writing 'fuck' on MeFi in any of its forms. I'm such a goody two shoes.
posted by daisyk at 8:49 AM on July 26, 2012


Words you use heavily that (practically) no one else does:
bhin, clomp, runnable, hap, gdansk, rial, pars, rasp, heil, malodourous, ail, 38000, sdb, tigress, hasp, haps, zeldman, nils, 24000, 48000

More common words you use much more than anyone else:
stag, blam, lain, nantucket, lira, 54000, 31000, trams, 22000, rani, 33000, 29000, matriarchal, springtime, 51000, doggies, 49000, 69000, 34000, 18000

Thanks, benito.strauss.
posted by pracowity at 9:21 AM on July 26, 2012


The only words I use heavily that (practically) no one else does that weren't proper nouns or Metafilter handles were:

wineskin, homesteading, cockring


On a related note, I think everybody should copy their clouds into their profiles because I think it's pretty telling whether you want it to be or not.
posted by MCMikeNamara at 9:46 AM on July 26, 2012 [4 favorites]


I think everybody should copy their clouds into their profiles because I think it's pretty telling whether you want it to be or not.

Good idea. Done!
posted by grouse at 9:51 AM on July 26, 2012


Okay, sure, why not.
posted by griphus at 9:54 AM on July 26, 2012


Joy! benito, please cloud me. Cloud me good.
posted by cmyk at 10:12 AM on July 26, 2012


I as well am ready for my clouding. Thanks muchly in advance!
posted by yellowbinder at 10:16 AM on July 26, 2012


benito is fast! Here's what I've got.

Words you use heavily that (practically) no one else does:
cujo, nikkormat, tpd, cowbells, palmetto, cannoli, ftn, mina, anole, hiaasen, ybor, 120mm, instamatics, spousing, dramamine, zednik, floridian, tesseract, wossits, wharg

More common words you use much more than anyone else:
cmyk, estonian, basset, tampa, garp, clang, kibble, roscoe, pup, komen, kimono, spasms, cantankerous, crate, pepe, full-grown, parrots, gators, winamp, great-aunt

So... dogs, cameras, Florida locales and animals, my crazy immigrant family, onomatopoeia, and my god-forsaken killer cat.

Sounds about right.
posted by cmyk at 10:25 AM on July 26, 2012


Super fast! Fuck yeah Culdcept!

Words you use heavily that (practically) no one else does:
addable, sleater-kinney, culdcept, nuit, skyrim, digitizer, gangbang, showrunners, rb1, zuneboards, dufferin, toonie, bejewelled, saint's, timeport, bev, hmv, microprose, streamclip, damacy

More common words you use much more than anyone else:
gta4, paulie, 366, pinback, winamp, worf, zune, phair, shingles, podcasting, eglinton, hodgman, slumber, katamari, dv, snipe, sheryl, shithead, andreas, breakup
posted by yellowbinder at 10:33 AM on July 26, 2012


Benito are you still clouding? Can you do me? Thanks.
posted by Wretch729 at 10:37 AM on July 26, 2012


Words you use heavily that (practically) no one else does:
migs, jonson, hama7, amberglow, rushmc, y2karl's, scarabic, bugbread, pushkin, quonsar's, da-da, merriam-webster's, userpage, troutfishing, ikkyu2, stuffthanks, zaelic, miguel's, vidiot, plep

More common words you use much more than anyone else:
matteo, lh, miguel, eb, quonsar, dios, konolia, y2karl, bolsheviks, stav, merriam-webster, webster's, languagehat, languagehat's, davy, astoria, stavros, matt's, indo-european, georgian


Sad to see all those historic MeFites nobody talks about any more, but what really gets me is "stuffthanks." I searched my activity and got "Sorry, no matches for stuffthanks by languagehat." Weird!
posted by languagehat at 10:45 AM on July 26, 2012 [1 favorite]


Might be a deleted comment, I think cortex said that the report pulls deleted comments, but as far as I know those won't show up in a site search.
posted by shakespeherian at 10:46 AM on July 26, 2012


What my identifying words say about me…

I’m British.
Favouriting (not favoriting)
Postcode (not zip code)
Woolly (not wooly)
Midlands (places I have lived)
Chester (places I have lived part two: electric boogaloo)

I like to obfuscate my words sometimes*
Gur (ROT13 for ‘the’)
Purrfr (ROT13 for ‘cheese’)
6f (I had to Google this one, but it was from an exceptionally silly alphabet thread where we all started posting in Hex and Octal and stuff)

I like to talk about working in theatre
ANLO (this was a discount scheme for young UK theatregoers)
IRCs (I ranted at length about how annoying these were when I managed a slush pile. It stands for International Reply Coupons. If you are mailing scripts internationally don't use these!)
SASEs (different acronym from the same rant. Self-Addressed Stamped Envelopes.)
CRB (another acronym, this time referring to the Criminal Records Background check. Probably delivered as part of a longer rant about restrictions of child performers and the daft UK child licensing laws)
Punchdrunk (I <3 Punchdrunk Theatre Company)
Playwrights and Playhouse (self explanatory)

I like to comment in threads about social equality
People-first (I’ve had a few conversations on MeFi about the differences between politically correct terminology in the US and UK. E.g. Whether you should say “People with disabilities” or “Disabled people”)
MySociety (MySociety is a collection of UK websites mostly having to do with open democracy and social inclusion)
Transpeople (I suspect all the other people in the threads about trans issues are hyphenating this)
Alpha-male (I suspect none of the other people in the threads about gender issues are hyphenating this)
Remploy (Remploy was set up to provide employment to those disabled people who would otherwise have been unable to find work and is now being shut down in a Tory boondoggle. Grr.)

I recommend a lot of books
Radley (as in Boo Radley from To Kill A Mockingbird)
Matilda (as in Matilda Wormwood from Matilda)
Dahl (as in Roald Dahl, the author of Matilda)
Saki (the pen name of H.H Munro, the author of many wickedly dark and funny Edwardian short stories)
Fanthorpe (as in the poet U A Fanthorpe)
Maclary (as in Hairy Maclary From Donaldson’s Dairy)

I spend a lot of time on AskMe
Snooze (an AskMe about people who abuse the snooze button was what made me pony up the $5 for an account here!)
BDoF (Acronym for one of the participants in a Human Relations AskMe. Apparently I was the only person too lazy to type out their pseudonym in full.)
Goldblum and NotGoldblum (from an AskMe I posted about a half-remembered movie. Neither character was actually played by Jeff Goldblum.)
Braided and braids (from AskMes about shaving one’s head for charity)
Yolk (from any AskMe about eggs. The yolk is the best part, people!)
Bartending and blackcurrant (from an AskMe about the syrups which European bartenders sometimes add to beer)

I... don’t believe these accurately reflect me, actually
Vicky (I've only used this in my recent comment about Facebook advertising. Which, yes. That comment got a lot of attention, but word frequency tables don’t take that into account.)
25mb (I’ve only used this in one comment ever. Admittedly I used it multiple times in that comment...)
Booklets (Do I use this a lot? I don’t feel like I do.)
Potatos (C’mon. I spelled this wrong in one comment! One!*


Thanks benito.strauss!



* I’ve actually tried to use ROT13 less since Jessamyn volunteered the information that sometimes people who don’t know what it is freak out upon seeing it and try to report to the mods that the commenter has possibly had a stroke in the middle of commenting or something.

** Well, two now...
posted by the latin mouse at 10:52 AM on July 26, 2012 [1 favorite]


languagehat: I think I've seen a couple places where words get fused, like in "stuffthanks". It's either the person typing forgetting the space, or maybe the tokenizing gets confused if you have something other than 0x20 for the space.


latin mouse: I was going to ask why you had no latin words or cheese in your list. I just thought 'purrfr' was a cat sound. As for 'potatos', and it getting in your list even though you only used it once. This probably wont' make you feel any better, but the reason it's in there is because almost no-one else used it. [In 2010, potatoes: 334 uses, potatos: 15 uses].
posted by benito.strauss at 11:05 AM on July 26, 2012


lhat, I bet it's this. The way I clean up the text involves, among other things, filtering out named entities, and I'm guessing that unlike more common literal punctuations I'm just deleting those outright instead of replacing them with whitespace.
posted by cortex (staff) at 11:10 AM on July 26, 2012 [2 favorites]


I bet things like "stuff—thanks" get processed as "stuffthanks" because the tokenizer doesn't recognize the em dash as a word.
posted by grouse at 11:10 AM on July 26, 2012


executedquot

Probably "executed&quot;".
posted by grouse at 11:11 AM on July 26, 2012


I’ve actually tried to use ROT13 less since Jessamyn volunteered the information that sometimes people who don’t know what it is freak out upon seeing it and try to report to the mods that the commenter has possibly had a stroke in the middle of commenting or something.

If for no other reason (and there are plenty of other awesome reasons), this bit of previously-unknown-to-me information has made this thread totally awesomesauce. :)
posted by antifuse at 11:20 AM on July 26, 2012 [1 favorite]


As an aside: I am astounded that I only ever said "awesomesauce" once in a comment here on mefi, well now 3 times I suppose after the last two comments, because I say it fairly frequently IRL.
posted by antifuse at 11:22 AM on July 26, 2012 [1 favorite]


What I find really interesting is the long list of single-use words, and in particular my tendency to use onomatopoeia and also to smash things together with (or without) hyphens in a hand-flailing attempt (see?) to get my dizzy ideas across. Looking at this, it's a wonder that I ever make sense to anybody about anything.

Things like: agghhooggah, arsenic-breathed, ba-dom-ba-stabby, collage-y, creepometer, demographic-gatherers, doppelgangerous, extra-victorian, fireman-turned-hairdresser, future-evil-ikea, giganstrous, hyeagh, hyper-little-monkey, icky-skeevy-axe-murderer, internet-machine-box, jellyfished (which I am sure I used as a verb or adjective), jesii, kersquillion, lazyscaping, magizmo, monkeyvomit-green, non-washcloth-user (whut), people-shaped, plllbbbttt, punching-in-the-head-as-a-way-to-say-hi (this must be about my dog), scribble-man-with-a-hat, shits'n'giggles, shiv-wielding, skin-going-wub-wub-wub, string-related, un-bugger, wheee-dwooo-bleeping, whiz-bang-zoom, whomper-stomper, zoom-smash-bang

I have come to the depressing conclusion that I am a discarded Joss Whedon character. Perhaps from a failed collaboration with Carl Hiaasen.

I really talk like this. Please hope me. With a brick.
posted by cmyk at 12:12 PM on July 26, 2012 [8 favorites]


Give me time.
posted by cmyk at 12:47 PM on July 26, 2012 [1 favorite]


Me! Do me! I love stats.
posted by Zarkonnen at 12:53 PM on July 26, 2012


Thank you, benito.strauss!


Words you use heavily that (practically) no one else does:
poi

More common words you use much more than anyone else:
judgemental, sunflower, cloak, cambridge, presumption, geeky, gaps, op, easter, pill


Judgemental Sunflower Cloak is the name of my supergroup album with Björk and Yoko Kanno.

The rest of you are neither eating nor spinning anywhere near enough poi!
posted by daisyk at 12:55 PM on July 26, 2012


I am not a number, I am a man! A man made out of numbers.

*cough*
posted by juv3nal at 12:58 PM on July 26, 2012


benito.strauss I believe it's a go. Thank you.
posted by schyler523 at 1:03 PM on July 26, 2012


Please cloud me.
posted by fake at 1:51 PM on July 26, 2012


Benito.strauss, what you are doing is very cool and I would like to subscribe to your Perl script.
posted by restless_nomad (staff) at 1:52 PM on July 26, 2012 [4 favorites]


Cloud me up, sir!
posted by aubilenon at 2:14 PM on July 26, 2012


Is it too late?! I hope it's not too late.
posted by cooker girl at 2:35 PM on July 26, 2012


*cough*

I am a copybot simulacrum of a man, made out of numbers. sorry.
posted by Devils Rancher at 2:51 PM on July 26, 2012


Please cloud me, benito.strauss ?
posted by cybercoitus interruptus at 3:07 PM on July 26, 2012


benito, please to cloud me as well?
posted by en forme de poire at 3:19 PM on July 26, 2012


cortex, if this is still on, put me in -- and benito.strauss, too!
posted by escabeche at 4:22 PM on July 26, 2012


I'd love to be clouded as well. Thanks!
posted by bswinburn at 5:07 PM on July 26, 2012


23303 Please hope me Cortex!
posted by BrotherCaine at 5:35 PM on July 26, 2012


Words you use heavily that (practically) no one else does:
vibrams, prednisone, fitocracy, rheumatoid, dairy-free, kushiel, grain-free, restless_nomad, ativan, pullups, carmax, pushups, lunesta, butches, bodybuilding, barefoot-style, carey's, good-quality, low-stress, fivefingers

More common words you use much more than anyone else:
gluten-free, paleo, dojo, crossfit, gluten, uo, deadlifts, cpap, austin's, goodreads, siamese, cockburn, bioware, tanya, callouts, robb, ultima, recruiter, twitches, workouts

This is pretty hilariously accurate for the most part. (There are some mod-specific quirks - for example, "lunesta" and "ativan" only show up because I posted an anonymous asker's responses that included them.) And "cockburn" is a proper name, thank you very much. But otherwise... yeah, that's pretty much the short list of my life and interests.
posted by restless_nomad (staff) at 7:23 PM on July 26, 2012 [2 favorites]


Benito, please cloud me?
posted by Night_owl at 7:28 PM on July 26, 2012


Report for user 19628

Words you use heavily that (practically) no one else does:
dreads, sonique, slinky, moir, hifana, agps, breezebrowser, magnetometer, muir, sonique2, xargs, foobar2k, canon's, parsec, mk2, winamp3, 1mm, kenken, lymph, arg-file

More common words you use much more than anyone else:
inkjet, cmd, sudoku, winamp, ddr, psd, lycos, listerine, impedance, dof, toenail, mv, equator, interpolation, ammonium, supertaster, autofocus, wart, megapixels, kilowatt


Ut oh. I might be some kind of nerd.
posted by aubilenon at 7:32 PM on July 26, 2012


The only fun fuck in my comments is quelquefuck, which I can't take credit for. A friend of mine made it up, and it stuck.
posted by Night_owl at 7:32 PM on July 26, 2012 [2 favorites]


quelquefuck

Fantastic. Please tell your friend that a random person on the internet is going to adopt it as well.
posted by rtha at 7:34 PM on July 26, 2012 [3 favorites]


Yo, benito! Can you cloud me?
posted by owtytrof at 7:39 PM on July 26, 2012


Words you use heavily that (practically) no one else does:
anonyme, shoudl, abotu, thigns, deleteworthy, wiht, spendy, holidaytime, restless_nomad, non-answers, sitewide, untagged

[please note, I have the ability to correct my own typos]

Words you use much more than anyone else:
peopel, chatfilter, deletions, newsfilter, flagging, irritable, mathowie, landlady, konolia, googleable, pb, chitchat, timeout, chatty, callouts

No one else has a landlady?
posted by jessamyn (staff) at 7:40 PM on July 26, 2012 [2 favorites]


So... nobody talks about me but me and Jessamyn? I can't decide if that's good or bad.
posted by restless_nomad (staff) at 7:45 PM on July 26, 2012


I am not surprised that 'spendy' is on Jess' list, but a little surprised that 'fighty' isn't.
posted by box at 8:08 PM on July 26, 2012


Damn, that was quick!

Report for user 64035

Words you use heavily that (practically) no one else does:
trids, instr, x360, lewisville, julee, dma, 2nding, polydor, altima, nasher, rubydoom, w580i, now-wife, dvi, antennaweb, grapher, gantz, preordered, kimbell, slyme

More common words you use much more than anyone else:
vg, ensign, doodad, elektra, cruise's, eq, vga, love's, tetsuo, recieved, chihuahua, thyme, 20000, percieved, mazda, rca, eraserhead, ddr, aftertaste, antennas

Apparently I speak a lot of gobbledegook. At least the "common" list has a sort of poetry to it.
posted by owtytrof at 8:12 PM on July 26, 2012


RN: Zarq talks about you. I don't know what it means, except that your username is long and abbreviates to a common acronym.
posted by gingerest at 8:14 PM on July 26, 2012


benito.strauss wrote (thanks benito!):
Words you use heavily that (practically) no one else does:
h0, vst, 1e-6, h5n1, scipy, bayes, eigenfactor, foldit, gwas, clothianidin, burford, metabolomics, daw, summonses, proteome, xiu, cubase, ndseg, pull-ups, vergennes

More common words you use much more than anyone else:
plos, voxels, body-person, leptin, msc, p1, loci, epigenetics, lm, p-value, mitochondria, heritability, csv, h1, bioinformatics, knytt, earplugs, krebs, tiling, ianad
I don't know whether to be happy or anguished that all of this is from when I'm allegedly taking a break from work. Also I guess nobody else hyphenates pull-ups.
posted by en forme de poire at 8:15 PM on July 26, 2012


I think the rest of us has been so thoroughly infected with "fighty" that it's not an unusual word here anymore.
posted by restless_nomad (staff) at 8:16 PM on July 26, 2012


I, too, would like to participate in the encloudening.
posted by barnacles at 8:17 PM on July 26, 2012


Report for user 17573

Words you use heavily that (practically) no one else does:
bomberman, iriver, rockbox, zuh, in-ear, mdr-v6, ohhla, laswell, sennheiser, afrobeat, brotzmann, cowon, kuti, dipset, sonys, fela, emcee, tecmo, carhartt, over-the-ear

More common words you use much more than anyone else:
winehouse, chatfilter, rakim, emcees, leatherman, gameboy, 4wd, wu-tang, shmups, cdex, earphones, instrumentals, win2k, headphone, secondhand, shure, n64, gamecube, aftermarket, trackball


I like portable audio, hip-hop, video games and complaining about chatfilter. Trying to work on that last one. No idea what Winehouse is doing on there, but I know what 'zuh' is.
posted by box at 8:22 PM on July 26, 2012


Words you use heavily that (practically) no one else does:
cranny, crawfish, croce, abita, oat, halfling, tchoupitoulas, squirtle, amarula, candied, candide, bananagrams

More common words you use much more than anyone else:
po-boy, fielder, beignets, marquez, oreos, centipedes, zucchini, ender's, jasmine, fast-paced, pantyhose, aah, uptown, chandler, caramel, rogues, slacks, peppermint, nthing, pus

Ok, so I like to eat. I live in New Orleans. I play D&D. But: chandler, slacks, pus, jasmine, and fielder? Weird. I don't remember talking about Jim Croce on this site before. I don't like Candide, so I'm surprised I've said it more than once. And cranny? When have I ever said cranny?

This is so much fun. Thanks, cortex and benito, for your hard work.
posted by Night_owl at 8:26 PM on July 26, 2012


Report for user 32016

Words you use heavily that (practically) no one else does:
polynesia, kava, palau, pago, aral, elemenstor, nisp, mni, yoyo, micronesia, tepe, construx, elbot, darya, threadshitters, ava, yoda's, startropics, babeldaob, robcorr

More common words you use much more than anyone else:
samoa, polynesian, anti-missile, barnacles, cbd, canberra, archaeology, yee, fitzpatrick, upending, berger, scrivener, schell, samoan, divabat, polynesians, fiji, fieldwork, schaefer, sfo


Among many things that make a lot of sense to me as to why they're in there, I rather like that "anti-missile" slipped in. I wasn't aware that anti-missile was a passion of mine, but I'm going to adjust my expectations accordingly! Maybe that's related to "threadshitters", who knows ...
posted by barnacles at 8:27 PM on July 26, 2012


OK on balance I have decided to be happy about it because the alternative is too awful to contemplate. So, FUCK YEAH BIOINFORMATICS STAR, IM IN UR BASE ALIGNING UR PARALOGS AND REJECTING THE SHIT OUT OF UR H0
posted by en forme de poire at 8:30 PM on July 26, 2012 [3 favorites]


Seriously though, benito, these are both brilliant and terrifying.
posted by en forme de poire at 8:31 PM on July 26, 2012 [1 favorite]


Is it too late to get my data?
posted by hydrobatidae at 8:45 PM on July 26, 2012


My set is an interesting mix of things and people I am a big fan of, and those that I am not. (Thankfully, mostly the former.) I apparently will not shut up about places I have lived, places I have eaten, or things to wear. I am surprised that "introvert" and "introversion" aren't more popular words around these parts.
Words you use heavily that (practically) no one else does:
hsv-1, wrens, untucked, cross-stitch, rothko, hulk-marg, zennie, doughty, fingerless, giffords, goldfrapp, pro-mubarak, caregiving, gotye, suffragettes, bellydance, tahrir, mckinley's, firewater's, bab, meadowlands, zoomorphic's, snuggly, lindt, tomboys, gotye's, coulson's

Words you use much more than anyone else:
brewer's, lira, roiphe's, button-down, kristof, baltimore's, outbreaks, slats, macedonia, temping, dupont, granville, isabel, dap, headdesk, earring, racialized, necklaces, hard-headed, essentialism, timestamp, introversion, shawl, moulin, snug, coed, winehouse, lanterns, fabrics, mussels, mercifully, hampden, menswear, shortbread, faved, fells, vases, grieving, stitches, mckinley, knits, hairstyle, aj, outbreak, alexandria, patter, go-go, awwwww, balkans, introvert, vernon, layering, seams, earrings, absinthe, flirting, stubble, stitch, bookmarking, decorating, seam
Thanks again, benito.strauss!
posted by EvaDestruction at 8:51 PM on July 26, 2012


Me next, me next please!
posted by not_on_display at 8:52 PM on July 26, 2012


The one unexpected thing I've noticed is that the distinctive words are for the most part nouns (pace grotendous). People, let's get some spicier adjectives in here!
posted by benito.strauss at 9:03 PM on July 26, 2012 [4 favorites]


Words you use heavily that (practically) no one else does:
f2p, bigname, runescape, xylitol, radeon, saydur, sooners, micro-transaction, antihistamine, telemarketer, dextromethorphan, pawned, enso, phlegm, money-for-power, swtor, decongestant, step-mom, spenders, santorum's

More common words you use much more than anyone else:
amex, geostationary, debeers, nyquil, cheetah, nissan, usain, pseudoephedrine, motorbike, mmos, painkillers, upsides, oxycontin, clamor, p2p, pawn, telemarketers, bitters, shareware, palpitations

So my fascination with the MMO industry shows, particularly given that I use the more general MMO as opposed to making the mistake of referring to the entire genre as MMORPGs. Somewhere along the line I spent time talking about myself in the third person. Virtually nobody else talks about me. I have a terrible habit of hyphenating anything I feel like because-I-can.

I have used cheetah seven times prior to this post, clamor four times, and palpitations four times. Considering the userbase, I am surprised these words are not used more often. The fact that I can manage to refer to any form of cat more often than most people within an Internet community astounds me. I use the AMEX abbreviation 19 times, something that doesn't surprise me given that most people just say American Express in full. Not many people care about xylitol. Sifting through my single instances of a word leads to some highly amusing results such as how I talk about Spongebob as often as I do about guacamole, urination, and the Cheetahmen, a storied and truly terrible video game not to be confused with actual cheetahs. I fear I will not be the last person on the Internet to refer to all four of the previously mentioned words in the same sentence. I fear more than I was not the first. I will not indulge myself in such research.

I use an average of 131.37 words per comment, yet I only comment 0.3 times/day. The former statistic may induce gratitude for the latter.

Thanks cortex and benito.strauss for the data. I will endeavor to use ostentatious and/or capsaicin-laden words in anticipatory preparation for next time period's geekout.
posted by Saydur at 10:26 PM on July 26, 2012


benito, I am trying with my Whedon Words. Unfortunately, they are all void-shouty and lost. Like tears in the rain. Or a leaf on the wind.

Urk.
posted by cmyk at 11:54 PM on July 26, 2012


Words you use heavily that (practically) no one else does:
esters [21 times], lysine [15], harding's, arginine [10], zygotes [9], 2nding, tierney, slutwalk [8], lepine, roughy [6], lgbts, intermarry [6], tcoyf, racialicious, antiracist [6], valancy, lysine-arginine, afr-am, bonito [5], scythe [5]

More common words you use much more than anyone else:
staph [11], konolia, deep-sea, mucous [7], harding, billings, ime, aureus, chiding [5], xxxx, cheep, tiller's, lobstermitten, staphylococcus, outbursts [7], babcock, assertiveness [5], whiteness [36], post-racial [6], mrsa [11]

Esters and [orange] roughy, whiteness, and intermarry are high because they're from academic papers that I quoted long blockquotes from. One long blockquote-comment each. Other than that, yep, I pipe up regularly on feminist and race issues. And some personal medical history.

I am surprised that, apparently, few other people use "outbursts". And "assertiveness"!

Thanks, cortex and benito.strauss. Reading about everybody's results is fascinating.
posted by cybercoitus interruptus at 12:18 AM on July 27, 2012


Don't worry, cmyk. You've always got "cantankerous".
posted by benito.strauss at 12:22 AM on July 27, 2012 [1 favorite]


Actually, two of the five instances of "assertiveness" in my comments were just me quoting other Mefites.
posted by cybercoitus interruptus at 12:25 AM on July 27, 2012


I'm going to try to figure how to do this on my own later, but for now, do both of you mind quantifying and clouding me up?
posted by iamkimiam at 2:05 AM on July 27, 2012


I would be ever so grateful for an encloudening, please, benito.strauss. Thanks!
posted by Homeboy Trouble at 8:10 AM on July 27, 2012


I think there's a few peopel (all the cool kids are spelling it that way now) requesting encloudment that hadn't been cortexified yet when I ran the script. It's easiest for me if you re-post in here re-requesting entwolkenung once your cortex file is in place. Thanks.

/I'm working on making my next year's file more interesting.
posted by benito.strauss at 8:33 AM on July 27, 2012 [8 favorites]


I kinda missed the middle of this thread, so I'm not quite sure what I'm getting myself into by asking this, but whatever it is you're doing with data for others benito.strauss, would you mind running my data through it too? Thanks!
posted by carsonb at 1:08 PM on July 27, 2012


benito.strauss, if you're still doing this I'd love to see the brundlecloud.
posted by brundlefly at 1:21 PM on July 27, 2012


Words you use heavily that (practically) no one else does:

clawhammer, lodges, kiwanis, scruggs, chemex, brayer, medeski, filemaker, shriners, 3-finger, laminate, sufferbus, frailing, open-back, gabrels, north-central, brubeck, docent, mandolins, new_path

More common words you use much more than anyone else:

freemasonry, mandolin, elks, old-time, usonian, fraternal, masonic, banjos, masons, banjo, lodge, cryin, freemason, grange, freemasons, resonator, stringed, cms, drupal, minstrel

Banjos, Chemex coffee makers and fraternal organizations, I guess that does just about cover the bulk of my (hopefully) useful contributions to Ask MeFi.
posted by usonian at 2:34 PM on July 27, 2012


I wouldn't dream of posting my benito-words. I'm as shallow as spit.
posted by thinkpiece at 2:59 PM on July 27, 2012


I'm proud to say that "parasauralophus" is one of my words.
posted by brundlefly at 3:23 PM on July 27, 2012 [2 favorites]


Me too please! I would like to be numerified.

Thank you!
posted by kristi at 5:30 PM on July 27, 2012


Where have I been? Me too me too.
posted by desuetude at 10:17 PM on July 27, 2012


NOTICE: It looks like the requests have pretty much ended. I'm going to stop checking this thread, but I'll keep the data and scripts on my computer. If you want me to generate your cloud, just get cortex to create your file, and then send me a memail. It's very easy to do, so don't hesitate to ask.

It's been a lot of fun, and really interesting to see different people's clouds. And if you didn't like your words, remember, they're not the things that are most important to you, they're just the things that most distinguish you from other MeFites. If you made just one comment about snickerdoodles, and you used the word "snickerdoodles" five times in that comment, it's going to show up in your list. Hey, at least you don't have the problem McGregor has.
posted by benito.strauss at 9:41 PM on July 29, 2012 [3 favorites]


Oh, yeah, two last things.

Firstly, since I had the script, I took cortex's idea and compared MeFi as a whole against that COCA corpus of contemporary American words. Here's is MeFi's word cloud, along with some more detailed stats. (I hope it's not too long.) The "rel_PPM" is just the PPMMeFi / PPMCOCA. So, for example, we are more than 500 times more likely to use "douchebags" in our writing than in the publications in the corpus.


Report for user MetaFilter

Words MetaFilter uses heavily that (practically) no other contemporary American does:
metafilter, mefi, fpp, 3d, delmoi, mefites, upthread, fwiw, 3rd, 20th, 4chan, mp3, didnt, koeselitz, askme, 2nd, artw, iirc, doesnt, metatalk, zarq, html5, ironmouth, 19th, minecraft, 4th, teabaggers, 21st, 1st, amirite, grar, mefite, hes, grumblebee, blazecock, hippybear, malor, co2

Commonly used words that Metafilter uses much more than any other contemporary American:
nsfw, wanna, gonna, dunno, gizmodo, dont, gotta, wtf, imho, btw, fanfic, beese, reddit, gpl, adipocere, cannot, rekers, full-on, assange, adblock, douchebags, ggw, dadt, osx, jessamyn, meh, c'mon, paywall, repubs, stfu, bsg, gompa, snark, astroturfing, ghibli, banksy, autotune, aphex, technica, webkit, threeway, steampunk, mashups, nutjobs, emergents, huffpo, hahaha, spot-on, aint, pinboard
MetaFilter words not found in COCA corpus: 3624
count	PPM	word
  8180	217.257392	metafilter
  3618	96.092573	mefi
  3247	86.238967	fpp
  1431	38.006764	3d
  1098	29.162423	delmoi
  1006	26.718941	mefites
   857	22.761563	upthread
   851	22.602205	fwiw
   768	20.397760	3rd
   767	20.371200	20th
   745	19.786890	4chan
   736	19.547853	mp3
   720	19.122900	didnt
   710	18.857304	koeselitz
   697	18.512030	askme
   676	17.954278	2nd
   655	17.396527	artw
   655	17.396527	iirc
   596	15.829512	doesnt
   579	15.377999	metatalk
   576	15.298320	zarq
   555	14.740569	html5
   552	14.660890	ironmouth
   546	14.501533	19th
   541	14.368735	minecraft
   513	13.625066	4th
   497	13.200113	teabaggers
   488	12.961077	21st
   460	12.217408	1st
   447	11.872134	amirite
   447	11.872134	grar
   445	11.819015	mefite
   440	11.686217	hes
   431	11.447180	grumblebee
   418	11.101906	blazecock
   407	10.809750	hippybear
   400	10.623833	malor
   383	10.172320	co2

MetaFilter words found in COCA corpus

count	rel_PPM	word
   764	2841.961760	nsfw
   886	1977.471389	wanna
  3704	1797.171759	gonna
  1527	1549.147640	dunno
   628	1401.638862	gizmodo
  2180	1216.390159	dont
  1606	1120.138985	gotta
  1723	1068.216305	wtf
   558	1037.837547	imho
  1758	934.213121	btw
   320	892.763794	fanfic
   524	835.371460	beese
   719	729.428391	reddit
   320	714.210885	gpl
   176	654.692761	adipocere
  5654	637.333889	cannot
   164	610.054619	rekers
   162	602.614928	full-on
  2363	573.260864	assange
   199	555.187485	adblock
   245	546.817709	douchebags
   144	535.657714	ggw
   796	522.529236	dadt
   230	513.339074	osx
   307	489.425645	jessamyn
  1087	449.274921	meh
   563	448.773027	c'mon
   150	418.483029	paywall
   110	409.182976	repubs
   140	390.584160	stfu
   170	379.424533	bsg
   101	375.704369	gompa
  1286	350.028579	snark
    94	349.665452	astroturfing
    87	323.626535	ghibli
   173	321.766838	banksy
   141	314.699171	autotune
    84	312.467000	aphex
    81	301.307464	technica
    78	290.147928	webkit
   102	284.568459	threeway
   381	283.452366	steampunk
   125	278.988627	mashups
    99	276.198799	nutjobs
    99	276.198799	emergents
   122	272.292900	huffpo
   146	271.548892	hahaha
   144	267.829044	spot-on
   116	258.901446	aint
    68	252.949476	pinboard
posted by benito.strauss at 9:51 PM on July 29, 2012 [12 favorites]


Secondly, Greg Nog wins.

I know it's not a competition, and all of the clouds were pretty interesting. What I liked most about generating them was discovering some biologists and physicist we've got who I didn't know about before. I'll be following them more closely, hoping to catch some cool explanations that I might have otherwise missed.

And sure, Greg benefited from getting his cloud before there was the 20 word limit. But for sheer recipes + Star Trek + Ruffalo + I-have-no-idea-what-that-word-even-relates-to, Greg Nog wins. Behold!
Words Greg Nog uses heavily that (practically) no one else does:
tbsp, jonson, tempeh, caliban, hodor, teaspoons, pbo, yiayia, data's, clynelish, peaty, p90x, i'm'a, gangbang, seamen, tomine, miracleman, soyjoy, ortolan, ladyfriend, j'ai, severio, casp, manch, dalwhinnie, gools, ardbeg, votre, yiayia's, psmith, furrows, cela, turisas, cufflinks, pc5, xiu, increasemathr, curries, druthers, chalupa, zander, distilleries, retsina, jerremy, absurdism, worf's, plot-point, hampshirite, autobio, bum-bum, hampshirites, finntroll, sakura, 10217, jaundice, caramelized, cookoff, goodwifeazz, risers, alginate, cool-ass, ruffalo, bacon-grease, repping, fever-dream, marantz, balls-to-the-wall, str8, protomen, holmgard, miso, lopez-alt's, medeival, kenji, golden-brown, sophocles, picard's, conrad's, smuttynose, cragganmore, courtier's, smirks, hullaballoo, greek-style, whiskies, weakerthans, laforge, jamais, terriers, burkert, silly-bandz, politically-righteous, hippocampus, mtrhead, coetzee, kara, bacon-fat, sautee, brooklyn's, skycap, high-protein, uus, thumbtack, mix-cd, hansel, litterbox, ma-ti, uef, darmok, rx-ft500, bovril, kiefer, harpold, werckmeister, chartreuse, 15-oz, dawdle, ouzo, lamora, ellrod, morlock, cloon, tamburlaine, cette, surface-level, french-cuff, winstons, lustrous

Words Greg Nog uses much more than anyone else:
tsp, cloves, confit, vous, coon, gurgeh, worf, vo, ds9, riker's, apfelwein, nh, romulan, ferengi, cheesecloth, riker, uu, bite-sized, dhalgren, tussles, sandpaper, centaur, breading, comme, acetone, romulans, darkseid, tartan, greenpoint, troi, starfleet, artie, deep-fry, geordi, parakeet, upenn, oregano, edamame, saucepan, cayenne, bitchin, peachy, marinade, hermes, woolf, frist, parsley, tablespoon, spaniard, manhunter, teaspoon, toffee, simmer, galbraith, nous, mead, liqueur, laphroaig, splotches, jul
posted by benito.strauss at 10:04 PM on July 29, 2012 [2 favorites]


Damnit you guys doing fun things while I'm asleep! Add me to your list, cortex.
posted by taz (staff) at 10:40 PM on July 29, 2012


I would like to be on said list as well, please...
posted by Kimothy at 10:48 PM on July 29, 2012


Please (and thank you).
posted by willF at 11:21 PM on July 29, 2012


I'll get back in and do another run for the stragglers tomorrow morning; the weekend has been slackworthy.
posted by cortex (staff) at 12:01 AM on July 30, 2012


If you haven't run the last batch yet, I'd be interested in seeing mine.
posted by nobody at 6:07 AM on July 30, 2012


I would love to see mine as well.
posted by Tin Man at 7:34 AM on July 30, 2012


Can I put my name down as well?
posted by Phire at 9:24 AM on July 30, 2012


I'd also like to jump on this bandwagon.
posted by zombieflanders at 9:59 AM on July 30, 2012


benito, I'd be curious to see a word-avoidance cloud: are there words that are common in the Metafilter corpus, or in the COCA, that I avoid? That might take some careful thought to put in a statistically interesting way ... I'm not interested in rare words that I avoid. But it'd be amusing to learn that, say, everyone spends their time talking about pink elephants and I've never noticed.

don't think about it don't think about it don't think about it argh
posted by fantabulous timewaster at 3:31 PM on July 30, 2012 [1 favorite]


Me, too.
posted by MythMaker at 6:39 PM on July 30, 2012


Me!
posted by schmod at 12:34 PM on July 31, 2012


Straggler edition! Here's everybody new starting from Zarkonnen. Tables here.
user 24905:	10665 words,	2795 unique, in	200 comments.
user 57855:	154822 words,	11566 unique, in	1640 comments.
user 21049:	186817 words,	15799 unique, in	2554 comments.
user 23303:	274589 words,	23278 unique, in	5767 comments.
user 60594:	40778 words,	6057 unique, in	376 comments.
user 61170:	208565 words,	18982 unique, in	6011 comments.
user 48758:	439273 words,	24158 unique, in	5569 comments.
user 66440:	94029 words,	9512 unique, in	687 comments.
user 21365:	712839 words,	33865 unique, in	8621 comments.
user 14421:	659953 words,	32992 unique, in	7463 comments.
user 69120:	23075 words,	4022 unique, in	317 comments.
user 15382:	2356 words,	967 unique, in	78 comments.
user 19227:	64513 words,	8916 unique, in	956 comments.
user 3398:	42681 words,	5853 unique, in	635 comments.
user 40688:	167988 words,	14448 unique, in	2388 comments.
user 100776:	121746 words,	11891 unique, in	1233 comments.
user 33546:	81185 words,	9075 unique, in	1095 comments.
user 49429:	416879 words,	26283 unique, in	4606 comments.
posted by cortex (staff) at 3:00 PM on July 31, 2012 [4 favorites]


My cloud:

Words you use heavily that (practically) no one else does:
wis, vam, 1-x, barthelme [10], riemann, poincare, gab, waukesha, gottman, cosma, valency, not-p, baldy, landsburg's, set-free, cochon, jap, zopilote, churrascaria, cognitive-behavioral

More common words you use much more than anyone else:
orioles [35], chowhound, uw, sqrt, madcaptenor, teardrop, bangles, japs, dx, medalist, mika, richman, genoways, wi, zeta, forester, cloves, nxivm, first-rate, lethem

So yeah: I live in Wisconsin, I do math, and I like restaurants, baseball, experimental fiction, and new wave more than most people do. Sounds about right.
posted by escabeche at 5:25 PM on July 31, 2012 [1 favorite]


My pink elephants are awesome. Quoth benito:
I got curious about that too (IvanF also asked about it as well). So I restricted your word list to those that were in the top 10% most frequently used on the site as a whole, then sorted those by relative frequency of use. Taking the top and bottom 20 for you yielded this:

Report for user 69606

Most and least common user words among top 10% site words
Ranked by PPM(user) / PPM(site)

count rel_PPM word
48 30.434429 quantum
30 23.776898 temperature
35 21.304100 atmosphere
59 19.746679 physics
23 18.850768 radiation
29 18.805365 mechanics
39 17.750888 electric
17 17.131963 density
25 16.421455 experiments
51 16.224626 experiment
29 16.086241 interaction
26 15.062725 stable
17 14.642980 vacuum
18 14.569673 transition
15 14.416308 estimate
18 14.266139 liquid
15 14.003572 chemistry
54 13.896216 moon
27 13.354219 ordinary
129 13.281601 energy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6 0.192650 wasn't
3 0.185049 she's
3 0.179684 society
7 0.177979 saying
3 0.175764 stupid
5 0.175556 shit
5 0.173290 american
3 0.169625 although
3 0.168809 huge
3 0.167263 city
3 0.161351 games
7 0.155067 man
8 0.154040 love
5 0.143153 music
3 0.132593 internet
3 0.127507 hell
3 0.113027 war
3 0.111355 men
3 0.094706 government
4 0.093592 yeah

Apparently, we are all saying "Hell yeah, I love the stupid shit american government.", and you are not. I suspect you may be Canadian.
True story: I am a professional Canadian.
posted by fantabulous timewaster at 6:06 PM on July 31, 2012 [1 favorite]


OH MY GOD I WANT THIS TOO
posted by escabeche at 6:26 PM on July 31, 2012


benito.strauss emailed my (two: EB and IF) versions of that new report to me, too, but I didn't post it here because I didn't want to encourage people to ask him to do more work. I won't speak for him, of course — if he wants to do so, that's cool. But he's been extremely generous with his time and effort in generating the standard report from cortex's data for so many of us.

This other report is more meaningful for me because especially with regard to my EB data, where I frequently quoted people and included attributions, 80% of my words were people's usernames. Which was interesting in that limited respect, but not what I was hoping for and not like many of the results posted above. That's why I didn't post mine. My IF results aren't quite so skewed in this manner, but are still not very interesting. But this new report is (to me):

Ivan Fyodorovich
Frequent
regard, numerous, sexism, inclined, partly, culturally, distinct, strongly, conventional, deeply, portion, bigotry, askme, nevertheless, feminism, meta, sexist, orientation, privileged, and arguably.
Rare
budget, travel, tax, island, afghanistan, bill, flash, morning, john, train, ipad, insurance, etc, team, and gonna.
Ethereal Bligh
Frequent
matt, meta, nevertheless, hypocrisy, askme, regard, mefi, belt, strongly, portion, bigotry, utility, greatly, wheels, assertion, inclined, evolutionary, posts, context, and judgment.
Rare
glenn, elected, monster, bp, 3d, competition, plant, cameras, clothes, dig, robot, budget, awesome, apple, bike, funding, tea, iphone, facebook, and obama.
In both cases, some of the rarely used words are more an artifact of the fact that the MetaFilter corpus spans its entire history but, because of increases in membership and posting, must be biased toward the present; while, in contrast, as EB my activity is limited to 2004-2007 and as IF to the last year. That, I think, explains most (but not all) of why afghanistan appears in the rare IF list and obama appears in the rare EB list. Possibly also glenn and facebook in the latter list, as well.
posted by Ivan Fyodorovich at 7:08 PM on July 31, 2012


The truth is, there's a great clustering (or even biclustering) problem here -- given the table of the wordlists, you could make a map of users and a map of words, where users are close to each other if their patterns of word usage are similar, and words are similar if they are popular (respectively unpopular) with similar sets of users.

Then you can go deeper: are users more likely to favorite other users who are nearby in word-usage-space....?
posted by escabeche at 7:15 PM on July 31, 2012 [2 favorites]


I learn from benito that I use the word "terrific" almost 9 times as frequently as does MeFi overall.

Things I have said were "terrific" on MeFi:

"A Pail of Air," by Fritz Lieber
Chinese punk band The Angry Jerks
Matthew Sweet
Mt. Rushmore
"The Running Man," by Stephen King
Pecorino Toscano with honey
the mathematical research of Vladimir Berkovich
DIG!, a documentary about the Dandy Warhols and the Brian Jonestown Massacre
vegetarian dinner at the Peachtree soul food restaurant in Kansas City
the Subway Museum in Brooklyn
applying to graduate school in a big math department
China Village in Albany, CA
Davis and Maclagan's expository paper about the card game Set
Lorrie Moore's short story "People Like That Are The Only People Here"
Glacier Bay Cruiselines
"Grace Kelly," by Mika

I stand by these judgments!
posted by escabeche at 7:29 PM on July 31, 2012 [3 favorites]


Wait wait i should have been more explicit, I want a pretty snowflake cloud too. (I feel like i say explicit more than most people, for example.)
posted by desuetude at 10:24 PM on July 31, 2012


God, this is a little embarrassing.
Words iamkimiam uses heavily that (practically) no one else does:
meta, marked, languages, favorites, anyways, speakers, ex, somebody, speaker, askme, boundaries, spelling, variation, suggestions, language, phrases, metaphor, styles, op, framing

More common words iamkimiam uses much more than anyone else:
anyone, gaga, corporations, iraq, cops, assange, laws, dr, flash, billion, afghanistan, drug, universe, citizens, war, ads, law, china, religious
I clearly like to talk about languages and relationships. I might be skewing my own research data here. Also, I might be more political than I realized. (To be honest, the second category doesn't sound like me at all...but hey, I'm full of surprises. Anyways...)
posted by iamkimiam at 2:43 AM on August 1, 2012


Nice try, iamkimiam. Mentioning politics hoping to divert our attention from "gaga". You talk about Lady Gaga ...... more than the average MeFite.
posted by benito.strauss at 12:44 PM on August 1, 2012 [2 favorites]


I would like this, too.
posted by acridrabbit at 9:08 PM on August 1, 2012


Please send me my words too!
posted by not_on_display at 9:24 PM on August 1, 2012


Yeah, with my whole two mentions of Lady Gaga. But I guess that's two times more than most everybody else, so perhaps I should DTMFGG and go get some therapy (it may be wise, considering how obsessed I am with her song 'Teeth').
posted by iamkimiam at 11:53 PM on August 2, 2012


user 40688: 127424 words, 12363 unique, in 1927 comments. (Jul 2006 to Jul 2011)
user 40688: 167988 words, 14448 unique, in 2388 comments. (Jul 2006 to Jul 2012)

Wow, 40,000 words in the last year alone, compared to about 25,000 words a year for the first 5. Though there were 461 comments in the last year, at about 88 words per comment, compared to 385 comments per year for 2006-2011, at about 66 words per comment. I knew I was getting more verbose.

(This comment has 103 words.)

benito.strauss, any chance of running (yet) another word cloud thingie for this latecomer?
posted by Phire at 1:25 PM on August 3, 2012


Oooh, fascinating! I can haz?
posted by mostlymartha at 1:34 PM on August 3, 2012


Hello, data wankers and late-birds.

It's become clear that the current word cloud suffers from McGregor's Dilemma, or the Regrettable Tattoo phenomenon. (If one drunken night you get a tattoo that you immediately regret, it'll be the first thing people notice about you, and a good way to pick you out of a crowd, but it actually says very little about you.)

So, since it's too hot to go outside, I thought I'd adapt the script to look just at words that are the most commonly used on the site, and pick a user's high-interest and low-interest words from there. It also computes the overlap between the user's and the site's top-ranked words. ('Top ranked' is the top 5000 words, the figure used by professional linguists — i.e. iamkimiam mentioned it.)

I'll send out these new reports to those who have asked for them, but they'll still have the old-style word cloud in the second half (along with more details about those words). I am using a more complete site corpus, so things might be a little different if you have a previous word cloud; this one should be more accurate.

If you're a hard-core data wanker, feel free to ask me to re-run your report. I doubt there are many left following this thread so there won't be too many to do.

I think the most fun we could have with this is if people try to make a sentence using as many words as possible from their low-interest word list.
posted by benito.strauss at 1:33 PM on August 4, 2012


Oh, for the curious I'll paste in my (full) report below so you can see what's in there. Please forgive the length.

My sentence made from my low-interest words, the sentence I am extremely unlikely to say, goes something like:

Be reasonable, baby. The awesome fire will destroy the expensive female oil game. The poor industry simply will not suspect us, nor will we have to go to court.

Report for user 126778

Most and least common user words among top 5000 site words

High interest words
boston, y, functions, stats, socialism, folk, russian, x, mefi, presidential, mccain, tendency, function, 2012, efficiency, clock, math, curious, sincerely, ring

Low interest words
games, simply, kill, oil, however, couple, several, industry, awesome, fire, suspect, nor, entirely, ten, poor, court, reasonable, baby, female, expensive

Percent overlap of user and site top ranked words: 59.6%

Details: ranked by PPM(user) / PPM(site)

user_count	rel_PPM	user_rank	site_rank	word
   53	16.251882	  254	 2937	boston
   54	12.785100	  253	 2377	y
   20	8.921159	  578	 3945	functions
   13	7.670308	  923	 4912	stats
   13	7.224360	  920	 4672	socialism
   17	6.326183	  680	 3400	folk
   22	6.092679	  536	 2724	russian
   80	6.021030	  172	  835	x
   58	5.790367	  230	 1106	mefi
   10	5.651072	 1149	 4733	presidential
   10	5.613145	 1138	 4711	mccain
   11	5.592672	 1079	 4380	tendency
   30	5.514453	  410	 1905	function
   10	5.318656	 1092	 4525	2012
    9	5.310214	 1221	 4906	efficiency
   11	5.219827	 1031	 4142	clock
   37	5.102270	  348	 1500	math
   39	5.080683	  333	 1419	curious
    9	4.993186	 1291	 4664	sincerely
   16	4.792028	  747	 2886	ring
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    3	0.269070	 3134	  999	expensive
    3	0.268494	 3152	  996	female
    3	0.260548	 2908	  964	baby
    3	0.260075	 3512	  963	reasonable
    4	0.258584	 2400	  714	court
    6	0.257804	 1846	  492	poor
    3	0.244669	 3670	  895	ten
    4	0.237813	 2456	  664	entirely
    3	0.236093	 3401	  870	nor
    3	0.232591	 3655	  859	suspect
    3	0.212094	 3159	  792	fire
    7	0.211756	 1487	  352	awesome
    3	0.209965	 3273	  782	industry
    4	0.202478	 2757	  574	several
    5	0.183896	 2008	  423	couple
    5	0.171666	 2083	  397	however
    3	0.170656	 3412	  635	oil
    3	0.167775	 3306	  620	kill
    4	0.134328	 2763	  391	simply
    3	0.118241	 3191	  450	games
-----------------------------------------------
[This is the old-style cloud. As the cloud is now generated against an almost
[complete site list, the items in the first section ('practically no one else')
[are probably pretty un-important.
[The second section ('more common words') is more meaningful.

[The cloud also consists of the words that are the most distinctive for the user,
[not necessarily what the user spends a lot of time on (McGregor's Dilemma).

Words you use heavily that (practically) no one else does:
benitostrauss, db8, crucanti, nomogram, chon, hameau, mapmaker, hapax, legomenon, surjective, physicsmatt, kristine, fraid, haglund, nomograms, cahokia, pide, pngview, pynomo, zaltzman

More common words you use much more than anyone else:
memri, keepon, 1-d, mansfield, wunderground, ubuweb, z2, meany, nomyte, re-creating, hilbert, brecht, knotted, benito, fenway, vassal, chick-fil-a, fela, haskell, baptism

Details:
Words you use heavily that (practically) no one else does:
count	PPM	word
     7	62.197343	benitostrauss
     6	53.312009	db8
     4	35.541339	crucanti
     4	35.541339	nomogram
     4	35.541339	chon
     4	35.541339	hameau
     4	35.541339	mapmaker
     4	35.541339	hapax
     4	35.541339	legomenon
     3	26.656004	surjective
     3	26.656004	physicsmatt
     3	26.656004	kristine
     3	26.656004	fraid
     3	26.656004	haglund
     3	26.656004	nomograms
     3	26.656004	cahokia
     3	26.656004	pide
     3	26.656004	pngview
     3	26.656004	pynomo
     3	26.656004	zaltzman

More common words you use much more than anyone else:
count	rel_PPM	word
    11	735.995598	memri
     5	557.572423	keepon
     6	401.452144	1-d
     3	334.543454	mansfield
     3	334.543454	wunderground
     3	250.907590	ubuweb
     3	250.907590	z2
     6	223.028969	meany
     4	223.028969	nomyte
     3	200.726072	re-creating
     3	200.726072	hilbert
     3	200.726072	brecht
     7	195.150348	knotted
     9	188.180693	benito
    12	167.271727	fenway
     4	167.271727	vassal
     4	167.271727	chick-fil-a
     3	167.271727	fela
     9	158.467952	haskell
    18	158.467952	baptism
posted by benito.strauss at 1:45 PM on August 4, 2012 [2 favorites]


phire, your memail is disabled. I can't send you your cloud.
posted by benito.strauss at 2:03 PM on August 4, 2012


Sorry! I've re-enabled it, but you can also send it to my email (in my profile). Sorry for the trouble, and thanks to restless_nomad for giving me a heads up about this.
posted by Phire at 2:40 PM on August 4, 2012


Stragglers two, data boogaloo:
user 43691:	96305 words,	9703 unique, in	739 comments.
user 105027:	65611 words,	9069 unique, in	929 comments.
user 1461:	30622 words,	5578 unique, in	511 comments.
benito, I'd love a report as well.
posted by cortex (staff) at 2:59 PM on August 4, 2012


I too would like a New Improved report please thank you!
posted by languagehat at 3:25 PM on August 4, 2012


And so would I! Please and thank you, benito.
posted by likeso at 3:29 PM on August 4, 2012


High interest words

metatalk, askme, matt, deleted, mods, threads, delete, posts, it'd, favorites, problematic, folks, mefi, flag, meta, removed, explicit, anonymous, grey, mefites


No surprises there, other than "it'd", which is apparently not a contraction that most folks use so much? Huh. Also, nice to see that my self-conscious use of grey-with-an-e is something I should in fact feel self-conscious about.

Low interest words

congress, democrats, israel, africa, republican, terrorists, schools, republicans, teachers, courts, poverty, iran, india, voters, wikileaks, climate, korea, leadership, assange, civilization


At least half of these are keywords for Thread On The Blue I Don't Want To Read, so that doesn't shock me. I'll be tempted to point at this again next time someone gets on my nuts about my blatant political biases, though; I don't even like discussing this shit most of the time. (Nothing personal, teachers and civilization and India. I just don't have a ton to say about you.)
posted by cortex (staff) at 3:42 PM on August 4, 2012 [1 favorite]


Oh, I would love a rerun too, that's an interesting split.

Also, nice to see that my self-conscious use of grey-with-an-e is something I should in fact feel self-conscious about.

"Grey" and "gray" are two different colors. Because 'e' is white and 'a' is black, "grey" is the lighter one. I use them in a way that must appear to be interchangeable to everyone else, but is very much not in my own weird synaesthetic brain.
posted by restless_nomad (staff) at 4:07 PM on August 4, 2012 [2 favorites]


Report me, benito.strauss!
posted by Kattullus at 4:14 PM on August 4, 2012


Oh, me too, please!
posted by rtha at 4:15 PM on August 4, 2012


Hmm*, I'm seeing a lot of site-specific words in the people getting second reports (like meta, metafilter, askme, etc.). I guess it make sense, that people interested in data on the site would be talking about the site. But my brain was saying "wait a minute, how can everybody use these words more than averge?"


* a word I use a lot more than others do, apparently
posted by benito.strauss at 4:32 PM on August 4, 2012


Most and least common user words among top 5000 site words

High interest words
op, metatalk, deleted, delete, meta, askme, mods, y'all, handy, ton, manager, lesbian, forums, stress, sleep, definitely, inclined, relationship, drama, conversations

Low interest words
government, apple, children, vote, society, court, america, states, okay, u, political, 000, truth, christian, schools, mr, youtube, dollars, economic, taxes

Percent overlap of user and site top ranked words: 67.3%

That looks like it matches my status as a mod better than the previous set - lots of meta-conversation. Also I stay the hell out of political threads. And SLYTs, apparently.
posted by restless_nomad (staff) at 4:37 PM on August 4, 2012

Most and least common user words among top 5000 site words

High interest words
metatalk, favorites, askme, poetry, literature, incidentally, literary, meta, mefites, y'know, posts, prize, novel, translation, who've, would've, alright, essay, editors, hop

Low interest words
cost, software, bike, iphone, property, california, traffic, privacy, religious, os, regardless, device, ipad, insurance, browser, it'd, torture, abortion, congress, cops

Percent overlap of user and site top ranked words: 74.8%
Interestingly, the lobster and lederhosen words completely disappeared from this version.

This feels like a much more accurate accounting of my interests than the other one.
posted by Kattullus at 7:13 PM on August 4, 2012 [1 favorite]


Oooh, I'd love to see mine too if you're still doing this.
posted by SisterHavana at 9:41 PM on August 4, 2012


High interest words
greek, russian, matt, latin, meta, metatalk, askme, languages, poster, jazz, grammar, spelling, mefi, usage, poetry, russia, soviet, english, translation, deleted

Low interest words
facebook, twitter, revenue, hardware, palin, healthcare, corporations, 2010, programming, software, microsoft, digital, funding, carbon, developers, regulation, lab, engine, app, devices


I stand by all of that. (I just hope this comment doesn't drive Ms. P*lin out of my "Low interest words" list.)
posted by languagehat at 8:26 AM on August 5, 2012 [2 favorites]


oh, hey, can I please get one of these lists?
posted by griphus at 8:54 AM on August 5, 2012

High interest words
brooklyn, russian, nyc, girlfriend, miserable, therapy, dating, hang, considering, mods, apartment, dates, retail, advice, manhattan, punk, boyfriend, genuinely, op, t-shirt

Low interest words
democrats, various, beliefs, republicans, texas, largely, canadian, fairly, iraq, humans, mainstream, appeal, btw, catholic, uk, victims, notion, frankly, extremely, capital

...this is painting sort of an odd picture.
posted by griphus at 10:22 AM on August 5, 2012 [2 favorites]


High interest words
francisco, san, lesbian, askme, meta, mods, birds, sf, therapy, mefites, bay, partner, deleted, cats, pie, cheese, bird, preview, op, married

Low interest words
gaga, 3d, hollywood, bp, interface, tries, elements, consumers, korea, excited, albums, democrats, volume, gop, classical, glenn, comedy, rap, linux, russia

Consumers in Russia are excited about the volume of classical GOP comedy; those in Korea are gaga for the rap elements of 3d democrats.
posted by rtha at 10:45 AM on August 5, 2012 [1 favorite]


...why is "russia" coming up so often?
posted by griphus at 10:49 AM on August 5, 2012


More of us active in the sex trafficking askme/meTa?
posted by rtha at 10:59 AM on August 5, 2012


RIGHT
posted by griphus at 11:02 AM on August 5, 2012


This is awesome!

High interest words
ii, 7, askme, metatalk, meta, boston, fpp, birthday, knock, mefi, bacon, wanna, favorites, x, hello, y'all, hi, caps, o, rocks

Low interest words
political, basically, religious, business, legal, several, design, certainly, fact, students, terrible, argument, statement, church, health, market, culture, killed, involved, marriage


KNOCK KNOCK! HELLO Y'ALL! HI! CAPS ROCKS! WANNA BOSTON BIRTHDAY BACON FPP? XO! FAVORITES! Just don't get all serious on me, yeah? I pronounce MeFi "ii 7".


Words you use heavily that (practically) no one else does:
iiii, 7777, 777777, iiiii, 77777, 777777777, 7777777, ii7, iiiiiii, i77, 77777777, n_o_d, 77777777777, 777777777777, iiiiii, 7777777777, 7ii, 77777777777777, 777777777777777, i777

More common words you use much more than anyone else:
7i, zombocom, 777, memepool, recursiveness, lederhosen, kleine, xxxxxxx, mefimu, lynnster, i7, sajak, aaaagh, esperanza, chk, jm, not_on_display, xxxxxx, dag, kann


Anything is possible at zombocom. Lobsters wear lederhosen at zombocom. Welcome to zombocom. anyanythingthing is posspossibleible atat zomzombobocomcom. dag! aaaagh! esperanza memepool lobster lederhosen zombocom. eine kleine lobster zombocom lederhosen AND HERE'S SOME ASCII ART! HELLO! HI! XO!
posted by not_on_display at 11:53 AM on August 5, 2012 [4 favorites]


benito, if you can stand one more request...I would like to see most and least list, too, please.
posted by EvaDestruction at 7:56 PM on August 5, 2012


"...why is "russia" coming up so often?"

I dunno for you, but LH knows Russian and has a long-standing interest and expertise in Russian literature. He's a pretty reliable source on the topic.
posted by Ivan Fyodorovich at 10:39 AM on August 6, 2012


Oh, please, benito, please update my report!
posted by maudlin at 11:41 AM on August 6, 2012

High interest words
askme, op, would've, favourite, boyfriend, meta, behaviour, relationship, suggestions, mefites, mefi, threads, girlfriend, sleeping, emails, ridiculously, matt, metatalk, y'know, gorgeous

Low interest words
science, ok, war, gay, religious, legal, nobody, president, air, machine, modern, costs, united, dog, cars, laws, tea, conservative, built, film
At a guess it would seem like my "high interest" list is skewed by my AskMe answer history, rather than by my talking abou relationships and how gorgeous I find Matt Smith all the time.

I was also confused about my "low interest" list at first, but then I realized I tend to read-but-not-comment in those threads. Except for "dog". Dogs are boring.
Words you use heavily that (practically) no one else does:
polyphasic, biphasic, e2cs, korra, merida, gotye, phirephoenix, sansgras, ows, ebaum, suchwhat, trans-folk, kimbra, beekeeper's, kimbra's, mee-fie, shures, knoble, fullmetal, cis-folk
Now this list is kind of hilarious.
posted by Phire at 1:17 PM on August 6, 2012 [1 favorite]


I actually not entirely sure what's going on here, but it keeps popping up in my recent activity and it looks fun, so: can I play? Not sure what 'can I play' is even asking, but I'm just going to let that paper sailboat float into the aether and see what comes back.
posted by stavrosthewonderchicken at 9:50 PM on August 6, 2012


With such an open invitation for whatever universe chooses to return, goddamn if it wasn't tempting to completely make up all of the words for stavros' clouds. "WTF! I'm sure I've never used the word 'hebephrenic'. I'm not even sure what it means. And I don't even like 'mascarpone'!"

Oh well, I took an oath, and I'll stand by it. Enjoy them, stavros; they're your words, even if you don't understand why.
posted by benito.strauss at 10:10 PM on August 6, 2012 [4 favorites]


Heh. Thanks for that. I actually am very much confused by some of the data, there, but I will say that my 'High interest words' include 'amusing', 'bastards', and 'goddamn,' which seems about right.

Looks like my stats were thrown off horribly by that one, ill-advised, pre-youtube-embracing-music-videos thread I posted with a bazillion music video links.
Words you use heavily that (practically) no one else does:
wonderchicken, hama7, gamefilter, scarabic, romanization, 9622, inthread, mfc, kindall, rushmc, jpoulos, soju, y6, 1142, bodom, adamgreenfield, evanescence, ffdshow, gackt, romanized
All usernames, thread numbers (9622 and 1142), Mefight Club or Korea-related, and a couple of ones from that one dumb thread (I assume bands named 'Bodom' and 'Evanescence' -- neither of which I've ever actually listened to, I don't think), which again, seems about right, I guess. I was expecting a lot more creative swearing, though.
posted by stavrosthewonderchicken at 10:20 PM on August 6, 2012


Well, having reviewed results, I can now state conclusively that I am the most erudite mefite of them all.

High interest words:
dutch, ooh, re, heh, cream, ms, ex, um, contact, meta, op, aw, nah, comfort, topics, askme, guilt, cats, males, er

I am so proud.
posted by likeso at 3:03 AM on August 7, 2012 [4 favorites]


Here's my very mathy high-interest words:

Most and least common user words among top 10% site words
Ranked by PPM(user) / PPM(site)

count rel_PPM word
54 19.751701 mathematics
49 18.918553 e-mail
220 18.276551 math
40 17.079704 mathematical
41 15.445160 professors
33 14.649420 faculty
41 12.634802 probability
38 11.196695 op
38 10.666489 harvard
49 10.139102 graduate
83 9.770948 n
67 9.383748 professor
22 8.850082 terrific
38 8.333557 wedding
20 7.796538 foster
20 7.010105 brooklyn
23 6.949673 scores
27 6.870699 psychology
20 6.843481 courses
29 6.749047 grad

And my low interest:

5 0.195975 religion
5 0.189560 reality
3 0.189537 canada
4 0.187742 software
3 0.183108 videos
3 0.180970 lady
3 0.179360 photos
3 0.163411 ass
3 0.162708 bush
3 0.156031 iphone
3 0.151231 countries
3 0.150741 speech
3 0.140121 suspect
3 0.127048 companies
5 0.124808 although
4 0.119149 police
3 0.115518 security
5 0.114903 gay
7 0.108505 shit
3 0.059475 fuck

You guys cuss a lot. Also you've stopped putting a hyphen in "e-mail," it turns out.
posted by escabeche at 5:43 AM on August 7, 2012


Latebreakin' stats for SisterHavana:
user 10995:	69361 words,	8594 unique, in	2222 comments.
posted by cortex (staff) at 1:41 PM on August 7, 2012


High interest words
favorites, helpful, suggestions, would've, boyfriend, chocolate, dc, lovely, confidence, comfortable, dating, figuring, mood, challenging, relationship, schedule, feminist, suggestion, clothes, holiday

Low interest words
government, ok, news, political, god, game, power, data, s, games, 4, war, rights, u, old, states, religious, vote, claim, american

(4 is actually my favorite number. Now I feel bad for neglecting it.)

cortex and benito.strauss, this has been much fun. Thank you!
posted by EvaDestruction at 7:56 PM on August 7, 2012


(benito, could I trouble you for that second enrichment analysis?)
posted by en forme de poire at 10:32 PM on August 8, 2012


I too seek enrichment analysis.
posted by flapjax at midnite at 10:35 PM on August 8, 2012


May I too play Foods that begin with the letter "Q", or why I fell in love with word-crunched numbers for 100 Ril'sok?

(I read that "f-at-m" has been juicing their comments recently (using nested small tags, so it isn't even readable, saying words purely to win the hunger game... dolloping incongruent, advantageous linguistically dis-sound phrases, presumably for proficuous motives, someone ought to demand congress look into the injection of nifty, or under-utilized words into comments for the express purpose of skewing the results of the annual words counts threads;

Cupidity if you ask me. Google saxicolous, Reeple.)


Please :)
posted by infinite intimation at 11:50 PM on August 8, 2012


infinite intimation, just to show how nerds can suck the joy out of everything, I'll point out that those sesquipedalian words you threw in there probably will have no effect on your results. Being used just one time means that 'cupidity' will show up next to something mundane like 'cheese', that you also happened to use just once. You'll have to make a consistent effort, in which case you will actually become "the guy who always uses 'frangible' in his posts".

But once cortex creates your extract I'll memail you all your quinoa, quiche, and quodlibet.
posted by benito.strauss at 11:53 AM on August 9, 2012 [2 favorites]


infinite intimation, jadepearl: you are go for tables.
user 96984:	256661 words,	22233 unique, in	870 comments.
user 14594:	105632 words,	11843 unique, in	1274 comments.
posted by cortex (staff) at 9:29 PM on August 10, 2012


Oh I am so late to this party! Or, rather: fashionable. May I play??
posted by mimi at 6:08 AM on August 15, 2012


Party on, mimi:
user 1174:	36511 words,	6880 unique, in	744 comments.
posted by cortex (staff) at 7:23 AM on August 15, 2012


Wait, there's even more data dorkery?

Please do tell me what I'm interested in.
posted by philipy at 9:55 AM on August 15, 2012


Here's my stats...

Most and least common user words among top 5000 site words

High interest words
courses, p, mefi, info, btw, y, fwiw, suggestions, stats, x, tech, developing, answers, chances, hypothesis, psychology, domain, randomly, topics, aim

Low interest words
yeah, shit, god, woman, obama, lost, water, oh, dead, house, art, military, song, million, black, society, movie, awesome, sort, hell

Percent overlap of user and site top ranked words: 48.5%

Details: ranked by PPM(user) / PPM(site)

user_count rel_PPM user_rank site_rank word
22 14.044304 495 4757 courses
76 10.894521 177 1395 p
93 10.435214 144 1106 mefi
43 9.858650 301 2075 info
45 9.624684 286 1954 btw
32 8.515314 380 2377 y
19 8.394926 562 3570 fwiw
12 7.888203 834 4880 suggestions
11 7.294617 901 4912 stats
79 6.682642 172 835 x
27 6.639710 438 2210 tech
19 6.322197 557 2857 developing
28 6.315610 418 2047 answers
16 6.195746 638 3220 chances
10 6.194470 950 4642 hypothesis
13 6.171788 780 3772 psychology
16 6.113891 639 3188 domain
11 6.109376 888 4291 randomly
11 6.064586 906 4274 topics
9 5.875068 1006 4834 aim
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6 0.210039 1465 358 hell
9 0.206596 1074 248 sort
6 0.203999 1394 352 awesome
7 0.203812 1311 302 movie
4 0.197326 2238 504 society
5 0.194841 1624 404 black
3 0.191513 2688 634 million
4 0.183149 2239 470 song
3 0.181586 2687 605 military
5 0.181364 1611 372 art
5 0.164784 1731 342 house
3 0.159482 2448 547 dead
9 0.154438 1058 194 oh
3 0.150965 2986 514 water
3 0.148014 2663 506 lost
3 0.134930 2721 462 obama
3 0.134239 3002 459 woman
4 0.124515 2053 323 god
3 0.086757 2860 301 shit
4 0.077086 2291 211 yeah


So I refer to Mefi a lot more than the average Mefite apparently. Who knew?

Also why are you all so interested in discussing water???

As for Benito's game of making sentences from the least used words...

- Hell yeah, Obama's one awesome black woman.

- Oh... God is dead, society is lost, art is shit, and water is hell.
posted by philipy at 11:24 AM on August 15, 2012 [1 favorite]


Oh wow - just found this. Cortex, if you're still monitoring, I'd love to see my wordcloud.
posted by tzikeh at 1:48 PM on August 15, 2012


Kapow! (benito.strauss does the word cloud bit.)
user 20459:	170533 words,	15805 unique, in	2293 comments
posted by cortex (staff) at 1:57 PM on August 15, 2012


Thanks! off to bug benito.strauss... :)
posted by tzikeh at 2:31 PM on August 15, 2012


« Older One of These Things is Not Like the Others   |   Quelling ones Jobs-like tendencies Newer »

You are not logged in, either login or create an account to post comments