"Action speaks louder than words but not nearly as often." - Mark Twain August 5, 2013 10:02 AM   Subscribe

O come, all ye MeFites,
Fav'riting and Flagging!
O point ye, O click ye to...
The Third Annual MetaFilter Wordcount!
(Brought to you by the generosity of cortex, no grant from National Science Foundation, and Viewers Like You)

Chime in here if you'd like cortex to reveal to you the number of words you've typed on MetaFilter. If you've participated before, the links above contain your previous word counts.
posted by griphus to MetaFilter-Related at 10:02 AM (387 comments total) 6 users marked this as a favorite

There are some things man was not meant to know.
posted by The Whelk at 10:04 AM on August 5, 2013 [10 favorites]

And here's how it works! All you have to do in here is say something that's pretty unambiguously a "yes" or a "me too", and I'll add you to the list. I'll do periodic runs through that list to do two things:

1. Calculate your total word count on the site, and
2. Generate a word frequency table for you that I'll stick at a url that you can easily grab it from.

Please note that that url for (2) will be really, really not-secret, so if for some reason you would like to have your word frequency table generated but do not want it posted publicly, either say so really clearly in a comment here or just skip saying "yes" in here entirely and just mefimail me instead.

If you're curious about the details of the frequency table thing, here's an older comment I made about what's in 'em.

For clarity's sake, I've shamelessly abused my admin powers to edit in the url path to this comment now that I've done the first batch of numbers for folks.

Copy this url and replace XXXX with your userid, and you'll be able to get your file accordingly:

posted by cortex (staff) at 10:12 AM on August 5, 2013 [2 favorites]

I'm kind of hoping it's less than last year. Because this would mean I have more of an IRL life.

Yes. Tell me. I have to know.
posted by arcticseal at 10:13 AM on August 5, 2013

Yes, please.
posted by Rock Steady at 10:14 AM on August 5, 2013

Yes, please, and thank you.
posted by MCMikeNamara at 10:16 AM on August 5, 2013

Oooh oooh oooh me!
posted by restless_nomad (staff) at 10:17 AM on August 5, 2013

posted by insectosaurus at 10:17 AM on August 5, 2013

Yes, please!
posted by Iridic at 10:18 AM on August 5, 2013

Yes, please. Thank you!
posted by MonkeyToes at 10:19 AM on August 5, 2013

Oh! I've alerted benito.strauss to this thread, as he was the one that did the rest of the fun stuff everyone remembers from the last one. It's up to him if he wants to/is able to participate, of course.
posted by griphus at 10:22 AM on August 5, 2013

my last list was fucking BALLER

I'm really hoping that my list removes "sandusky" and "paterno" but it will probably be replaced by "Manti" or "T'eo"; basically, my pattern just tells me that I should just comment at Deadspin and Doctor Who websites.
posted by MCMikeNamara at 10:22 AM on August 5, 2013 [2 favorites]

Me too! Choose me!
posted by dotgirl at 10:25 AM on August 5, 2013


I don't care about the word count as much as I do the word frequency though.
posted by elizardbits at 10:27 AM on August 5, 2013

Yes please!
posted by SisterHavana at 10:27 AM on August 5, 2013

I'll give it a go.
posted by QueerAngel28 at 10:27 AM on August 5, 2013

I just realized that due to the months-long absence, my numbers will be slightly off for averaging purposes.

I wonder if it is possible to use my previous word-frequency tables, my posting frequency, and all my previous comments to hack up some sort of markov chain script that will basically simulate three missing months of my commentary.

You know, for more-even statistics. Definitely for not some sort of emergent behavior cybermind cloning thing.

Not at all.
posted by griphus at 10:28 AM on August 5, 2013 [3 favorites]

posted by dismas at 10:29 AM on August 5, 2013

Oh god no.
posted by Artw at 10:29 AM on August 5, 2013

Me? Thank you!
posted by bitter-girl.com at 10:30 AM on August 5, 2013

me too plz!
posted by Stynxno at 10:30 AM on August 5, 2013

I would like mine. Thanks.
posted by jessamyn (staff) at 10:31 AM on August 5, 2013

yes, please and thank you!
posted by troika at 10:34 AM on August 5, 2013

sure, go and and do mine...
posted by empath at 10:34 AM on August 5, 2013

Do me! Do me!
posted by zombieflanders at 10:36 AM on August 5, 2013

Unambiguous yes.
posted by burnmp3s at 10:38 AM on August 5, 2013

Yes, please!
posted by zarq at 10:39 AM on August 5, 2013

Please do me as well sir!
posted by Meatbomb at 10:42 AM on August 5, 2013

I don't really want to know except if I don't know I will keep wondering and then five minutes before this thread closes I'll be all OH SHIT FINE ME TOO.

So...Me too, please, yes.
posted by rtha at 10:42 AM on August 5, 2013 [4 favorites]

Theory: Higher word count correlates closely with high popularity (difficult to measure, I know.)
posted by latkes at 10:43 AM on August 5, 2013

I would like to learn this information. Yes, please.
posted by Tanizaki at 10:44 AM on August 5, 2013

Please, sir!
posted by dirtdirt at 10:46 AM on August 5, 2013

something that's pretty unambiguously
posted by phunniemee at 10:48 AM on August 5, 2013 [3 favorites]

yes I said yes I will Yes.
posted by languagehat at 10:52 AM on August 5, 2013 [2 favorites]

Do we get sedatives before you do this? (Or, at the very least, laxatives?)
posted by jbickers at 10:54 AM on August 5, 2013

yes, please!
posted by nadawi at 10:57 AM on August 5, 2013

pretty unambiguously a "yes" or a "me too"
posted by slogger at 11:00 AM on August 5, 2013 [1 favorite]

Yes. But, oh god. But, yes.
posted by showbiz_liz at 11:04 AM on August 5, 2013

Yes, please!
posted by rachaelfaith at 11:04 AM on August 5, 2013

Yes, please! Thank you!
posted by mountmccabe at 11:04 AM on August 5, 2013

I would also like to be in for the emergent behavior cybermind cloning thing if possible.
posted by mountmccabe at 11:06 AM on August 5, 2013 [2 favorites]

Oh heck, why not.
posted by St. Alia of the Bunnies at 11:07 AM on August 5, 2013

I would love that. Thanks so much in advance!
posted by Admiral Haddock at 11:09 AM on August 5, 2013

What the hell, sure.
posted by jquinby at 11:09 AM on August 5, 2013

Göran Hasselkvist will be eternally grateful. Count me in.
posted by Namlit at 11:12 AM on August 5, 2013

never did this before but I would like to.
posted by Miko at 11:12 AM on August 5, 2013

I always need this.
posted by Coatlicue at 11:15 AM on August 5, 2013

Seems kinda scary, but go on then, yes please. I bet my most used word is "The".
posted by marienbad at 11:17 AM on August 5, 2013

I'm afraid what I'm going to learn here, so I'll also say yes to this. Thank you.
posted by gauche at 11:18 AM on August 5, 2013


Also, I saw my frequency table, but (I'm slow, here) what was the deal with the "words used frequently by you but less frequently by others" or some such? How did that work?

You missed the John D. and Catherine T. MacArthur Foundation: committed to building a more just, verdant and peaceful world.
posted by Madamina at 11:19 AM on August 5, 2013

Yes please!
posted by capricorn at 11:26 AM on August 5, 2013

... what was the deal with the "words used frequently by you but less frequently by others" or some such? How did that work?

benito.strauss was doing it on request last time. I've let him know about this thread if he'd like to entertain us with his magic again.
posted by griphus at 11:27 AM on August 5, 2013

Yes! I need to know what abuses of onomatopoeia and occasionally hyphenated wordsmashes (see?) I have inflicted upon you all.
posted by cmyk at 11:27 AM on August 5, 2013

Yes, me too.

posted by Bulgaroktonos at 11:29 AM on August 5, 2013

Pick me! pick me!

Me too, please!
posted by misha at 11:30 AM on August 5, 2013

Me too!
posted by aspo at 11:33 AM on August 5, 2013

Yes please!
posted by Blasdelb at 11:41 AM on August 5, 2013


(as in not "no")

posted by Hairy Lobster at 11:43 AM on August 5, 2013

yes. i'd also like a word cloud of my contributions, if only to see the frequency of the times I've said "fuck me"
posted by Think_Long at 11:44 AM on August 5, 2013

Please, thanks! I'd like to see how erudite and verbose I am.

Question: will this only query the "final" text we've entered, or will it also look at edited text that is only visible to mods?
posted by filthy light thief at 11:54 AM on August 5, 2013

Question: will this only query the "final" text we've entered, or will it also look at edited text that is only visible to mods?

Ha, good question. I'm pretty sure it'll be the final canonical form of the text based on my understanding of how pb engineered the editing feature to behave on the backend, but I hadn't actually looked at that explicitly.
posted by cortex (staff) at 11:56 AM on August 5, 2013

Yes, please.
posted by Celsius1414 at 11:57 AM on August 5, 2013

Also, will this be metannounced or do we keep an eye on this thread?
posted by Celsius1414 at 11:58 AM on August 5, 2013

yay, word table time! I'm in.
posted by kagredon at 11:59 AM on August 5, 2013

Also, will this be metannounced or do we keep an eye on this thread?

Just keep an eye on the thread, I'll post updates periodically.
posted by cortex (staff) at 12:01 PM on August 5, 2013

Me please!
posted by a snickering nuthatch at 12:03 PM on August 5, 2013

I'd like to know that information about myself. I have my reasons, and only one of them involves wanting to make a slightly larger amount of work for Cortex.
posted by Gygesringtone at 12:05 PM on August 5, 2013 [2 favorites]

Sure, why not, all righty, then, yes, please.
posted by Atreides at 12:12 PM on August 5, 2013

yes please.
posted by MartinWisse at 12:13 PM on August 5, 2013

Count me in. Damn, three more. Damn!
posted by Errant at 12:15 PM on August 5, 2013 [2 favorites]

Oh - yes, please.
posted by needlegrrl at 12:18 PM on August 5, 2013

I may regret this, but yes.
posted by corb at 12:18 PM on August 5, 2013

posted by theodolite at 12:22 PM on August 5, 2013

Si, por favor
posted by Doleful Creature at 12:22 PM on August 5, 2013

Ooh, me please!
posted by threeants at 12:25 PM on August 5, 2013

Me pleeeese!
posted by Salvor Hardin at 12:25 PM on August 5, 2013

Why not? Sure.
posted by pjern at 12:33 PM on August 5, 2013

Yes, please.
posted by Chrysostom at 12:35 PM on August 5, 2013

Yes, but the answer is three words and a blartish syllable that was not Quonsars fault.
posted by clavdivs at 12:40 PM on August 5, 2013

Me too, please!
posted by Room 641-A at 12:41 PM on August 5, 2013

Yes indeed. Please.
posted by JohnnyGunn at 12:41 PM on August 5, 2013

please sir thank you sir
posted by billiebee at 12:41 PM on August 5, 2013

Me please, I guess.
posted by hoyland at 12:42 PM on August 5, 2013

yes please.
posted by sweetkid at 12:43 PM on August 5, 2013

haven't been around as much so i think my list will probably be boring, but sure.
posted by ifjuly at 12:43 PM on August 5, 2013

Yes please.
posted by googly at 12:44 PM on August 5, 2013

That would be great! Thanks, Cortex!
posted by carmicha at 12:46 PM on August 5, 2013

I would like to subscribe to your newsletter.
posted by moonmilk at 12:49 PM on August 5, 2013

Yes please, thank you.
posted by ramix at 12:50 PM on August 5, 2013

Me, also.
posted by ocherdraco at 12:53 PM on August 5, 2013

Sure why not.
posted by jph at 12:54 PM on August 5, 2013

yes, thx
posted by fuse theorem at 12:56 PM on August 5, 2013

Hit me, cortex!
posted by en forme de poire at 12:59 PM on August 5, 2013

Sure, however meager it may be
posted by holmesian at 1:03 PM on August 5, 2013

Is this going to involve the amazing "words used frequently by you but less frequently by others"? because that is secretly my chief interest here
posted by threeants at 1:11 PM on August 5, 2013 [1 favorite]

This sounds fascinatingly interesting (in a "this-could-be-terrifying" kind of way). Yes, please.
posted by QuantumMeruit at 1:12 PM on August 5, 2013

heck yes?
posted by lizjohn at 1:14 PM on August 5, 2013

posted by timsteil at 1:15 PM on August 5, 2013

Please and thank you.
posted by thinkpiece at 1:17 PM on August 5, 2013

posted by lalochezia at 1:27 PM on August 5, 2013

Decisive me would be embarrassed at the frequency of 'perhaps', 'maybe', 'possibly'.
posted by Cranberry at 1:29 PM on August 5, 2013

Yes, please!
posted by EvaDestruction at 1:31 PM on August 5, 2013

Yes pls
posted by scrump at 1:32 PM on August 5, 2013

Yes, me too.
posted by box at 1:35 PM on August 5, 2013

Morbid curiosity says "yes"
posted by ook at 1:37 PM on August 5, 2013

Yes, please.
posted by Diablevert at 1:38 PM on August 5, 2013

yes please yes please

I feel certain I will regret this but can't figure out why
posted by Elsa at 1:39 PM on August 5, 2013

Alright, here's the first batch, from the top of the thread, in order of request, on through QuantumMeruit. See my comment up top for the url to find your frequency table.

If you think you should be here but aren't, either your yes wasn't unambiguous enough or I typoed your userid or I just goofed. In any case, speak up and I'll get you in the next run.

I'll continue to run batches periodically, at least one more today for sure and then at least daily after that while there's still interest, so don't worry about having missed the boat.
user 49346:	682849 words,	35125 unique, in	14756 comments.
user 7418:	2478313 words,	74886 unique, in	36193 comments.
user 39488:	136041 words,	14044 unique, in	4701 comments.
user 22627:	344871 words,	24432 unique, in	6161 comments.
user 24139:	429785 words,	22864 unique, in	5373 comments.
user 28936:	362978 words,	20315 unique, in	5123 comments.
user 90614:	130574 words,	9587 unique, in	1671 comments.
user 40395:	115065 words,	18573 unique, in	2069 comments.
user 18811:	205677 words,	20957 unique, in	3122 comments.
user 36852:	424767 words,	30387 unique, in	7349 comments.
user 147513:	8942 words,	2241 unique, in	180 comments.
user 71074:	348077 words,	25225 unique, in	11233 comments.
user 10995:	75495 words,	9028 unique, in	2450 comments.
user 71095:	31350 words,	5046 unique, in	537 comments.
user 137499:	180180 words,	11806 unique, in	1653 comments.
user 56010:	33274 words,	5914 unique, in	751 comments.
user 20842:	270190 words,	21051 unique, in	3187 comments.
user 14474:	162144 words,	12172 unique, in	2461 comments.
user 292:	3320691 words,	62559 unique, in	33445 comments.
user 63377:	38812 words,	6755 unique, in	970 comments.
user 29475:	1237979 words,	40206 unique, in	26242 comments.
user 100776:	403027 words,	23779 unique, in	3973 comments.
user 63307:	670266 words,	25710 unique, in	5281 comments.
user 18312:	1258882 words,	46519 unique, in	15382 comments.
user 17588:	293258 words,	22650 unique, in	6570 comments.
user 43189:	1138191 words,	39343 unique, in	16293 comments.
user 114702:	207322 words,	14838 unique, in	1528 comments.
user 23431:	162644 words,	15695 unique, in	2714 comments.
user 74248:	520722 words,	25171 unique, in	7109 comments.
user 14752:	1773783 words,	67395 unique, in	22952 comments.
user 15688:	494107 words,	22693 unique, in	6962 comments.
user 23588:	54878 words,	8920 unique, in	1910 comments.
user 55318:	164391 words,	13256 unique, in	2245 comments.
user 82956:	36482 words,	6048 unique, in	567 comments.
user 152546:	22175 words,	4219 unique, in	255 comments.
user 83298:	363579 words,	19299 unique, in	6047 comments.
user 19620:	302301 words,	21053 unique, in	4295 comments.
user 54985:	192707 words,	19685 unique, in	4988 comments.
user 85129:	217872 words,	18404 unique, in	3894 comments.
user 19344:	2511459 words,	59803 unique, in	18566 comments.
user 56070:	78355 words,	8405 unique, in	595 comments.
user 98744:	81227 words,	11474 unique, in	1662 comments.
user 96173:	237636 words,	16075 unique, in	2384 comments.
user 55262:	384191 words,	22099 unique, in	3409 comments.
user 159653:	23741 words,	4172 unique, in	427 comments.
user 21202:	98303 words,	11514 unique, in	1311 comments.
user 24132:	360150 words,	19867 unique, in	3724 comments.
user 53581:	853080 words,	33912 unique, in	6923 comments.
user 17922:	58905 words,	7243 unique, in	872 comments.
user 111601:	504133 words,	19754 unique, in	6337 comments.
user 90947:	528980 words,	28017 unique, in	4366 comments.
user 57144:	59872 words,	8738 unique, in	809 comments.
user 88408:	160496 words,	15924 unique, in	4225 comments.
user 81610:	940365 words,	45572 unique, in	11547 comments.
user 112576:	60364 words,	10330 unique, in	1437 comments.
user 117509:	100357 words,	11547 unique, in	1143 comments.
user 63452:	116474 words,	12053 unique, in	1511 comments.
user 139650:	87675 words,	8834 unique, in	744 comments.
user 26998:	281943 words,	19289 unique, in	3142 comments.
user 141010:	232110 words,	19416 unique, in	3277 comments.
user 25909:	229701 words,	16814 unique, in	1597 comments.
user 128789:	13959 words,	2526 unique, in	157 comments.
user 128858:	427990 words,	19173 unique, in	2665 comments.
user 113106:	61057 words,	10290 unique, in	1586 comments.
user 135183:	123990 words,	13412 unique, in	1500 comments.
user 99046:	88210 words,	11521 unique, in	1711 comments.
user 81495:	130597 words,	14239 unique, in	2299 comments.
user 58356:	75400 words,	9119 unique, in	1562 comments.
user 19109:	65661 words,	9541 unique, in	1413 comments.
user 10825:	67819 words,	10735 unique, in	2164 comments.
user 6915:	452950 words,	34656 unique, in	9924 comments.
user 78625:	91386 words,	11379 unique, in	1398 comments.
user 37485:	415914 words,	18544 unique, in	5700 comments.
user 172111:	36214 words,	4977 unique, in	421 comments.
user 114722:	356214 words,	19637 unique, in	3093 comments.
user 38780:	305834 words,	16851 unique, in	6261 comments.
user 17220:	385035 words,	25481 unique, in	2934 comments.
user 19438:	209439 words,	18826 unique, in	2657 comments.
user 64979:	97142 words,	12354 unique, in	1539 comments.
user 21267:	36642 words,	6766 unique, in	1042 comments.
user 17958:	11233 words,	3034 unique, in	288 comments.
user 63166:	228507 words,	18842 unique, in	4891 comments.
user 115240:	120382 words,	11870 unique, in	916 comments.
user 46708:	79699 words,	10110 unique, in	1071 comments.
user 88591:	154150 words,	15454 unique, in	2193 comments.
user 169597:	614 words,	315 unique, in	26 comments.
user 63583:	70524 words,	8497 unique, in	430 comments.
posted by cortex (staff) at 2:01 PM on August 5, 2013 [13 favorites]

Yes yes. Yes. Yes
posted by invitapriore at 2:07 PM on August 5, 2013

Count me in. Yes. Me, please. I'd like this just fine.
posted by Rustic Etruscan at 2:09 PM on August 5, 2013

For comparison: list of longest novels
posted by showbiz_liz at 2:10 PM on August 5, 2013

I'm at 30.959% of a Les Miserable
posted by showbiz_liz at 2:12 PM on August 5, 2013

posted by adrianhon at 2:12 PM on August 5, 2013

I would like mine, please! And as anyone who has paid any attention to any of my words knows, I have no sense of privacy so you are welcome to post my word frequency thingy publicly.
posted by DarlingBri at 2:15 PM on August 5, 2013

I'm at 0.30959% of a Les Miserable

Chin up, you're actually 30.959% of one! Keep going!
posted by theodolite at 2:15 PM on August 5, 2013

> user 14752: 1773783 words, 67395 unique, in 22952 comments.

I'm up to all of Proust plus War and Peace, baby!
posted by languagehat at 2:16 PM on August 5, 2013 [3 favorites]

Is there a way to do that "words you use more than anyone else" thing?
posted by showbiz_liz at 2:16 PM on August 5, 2013

I wrote 17% less words this year than last. I wasn't here for three months, or a solid 25% of the year.

Something something abyss stares back.
posted by griphus at 2:16 PM on August 5, 2013 [1 favorite]

Yes, please and unambiguous thank you.
posted by infinite intimation at 2:21 PM on August 5, 2013

Shit, I more than doubled my total word count since last year's word frequency survey. 44642 total words then, 100357 words now.
posted by kagredon at 2:22 PM on August 5, 2013

Is there a way to do that "words you use more than anyone else" thing?

It might happen, but it's more work and I'm volunteering neither benito.strauss (who did that last time) nor myself for it up front here. We'll see how busy of a week it turns out to be!
posted by cortex (staff) at 2:22 PM on August 5, 2013

Count me in, please.
posted by colfax at 2:23 PM on August 5, 2013

What's that I see projected against the clouds in the night sky? Why, it's the image of some other, different cloud. Looks like someone is summoning Cloud Man! (Actually, griphus sent me a memail.)

I'd be glad to generate people's clouds again. (I'd also be glad to hand over the script to someone else, if they're interested.) I'm behind a slower internet connection right now, so it won't be as quick as last year.

To get started I need two things, probably from cortex:

1) What's the URL pattern again? Last year was:
my $user_url = 
	sprintf "http://stuff.metafilter.com/corpus/freq/temp" . 
		"/%s--1-gram--allsites--1999-01-01--2013-01-01.txt", $user_id;
2) I need a site-wide baseline set of data. Last year I used
(I don't immediately remember the URL I got that from.)
Because the individual files didn't cover the same time period as the site-wide file there was some issue with some indiv. words not being in the site-wide list. The script can handle this, but maybe it would be worth creating a new site-wide file for the exact same period as the individual files?

I keep a local copy of the site-wide file so I won't waste your bandwidth, and the file I used last year had about 337,000 lines (distinct words) and ran fast enough, so don't bother limiting it to words appearing ten or more times.

When I get this sorted out (2-3 days?), I'll post in here again and open up for cloud requests

(Cloud Man disappears into the night, leaving a few ppm of vowels swirling behind him in the night air, and returns to his lair above the "Constructed Languages" shelves in the Reference section of the Boston Public Library.)
posted by benito.strauss at 2:24 PM on August 5, 2013 [19 favorites]

yes please!
posted by iamkimiam at 2:26 PM on August 5, 2013

Cool, I've used the word "seven" seven times. AND NOW IT'S RUINED.
posted by theodolite at 2:32 PM on August 5, 2013 [1 favorite]

Cloud Man! Thank god you're here!

1. The url structure has snuck into my comment at the top of the thread, it's basically the same except 2013 is now 2014. Easy peasey. Peasy?

2. A more recent baseline dataset for the whole site is available on the Frequency Tables page; "allsites--1999-01-01--2013-01-01.txt.zip" would be the most thorough canonical one if you can handle its great girth.

Though just for thoroughness's sake I'll try running one off that's up to today to minimize as much as possible the no-show nature of unlikely words. And if you want a tailored file for one reason or another, just let me know.
posted by cortex (staff) at 2:35 PM on August 5, 2013

Yes please, most unambiguously!
posted by Mister_A at 2:38 PM on August 5, 2013

Was I ambiguous? I intended to be unambiguous, but I forgot. YES PLEASE, UNAMBIGUOUS PERMISSIONS AND SUCH.
posted by Elsa at 2:45 PM on August 5, 2013

My words too please, yes. I've been a bit chattier recently!
posted by h00py at 2:48 PM on August 5, 2013

Yes, you clouded me the last time, and I would like to be clouded again! (I bet I still say birding, yorvit, binos, metas, menlo, headlands, peregrines, crissy, gay-married, starlings, frjtz, unvaccinated, redtails, birders, annulled, distilleries, couple-three, reacher, redtail, and recs waaaaaay more than anyone else here, but we shall see.)
posted by rtha at 2:52 PM on August 5, 2013 [2 favorites]

Yes, count me in, so to speak. Thx.
posted by Michele in California at 2:52 PM on August 5, 2013

Yes. Tell me. Tell me now.
posted by IvoShandor at 2:52 PM on August 5, 2013

Me too, please.
posted by notyou at 3:04 PM on August 5, 2013

If clouding is offered as an add-on or premium service, let this post stand as my order! I shall pay any sum you demand, for I am a wealthy racist football player Formula I driver/astronaut/veterinarian.
posted by Mister_A at 3:13 PM on August 5, 2013

Yes, please!
posted by ambrosia at 3:25 PM on August 5, 2013

I would like this, please.

I'm pretty sure that my word count is less than The Whelk's
posted by double block and bleed at 3:32 PM on August 5, 2013 [1 favorite]

Me too, please! Thanks!!!
posted by BibiRose at 3:37 PM on August 5, 2013

yes me also please thank you!
posted by brainmouse at 3:39 PM on August 5, 2013

Well, I am not quite up to some of these on the longest novel list, but, hey, I did use the word "fonz" once.
posted by JohnnyGunn at 4:00 PM on August 5, 2013 [1 favorite]

Also me please!
posted by windykites at 4:00 PM on August 5, 2013

I would like a count of all the words I haven't typed on MetaFilter.
posted by Eideteker at 4:16 PM on August 5, 2013

Yes, please. It's fun/terrifying to see.
posted by Ghidorah at 4:26 PM on August 5, 2013

115 1258.39844177445 romney
54 590.900137876699 obama
53 579.957542730834 santa

I will never forget you, epic election threads! Wait.

Actually, my list is kind of boring.

54 590.900137876699 actually
posted by Room 641-A at 5:12 PM on August 5, 2013

Okay, so once we get rid of the boring words that I use a lot, the top four are 'people' with 1909, 'school' with 470, 'trans' with 427 (quelle surprise, I guess we know what threads I'm in), and 'BLANK' with 422, which I'm assuming is something other than 'BLANK' (carriage returns?), since I only used 'foo' three times.
posted by hoyland at 5:19 PM on August 5, 2013

I'd like that.
posted by Margalo Epps at 5:20 PM on August 5, 2013

Oh, it's also kind of amusing to see what the non-words are and guess what they came from. I've got 'ix' and 'j' each three times. And 'sd' three times, but that was almost certainly süd.
posted by hoyland at 5:22 PM on August 5, 2013

Oh, man, BLANK. BLANK is a bug I may have fixed in a later revision of the code this was branched off for metatalking purposes.

Ignore it, there are no valid all-caps tokens, it's just a placeholder for "this string turned out to be empty after we cleaned out punctuation and such but we forgot to actually throw it away before counting".
posted by cortex (staff) at 5:32 PM on August 5, 2013

Oooh do me! Do me!

And the wordcount thing too.

Please and thank you
posted by xqwzts at 5:41 PM on August 5, 2013

Oh yes please, I'd like to be in on this!
posted by mosessis at 5:54 PM on August 5, 2013

Yes, sure, why not?
posted by His thoughts were red thoughts at 6:20 PM on August 5, 2013

Although I have a rough count of the number of words I've typed on the site because I always draft my missives in Microsoft Word and use the "word count" feature, I would like to compare that number to this more official counting.*

So, yes, please.

*- I don't actually do this.
posted by elmer benson at 6:30 PM on August 5, 2013

Yes. Me too.

Um, how do I find my user number?
I tried. Can't.
I may be picking it up and looking under it.
posted by mule98J at 6:31 PM on August 5, 2013 [1 favorite]

It's in the URL of your profile: 149961.
posted by hoyland at 6:32 PM on August 5, 2013 [1 favorite]

My beverage fridge is here! I mean, it's not super fancy, it does't have a glass door or anything but it was like half the price if a bar fridge. It has dedicated slots for cans and and a two liter.

Imma make a baby bonbel and Arizona run to stock it, anyone want anything from the store?
posted by Ad hominem at 6:34 PM on August 5, 2013 [2 favorites]

Ah. I can put it back down now.

Thanks, hoyland.
posted by mule98J at 6:39 PM on August 5, 2013

Yes, I want one too!
posted by wildcrdj at 6:45 PM on August 5, 2013

Yea, verily, please.
posted by beagle at 6:57 PM on August 5, 2013

Sure me too.

I'm pretty close to 10k comments. I bet I've posted at least that many words.
posted by Ad hominem at 7:02 PM on August 5, 2013

Yep, let's hear it.
posted by Marisa Stole the Precious Thing at 7:05 PM on August 5, 2013

I want to know. I predict that the bigram "in fairness" will be somewhere near the top once all the stop words have been dropped.
posted by Going To Maine at 7:10 PM on August 5, 2013

Yes please!
posted by crossoverman at 7:20 PM on August 5, 2013

Now that I've written more words than In Search of Lost Time, I'd like people to refer to my comment history as "Proustian".
posted by empath at 7:27 PM on August 5, 2013 [2 favorites]

I'm a bit scared but I must know
posted by NoraReed at 7:38 PM on August 5, 2013

Ja, bitte.
posted by brina at 8:03 PM on August 5, 2013

Come and behold them, born of countless FPPs....
posted by koucha at 8:13 PM on August 5, 2013

I would like a pink rideable cloud like Monkey, please.
posted by arcticseal at 8:31 PM on August 5, 2013

OK, so...

2013: 1258882 words, 46519 unique, in 15382 comments.
2012: 1007000 words, 41575 unique, in 12463 comments.
2011: 690117 words, 34674 unique, in 8654 comments.

In the last year I have posted: 251,882 words, 4944 unique, in 2919 comments.
Between 2011 and 2012 I posted: 316,883 words, 6901 unique, in 3809 comments.

I'm slacking off.

If it counted posts and not just comments, I'd be closer to 2 million words by now. :)
posted by zarq at 8:35 PM on August 5, 2013

yes please and thank you ;-)
posted by madamjujujive at 8:40 PM on August 5, 2013

I predict that the bigram "in fairness" will be somewhere near the top once all the stop words have been dropped.

Alas, the default frequency tables are 1-grams, just to keep things sane. I have the ability to generate bigrams and in theory n-grams for arbitrary n, though arbitrary becomes pretty problematic once n gets bigger than 3 or 4 for anyone with a significant comment load because of my suboptimal approach.

So, long story short, if you specifically want to see bigrams or trigrams right me a mefimail about it with a 25-word essay on why and I can do it as a one-off.
posted by cortex (staff) at 8:42 PM on August 5, 2013

Also, I'm going to make a liar of myself here and punt the second run until tomorrow morning sometime because I've got beer to drink and the rest of Peter Capardi's In The Loop to finish on Netflix.
posted by cortex (staff) at 8:43 PM on August 5, 2013 [1 favorite]

I'd like a word count, but I'd also like a cloud, if possible. Thanks!
posted by Rustic Etruscan at 8:44 PM on August 5, 2013

So, long story short, if you specifically want to see bigrams or trigrams right me a mefimail about it with a 25-word essay on why and I can do it as a one-off.

In my case, "fairness" would be a pretty good approximation of "in fairness" (until now), so not really a problem to worry about.
posted by Going To Maine at 8:58 PM on August 5, 2013

Yes please cortex! How nifty.
posted by cairdeas at 9:05 PM on August 5, 2013

Yes, please, too, thanks.
posted by barnacles at 9:17 PM on August 5, 2013

posted by vegartanipla at 9:30 PM on August 5, 2013

Cortex and benito.strauss: May I please? Thank you.
posted by gingerest at 9:41 PM on August 5, 2013

Yes, unambiguously yes.
posted by plinth at 9:50 PM on August 5, 2013

Thank you sir may I please haz?
posted by Lynsey at 10:32 PM on August 5, 2013

Me too please! Thanks so much for doing this!
posted by marsha56 at 10:46 PM on August 5, 2013

Me, please! Thank you!
posted by MeghanC at 11:01 PM on August 5, 2013

Me! Thanks!
posted by Autumn at 11:09 PM on August 5, 2013

I'd like one as well.
posted by Dr Dracator at 11:10 PM on August 5, 2013

Me too yes please.

This may be the kerf that leads to parsiloquency.

This thread gets counted, too, right?
posted by Doroteo Arango II at 11:27 PM on August 5, 2013

Yes please, o corticular* one and Cloud Man! I would like this very much.

*Corteculated? Cortextual?
posted by daisyk at 11:35 PM on August 5, 2013


Now there's an "alternative lifestyle" I haven't heard of yet... wonder if they have conventions...
posted by cairdeas at 12:00 AM on August 6, 2013

Yes, go on then
posted by Cannon Fodder at 12:04 AM on August 6, 2013

I'd like this!

I hope it's prime.
posted by 23 at 12:25 AM on August 6, 2013 [1 favorite]

Please, thanks! I'd like to see how erudite and verbose I am.

Oh yes, me too!

Oh hang on

Please, thanks! I'd like to see how erudite and verbose I am much I say naughty words.
posted by louche mustachio at 2:26 AM on August 6, 2013

Yes please!
posted by ellieBOA at 2:41 AM on August 6, 2013

How do we get our userids?
posted by corb at 2:45 AM on August 6, 2013

corb just click on your name and then look at the URL when you arrive at your profile page.
posted by cairdeas at 2:48 AM on August 6, 2013

(Nevermind, I figured it out! Hover over your name, anyone who was also puzzled.)
posted by corb at 2:49 AM on August 6, 2013

I would be curious.
posted by HuronBob at 2:56 AM on August 6, 2013

yes I too want to put my head in my hands whilst moaning 'oh man, I could have written another novel' or worse 'novels'
posted by fearfulsymmetry at 4:49 AM on August 6, 2013 [1 favorite]

Sure, why not?
posted by eriko at 4:56 AM on August 6, 2013

ooh ooh i wanna cloud all puffy and white and all for me
posted by dismas at 4:57 AM on August 6, 2013

I would like a cloud, please.
posted by Madamina at 5:32 AM on August 6, 2013

Include me in, please!
posted by Rosie M. Banks at 5:53 AM on August 6, 2013

benito.strauss, if you would do mine as well I'd really appreciate it. Thank you!
posted by gauche at 5:54 AM on August 6, 2013

cortex: the rest of Peter Capardi's In The Loop to finish on Netflix.

I would love to see Malcolm Tucker's word cloud.
posted by Rock Steady at 5:55 AM on August 6, 2013

You won't be able to see it, it's a bus-sized fuckberg in the sewers.
posted by Namlit at 5:58 AM on August 6, 2013 [3 favorites]

Oh crap. 217872 words. I must be kidding me: where is that book I'm supposed to be writing.
posted by Namlit at 6:03 AM on August 6, 2013

I'll add my approval for benito.strauss to do my analysis now. Thanks!
posted by burnmp3s at 6:11 AM on August 6, 2013

Also, I'm going to make a liar of myself here and punt the second run until tomorrow morning sometime because I've got beer to drink and the rest of Peter Capardi's In The Loop to finish on Netflix.

... by which you mean Hellraiser: Revelations?
posted by Going To Maine at 6:16 AM on August 6, 2013

Yes please!
posted by booksherpa at 6:57 AM on August 6, 2013

Yes, please
posted by Optamystic at 7:25 AM on August 6, 2013

benito.strauss can I also get one o them things
posted by showbiz_liz at 7:28 AM on August 6, 2013

Me, too! Yes please!
posted by FirstMateKate at 7:28 AM on August 6, 2013

That would be wonderful, benito.strauss! I'll take one please.
posted by troika at 7:50 AM on August 6, 2013

benito please cloud me!! hurray
posted by en forme de poire at 7:52 AM on August 6, 2013

I'd take a cloud, benito. Thanks!
posted by Chrysostom at 7:56 AM on August 6, 2013

Oh, I'd love a cloud too!
posted by Autumn at 7:58 AM on August 6, 2013

Cloud me, too, please, benito!
posted by EvaDestruction at 7:58 AM on August 6, 2013

I would adore a cloud as well and thank you so very much!
posted by Miko at 8:02 AM on August 6, 2013

benito.strauss, please add me to your cloud list. Thank you!
posted by MonkeyToes at 8:04 AM on August 6, 2013

Yeah, I'd really a cloud of my very own as well.
posted by Gygesringtone at 8:05 AM on August 6, 2013

I don't know what y'all mean by cloud but I need one! Thank you kindly.
posted by billiebee at 8:06 AM on August 6, 2013

Average novel: 80 - 100k words

sweetkid's total: 305,834 words.

(debating the temporary account disable so I can get on to doing something with my life).
posted by sweetkid at 8:07 AM on August 6, 2013

Alright, here's the second run, from lizjohn through FirstMateKate. See my comment at the top of the thread for the url to plug your userid into to get your word frequency table.

I'll continue to run these at least daily for a while, so if you're getting here late or I missed or typoed you somehow, just speak up and I'll get you on the next run.
user 33334:	19338 words,	3800 unique, in	248 comments.
user 16328:	113496 words,	12031 unique, in	1913 comments.
user 17698:	122337 words,	13402 unique, in	2175 comments.
user 27547:	101160 words,	16502 unique, in	2305 comments.
user 17994:	126786 words,	15591 unique, in	4857 comments.
user 94911:	99089 words,	11066 unique, in	1293 comments.
user 25030:	110935 words,	12277 unique, in	1353 comments.
user 17573:	309362 words,	26159 unique, in	11169 comments.
user 4448:	481715 words,	25731 unique, in	5665 comments.
user 92928:	360289 words,	24608 unique, in	2412 comments.
user 24675:	226251 words,	18417 unique, in	1460 comments.
user 18541:	186164 words,	16890 unique, in	2094 comments.
user 148248:	82137 words,	11671 unique, in	2260 comments.
user 482:	94909 words,	9757 unique, in	1105 comments.
user 51358:	803902 words,	30126 unique, in	8693 comments.
user 96984:	289240 words,	23734 unique, in	1020 comments.
user 101169:	100400 words,	9058 unique, in	649 comments.
user 48758:	508972 words,	26228 unique, in	6477 comments.
user 40584:	367335 words,	29302 unique, in	13073 comments.
user 52224:	88209 words,	9674 unique, in	2040 comments.
user 153873:	265319 words,	13720 unique, in	1569 comments.
user 77058:	96551 words,	11027 unique, in	1769 comments.
user 26577:	129978 words,	16398 unique, in	2419 comments.
user 17499:	225951 words,	16388 unique, in	2544 comments.
user 74856:	85452 words,	11472 unique, in	1378 comments.
user 93788:	144862 words,	10470 unique, in	1425 comments.
user 94580:	141603 words,	10523 unique, in	2207 comments.
user 152152:	142264 words,	11465 unique, in	1579 comments.
user 77383:	489407 words,	24538 unique, in	4334 comments.
user 36397:	56427 words,	6482 unique, in	660 comments.
user 89795:	14584 words,	3687 unique, in	226 comments.
user 20601:	51458 words,	6156 unique, in	533 comments.
user 112687:	181735 words,	16893 unique, in	3247 comments.
user 147329:	6818 words,	2063 unique, in	98 comments.
user 51112:	22260 words,	4353 unique, in	362 comments.
user 149961:	172093 words,	15928 unique, in	1207 comments.
user 11891:	206511 words,	13727 unique, in	2239 comments.
user 7961:	238335 words,	19913 unique, in	3014 comments.
user 118269:	504269 words,	29460 unique, in	8798 comments.
user 71801:	657901 words,	34185 unique, in	9236 comments.
user 170113:	31695 words,	5532 unique, in	492 comments.
user 57973:	100963 words,	10273 unique, in	1539 comments.
user 84904:	112397 words,	11633 unique, in	1696 comments.
user 89363:	1129437 words,	40353 unique, in	14768 comments.
user 16148:	169892 words,	13843 unique, in	1380 comments.
user 139906:	5619 words,	1714 unique, in	131 comments.
user 15971:	320167 words,	25198 unique, in	5453 comments.
user 140365:	369632 words,	15169 unique, in	2045 comments.
user 32016:	97308 words,	12008 unique, in	1418 comments.
user 144659:	53883 words,	7173 unique, in	550 comments.
user 118218:	128257 words,	15290 unique, in	1380 comments.
user 549:	341446 words,	21857 unique, in	3869 comments.
user 774:	13376 words,	3075 unique, in	278 comments.
user 17151:	92419 words,	11811 unique, in	1513 comments.
user 87841:	140484 words,	11217 unique, in	696 comments.
user 106897:	17289 words,	2550 unique, in	229 comments.
user 91565:	96561 words,	11323 unique, in	1740 comments.
user 162142:	11616 words,	2817 unique, in	139 comments.
user 72963:	20330 words,	4017 unique, in	291 comments.
user 176654:	17015 words,	3249 unique, in	111 comments.
user 151915:	29314 words,	5359 unique, in	395 comments.
user 54479:	205070 words,	18562 unique, in	4169 comments.
user 100132:	13846 words,	3410 unique, in	406 comments.
user 17601:	390810 words,	19155 unique, in	9171 comments.
user 59648:	224736 words,	19782 unique, in	6310 comments.
user 12684:	794875 words,	37734 unique, in	5714 comments.
user 49455:	210661 words,	15098 unique, in	1297 comments.
user 39474:	64011 words,	8004 unique, in	692 comments.
user 1260:	83491 words,	11724 unique, in	2420 comments.
user 140431:	39977 words,	5888 unique, in	472 comments.
posted by cortex (staff) at 8:08 AM on August 6, 2013 [2 favorites]

657,901 words, 34,185 unique, in 9,236 comments.

Holy crap. I was thinking I'd be somewhere around 2 or 3k.

Also very surprised I have used "zeppelin-like" and "zeppelin-sized" on separate occasions. What's the difference between the two, I wonder?
posted by Marisa Stole the Precious Thing at 8:30 AM on August 6, 2013

And only mentioned "anime" 112 times? Wow. That I did not expect.
posted by Marisa Stole the Precious Thing at 8:32 AM on August 6, 2013

Benito: I would also like to leverage your cloud technology to drive new paradigms. Thank you!
posted by double block and bleed at 8:34 AM on August 6, 2013

A football is zeppelin-like but not zeppelin-sized; the Space Needle is zeppelin-sized but not zeppelin-like.
posted by cortex (staff) at 8:34 AM on August 6, 2013 [1 favorite]

Also very surprised I have used "zeppelin-like" and "zeppelin-sized" on separate occasions. What's the difference between the two, I wonder?

Well zeppelin-like would probably be referring to the shape, such as a cylindrical nicotine inhaler, whereas zeppelin-sized would be referring to something quite large, such as the size of Michael Jackson's ego when Weird Al asked to do a parody of Beat It. Just guessing though.
posted by burnmp3s at 8:39 AM on August 6, 2013 [1 favorite]

Deep Purple is Zeppelin-like, the Rolling Stones are Zeppelin-sized.
posted by jessamyn (staff) at 8:39 AM on August 6, 2013 [10 favorites]

I'm unclear if benito.strauss is taking requests, but if so, I would also like that to happen.

(I apparently wrote 122K words here in the last year, so you'd think I'd be able to come up with a less awkward way to say that.)
posted by MCMikeNamara at 8:43 AM on August 6, 2013

I apparently wrote 122K words here in the last year, so you'd think I'd be able to come up with a less awkward way to say that.

Or at least less concise.
posted by burnmp3s at 8:44 AM on August 6, 2013

All these points about -sized and -like are true, and I'd like to believe I'm clever enough to have been making that distinction. But I'm sure in both cases in context I meant "really big", and might have been referring to Roger Water's ego in one of them.
posted by Marisa Stole the Precious Thing at 8:48 AM on August 6, 2013

But I'm sure in both cases in context I meant "really big"

That would certainly make the mental image of this comment more entertaining.

other one
posted by burnmp3s at 8:58 AM on August 6, 2013

I love the found poetry of this stuff. Also that I've written 'ska' more than 'sex.'
posted by Ghidorah at 9:00 AM on August 6, 2013

I'm in.
posted by Wordwoman at 9:02 AM on August 6, 2013

I'm satisfied with the number of swear words, but mystified by my apparent overuse of the word 'cheese'.
posted by billiebee at 9:07 AM on August 6, 2013

Fun bit, if you look near the bottom, you'll see all the words you've grouped together with dashes, only utterly out of context. Some of my greatest hits:

posted by Ghidorah at 9:10 AM on August 6, 2013 [4 favorites]

I like how "Chihuahuas" made it onto the list. >.>
posted by Autumn at 9:10 AM on August 6, 2013

I've used 'book' and 'movie' exactly 121 times each! 122 now I suppose.
posted by Mister_A at 9:19 AM on August 6, 2013

Hmm 'Zarqfenestration' and 'Yuenglingic' only once each...
posted by Mister_A at 9:23 AM on August 6, 2013

"I Can't Believe It's Not Led" is Zeppelin-lite.
posted by griphus at 9:24 AM on August 6, 2013

Me too please.
posted by Garm at 9:35 AM on August 6, 2013

omg Ghidora this is so fun


I also enjoy the little mini-stories that occasionally appear, such as:

half-nude half-story-high half-woman
posted by showbiz_liz at 9:39 AM on August 6, 2013

posted by ook at 9:51 AM on August 6, 2013

Here's what happens when you're a smart-ass with The Treaty of Westphalia:
24	280.859429855357	marquiss
14	163.834667415625	marquisate
12	140.429714927679	restor'd
6	70.2148574638394	abolish'd
6	70.2148574638394	call'd
6	70.2148574638394	enjoy'd
2	23.4049524879464	luxemburgh
2	23.4049524879464	marquisses
2	23.4049524879464	montbeillard
2	23.4049524879464	neuschaftel
2	23.4049524879464	neustadt
posted by double block and bleed at 10:03 AM on August 6, 2013 [3 favorites]

posted by phunniemee at 10:05 AM on August 6, 2013 [2 favorites]

I've been passing my word corpus through a randomzing awk script and a python Markov chain generator. Nothing looks particularly plausible, probably because it's just a list of words instead of actual text. Still, a few random gems have popped up and aren't half-bad presented as minimal Gibsonesque blank-verse:

anything easy
tried including hives
jesus clothes

have seconding night
website running beyond price
sold ideas
spanish uh dig posts
supply hanging
battery overnight prison pulling stolen colonies
degree paperwork bother confess
dollar gather roundup troubleshoot
author blocked
jay participate
rdr stress bred complex dominion
jose lies
misty reveals

I don't know what the hell Jose lied about or what Misty ended up revealing, but whatever it was, the author seemed pretty defeated by the end.
posted by jquinby at 10:22 AM on August 6, 2013 [2 favorites]

Thanks Cortex.
posted by Ad hominem at 10:25 AM on August 6, 2013

I don't know what the hell Jose lied about or what Misty ended up revealing, but whatever it was, the author seemed pretty defeated by the end.

Well if she had just let Jay participate instead of ignoring him.
posted by sweetkid at 10:26 AM on August 6, 2013

I like finding little meaningful phrases in the ordering, for instance in mine where about a hundredth of the way down the list the phrase "experience getting hard" appears.

Anyway, my favorite compound words in reverse order:

sudo-sudoko [?]
light-yearsduly [again, ?]
posted by invitapriore at 10:26 AM on August 6, 2013

Oh yeah and benito.strauss when/if you get the word cloud running this year I would love to have my tendencies revealed. Thanks!
posted by invitapriore at 10:28 AM on August 6, 2013

cortex: Oh, man, BLANK. BLANK is a bug I may have fixed in a later revision of the code this was branched off for metatalking purposes.

Ignore it, there are no valid all-caps tokens, it's just a placeholder for "this string turned out to be empty after we cleaned out punctuation and such but we forgot to actually throw it away before counting".

Could it be non-ASCII UTF-8 stuff? I know I've typed a few things that weren't in latin characters and don't see any in the generated file, so they might be getting stripped out into BLANK.

On the bright side now that I've typed it here I'm definitely getting BLANK in my word count next year. woohoo!
posted by xqwzts at 10:29 AM on August 6, 2013 [1 favorite]

There is a weird poetry to the word lists:

posted by colfax at 10:56 AM on August 6, 2013

posted by MCMikeNamara at 11:04 AM on August 6, 2013 [5 favorites]

Yes please.

Theory: Higher word count correlates closely with high popularity

Past a certain point, IRL they might well be inversely correlated.
posted by justsomebodythatyouusedtoknow at 11:07 AM on August 6, 2013

I want the cloud thingy and word frequencies thingy. Grazie!
posted by treehorn+bunny at 11:42 AM on August 6, 2013

I've been passing my word corpus through a randomzing awk script and a python Markov chain generator.

Heh, you may be a candidate for a 3-gram file, if you're going to chase after my heart like that anyway. I'll try running one and sending it your way if you want.

On the bright side now that I've typed it here I'm definitely getting BLANK in my word count next year. woohoo!

Actually, you'll just have one more "blank" in your count; I normalize everything to lowercase to keep the files smaller.

Could it be non-ASCII UTF-8 stuff?

That's presumably at least part of it, yeah.
posted by cortex (staff) at 11:47 AM on August 6, 2013

cortex: "Heh, you may be a candidate for a 3-gram file, if you're going to chase after my heart like that anyway. I'll try running one and sending it your way if you want."

If it's not a heinous imposition, sure!
posted by jquinby at 11:49 AM on August 6, 2013

It is kind of heinous now that I think ab—

Wait, here it is.
posted by cortex (staff) at 11:55 AM on August 6, 2013

cortex: "Wait, here it is."


be willing to but i have money i just screwed i like wild i made better by made some to make left atlanta dammit legal eagles weigh lender and tell let's not confuse letting the smell library ever again lies misty reveals light and water like a thin like that

you're looking for all saints for banking and for cutting down for fear that for lots of info lowest details because made the right off to right we've finally rigs over the robin urbanski said roger what's our roses are red router and the rowing if your run along with

some of the shave ice the sheer volume the shining spider the simple search-and-replace the sink and the next i have not it all the much of it you might like at least a bit of status with of suction cup of teeing up of the seasons of the seasons

as far as supposed to do sure and something in your something something recalling something to consider sort of followed by a get much more stirring once halfway strange stuff is stuff and some stuff it seems well-nigh it to go it was slow it will show it worked out

is that hvac test that i didn't own i do appreciate i go back i got numb i hadn't considered i imagine she i might use i miss some i say soak up the local it felt like little on the synthesis on there in on tubular bells on tv

posted by jquinby at 12:19 PM on August 6, 2013 [2 favorites]

If you dump your results into Excel, you can use the LEN function to do a character count in each cell, thus figuring out your longest word!
=LEN([cell number])
My longest word is "dunning-baader-meinhof-kruger-kruder-dorfmeister"!

posted by slogger at 12:23 PM on August 6, 2013 [1 favorite]

Also, benito.strauss: please add me to your puffy white cloud of word-bong smoke.
posted by slogger at 12:24 PM on August 6, 2013

Me too, please!
posted by trip and a half at 12:29 PM on August 6, 2013

Incidentally, if anyone else is interested in putzing around like me (BUSY DAY AT THE OFFICE, DEAR?), I grabbed the corpus file, extracted out the n-grams themselves and renamed it to something shorter. Then:

while true; do echo; cat 3grams.out | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' > input.txt ; python markov.py; echo; sleep 5; done

The awk statement writes out a randomish selection from the original file. The python script reads it and does the magic. There seems to be some clumpiness in what awk is pulling, but I think it sometimes makes for a nice alliterative effect.

The python bit is here. My only changes were to line 7, where I changed the name of the input file to 'input.txt' and line 36, where I played around with the length of generated text. I also fiddled with the 'order' setting on line 5, with various results. It's wrapped in a loop to scroll semi-coherent gibberish every few seconds.
posted by jquinby at 12:32 PM on August 6, 2013

Also, changing BLANK to REDACTED moves the creepiness factor up a fair bit.
posted by jquinby at 12:37 PM on August 6, 2013 [3 favorites]

like so:

you can 50's were full 6 and thrifty 75 dixie 28 REDACTED am stations REDACTED and sooner REDACTED family pictures REDACTED from trader REDACTED i knew a little allen wrench livestock are not livestock hey we'll get high and dry highlighted using a highly stylized ritualistic highwaymen are johnny him.
posted by jquinby at 12:40 PM on August 6, 2013

latkes: "Theory: Higher word count correlates closely with high popularity (difficult to measure, I know.)"

I could be wrong, but I suspect favorites made to comments would be a better metric than word count. Even though some people do use them as bookmarks.

Maybe some sort of conglomeration of the two?
posted by zarq at 12:41 PM on August 6, 2013

I am deeply amused by the accidental poetry of the lists.
posted by scrump at 1:26 PM on August 6, 2013

Apparently I've said "fuck" or "fucking" almost 300 times, with hundreds of additional variations. Given that I've written several novels in the last year (803,902 words) I actually think that's less alarming than it seemed at first.
posted by DarlingBri at 1:28 PM on August 6, 2013

I used the word 'penis' ~4.5x more often on metafilter than I do 'vagina', but apparently I'm equal with 'cock' and 'pussy'.

I will try to even up the vagina imbalance so it doesn't look like I'm not giving vagina equal time. Vagina.

Oh also, laser-nostils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils, laser-nostrils.
posted by plinth at 1:29 PM on August 6, 2013

jquinby, to my eye that output seems less coherent than I'd expect from a degree-3 markov model. Are you sure it's operating as expected, correctly chaining key{word_n word_n+1} -> value{word_n+2} successively across trigrams?
posted by cortex (staff) at 1:36 PM on August 6, 2013

Also, in the Moderator Popularity Contest:
39 351.557218190832 cortex
17 153.242889980619 mathowie
17 153.242889980619 jessamyn
1 9.01428764591878 pb
...and no mention whatsoever of the rest of the team. I feel vaguely ashamed of that.

Cortex is also the Mefite I mention most. I feel vaguely ashamed of that.

(Sorry, rtha: 9 81.128588813269 rtha. I was surprised too.)
posted by scrump at 1:37 PM on August 6, 2013 [1 favorite]

Ah, you said you'd fiddled with it and the order in there is set to 2 which is about as coherent as it looks. I've always found order 3 to be a nice sweet spot of coherence over the short term with wild veering surreality at the sentence/paragraph level, though for a small enough corpus the downside is you get less variation from the source text.
posted by cortex (staff) at 1:38 PM on August 6, 2013

This list tells me I have way too much fun sticking non- in front of things. Maybe I should just say what the thing is rather than state that it's not the opposite of what it is.
posted by Mister_A at 1:40 PM on August 6, 2013

languagehat: "yes I said yes I will Yes."

Just when I think I can't love you any more, you do something like this.

There are times, my beloved hat, when my heart has a tiny reaction to your words. Nearly imperceptible, almost beneath the threshold of thought. Perceptible only to those who really listen.

You could, perhaps, if we were sitting companionably in a room together, in a companionable silence, if you listened very closely, make it out...

posted by scrump at 1:43 PM on August 6, 2013

cortex: "Ah, you said you'd fiddled with it and the order in there is set to 2 which is about as coherent as it looks. I've always found order 3 to be a nice sweet spot of coherence over the short term with wild veering surreality at the sentence/paragraph level, though for a small enough corpus the downside is you get less variation from the source text."

I tried 1, 2, 4 and several other values, but not 3. The results do seem a bit more balanced.

i don't think i think you you could also to set up if it's not going on in it into a pain in the as much of do well in and whatnot on can't speak to especially if you of thing i probably the best the books to the command

Also, I'm not running the whole corpus through the generator, but just a subset as pulled by that bit of awk. It's probably good for feeding to the Phosphor xscreensaver, but maybe not much else.

On the other hand, these just scrolled by and I ran them together:

REDACTED thanks for the first time a bit like one of our you might check on the hook like a good but it might this can be to this one a bottle of about all i and if not are likely to betting man i'd can't speak to creators and to be the that sort of in addition to the same sort down in the i am continually i breathed a i consulted this i counted 10 i do perhaps i followed were i ran into i saw fleas i started push-ups i take her i take that i touched

The output would benefit from some editing (punctuation and linebreaks), but that'd sort of be against the spirit of the thing.
posted by jquinby at 1:53 PM on August 6, 2013

I wish I knew my shell scripting any any Python at all so I could satisfy my curiosity about this, because eevn your new 3-degree results don't read convincing to me. I wonder if that .py script is counting whitespace as a token in the ngrams it builds up for its markov table; if so then what I think of as order n would actually be represented as order 2n-1, which means it'd be order 5 that'd really start to look a bit more coherent.

Out of curiosity, would you do a test run at like order = 9? Regardless of which token-counting scheme it uses, that should produce some pretty dull, coherent reiterations of the source text. if it stays pretty incoherent like this at that point, there's clearly something else going on.
posted by cortex (staff) at 1:58 PM on August 6, 2013

Poem from my list:


A word I seem to have trouble typing:

posted by Miko at 2:09 PM on August 6, 2013 [2 favorites]

Also, just to be more helpful on my end in case it's really mostly just a matter of not enough string length to work with, here's some larger-n files I just generated as well, jquinby: 4-gram, 5-gram, 6-gram.
posted by cortex (staff) at 2:19 PM on August 6, 2013

something that's pretty unambiguously a "yes" or a "me too"
posted by Perplexity at 2:29 PM on August 6, 2013

Based off my list, I've apparently typed the string "000" 51 times. Seems really, really weird to me. I wonder how this happened. I suppose maybe a couple of times, I might have had a comment with "1,000,000" or "1,000" in it, but 51 times?? Jeesh!

Anyhoo, if anyone is doing the word cloud thingie, please count me in. Thanks so much !!
posted by marsha56 at 2:40 PM on August 6, 2013

marsha, I believe I've found the culprit.
posted by cortex (staff) at 2:45 PM on August 6, 2013 [3 favorites]




posted by elizardbits at 2:48 PM on August 6, 2013

posted by Rustic Etruscan at 2:53 PM on August 6, 2013

OMGosh, thanks Cortex !! Yes, that's it !!
posted by marsha56 at 2:56 PM on August 6, 2013

wait how do you get the word chains?
posted by sweetkid at 2:58 PM on August 6, 2013

wait how do you get the word chains?

You ask me super nicely. They're a little more time-consuming to generate so I'm not just doing them for everybody by default, but if you want e.g. a 4-gram file like the one I made for jquinby I can do that.

In related news, I cranked out a (92 megabyte!) 4-gram file for myself a little bit ago just for giggles and while there's nothing surprising about most of the entries at the top (it's all common preposition-heavy sentence glue stuff like "if you want to", "in the first place", "a lot of people", "that sort of thing", "in a way that"), there are a few things that I guess speak to my more emphatic tendencies:

I've used the string "a hell of a" 143 times, "hell of a lot" 104, "the hell out of" 93, "pain in the ass" 82, etc.

I've also apparently said both "part of the problem" and "take it to metatalk" exactly 48 times.

The most common four-word string involving "fuck" that I've used is 17 instances of "cut it the fuck", which, I'm guessing we can predict the next word after that in most cases. Also "what the fuck is" at 16, "or whatever the fuck" at 9, "fucking with the site" at 6, "fuckall to do with" at 4, and so on and so on.
posted by cortex (staff) at 3:00 PM on August 6, 2013

cortex - the 5-grams and a 4th order chain seem to be hitting all the right spots:

as fucked up as a can i tell em that depending on how you define for the reasons listed by from my mothers kitchen serves had a 2-year-old adopted greyhound in the streets by a is your best bet REDACTED more than you ever wanted thanks to one and all

beatrix potter and the original beautiful stuff REDACTED thanks for i bought a couple of i have to say though i stopped using it because i thought i was the one of my all-time favorites otherwise i don't think there's recipe adapted from my mothers the us space and rocket

far as i can tell i just want to say might take a look at could get a shell account host which out-numbers lankhmar's inhabitants i remember it very clearly i've mentioned it here before quite a bit better than relax you know more than the discoverers and the seekers

a long way to go for a couple of subtle variations in speech patterns successfully splashed down in the sugar bunnies on the door suggestions nashville is nice and sunlight what about alfalfa sprouts super-cheap in addition to being supporting documents from a number sure if you'd get one sure

i approve of this and you can do bathtime i'm sure there are bay rum aftershave balm any be comfortable opening the thing be curious to see how be enough to hold back be from this one spot be growing their own peppers be interesting and occasionally entertaining be kinetic

posted by jquinby at 3:01 PM on August 6, 2013

1	0.420738008125292	who the fuck are
1	0.420738008125292	who the fuck do
1	0.420738008125292	who the fuck he
1	0.420738008125292	who the fuck hoobastank
1	0.420738008125292	who the fuck is
1	0.420738008125292	who the fuck knows
posted by cortex (staff) at 3:03 PM on August 6, 2013 [8 favorites]

can I please please have a word chain?
posted by sweetkid at 3:04 PM on August 6, 2013

Here's your 4-gram table, sweetkid. Big file, please download it and play with it locally!
posted by cortex (staff) at 3:07 PM on August 6, 2013 [1 favorite]

Ooh, pretty please with truvia on top may I have my word chain list?
posted by phunniemee at 3:08 PM on August 6, 2013

Tell ya what, I'll do a batch of orders later this afternoon, anybody else who specifically wants a 4-gram table on the sooner side lemme know. I can keep knocking those out in the future as well, so as with the main event here don't worry about missing the one-time window or anything.
posted by cortex (staff) at 3:09 PM on August 6, 2013

me me please but like whenever you feel like doing it
posted by elizardbits at 3:11 PM on August 6, 2013

I didn't know I was so, well, dumb:

i don't know about
i don't know but
i don't know exactly
i don't know much
i don't know where
i don't really see
i don't really understand
i don't understand the
i don't understand what
i don't want that

posted by sweetkid at 3:12 PM on August 6, 2013 [1 favorite]

Also this little poem sums up my first year in NYC and probably also 2009:

my point is that
myself how to grow
need to get away
new york is where
no one in the
no one listens to
not a big deal
not even a little
not going to do
not going to help
not interested in the
not one of those
not saying it doesn't
not something you should
not sure exactly what
not that it matters

posted by sweetkid at 3:14 PM on August 6, 2013 [3 favorites]

posted by Sebmojo at 3:19 PM on August 6, 2013

And I would love a word chain too if that's okay!
posted by Sebmojo at 3:20 PM on August 6, 2013

Would totally love a wordchain.
posted by Marisa Stole the Precious Thing at 3:25 PM on August 6, 2013

Sorry to add to the pile-on, but could I also have a word chain, please?
posted by Rustic Etruscan at 3:26 PM on August 6, 2013

please accept my application for your bountiful word list benito!
posted by threeants at 3:42 PM on August 6, 2013

oooh and cortex I'll happily take any length gram you're willing to serve up!

...that sounded more sexual than I intended.
posted by threeants at 3:46 PM on August 6, 2013

poem from my 1grams:

posted by threeants at 3:48 PM on August 6, 2013

posted by threeants at 3:50 PM on August 6, 2013

posted by threeants at 3:51 PM on August 6, 2013 [1 favorite]

posted by threeants at 3:52 PM on August 6, 2013

posted by threeants at 3:54 PM on August 6, 2013 [1 favorite]

Me, please!
posted by wiskunde at 3:55 PM on August 6, 2013

posted by threeants at 3:56 PM on August 6, 2013

posted by ambrosia at 3:58 PM on August 6, 2013

posted by threeants at 3:58 PM on August 6, 2013

I've titled this one Gratuity Included:

posted by threeants at 4:03 PM on August 6, 2013

Olympic Oblivion:

posted by threeants at 4:05 PM on August 6, 2013

The Superhighway:

posted by threeants at 4:09 PM on August 6, 2013

The Discovery:

posted by threeants at 4:12 PM on August 6, 2013

If its not too much trouble, I too would like to see my word chains. Like you said, the three gram seems to have a good coherence to it.
posted by Ghidorah at 4:12 PM on August 6, 2013

Heh, I guess it can't do special characters-- I spent a good few minutes wondering how I managed to type "beyonc" on seven separate occasions.
posted by threeants at 4:15 PM on August 6, 2013 [3 favorites]

First Lady On The Stump:

posted by threeants at 4:19 PM on August 6, 2013


posted by threeants at 4:22 PM on August 6, 2013

posted by threeants at 4:25 PM on August 6, 2013

Herr Godwin:

posted by threeants at 4:26 PM on August 6, 2013

The Twilight Zone:

posted by threeants at 4:32 PM on August 6, 2013

Yes, I would as well. How fun!
posted by Shouraku at 4:47 PM on August 6, 2013

count PPM word

4079 50217.2922796607 the

posted by marienbad at 4:55 PM on August 6, 2013

So what is PPM anyway?
posted by Rustic Etruscan at 5:15 PM on August 6, 2013

I too would like to be enclouded!
posted by languagehat at 5:21 PM on August 6, 2013

PPM = Parts Per Million.
posted by cortex (staff) at 5:24 PM on August 6, 2013 [1 favorite]



posted by Rustic Etruscan at 5:51 PM on August 6, 2013

Yes Pls
posted by naju at 5:53 PM on August 6, 2013

how do you get the word chains?

You ask me super nicely.

Dear Cortex,
You're really cool and nice. Though I don't have any empirical evidence, you probably smell good too.
Could I please have a 4-word-chain?
Thanks a bunch!
posted by FirstMateKate at 5:56 PM on August 6, 2013

Yes please!
posted by A Bad Catholic at 6:26 PM on August 6, 2013

I've used the words "patriarchy" and "Romani" as many times as the words "rice" and "worry".

I wonder why my hyphenates include "all-services-start-january-19"? "cataract-from-barfighting"? Perhaps David Bowie. "mommy-filter-800-word-book-shill-nyt"? I have no idea.

"meta-pedantry"! I suspect I meant pedantry in MeTa, but uncapitalized it's a perfectly self-descriptive word.
posted by gingerest at 6:27 PM on August 6, 2013

"Pretty unambiguously a yes or a me too!" (Please and thank you!)
posted by Now there are two. There are two _______. at 6:28 PM on August 6, 2013

cataract-from-barfighting. If you're wondering where the heck something came from, you can hit your own user page and use the lower search box to search your own activity for a keyword; I threw in "cataract", and boom! Very handy for tracking down mysteries.
posted by cortex (staff) at 6:31 PM on August 6, 2013

Hahahahahaha, I used the word "zipf" a total of one time, making it a personal hapax legomenon. So meta!

(Though now, of course, I have jinxed it.)
posted by en forme de poire at 6:48 PM on August 6, 2013 [1 favorite]

Wow, a super quick check of the corpus reveals that I say "actually" almost 3 times as much as the rest of you, which for such a common word is almost certainly statistically significant. Crap.

Also, cortex, in the latest full corpus file the word "shit" appears to be used only once, which cannot possibly be true... que pasa?
posted by en forme de poire at 7:11 PM on August 6, 2013 [1 favorite]

also because I am a nerdo mcnerdface I am now calculating Fisher enrichment pvalues on my own and my netbook is struggling somewhat. (benito, is that what you ended up doing for the clouds?)
posted by en forme de poire at 7:33 PM on August 6, 2013 [1 favorite]

I'll try. However, approximately 13% of my output on this site is variations on the word "fuck".
posted by notsnot at 7:57 PM on August 6, 2013

I should like to have my infos and stuff, so I can calculate how many novels I didn't write this year because I was answering AskMes instead.
posted by jacquilynne at 7:59 PM on August 6, 2013 [1 favorite]

(This is the third year, and (possibly...) the first time I find the thread in time.)

Yes, me, me, me!
posted by AsYouKnow Bob at 8:20 PM on August 6, 2013

Yes please! Me please! And thank you very much!
posted by kristi at 8:34 PM on August 6, 2013

posted by oceanjesse at 8:40 PM on August 6, 2013

me! please!
posted by Xere at 10:02 PM on August 6, 2013

Do me too, please.
posted by Jacqueline at 11:48 PM on August 6, 2013

posted by dismas at 4:57 AM on August 7, 2013 [2 favorites]

I would unambiguously like one of everything, please.
posted by dg at 5:07 AM on August 7, 2013

Yes please!
posted by swift at 6:40 AM on August 7, 2013

Could I please also have a four word chain?
posted by windykites at 7:42 AM on August 7, 2013

I would love a 4-gram listing. Thanks a bunch for all your word-crunching, cortex.
posted by invitapriore at 8:10 AM on August 7, 2013

OK, not to compete with Benito, but here's my enriched/depleted words, based on Fisher test and thresholded at an FDR of 0.1%.

Words I say less often than the avg MeFite:
word	log10 ratio
email	-1.38
folks	-0.82
country	-0.73
simply	-0.66
house	-0.54
us	-0.47
old	-0.38
her	-0.31
he	-0.24
back	-0.23
go	-0.23
never	-0.21
my	-0.12
up	-0.10
your	-0.07
it	-0.07
on	-0.07
was	-0.05
the	-0.05
and	-0.03
you	-0.02
Words I say more often:
word				log10 ratio
is				0.10
this				0.12
more				0.15
pretty				0.21
think				0.24
between				0.27
whether				0.29
specific			0.36
mostly				0.37
effect				0.40
article				0.42
actually			0.44
test				0.46
definitely			0.46
previously			0.55
ultimately			0.56
environment			0.58
impact				0.59
research			0.60
differences			0.60
data				0.60
sexual				0.61
potentially			0.62
cell				0.67
cancer				0.77
jersey				0.80
journal				0.83
grad				0.84
interactions			0.87
mice				0.88
meditation			0.89
factors				0.90
researchers			0.91
protein				0.93
chord				0.93
argh				0.99
lab				0.99
phd				1.06
adhd				1.07
probability			1.09
empirical			1.09
rigorous			1.09
modeling			1.15
genetic				1.16
donors				1.18
biology				1.26
cloning				1.27
midi				1.28
gene				1.29
metabolism			1.32
ime				1.32
econ				1.34
computational			1.36
correlated			1.39
clusters			1.40
genes				1.45
metabolic			1.49
prevalence			1.50
mindfulness			1.52
effeminate			1.54
regression			1.59
postdoc				1.59
msc				1.62
benito				1.72
538				1.76
genome				1.81
fortran				1.83
msm				1.83
lm				1.93
bioinformatics			1.93
ferrous				1.96
fluoxetine			1.96
aspartame			1.97
p1				2.02
h5n1				2.06
sucralose			2.06
fourths				2.07
heritability			2.13
mitochondria			2.14
cosma				2.43
autoclaving			2.66
scipy				2.72
foldit				2.89
clothianidin			2.98
cross-validation		2.98
eigenfactor			3.18
transcriptomics			3.25
variste				3.25     * Almost sure this is Evariste
h0				3.25
filehash			3.31
polygenic			3.37
ill				3.44
metabolomics			3.44
dys4ia				3.79
bee				3.92
sats				3.92
beings				4.22
acts				4.46
association			4.69
popular				4.76
shit				5.20
were				5.99

"Shit" and "were" seem bizarre to me, but hey, I can definitely believe that I talk about metabolomics 1,000x more than the average MeFite, so there we go. I also appear to have some weird-ass writing tics - for one thing, I love my conjunctive phrases (actually, potentially, ultimately) and adverbs in general (pretty, mostly, definitely).

My depleted words read bizarrely like a sprinkling of word salad from an Obama speech.
posted by en forme de poire at 9:25 AM on August 7, 2013

P.S. I'm moving today (can you tell I'm procrastinating loading the van???) but if I get a moment in the next few days I can post an R script to spit that out - though it will lack Cloud Man's personal touch and curation and the refreshing whiff of vowels that follows in his wake.
posted by en forme de poire at 9:34 AM on August 7, 2013 [1 favorite]

shit and were are both way bizarre, yeah. And, really, any non-domain-specific, non-proper-noun word showing up that a multiple orders of magnitude out of whack for anything other than a very very small corpus seems werid, so ill, bee, beings, acts, association and popular are all pretty dang weird too.

Clearly something is up, someone else mentioned an absurdly low shit count upthread in the corpus file I generated. I'll take a look at my file, maybe something went weird on my end.
posted by cortex (staff) at 11:08 AM on August 7, 2013

someone else mentioned an absurdly low shit count

see a doctor
posted by sweetkid at 11:09 AM on August 7, 2013 [4 favorites]

Single name members I have mentioned more than Bush (130)

... more than Obama (93)
129 plep

... less than Obama but more than or the same as Romney (52)
90 nickyskye
85 taz
72 homunculus
56 quonsar
54 languagehat
52 mathowie

...less than mathowie but more than Cheney (31)
51 amberglow
50 hama7
46 iconomy
46 jonson
43 flapjax
40 cortex
39 jessamyn
35 matteo

but I like YOU (3,102) most of all, and hey, WE (1,037) are pretty good too
posted by madamjujujive at 11:25 AM on August 7, 2013 [1 favorite]

Actually, en forme de poire, could you clarify when you have a chance which file specifically you're doing your comparison against? I looked in a couple places but "shit" seems to be well represented so I'm not sure what's what.
posted by cortex (staff) at 11:35 AM on August 7, 2013

Just found this. Add me, please.
posted by rocket88 at 11:40 AM on August 7, 2013

Me too.
posted by ersatz at 11:55 AM on August 7, 2013

"I get that a lot."
posted by Rustic Etruscan at 3:08 PM on August 7, 2013

Benito, please hope me with your word analysis thingamajig!
posted by adrianhon at 3:13 PM on August 7, 2013

I am a bit late to this party, but I'd like a word count too, please!
posted by gubenuj at 4:12 PM on August 7, 2013

I'm a quiet dude lately but yeah. Fling those stats, stat bro.
posted by Splunge at 6:14 PM on August 7, 2013

I'm in.
posted by inturnaround at 5:45 AM on August 8, 2013

I'd like to see mine too - thanks
posted by antonymous at 7:08 AM on August 8, 2013

Alright, here's the next run, counts and basic frequency tables from anotherpanacea through antonymous. Fine the URL for the frequency table in my comment up top and plug in your userid to grab it.
user 36760:	441071 words,	27467 unique, in	4208 comments.
user 94555:	57663 words,	7407 unique, in	1323 comments.
user 125725:	29225 words,	5019 unique, in	417 comments.
user 53775:	168175 words,	17733 unique, in	2076 comments.
user 23593:	79232 words,	10327 unique, in	2679 comments.
user 42726:	24106 words,	4882 unique, in	921 comments.
user 91319:	100554 words,	13185 unique, in	1601 comments.
user 87564:	11871 words,	2330 unique, in	174 comments.
user 66952:	337907 words,	19830 unique, in	5309 comments.
user 138117:	61284 words,	5346 unique, in	379 comments.
user 165178:	4615 words,	1405 unique, in	120 comments.
user 171256:	45873 words,	6684 unique, in	461 comments.
user 17022:	285361 words,	22954 unique, in	5374 comments.
user 18435:	43389 words,	7217 unique, in	750 comments.
user 66440:	109693 words,	10489 unique, in	821 comments.
user 132162:	34696 words,	5703 unique, in	1312 comments.
user 48720:	32980 words,	5860 unique, in	637 comments.
user 67231:	213278 words,	14800 unique, in	3181 comments.
user 14179:	441843 words,	20784 unique, in	7156 comments.
user 11888:	56708 words,	9455 unique, in	2467 comments.
user 17766:	117355 words,	13164 unique, in	2756 comments.
user 62962:	157349 words,	17765 unique, in	2964 comments.
user 94190:	28002 words,	4601 unique, in	273 comments.
user 94835:	184535 words,	16782 unique, in	4472 comments.
user 48584:	134274 words,	10648 unique, in	2255 comments.
user 94555:	57663 words,	7407 unique, in	1323 comments.
user 125725:	29225 words,	5019 unique, in	417 comments.
user 53775:	168175 words,	17733 unique, in	2076 comments.
user 23593:	79232 words,	10327 unique, in	2679 comments.
user 42726:	24106 words,	4882 unique, in	921 comments.
user 91319:	100554 words,	13185 unique, in	1601 comments.
user 87564:	11871 words,	2330 unique, in	174 comments.
user 66952:	337907 words,	19830 unique, in	5309 comments.
user 138117:	61284 words,	5346 unique, in	379 comments.
user 165178:	4615 words,	1405 unique, in	120 comments.
user 171256:	45873 words,	6684 unique, in	461 comments.
user 17022:	285361 words,	22954 unique, in	5374 comments.
user 18435:	43389 words,	7217 unique, in	750 comments.
user 66440:	109693 words,	10489 unique, in	821 comments.
user 132162:	34696 words,	5703 unique, in	1312 comments.
user 48720:	32980 words,	5860 unique, in	637 comments.
user 67231:	213278 words,	14800 unique, in	3181 comments.
user 14179:	441843 words,	20784 unique, in	7156 comments.
user 11888:	56708 words,	9455 unique, in	2467 comments.
user 17766:	117355 words,	13164 unique, in	2756 comments.
user 62962:	157349 words,	17765 unique, in	2964 comments.
user 94190:	28002 words,	4601 unique, in	273 comments.
user 94835:	184535 words,	16782 unique, in	4472 comments.
user 48584:	134274 words,	10648 unique, in	2255 comments.
user 70792:	40188 words,	6442 unique, in	415 comments.
posted by cortex (staff) at 8:23 AM on August 8, 2013

And if I could tell (or thought I could) that you were asking for a special 4-gram frequency table as well after that all came up, you're on this list:


And you can find that file with just a slightly different url than the normal one:

posted by cortex (staff) at 8:27 AM on August 8, 2013

Also, I'm not gonna share the 4-gram table for the entire site for all time because uncompressed it is 1.3 fuckin' gigabytes, but I could make a cropped version of that available directly if any of you dedicated datawankers specifically want to try and do some 4-gram vs. site-baseline comparisons in the spirit of the existing wordcloud / comparative frequency stuff. I would guess it'd mostly be weird and noisy but, hey, I've never tried!
posted by cortex (staff) at 8:30 AM on August 8, 2013

TIL: I'm unreasonably fond of hyphenated expressions which sound like the Mad Libs version of an epic house party.

"Showing-up-with-a-backpack-which-you-promptly-stash-somewhere," "sitting-to-pee," "taking-off-clothing," "praise-and-treats-and-cuddles," "enough-about-me-lets-talk-about-you," "manner-of-hopping-around-with-bells-on-my-shoes," "tremolo-bar-wiggling," "innocent-lesbian-slumber-party," "fried-in-lard-in-a-big-copper-pot," "fucking-all-night-without-coming," "stoned-dude-with-a-thesaurus," "urinal-spitter," "young-man-about-town," "vaguely-unsuitable-seeming," "wake-up-sheeple"....
posted by Now there are two. There are two _______. at 8:38 AM on August 8, 2013 [1 favorite]

I think I got missed in the last batch run (possibly because Jacqueline and I asked just a couple of comments apart), so asking again. Thanks!
posted by jacquilynne at 8:41 AM on August 8, 2013

The top of that site-wide-for-all-time 4-gram list is very boring in the way you'd expect it to be from looking at any of this stuff previously; it's all really common strings of simple words, the same conversational glue that I referenced upthread when talking about the 4-gram stuff previously. A sampling:
4046    95.4296621476264        on the other hand
3374    79.5797528635916        the rest of the
3193    75.3106552736953        a lot of people
2765    65.215772574935 if you want to
2525    59.5550906877797        the end of the
2429    57.2908179329176        at the same time
2423    57.1493008857387        is one of the
2380    56.1350953809567        to be able to
2368    55.8520612865989        a lot of the
2215    52.2433765835374        at the end of
2116    49.9083453050859        in the first place
2046    48.2573130879989        when it comes to
1919    45.2618689227126        as far as i
1897    44.7429730830567        is going to be
1888    44.5306975122883        in the middle of
1854    43.7287675782747        nothing to do with
1768    41.7003565687107        one of the most
1738    40.9927713328163        i don't want to
1694    39.9549796535045        i have no idea
1671    39.4124976393188        i don't know if
1642    38.7284985779542        when i was a
1609    37.9501548184703        but i don't think
1581    37.2897419316355        in a way that
1545    36.4406396485622        the fact that the
1522    35.8981576343765        to do with the
1478    34.8603659550647        i don't know what
1376    32.4545761530237        a bit of a
1360    32.0771973605467        i thought it was
1286    30.3318204453405        there are a lot
1285    30.3082342708107        for the first time
1273    30.0252001764529        in the u s
1256    29.6242352094461        are a lot of
1247    29.4119596386777        i think this is
1243    29.3176149405585        it seems to me
1202    28.3505817848361        a few years ago
So nothing really interesting to chew on by itself; all we learn is that people speak a common language with common syntax and very predictable fixed phrases for accomplishing simple, everyday discursive tasks.

Although it is a little interesting that the top item, "on the other hand", is a figurative phrase. "In the first place" is also right up there. These are idiomatic expressions so ingrained into our culture of speaking that they likely don't even register as figurative most of the time when we use them; unless someone's using it to poetic effect or making a jokey play on it ("on the third hand...", etc) it just glides by without even a sense of metaphor. It's just what you say, right?

There's plenty of common repetition of emphatic stuff: here's a filtered sample of the top 4-grams that contain the substring "fuck":
201     4.74082108049256        what the fuck is
168     3.96247732100871        the fuck out of
105     2.47654832563044        shut the fuck up
75      1.76896308973603        don't give a fuck
70      1.65103221708696        the fuck are you
68      1.60385986802733        know what the fuck
65      1.53310134443789        give a fuck about
61      1.43875664631864        are you fucking kidding
60      1.41517047178882        what the fuck are
53      1.25006725008013        the fuck is wrong
52      1.22648107555031        fuck is wrong with
52      1.22648107555031        get the fuck out
51      1.2028949010205 the fuck do you
48      1.13213637743106        you fucking kidding me
44      1.0377916793118 oh for fuck's sake
42      0.990619330252177       the fuck is this
40      0.94344698119255        give a flying fuck
And likewise "shit":
449     10.5901923638864        the shit out of
288     6.79281826458636        give a shit about
195     4.59930403331368        don't give a shit
84      1.98123866050435        shit out of me
74      1.74537691520622        doesn't give a shit
70      1.65103221708696        a shit about the
59      1.39158429725901        gives a shit about
55      1.29723959913976        a piece of shit
51      1.2028949010205 who gives a shit
46      1.08496402837143        a lot of shit
46      1.08496402837143        this kind of shit
45      1.06137785384162        a shit ton of
43      1.01420550478199        to give a shit
41      0.967033155722363       give a shit what
40      0.94344698119255        a shitty thing to
If you want to wade through those subsets more, I've bounced them out to a couple of ~3MB files: fuck 4-gram, shit 4-gram.
posted by cortex (staff) at 8:47 AM on August 8, 2013 [3 favorites]

28 88.4217972361872 nope nope nope nope
17 53.6846626076851 no no no no
10 31.5792132986383 na na na na (pretty sure this is katamari-related)

1 3.15792132986383 all shout butts simultaneously

1 3.15792132986383 zoomy nyan rainbows whooshing
posted by elizardbits at 8:54 AM on August 8, 2013 [3 favorites]

3 16.6101920138197 license to kill with

There's only one possible word that comes after this 4-gram.

Some of my favorites with count greater than one:

2 11.0734613425465 fuck yeah look at
2 11.0734613425465 furthermore that that subset
[god I'm a blowhard]
2 11.0734613425465 i agree with everybody

And then the weird, weird tail:

1 5.53673067127323 zip ties get fucked
1 5.53673067127323 zoophilia is morally acceptable
1 5.53673067127323 zorn related comets on

posted by invitapriore at 9:18 AM on August 8, 2013

Interesting how far down the shit list you have to go to get a plausible literal reference to feces.

(The highest one I see, just skimming, is "solvent for shit stuck" — which I guess still could be shit-as-in-a-generic-term-for-"matter" but is at least plausibly interpretable as referring to Liquid Plumber or something like that.)
posted by Now there are two. There are two _______. at 9:19 AM on August 8, 2013

posted by windykites at 9:23 AM on August 8, 2013

Could I get a four-gram too?
posted by notsnot at 10:30 AM on August 8, 2013

Interesting how far down the shit list you have to go to get a plausible literal reference to feces
Hmmm. A shit hit list would likely refer to the word "fan," no?
posted by Namlit at 12:05 PM on August 8, 2013

Heh, "on the other hand" was number 2 for me.

Also, what the crap?

20 31.6357585305823 BLANK newtgingrich com BLANK
16 25.3086068244658 com BLANK newtgingrich com
14 22.1450309714076 newtgingrich com BLANK newtgingrich

Has Newt stepped his game up, injecting links to his site into Metafilter itself?
posted by Marisa Stole the Precious Thing at 1:18 PM on August 8, 2013

An inewtculation, so to speak.
posted by Marisa Stole the Precious Thing at 1:24 PM on August 8, 2013

Thank you so much cortex, this is pretty neat.
posted by Shouraku at 1:58 PM on August 8, 2013

not sure if I'm late but yes please
posted by leopard at 2:35 PM on August 8, 2013

Also, what the crap?

Here you go, you silly son of a bitch. Note that the parsing process for the word counts treats each comment as a unit, and iterates through each 4-gram in order; so, tokens 1-4, tokens 2-5, tokens 3-6, etc. And it nixes punctuation and whitespace, so the dot in newtgingrich.com gets converted to white space and newtgingrich and com become two adjacent tokens, and the html and carriage return that separate each instance get converted into a token that I'm incorrectly leaving as a "nothing useful here!" BLANK placeholder instead of properly stripping, and successive lines/paragraphs in a comment flow one into the other.

So if I fix the BLANK bug, that'd actually count two lines of newtgingrich.com in a row as "newtgingrich com newtgingrich com" instead of the BLANKified version you're currently seeing.
posted by cortex (staff) at 4:31 PM on August 8, 2013 [1 favorite]

Of course! Newt Gingrich browser history strikes again.
posted by Marisa Stole the Precious Thing at 7:06 PM on August 8, 2013

Oh damn can I please get a 4-gram too?
posted by griphus at 7:12 PM on August 8, 2013

Cortex, I do remember participating in one thread about colony collapse so the "bee" one is probably legit... but the others do look hinky, I agree.
posted by en forme de poire at 9:27 PM on August 8, 2013

Thanks for the 1-gram, Cortex! Can I get a 4-gram too, please?
posted by oceanjesse at 11:41 PM on August 8, 2013

I'm interested.
posted by Renoroc at 4:56 AM on August 9, 2013

I, too, would like more grams, please. 3 and 4, if you can.
posted by Going To Maine at 6:35 AM on August 9, 2013

I'd love 3 and 4 grams, and a word cloud. Thanks!
posted by SisterHavana at 9:14 AM on August 9, 2013

I would love to get some word stats, thanks. :)
posted by epersonae at 2:21 PM on August 9, 2013

Yes, me too.
posted by jadepearl at 6:22 PM on August 12, 2013

Hey guys, I cleaned up the script I was using a little and made it (somewhat) suitable for public consumption. Basically, here are the steps for a DIY corpus analysis:
  • Download your individual corpus (http://stuff.metafilter.com/corpus/freq/temp/XXXX--1-gram--allsites--1999-01-01--2014-01-01.txt, where XXXX is your user number)
  • Download and unzip the master MeFi corpus ("allsites--1999-01-01--2013-01-01.txt.zip" from here
  • Download and install R, if you don't already have an installation (it is cross-platform and open source!)
  • Download this script, changing the path to the corpus files as necessary
  • Run the script and wait (possibly a long time depending on your computer; R will not appear to be doing anything during this time unless you look at CPU consumption; probably could have put in a progress bar or something but I am highly lazy)
  • Output is saved in "significant-words.csv", which you can open in any spreadsheet program, plus a frequency plot in "word-freq-plot.pdf" which shows the corpus frequencies vs. your own frequencies with the significant ones highlighted and labeled (may be kind of a busy plot, sorry 'bout it)
Example graphing output here. Have at it!

(Note those weirdo points on the far-left - possibly the same bug in the MeFi-wide corpus?)
posted by en forme de poire at 2:24 PM on August 13, 2013 [5 favorites]

Awesome! Thank you for sharing!
posted by Now there are two. There are two _______. at 3:52 PM on August 13, 2013

No problem!
posted by en forme de poire at 5:57 PM on August 13, 2013

« Older Mobile only content   |   MeFiSwap 2013-2 - THE SWAP BATTLE Newer »

You are not logged in, either login or create an account to post comments