Words words words July 21, 2011 1:52 PM   Subscribe

This comment about getting past one's first million words got me thinking: is there a way to find out exactly how many words I've written on MeFi?
posted by griphus to MetaFilter-Related at 1:52 PM (286 comments total) 5 users marked this as a favorite

You can ask me nicely, and I can run a script I wrote to help generate the Mefi Frequency Tables and let you know.

For the purposes of this discussion, saying "oh, me, me" in this thread is fine if you're okay with me dropping the word count in here in a comment.

We don't have any self-serve way to do a word count other than just grabbing your comment dump (see "Export Your Comments" at the bottom of the Preferences screen) and feeding that through a word count utility of some sort to get a (somewhat inflated by file format stuff) approximation.
posted by cortex (staff) at 1:56 PM on July 21, 2011 [1 favorite]


Okay, "oh, me, me!"
posted by hermitosis at 1:58 PM on July 21, 2011


Oh, me, me!
posted by jessamyn (staff) at 1:58 PM on July 21, 2011


Oh, me, me!
posted by Faint of Butt at 1:59 PM on July 21, 2011


Oh, me, me!

(I also said it aloud, in case that helps)
posted by shakespeherian at 2:00 PM on July 21, 2011 [1 favorite]


This is good timing, too; I've been tweaking my original frequency table script to make it easier to run some specific numbers for iamkimiam, so doing this is now slightly less of a pain than it would have been a week ago.

I'll give this a bit for people to chime in before I run the first round of numbers, so that I can do them in batches instead of one at a time. I can also provide anyone who specifically wants it a temporary download link to an actual frequency table of their word use on the site over time, akin to the global frequency tables at the link above; it's pretty dry stuff unless you're specifically interested in corpus linguistics, but if that's your particular kink then this is your lucky day.
posted by cortex (staff) at 2:02 PM on July 21, 2011


Also if I keep posting comments does that make your job harder?
posted by shakespeherian at 2:02 PM on July 21, 2011 [2 favorites]


Oh, me, ME!
posted by Conrad Cornelius o'Donald o'Dell at 2:02 PM on July 21, 2011


Cortex, darling, could you ever be so kind and run that marvellous little script on ...oh, I'm sorry, for me, me, me?
posted by griphus at 2:02 PM on July 21, 2011


I'm not really sure I want to know.

Oh, OK. Count me (in).
posted by mr_crash_davis at 2:02 PM on July 21, 2011


Ok... I want to know... though I expect to be alternately shocked at how little it is or horrified at how much it is (and not used more "usefully").
posted by Jahaza at 2:03 PM on July 21, 2011


Oh, me too! And I'd love that frequency table, if you wouldn't mind.
posted by reductiondesign at 2:03 PM on July 21, 2011


Oh, me, me! Pretty please. With a cherry on top!
posted by ericb at 2:03 PM on July 21, 2011


"oh, me, me" in this thread
posted by grouse at 2:04 PM on July 21, 2011


I think it would be fun to have both my word count and my frequency table, please.
posted by not that girl at 2:04 PM on July 21, 2011


Oh, moi, moi, monsieur!
posted by desjardins at 2:06 PM on July 21, 2011 [1 favorite]


Me! Me!

Please?
posted by kbanas at 2:06 PM on July 21, 2011


meeeeeeeeeee me me me

wild guess: most frequently used word will be either "awesome" or "dude"
posted by elizardbits at 2:07 PM on July 21, 2011


Oh, me, me! With a Markov Cloud, too. And some Magic Shell. That stuff is awesome.
posted by norm at 2:09 PM on July 21, 2011


Do me.
posted by not_on_display at 2:11 PM on July 21, 2011


Me (Everything I Am Is Me).
posted by Melismata at 2:13 PM on July 21, 2011


Oh, shoot, I would also love a frequency table!
posted by griphus at 2:15 PM on July 21, 2011


Me, me, please!
posted by peep at 2:15 PM on July 21, 2011


Me too, please!
posted by Brandon Blatcher at 2:17 PM on July 21, 2011


417. Don't ask me how I know. please
posted by blue_beetle at 2:18 PM on July 21, 2011


Or, if you're using a script "oh, me, me".
posted by Brandon Blatcher at 2:19 PM on July 21, 2011


me me me

posted by delmoi at 11.20PM on July 21 [+] [!]

























posted by adamvasco at 2:19 PM on July 21, 2011 [1 favorite]


Oh, me, me!
posted by Skorgu at 2:19 PM on July 21, 2011


I can ballpark mine as over 100K.

Beyond that, let's just say "a lot".
posted by Trurl at 2:19 PM on July 21, 2011


Moi?

/Miss Piggy
posted by brundlefly at 2:21 PM on July 21, 2011


!em ,em ,hO
posted by subbes at 2:23 PM on July 21, 2011


This is kind of like typing /played in World of Warcraft.
posted by kbanas at 2:23 PM on July 21, 2011 [1 favorite]


PS, don't ever type /played in World of Warcraft.
posted by kbanas at 2:24 PM on July 21, 2011 [3 favorites]


Besides, it's not the size; it's what you do with it.
posted by Trurl at 2:26 PM on July 21, 2011


"Slowly get fired."
posted by griphus at 2:27 PM on July 21, 2011


I can ballpark mine as over 100K.

But Trurl, you've only just joined!
posted by shakespeherian at 2:29 PM on July 21, 2011 [3 favorites]


wait, file format stuff? what is this business?

I don't remember what all we included in the Export file to separate or mark up comment records, so I was speaking generally.
posted by cortex (staff) at 2:29 PM on July 21, 2011


Oooh, me! (frequency would be neato, too.)
posted by epersonae at 2:29 PM on July 21, 2011


Oh, sure, what the hell. Me Me! Oh!
posted by hippybear at 2:30 PM on July 21, 2011


Oh, them, them!
posted by katillathehun at 2:32 PM on July 21, 2011


Oooh! Me! ME!
posted by jacquilynne at 2:32 PM on July 21, 2011


Oh, me, me!
posted by pjern at 2:33 PM on July 21, 2011


Me! Me! Please.
posted by mygothlaundry at 2:34 PM on July 21, 2011


Oh, me, me!
posted by languagehat at 2:36 PM on July 21, 2011


Oh, me, me, me!
posted by 0xFCAF at 2:42 PM on July 21, 2011


Oh, me, me too please!
posted by dolface at 2:42 PM on July 21, 2011


Oh, me, me!
posted by dirtdirt at 2:42 PM on July 21, 2011


I'm torn between downloading my comments and "wc -w"-ing the file, or Oh, me, Me!-ing.

But since I suspect that in my dumb, I'll do something wrong and parse the export incorrectly, I'll rely on the kindness of cortex and

Oh, me! Me! along with everyone else.
posted by quin at 2:43 PM on July 21, 2011


Me too, please please please! Not that I have even broken 1000, probably.
posted by digitalprimate at 2:45 PM on July 21, 2011


But Trurl, you've only just joined!

Speaking of which, there used to be a site that indexed Openings Of Yap against time registered - but I lost track of it. Is that still around?
posted by Trurl at 2:47 PM on July 21, 2011


I would enjoy having this information.
posted by Bulgaroktonos at 2:48 PM on July 21, 2011


Don't you dare.
500,000 inchoate words will ruin the denobulating flux buffers.
posted by clavdivs at 2:50 PM on July 21, 2011 [4 favorites]


Oh, me, me! (in Italian, which is written the same way, but pronounce differently).
posted by francesca too at 2:53 PM on July 21, 2011


pronounce. My spelling has been abysmal, lately.
posted by francesca too at 2:54 PM on July 21, 2011


Me too!
posted by dg at 2:54 PM on July 21, 2011


oui oui me me!
posted by functionequalsform at 2:56 PM on July 21, 2011


pronounce pronounced. (And my brain sliding rapidly into senility.)
posted by francesca too at 2:57 PM on July 21, 2011


I did a word count on my exported comments and it came out to 191,057 words.

I didn't do any correction for the fluff/links/dates. It would be nice if someone could toss out a rough-but-reality based "subtract 10%" so I wouldn't have to make a regex.
posted by fake at 2:57 PM on July 21, 2011


Okay, here's the first round, by userid in classic test-scores-posted-publicly fashion, in basically the order in which people asked in the thread here, up through languagehat's request:
user 49346: 231910 words, 19863 unique, in 4501 comments.
user 7418: 1919970 words, 66849 unique, in 29898 comments.
user 24933: 493524 words, 25867 unique, in 7703 comments.
user 292: 2498714 words, 55835 unique, in 25335 comments.
user 18128: 218018 words, 21804 unique, in 6633 comments.
user 37801: 324292 words, 23962 unique, in 8461 comments.
user 91713: 69546 words, 9128 unique, in 852 comments.
user 11817: 282093 words, 24925 unique, in 10812 comments.
user 41365: 112226 words, 12750 unique, in 1386 comments.
user 74788: 15799 words, 3592 unique, in 534 comments.
user 19102: 1321617 words, 56218 unique, in 23403 comments.
user 17563: 528631 words, 29543 unique, in 10807 comments.
user 37949: 265601 words, 15627 unique, in 1926 comments.
user 48794: 517549 words, 24218 unique, in 8973 comments.
user 15122: 67642 words, 7874 unique, in 1301 comments.
user 71074: 78810 words, 11783 unique, in 1900 comments.
user 1559: 166580 words, 16788 unique, in 2570 comments.
user 61170: 148577 words, 18498 unique, in 5322 comments.
user 65114: 41154 words, 6162 unique, in 942 comments.
user 18122: 96441 words, 10561 unique, in 1660 comments.
user 17675: 810687 words, 29682 unique, in 13918 comments.
user 15556: 220070 words, 19411 unique, in 8669 comments.
user 15797: 152416 words, 16514 unique, in 2335 comments.
user 17897: 177038 words, 18163 unique, in 6546 comments.
user 62806: 20564 words, 4910 unique, in 615 comments.
user 10352: 94095 words, 10193 unique, in 1142 comments.
user 89363: 726821 words, 32971 unique, in 9235 comments.
user 42498: 130836 words, 12312 unique, in 2519 comments.
user 14786: 392114 words, 19328 unique, in 4623 comments.
user 19109: 59646 words, 8969 unique, in 1259 comments.
user 17735: 242974 words, 15816 unique, in 2039 comments.
user 14752: 1663668 words, 65831 unique, in 21626 comments.
If I missed anybody above languagehat, sorry, let me know. If I couldn't tell your comment was in fact a request to have this posted, I skipped it just to be safe. If you have alternate accounts that you want run as well, let me know explicitly; feel free to do so privately if you prefer.

adamvasco, you have said "delmoi" six times.

For those who requested a frequency table, I will send you a form letter in mefimail letting you know where to find it. If you want more info on how the tables are generated and formatted, please take a look at the wiki page for the Corpus project.

I will run another batch later for folks who spoke up after languagehat.

Some word-nerd caveats:

- word count is based on strings of letters, numbers, and connecting punctuation (single hyphens, internal apostrophes for contractions) after stripping out all html tags and replacing most punctuation/symbols with white space. It's a pretty solid way to count words, but it does mean that occasionally you get incorrectly mushed-together strings (and hence a little bit of undercounting of total raw words typed) and also long-strings-of-hyphenated-words get counted as single words as well (and hence a bit more undercounting of same)

- unique strings include distinct words as well as distinct forms of words (e.g. "farm" and "farms" and "farming" are all counted as distinct), typos, and hyphenated compound strings. So it's a bigger number than any really responsible interpretation of "number of different words you have ever typed into mefi". Perusing the bottom half of your frequency table for the word tokens you've only used once is a good way to get an idea of how much of this stuff is real words vs. oddities, if you're curious.

- comments that this is based on come from the big three subsites (blue, green, grey) as well as Music, for canonical consistency (this is how the infodump works) and because the other subsites are or historically were misfits (no comments on jobs, normal comments on Projects are relatively recent, IRL is more recent still), so if you talk a lot on those other sites your count here will be a bit small.
posted by cortex (staff) at 2:57 PM on July 21, 2011 [5 favorites]


Looks like you've opened a bit of a Pandora's box here for yourself, cortex. It might end up faster to write a self-service script.

PS "oh, me, me"
posted by jedicus at 2:58 PM on July 21, 2011


I usually miss the boat on these hand wavey things. I want one! what's my number?
posted by bilabial at 2:59 PM on July 21, 2011


Speaking of which, there used to be a site that indexed Openings Of Yap against time registered - but I lost track of it. Is that still around?

Downloading my comments just now (didn't know you could do that) led me to remember that I was registered for two years before leaving a comment, which I'd completely forgotten. And there were months between my first 10 comments or so. Silent, fool, speak, remove doubt, etc.

Me and my approximately 190K words (which I'm sure is nothing around here) would be interested in the aforementioned site as well.
posted by MCMikeNamara at 3:00 PM on July 21, 2011


Yeah, despite the fact that I started here in 2002, my commenting really didn't hit its stride until about 2006. It's always weird to me to look through my history and see that in 2003 there were months that I didn't say a thing. Months.

Crazy.

Once I found my voice, I just couldn't bring myself to shut up, no matter how many people I bored to tears.
posted by quin at 3:08 PM on July 21, 2011


what's my number?
29460 - hover over your user name to see it at the end of the URL displayed.
posted by dg at 3:08 PM on July 21, 2011


hot wordy action

Can I nominate this as a potential T-shirt? Because, yeah. It should be on my chest.
posted by quin at 3:09 PM on July 21, 2011 [2 favorites]


Oh, me, me!

(do those count?)

Thanks, Cortex!
posted by Sparx at 3:09 PM on July 21, 2011


...signifying nothing.
posted by crunchland at 3:10 PM on July 21, 2011


Oh, me, me!
posted by OmieWise at 3:10 PM on July 21, 2011


I'd be interested to see mine as well, thanks.
posted by yerfatma at 3:10 PM on July 21, 2011


Yes, please! And a frequency table, too, if you're not already overwhelmed with requests by the time I finish tapping this comment out. (and thanks!)
posted by Rhaomi at 3:13 PM on July 21, 2011


I would like to see my tally though I will either be depressed it's so little or depressed it's too much or depressed that it's what I expected. But I promise to not be depressed with the number.
posted by maxwelton at 3:14 PM on July 21, 2011


3/4 of a million words. That's outstanding. Thanks cortex. I'm both thrilled and ashamed.
posted by hippybear at 3:15 PM on July 21, 2011


oh, me, me!
Plus, freq link too please & thank you.
posted by safetyfork at 3:17 PM on July 21, 2011


Oh, me, me!
posted by heyho at 3:18 PM on July 21, 2011


Great, now I won't be able to sleep. How do you spell... damn..
posted by Namlit at 3:21 PM on July 21, 2011


if that's your particular kink

Oh! Me! MEEE!!!


But seriously, can you run my word count?

posted by slogger at 3:23 PM on July 21, 2011


languagehat got the most unique words by using multiple languages which is straight-up cheating.
posted by GuyZero at 3:25 PM on July 21, 2011 [2 favorites]


Oh, me, me!

I would also like a frequency table, please. (Will it continue to work or is it a static dump?)
posted by troll at 3:27 PM on July 21, 2011


I wonder to what extent "stop" and "cut it out" and "take it to MeTa" and "grar" are holding back the mods' total/unique words ratios.
posted by mintcake! at 3:29 PM on July 21, 2011


oooh, me, me!
posted by Salvor Hardin at 3:32 PM on July 21, 2011


Ha. The frequency of 'me' has just spiked bigtime.
posted by iamkimiam at 3:36 PM on July 21, 2011 [2 favorites]


Ah, *sigh*, that's it:
Mew mew mew. (looked it up in my dictionary)
posted by Namlit at 3:46 PM on July 21, 2011


Oh, and do these tables exclude any previous postings of the Treaty of Westfalia?
posted by iamkimiam at 3:46 PM on July 21, 2011


Oh, me, me!
posted by roll truck roll at 3:51 PM on July 21, 2011


Oh, me, me!


(If you would be so kind.)
posted by Marisa Stole the Precious Thing at 4:01 PM on July 21, 2011


For the purposes of this discussion, saying "oh, me, me" in this thread is fine if you're okay with me dropping the word count in here in a comment.

We don't have any self-serve way to do a word count other than just grabbing your comment dump (see "Export Your Comments" at the bottom of the Preferences screen) and feeding that through a word count utility of some sort to get a (somewhat inflated by file format stuff) approximation.


Does your script make any attempt to account for quoting, such as the chunk above?

For that matter is there any way to tell the average number of comments that include the <i> tag?
posted by Meta Filter at 4:02 PM on July 21, 2011


Oh, me, me!
posted by nevercalm at 4:02 PM on July 21, 2011


<fauxvictorianmoppet>Oh, me, me, kind sir!</fauxvictorianmoppet>
posted by Kattullus at 4:08 PM on July 21, 2011


Can I have mine done as a word cloud?
posted by UbuRoivas at 4:09 PM on July 21, 2011


Oh, me, me!
posted by Splunge at 4:10 PM on July 21, 2011


For example, I can see at least one user who has 6553 words total, 2357 of which are in italics.
posted by Meta Filter at 4:12 PM on July 21, 2011


I, me, mine! I, me, mine! I, me, mine!
posted by flapjax at midnite at 4:17 PM on July 21, 2011 [1 favorite]


OMIMI
posted by DU at 4:18 PM on July 21, 2011


Does your script make any attempt to account for quoting, such as the chunk above?

It does not, and that's by far the most significant whammy factor in terms of overcounting. See the wiki page linked above for more details on what it does and does not do in terms of parsing and filtering.

For that matter is there any way to tell the average number of comments that include the <i> tag?

Not with this system; I strip html out completely.

there used to be a site that indexed Openings Of Yap against time registered - but I lost track of it. Is that still around?

I believe you are thinking of the Contribution Index. It has had its ups and downs but appears to be updating.
posted by cortex (staff) at 4:20 PM on July 21, 2011


Me, please!
posted by katemonster at 4:22 PM on July 21, 2011


Pick me, pick me! Please.
posted by misha at 4:26 PM on July 21, 2011


Second batch, up through misha:
user 38102: 75981 words, 9615 unique, in 1129 comments.
user 14213: 15983 words, 3900 unique, in 406 comments.
user 23431: 124331 words, 13552 unique, in 2122 comments.
user 15055: 899068 words, 37203 unique, in 12415 comments.
user 25620: 34373 words, 6054 unique, in 518 comments.
user 24132: 96907 words, 8833 unique, in 901 comments.
user 56932: 43373 words, 7163 unique, in 756 comments.
user 14179: 353957 words, 18618 unique, in 5870 comments.
user 94627: 28158 words, 5305 unique, in 469 comments.
user 18169: 136074 words, 13061 unique, in 1801 comments.
user 27428: 437654 words, 27137 unique, in 4421 comments.
user 36852: 298257 words, 25233 unique, in 4953 comments.
user 29460: 173996 words, 12508 unique, in 1417 comments.
user 18234: 178529 words, 19465 unique, in 2769 comments.
user 19466: 616111 words, 28663 unique, in 9134 comments.
user 10827: 204987 words, 17614 unique, in 4132 comments.
user 41040: 237083 words, 19422 unique, in 4549 comments.
user 18781: 41973 words, 7032 unique, in 743 comments.
user 58356: 48953 words, 7063 unique, in 1019 comments.
user 93543: 100616 words, 10625 unique, in 1560 comments.
user 23588: 38423 words, 7235 unique, in 1316 comments.
user 133027: 4668 words, 1908 unique, in 83 comments.
user 81495: 110388 words, 13085 unique, in 1891 comments.
user 47281: 126860 words, 12246 unique, in 2574 comments.
user 71801: 486735 words, 30063 unique, in 6897 comments.
user 28493: 90814 words, 10700 unique, in 1688 comments.
user 16000: 457279 words, 33522 unique, in 7236 comments.
user 18403: 844358 words, 45727 unique, in 14694 comments.
user 94835: 119035 words, 12965 unique, in 2597 comments.
user 39010: 605373 words, 33806 unique, in 14756 comments.
user 49143: 691386 words, 36826 unique, in 18002 comments.
user 43534: 81544 words, 9949 unique, in 822 comments.
user 53581: 474271 words, 25241 unique, in 4867 comments.
posted by cortex (staff) at 4:31 PM on July 21, 2011 [4 favorites]


Damn it! I'm not at a million yet? Well, expect some very long screeds to follow..
posted by quin at 4:35 PM on July 21, 2011


too late to say 'ooh, me too?'
posted by kaibutsu at 4:38 PM on July 21, 2011


67,642 words.

That's a lot of them.
posted by kbanas at 4:39 PM on July 21, 2011


Oh, me, me!

Sorry, busy day at work. Understand if the answer is " too slow."
posted by Devils Rancher at 4:41 PM on July 21, 2011


too late to say 'ooh, me too?'

Nope, I'll happily run more. Maybe tonight, maybe tomorrow.
posted by cortex (staff) at 4:42 PM on July 21, 2011


Thanks!
posted by Salvor Hardin at 4:43 PM on July 21, 2011


Hi y'all! I had a brief email exchange with cortex and he was ok with me posting this request. This is a very topical post since for the last two days I've been painstakingly emailing MeFites one-at-a-time to see if they'd like to participate in my next phase of my MetaFilter PhD research, comparing how frequently words that are similar to “MeFi” and “MeFite” occur in text environments (like MetaFilter, Wikipedia and Usenet) and in spoken environments (using the Corpus of Contemporary American English and other speech databases).

If any of you out there would like to be a part of this research phase by allowing cortex to generate the word frequency/count table for me, please send me a MeFi mail (or an email; it's in the profile) saying "you, you, you!" or whatever you wish (heck, tell me a joke, I could use a good laugh today) and I will send you more info and the link to a short consent form. I'd sure appreciate it, as it would greatly help my research along. Thanks a bunch!

And huge thanks to everybody who's already participated so far and especially to cortex, who's been awesomely crunching data for me all week!
posted by iamkimiam at 4:44 PM on July 21, 2011


Could you used this to find out who makes the longest average comments, and the shortest? And perhaps the most frequent user of a particular word?
posted by Jehan at 4:48 PM on July 21, 2011


Of the people whose stats have been posted so far, the most verbose user (i.e. the person with the highest words:comments ratio) is 37949 with 137.9 words per comment. The most concise user is 15556 with 25.4 words per comment. The median is 58.2.
posted by jedicus at 4:48 PM on July 21, 2011


Oh, I forgot to add ... if you send me a joke, I'll send you one back!

Disclaimer: This is in no way considered an 'incentive to participate'. The joke may not even be funny. In the event that it makes you frown, this would not be considered a 'participation risk'. HI ETHICS COMMITTEE!!!
posted by iamkimiam at 4:49 PM on July 21, 2011

I'd be interested to see mine as well, thanks.
posted by yerfatma at 5:10 PM on July 21 [+] [!]

Yes, please! And a frequency table, too, if you're not already overwhelmed with requests by the time I finish tapping this comment out. (and thanks!)
posted by Rhaomi at 5:13 PM on July 21 [+] [!]

I would like to see my tally though I will either be depressed it's so little or depressed it's too much or depressed that it's what I expected. But I promise to not be depressed with the number.
posted by maxwelton at 5:14 PM on July 21 [+] [!]

[...]

user 10827: 204987 words, 17614 unique, in 4132 comments.
user 41040: 237083 words, 19422 unique, in 4549 comments.

:(
posted by Rhaomi at 4:51 PM on July 21, 2011


Oh, me me! Please!

And that table thingie too, pls? :)
posted by zarq at 4:58 PM on July 21, 2011


By posting this, I'm throwing it off, but I was somewhat amused to see that I had 1142 comments.
posted by epersonae at 4:58 PM on July 21, 2011


Thanks for your efforts, cortex, you're a champ.
posted by nevercalm at 4:59 PM on July 21, 2011


Is it to late to get my total?

Please, oh please, me me me.
posted by oddman at 5:06 PM on July 21, 2011


This is my 3001st comment to Meta. Huh.
posted by zarq at 5:09 PM on July 21, 2011


Last time cortex did this for me, I think I was up around 600,000, and that was a few years back. I don't write as much here as I used to, but I'd be keen to see if I'm over a million yet. Thanks, man!
posted by stavrosthewonderchicken at 5:11 PM on July 21, 2011


For the purposes of this discussion, saying "oh, me, me" in this thread is fine if you're okay with me dropping the word count in here in a comment.

Just to foil this, I'm considering writing my future comments as one long word.
posted by jonmc at 5:11 PM on July 21, 2011 [1 favorite]


cortex: I believe you are thinking of the Contribution Index. It has had its ups and downs but appears to be updating.

Holy shit! I'm in the top 10 for most posts to the Blue! I think the last time I looked at this I was number 20 or something like that and the idea of getting into the top 10 seemed years off... maybe it was years ago. I don't know whether to be proud or aghast, to be honest :)
posted by Kattullus at 5:13 PM on July 21, 2011


Me too, please!

I will be sad if I have the lowest unique words/total words ratio.
posted by Metroid Baby at 5:15 PM on July 21, 2011


Oh cool. Me, please!
posted by phunniemee at 5:17 PM on July 21, 2011


oh, me me
posted by plinth at 5:18 PM on July 21, 2011


Me please - I'm curious to see whether my deliberate journaling efforts can match up to my MetaFilter contributions. And whether adding my 2010 NaNoWriMo wordcount will actually make any difference.
posted by SMPA at 5:21 PM on July 21, 2011


(and what the hell, me too)
posted by jonmc at 5:22 PM on July 21, 2011


Mi mi mi mi...
posted by weston at 5:23 PM on July 21, 2011


Y yo tambien, por favor.
posted by Joseph Gurl at 5:25 PM on July 21, 2011


For example, I can see at least one user who has 6553 words total, 2357 of which are in italics.
posted by Meta Filter at 4:12 PM on July 21 [+] [!]


Smart move cortex - rather than waste your time working on everyone's word counts, you just made the site itself sentient and told it to get to work
posted by mannequito at 5:31 PM on July 21, 2011


I too would like a wordy pony.
posted by goodnewsfortheinsane at 5:36 PM on July 21, 2011


If you're still running it, "oh me, me!"
posted by librarylis at 5:39 PM on July 21, 2011


Thanks, cortex!
posted by misha at 5:41 PM on July 21, 2011


Oh, me, me!

I should like my count to have a beautiful number
posted by maudlin at 5:42 PM on July 21, 2011


Ooh, ooh, me too!
posted by klangklangston at 5:49 PM on July 21, 2011


Me! Please? If it's not too late?

I like numbers.
posted by cooker girl at 5:50 PM on July 21, 2011


Oh, me, me! for round three!
posted by dws at 5:51 PM on July 21, 2011


Me equals MC 2!!
posted by Skygazer at 5:54 PM on July 21, 2011


No. Me equals MC.
posted by jonmc at 5:55 PM on July 21, 2011


JonMC Squared.
posted by Skygazer at 6:00 PM on July 21, 2011


oh me,me please
posted by 6550 at 6:01 PM on July 21, 2011


Oh, me, me!
posted by nickmark at 6:21 PM on July 21, 2011


If it is not too late, please run my user number too Mr. Cortex.
posted by JohnnyGunn at 6:28 PM on July 21, 2011


For comparison.

I have written the equivalent of half of "War and Peace".
posted by mr_crash_davis at 6:35 PM on July 21, 2011


Yesterday I was told I know approximately 35k words. Today I could probably prove that wrong.
posted by cjorgensen at 6:37 PM on July 21, 2011


after stripping out all html tags

*Sniffling* But what of the several serialized novels I've written in title attributes of <a> tags over the years?!?!
posted by Rhomboid at 6:38 PM on July 21, 2011


Oh, me, me!!!
posted by limeonaire at 7:03 PM on July 21, 2011


Is it too late? I was busy, or something, earlier.

Oh, me me me!
posted by rtha at 7:22 PM on July 21, 2011


me please!
posted by PinkMoose at 7:25 PM on July 21, 2011


oi me
posted by Lovecraft In Brooklyn at 7:25 PM on July 21, 2011


Mi mi mi mi!
posted by Mister_A at 7:27 PM on July 21, 2011


Thanks, that's pretty cool. Although I don't know what it actually says about me, it's nice to know.
posted by bilabial at 7:31 PM on July 21, 2011


I've been more verbose than I'd expected. Whether or not this is a good thing is left as an exercise for the reader.
posted by digitalprimate at 7:36 PM on July 21, 2011


Me too, please! Thanks!!
posted by salvia at 7:38 PM on July 21, 2011


Me, Me! Do Tell Me!
posted by schmod at 7:40 PM on July 21, 2011


Oh, me me!
posted by Navelgazer at 7:52 PM on July 21, 2011


Do me!
posted by crossoverman at 8:02 PM on July 21, 2011


The median is 58.2.
Wow, I'm above median!
posted by dg at 8:07 PM on July 21, 2011


Me two please. Thanks cortex!
posted by nestor_makhno at 8:08 PM on July 21, 2011


Oh, me, me!

You could probably get a good estimate by taking a sample of your comments - say the first one on each page of your activity - and using that as an average number of words per comment. I was going to suggest that but I see it's not necessary.
posted by madcaptenor at 8:29 PM on July 21, 2011


Oh, me, me!
posted by geoff. at 8:32 PM on July 21, 2011


Please can I please get in on the next round? You are awesome, cortex.
posted by brina at 8:39 PM on July 21, 2011


Terrifically jealous of cortex's uniques. Then I remembered that each number he posts in these sorts of things are treated as unique "words," right?


right?
posted by jessamyn (staff) at 8:45 PM on July 21, 2011


me too!
posted by PercussivePaul at 9:04 PM on July 21, 2011


Heh, I just realized that my MeFi word count is almost exactly ten times the word count of my first novel. Again, not sure whether to be proud or aghast :)
posted by Kattullus at 9:07 PM on July 21, 2011 [1 favorite]


i kind of don't want to know, but I'm going to ask anyway, so:

Me, too, please.
posted by empath at 9:08 PM on July 21, 2011


Then I remembered that each number he posts in these sorts of things are treated as unique "words," right?

Right! So that's a compromising factor. I also hyphenate a lot, and also conjugate somewhat recklessly.

If anybody is curious about what the per-user frequency tables looks like, you can take a look at this table of my activity as of the end of last year. A couple interesting notes about these tables, and about corpus frequency tables in general:

1. Uniques grow increasingly slowly in proportion to raw word count, at least up to a certain fairly high raw word count—the more we talk or write, the more we exhaust our productive vocabularies and so the more likely any given word we use will be one we've used at least once before. Hapax legomena become steadily rarer. So the proportion of unique words in your first 10000 words on the site is much higher than in your tenth 10000 words, etc.

2. About half the entries in a frequency table are likely to be hapax legomena, the other half words you've used at least twice. On the other hand, the majority of the total raw words you've used will be represented by a vanishingly small number of common words at the very top of the table; you write/say just the word "the" roughly as often as you say all of the words you've said least, combined. I said "the" about 75K times through the beginning of this year; I've said "zsazsa", "zucker", "zydeco", and probably thirty thousand other words, only once, ever.

There's a ton of interesting comparative and collective vocabulary analysis possibilities in this stuff that I hope to get around to looking at (probably? hopefully? with friendly linguists help since I'm really an enthusiastic layman about this stuff) in the future.

Anyway, I'll do another batch of number, and get out frequency table notifications, some time tomorrow since I'm entertaining friends tonight. But I'm totally glad people are enjoying this stuff.
posted by cortex (staff) at 9:09 PM on July 21, 2011 [4 favorites]


not sure whether to be proud or aghast

We need a portmanteau for when we feel these things simultaneously. Proughast is a little cumbersome, though. Aghroud?
posted by Devils Rancher at 9:13 PM on July 21, 2011


OMG, me me me please! I want to know how many novels I could have written if I'd never joined!

And then I will either laugh or cry.
posted by jokeefe at 9:13 PM on July 21, 2011


Oh, me too please.
posted by h00py at 9:15 PM on July 21, 2011


Hapax legomena this:

shiftlager
posted by adamdschneider at 9:20 PM on July 21, 2011


Oh meeeeeeeeeee. This is awesome. Long live cortex!
posted by Phire at 9:27 PM on July 21, 2011


I already know the answer for me: too many.

For april fools day, be sure to add "total sentence count, total word count, total letter count" to our profile pages, with random numbers spit out.
posted by davejay at 9:29 PM on July 21, 2011


metafilter | wc
posted by davejay at 9:29 PM on July 21, 2011


cortex, when you say unique words grow slowly with total words - is the growth logarithmic? I can't figure it out in my head and I don't feel like finding paper, but that feels right somehow.
posted by madcaptenor at 9:43 PM on July 21, 2011


Me! Me! (fi?!)
posted by dunkadunc at 9:51 PM on July 21, 2011


Oh, me, me!
posted by fake at 9:56 PM on July 21, 2011


Oh, me, me!
posted by pracowity at 10:38 PM on July 21, 2011


Moi aussi, s'ivp
posted by Gyan at 10:40 PM on July 21, 2011


Oh, me, me, please!
posted by Lynsey at 10:54 PM on July 21, 2011


Oh me, me!

I too would be interested in a frequency table.
posted by kenko at 11:19 PM on July 21, 2011


Yeah, I'd be interested in knowing my word count. And comparative frequency. I'm sure I'm near the top of the list for the word "fucking".
posted by BitterOldPunk at 11:41 PM on July 21, 2011


Dang, cortex, youse is da bomb!
posted by Lynsey at 11:53 PM on July 21, 2011


Plz count me in for the count and the freq table -- thanx!
posted by dancestoblue at 11:55 PM on July 21, 2011


OOOOOOOO!! Me, Me Please! and could I get the form letter for the frequency table?
posted by Blasdelb at 11:58 PM on July 21, 2011


Ooh, me, me! (And now I'm going to interpret that x as a Chi in your username...)
posted by Wrinkled Stumpskin at 12:40 AM on July 22, 2011


I'm sure I'm near the top of the list for the word "fucking".

Perhaps. I have slowed down in recent years. Heh.
posted by stavrosthewonderchicken at 12:45 AM on July 22, 2011 [1 favorite]


Oh, me, me.

(Suspect I'm no where near the million)
posted by seanyboy at 1:37 AM on July 22, 2011


Oh Me, Me Please! and could I get the form letter for the frequency table?

Also, because I'm sure my numbers will suck: zephyr xebec ocelot widdershins stultifying iconoclastic noosphere palimpsest butt-trumpet hoopty pinner zither cymbalo murgatroid *pant pant*
posted by BrotherCaine at 2:26 AM on July 22, 2011 [2 favorites]


Cortex is going to have to hire another unpaid intern to handle all these "me, too" requests.
posted by crunchland at 3:41 AM on July 22, 2011


Zero, me, me!
posted by Elmore at 3:55 AM on July 22, 2011


194 posts of "meme" and not a single reference to...

y'know what? Maybe that's for the best. By the way, me me!
posted by Saydur at 4:11 AM on July 22, 2011


If one were able to comment on Metafilter into infinity, one would not only make repeated identical comments creating a naturally occurring pattern, but each individual commenter, would eventually, make other people's comments, repeatedly as well.

And the harmonic vibration would cause the cosmic binders of the universe to shake loose. And we would all travel through a time tunnel and become born again like in that movie...



*I really hate it when my iced latte gets spiked with LSD.*
posted by Skygazer at 5:06 AM on July 22, 2011


If one were able to comment on Metafilter into infinity, one would not only make repeated identical comments creating a naturally occurring pattern, but each individual commenter, would eventually, make other people's comments, repeatedly as well.
posted by Elmore at 5:09 AM on July 22, 2011


On preview what Skygazer said.
posted by Elmore at 5:09 AM on July 22, 2011 [1 favorite]


Mi, me, oh!

( oh, me! me! )
posted by curuinor at 6:07 AM on July 22, 2011


I just looked at the Contribution Tables. Sometimes I worry that I talk too much here. I need to stop worrying about that.
posted by rtha at 6:15 AM on July 22, 2011


cortex, please count my words! ;)
posted by wierdo at 6:29 AM on July 22, 2011


rtha, you didn't link to the contribution tables (but I wished you did...what are they?!)
posted by iamkimiam at 7:24 AM on July 22, 2011


Lay some digits on me, my good man.
posted by loquacious at 7:40 AM on July 22, 2011


MetaFilter: ruin the denobulating.
posted by loquacious at 7:41 AM on July 22, 2011


Wow, that'll learn me to c&p without enough coffee in me. I meant to link to this comment by cortex, where he links to the contribution thingy.
posted by rtha at 8:01 AM on July 22, 2011


Oh, me, me!
posted by The Whelk at 8:06 AM on July 22, 2011


Pretty please, Mr. Cortex! Also would be much obliged if you could have the script spit out what % of my comments were bullshit.
posted by The White Hat at 8:15 AM on July 22, 2011


me, Me, ME!
posted by notsnot at 9:21 AM on July 22, 2011


Round three, leading off with neglected Rhaomi, up through notsnot.
user 62135: 340649 words, 28440 unique, in 3724 comments.
user 12851: 221662 words, 19212 unique, in 2290 comments.
user 19026: 274854 words, 24579 unique, in 5067 comments.
user 18312: 690117 words, 34674 unique, in 8654 comments.
user 23057: 200235 words, 15942 unique, in 3475 comments.
user 2238: 992159 words, 47211 unique, in 17527 comments.
user 61447: 282186 words, 18188 unique, in 2685 comments.
user 74248: 231189 words, 16914 unique, in 3378 comments.
user 549: 269292 words, 19699 unique, in 3160 comments.
user 110640: 112197 words, 11191 unique, in 1098 comments.
user 58: 1231128 words, 53299 unique, in 26792 comments.
user 16395: 519619 words, 27023 unique, in 4260 comments.
user 1801: 62992 words, 10447 unique, in 1968 comments.
user 17547: 283579 words, 20046 unique, in 4726 comments.
user 20191: 282612 words, 25640 unique, in 7025 comments.
user 59863: 95003 words, 10303 unique, in 660 comments.
user 801: 295265 words, 23028 unique, in 3635 comments.
user 57855: 132737 words, 10697 unique, in 1359 comments.
user 1890: 15372 words, 3884 unique, in 378 comments.
user 34026: 103491 words, 10230 unique, in 1892 comments.
user 14914: 289133 words, 24409 unique, in 3606 comments.
user 24422: 154298 words, 12185 unique, in 1739 comments.
user 4498: 42077 words, 6758 unique, in 676 comments.
user 37485: 253235 words, 14646 unique, in 3547 comments.
user 59453: 289060 words, 17234 unique, in 4959 comments.
user 99473: 115451 words, 9512 unique, in 1513 comments.
user 43189: 701735 words, 31941 unique, in 9812 comments.
user 104338: 30568 words, 5713 unique, in 569 comments.
user 123891: 185368 words, 16083 unique, in 3220 comments.
user 40584: 330945 words, 28500 unique, in 11281 comments.
user 33038: 629561 words, 25116 unique, in 5702 comments.
user 49429: 245881 words, 19914 unique, in 2960 comments.
user 38322: 504630 words, 28335 unique, in 4063 comments.
user 57973: 71418 words, 8688 unique, in 1100 comments.
user 95433: 18253 words, 4099 unique, in 472 comments.
user 91774: 76222 words, 7877 unique, in 1349 comments.
user 10331: 629862 words, 29948 unique, in 6169 comments.
user 16148: 114425 words, 11068 unique, in 933 comments.
user 20840: 210880 words, 15698 unique, in 2007 comments.
user 29475: 746556 words, 32298 unique, in 16063 comments.
user 12990: 416105 words, 27685 unique, in 5969 comments.
user 52224: 55281 words, 7087 unique, in 902 comments.
user 40688: 127424 words, 12363 unique, in 1927 comments.
user 77623: 149623 words, 15540 unique, in 3992 comments.
user 18169: 136077 words, 13061 unique, in 1802 comments.
user 3518: 682964 words, 35557 unique, in 8595 comments.
user 17446: 149653 words, 15893 unique, in 2572 comments.
user 744: 84796 words, 11254 unique, in 2186 comments.
user 17434: 131748 words, 17670 unique, in 3624 comments.
user 12903: 267551 words, 25140 unique, in 4817 comments.
user 90947: 68194 words, 9633 unique, in 852 comments.
user 82184: 35160 words, 5783 unique, in 296 comments.
user 14587: 243343 words, 17640 unique, in 4233 comments.
user 23303: 249109 words, 22643 unique, in 5298 comments.
user 25896: 47666 words, 7232 unique, in 1726 comments.
user 18726: 93203 words, 10155 unique, in 729 comments.
user 114526: 5093 words, 1694 unique, in 56 comments.
user 45186: 364096 words, 21006 unique, in 3303 comments.
user 17349: 1012342 words, 50669 unique, in 11974 comments.
user 80649: 491306 words, 36150 unique, in 16709 comments.
user 19185: 67580 words, 11639 unique, in 1370 comments.
user 17022: 215268 words, 19446 unique, in 4137 comments.
posted by cortex (staff) at 10:01 AM on July 22, 2011 [5 favorites]


me me me please
posted by adamvasco at 10:05 AM on July 22, 2011


Thanks dude. I rock with my 47666 words. *makes satan heavy metal hand gestures*
posted by Elmore at 10:07 AM on July 22, 2011


Of course I've ruined that count now, and satan hates me.
posted by Elmore at 10:09 AM on July 22, 2011


Thank you Mr. Cortex.
posted by JohnnyGunn at 10:18 AM on July 22, 2011


I just plonked my word count into Wolfram|Alpha. Apparently I've written the equivalent of a 243 page book, which took me 49 hours to type (although I suspect my typing is faster than "typical").
posted by brundlefly at 10:20 AM on July 22, 2011


Could you run it again? It seems to be twelve words short.
posted by goodnewsfortheinsane at 10:27 AM on July 22, 2011 [5 favorites]


I wrote 674 pages, so two decent sized paperbacks.

Oh I hope they're spy thrillers
posted by The Whelk at 10:38 AM on July 22, 2011


That's all we need, Bourne Agains.
posted by Elmore at 10:48 AM on July 22, 2011


I was thinking more "The Man Who Was Metafilter" myself.
posted by The Whelk at 10:50 AM on July 22, 2011


MetaTalk: Oh, me, me!
posted by banshee at 10:53 AM on July 22, 2011 [1 favorite]


Thanks!

690117 words, 34674 unique, in 8654 comments.

I need more variety.
posted by zarq at 10:58 AM on July 22, 2011


Banshee wins the Metafilter:

So we can all stop doing that now.
posted by Elmore at 11:02 AM on July 22, 2011


I was thinking "The Metafilterian Candidate".
posted by Elmore at 11:03 AM on July 22, 2011


"Metafilter Blaise"
posted by The Whelk at 11:03 AM on July 22, 2011


Following brundlefly's lead, I checked my numbers on Wolfram|Alpha and my contributions here would be contained by a 1232 page book which would take 10 days to type and about one hundred hours to read out loud.

And I'd like to believe that were I to need to come up with a really nasty punishment, reading my words aloud for more than four straight days would be one of the worst I could think up.
posted by quin at 11:05 AM on July 22, 2011


"The Metafiltese Falcon"
posted by Elmore at 11:06 AM on July 22, 2011


I tried the Wolf/Alphabear thing and it told me I wrote the Bible. So bite me.
posted by Elmore at 11:09 AM on July 22, 2011 [1 favorite]


I'm feeling somewhat aghroud about the fact that I know several of you by your user numbers.

Barely a quarter-million words in 7 years? I guess I make a lotta short comments. I doubt I'll live to see a million. Also, not enough unique words. I've got to get cracking on the vocabulary front.

Quincunx, phylactery, narthex, incunabula, vestibule, quondam, thanatopsis, phocine, plangent, accouchement, gegenschein, splenetic, etc.
posted by Devils Rancher at 11:16 AM on July 22, 2011 [2 favorites]


The Importance of Being Metafilter.
posted by griphus at 11:26 AM on July 22, 2011 [1 favorite]


I respectfully request my word count.
posted by reenum at 11:28 AM on July 22, 2011 [1 favorite]


Okay, I now need to know who has the highest unique words to word count ratio.
posted by empath at 11:31 AM on July 22, 2011


user 14914: 289133 words, 24409 unique, in 3606 comment

My uniques index (divide total words by uniques) is through the roof:

11.85
posted by Skygazer at 11:32 AM on July 22, 2011


Quincunx

I'm a what now?!
posted by quin at 11:35 AM on July 22, 2011


Something needed to kill Voldermort apparently.
posted by The Whelk at 11:36 AM on July 22, 2011


I'm a what now?!

You, sir, are an arrangement of five objects, four of which form the corners of a square, with the fifth placed at the center of said square. I pseudo-apologize if my comments have somehow offended you.

Aside: I was actually able to use quincunx during a staff meeting the other day, when we were discussing how to optimize placement of mugs on the conveyor belt for maximum curing speed.
posted by Devils Rancher at 11:39 AM on July 22, 2011 [1 favorite]


I waited 20 goddam years for a chance to actually use that word.
posted by Devils Rancher at 11:40 AM on July 22, 2011 [2 favorites]


You will note, if you've read down this far, that I am nothing if not persistent.
posted by Devils Rancher at 11:47 AM on July 22, 2011


My faux-outrage has been placated by your pseudo-apology.

Having a fifth placed in the center sounds just like something I'd do, so well spotted.

posted by quin at 11:48 AM on July 22, 2011 [1 favorite]


I have long considered it the world's most useless word, so you can imagine how excited I was when I finally got to use it in conversation. I veritably leapt from my chair. Eternal vigilance is the price of vocabulary.
posted by Devils Rancher at 11:58 AM on July 22, 2011 [1 favorite]


My quincunx brings the boys to the yard,

... ah, nevermind
posted by Elmore at 12:03 PM on July 22, 2011


127424 words, 12363 unique, in 1927 comments.

Well, there's the novel I've been not working on for the past six years.
posted by Phire at 12:35 PM on July 22, 2011 [2 favorites]


VoldeRwho?
posted by Namlit at 12:39 PM on July 22, 2011


zarq wrote:I need more variety

I thought 21,000 different words was pretty good. Thanks for making me feel bad. :(

(Thanks, cortex!)
posted by wierdo at 12:45 PM on July 22, 2011


Could you please do me? Thank you!
posted by ThePinkSuperhero at 12:47 PM on July 22, 2011


I'd love to know how many NaNoWriMos worth of words I've posted here so far. So, when you're through doing The Pink Superhero, could you please do me too?

Did that sound wrong?
posted by MrVisible at 12:52 PM on July 22, 2011


Me me me. (Because I will never turn down anything that is ALL ABOUT ME.)
posted by DarlingBri at 1:04 PM on July 22, 2011


user 744: 84796 words, 11254 unique, in 2186 comments. Hmm. Less said about that, the better, I suppose. :)
posted by Lynsey at 1:09 PM on July 22, 2011


IT'S OVER 9000!!!
posted by Slap*Happy at 1:19 PM on July 22, 2011


I thought 21,000 different words was pretty good. Thanks for making me feel bad. :(


Whatever you do, don't look at languagehat's stats then ;)
posted by BrotherCaine at 1:22 PM on July 22, 2011


I think I need to talk about things besides cats, booze, and bacon. Or at least do so with lots of different words, like felines, alcoholic beverages, and cured pork meat.
posted by rtha at 1:31 PM on July 22, 2011


Although, because I am a total math dimwit, I don't quite know if uniques index is "good" or "try harder next time!" - I mean, is a higher number "better" or "try harder next time"?
posted by rtha at 1:34 PM on July 22, 2011 [1 favorite]


The maths just describe the situation as it is, without value judgements.
posted by UbuRoivas at 1:46 PM on July 22, 2011


Thanks, cortex!

Interestingly, according to the Frequency Tables page there were over 427 million words posted to the major subsites from 1999 through the end of 2010. So my weighty-sounding 340,000 words only amount to 0.08% of the total corpus. (According to Wolfram Alpha, the corpus itself would make a 585,000-page book that would take nearly three solid years to read!)
posted by Rhaomi at 1:53 PM on July 22, 2011


Hey, Devils Rancher, isn't there a name for the opening or lobby area of a church? You know, the part located at the end of the nave, at the far end from the church's main altar? What's that part called again?
posted by nickmark at 1:56 PM on July 22, 2011


You mean like a portico of an early Christian or Byzantine church or basilica? An antechamber or porch at the western entrance of early Christian churches, separated off by a railing and used by catechumens, penitents?? Why... why, that's a... NARTHEX!

*happy dance*
posted by Devils Rancher at 2:00 PM on July 22, 2011


My Unwritten Novels Grade = 4. Sigh.

This is based on 416,000 or so comments, so let's say 4 decent-sized novels. Uniques ratio of just over 15, in just under 6K comments. LH has a uniques ratio of 27 or something, but as pointed out above, he gamed the results in an ungentlemanlylike manner by using those there foreign words.

Also: aboulomania, acipenser, acrography, adelaster, adenography, aerography, aerolith, aestival, aeviternal, aeviternity *cough*
posted by jokeefe at 2:02 PM on July 22, 2011


Erm, 416 K words, not comments. Grr.
posted by jokeefe at 2:06 PM on July 22, 2011


*waits for Devils Rancher to remember what "unique" means and slowly stop dancing*
posted by nickmark at 2:08 PM on July 22, 2011


On second thought, I'm not sure the uniques ratio thing includes that much useful information because it changes in relation to number of comments. My math brain doesn't go much farther than that without stretching in usually unused directions.
posted by jokeefe at 2:11 PM on July 22, 2011


I had originally guessed that number of uniques should be something like (number of words)/(log number of words). Given the data that cortex has posted, that doesn't seem to work. But it actually looks like some sort of uniqueness index that doesn't correlate with total number of words posted is given by u/(w^(1/2)*log(w)); this index averages around 3.13.

That is, among people who have written w words on metafilter, the number of unique words tends to be around 3.13 w1/2 log (w). (Logs are natural; I'm a mathematician.)

So, rtha: you've written 701735 words. The model predicts that you'll have (3.13) (701735)1/2 log(701735) = 35295 unique words. Your actual number of uniques is 31941, which is a bit smaller; it gives you a "uniqueness index" of

(31941) / ((701735)1/2 log (701735)) = 2.83

and 29 of the 126 people that I have data on have uniqueness index of 2.83 or lower. I don't know how to interpret this, though. My own uniqueness index is 2.53. My sense is that this uniqueness index is some sort of indicator of vocabulary size, so this basically means I say the same damn thing over and over again, or at least tend to write about the same topics, and you do too, although not as much.

(For the morbidly curious: among the data that we've already seen, the Pearson correlation between log(w) and u/(w^(1/2)*log(w)) is 0.0018; the Spearman rank correlation is 0.0039.In layman's terms, the fact that you've used more words doesn't appear to influence this particular measure of uniqueness. I have no idea why this should be the right form, though, and obviously fitting models like this is kind of a black art...)
posted by madcaptenor at 2:37 PM on July 22, 2011 [1 favorite]


Oh, me, me.
posted by ersatz at 2:44 PM on July 22, 2011


Oh, me, me!
posted by craven_morhead at 2:54 PM on July 22, 2011


The big old touchstone in hapax distribution stuff is Zipf's Law, but I get cross-eyed trying to read formulae so, you know, have fun with that.

The main thing I would emphasize with the unique counts here is that they're probably not super meaningful without a fairly close analysis of what's actually going on in any given person's language use on the site. High uniques could certainly speak to an admirably large productive vocabulary; it could also speak to a high typo rate; it could speak to a tendency to engage on a wide variety of subjects, however literate or not those engagements are; it could speak to a tendency to post long lists of 5- and 6-digit numbers in metatalk threads; and so on.

Someone with more familiarity with the nitty gritty of corpus linguistics could probably explicate here, but I just don't have the background to go into any kind of detail. But the core thing is that what is likely to be telling about a uniques count may not be what words you know any more than it's what you type about and how you type about it.
posted by cortex (staff) at 2:56 PM on July 22, 2011


Yeah, I don't know what I'm doing either when it comes to corpus linguistics. I know about Zipf's law but after that I'm just wildly speculating.
posted by madcaptenor at 2:58 PM on July 22, 2011


cortex : you can take a look at this table of my activity as of the end of last year.

I meant to comment on this last night, but I'm really digging the fact that the word "fuck" tops other common words (for this place) like "fun", "job", "users" "3", "flag", and "awesome" to name a few.

It really is a wonderfully useful word.

"Mefi" beats it out by over three times though, so there is that.
posted by quin at 3:23 PM on July 22, 2011


Thanks, madcap! I mostly have no idea what that means, but thank you!
posted by rtha at 3:37 PM on July 22, 2011


Basically, what it means (without the math) is that from the number of words you've used, I can predict the number of unique words you've used. You, for whatever reason, have used less unique words than most people who have written the same number of total words as you. But what this means about How You Use Language is unclear, as cortex pointed out.
posted by madcaptenor at 3:51 PM on July 22, 2011


I blame Felis catus.

Je fais le voeu d'arrêter de parler tant de choses sur les chats et le whisky.
posted by rtha at 4:08 PM on July 22, 2011


Oh, me, me, pretty please?
posted by defenestration at 4:12 PM on July 22, 2011


*waits for Devils Rancher to remember what "unique" means and slowly stop dancing*

Keep waiting - it's Friday night. I'm gonna keep on dancin'.
posted by Devils Rancher at 4:13 PM on July 22, 2011


madcaptenor, your dastardly recounting of your infernal equation has impugned my honor. I must, therefore, challenge you to a duel.
posted by wierdo at 4:14 PM on July 22, 2011 [1 favorite]


madcapentator - the number of unique words should be asymptotic and not significantly larger than the total number of words in the English language. Whereas the total number of words should be roughly proportional to the number of posts and therefore without astmptote.

In most cases it should be less than that with a certain amount of variance due to perfectly cromulent nonsense words, ansering askme questions in other languages, etc.
posted by plinth at 4:15 PM on July 22, 2011


it could also speak to a high typo rate;

Oh, hi!
posted by Devils Rancher at 4:19 PM on July 22, 2011


total number of words in the English language.

you're assuming that's well-defined.
posted by madcaptenor at 4:26 PM on July 22, 2011


Huh. Just 8000 words short of a million. Thanks, cortex!
posted by stavrosthewonderchicken at 4:57 PM on July 22, 2011


Thankyouthankyouthankyou Cortex. Other than finding out I have a near sociopathic I/We ratio, that was a lot of fun.
posted by BrotherCaine at 5:49 PM on July 22, 2011


Me please, cortex.
posted by dobbs at 10:49 PM on July 22, 2011


you're assuming that's well-defined.
Actually no, but even if you decided to adjust that by a constant (multiplicative or additive), it's still just a constant. And even if it grows per annum, the number of words typed per year for a significant mefi user will surely outstrip that (remember that for every set of new words added, archaic words drop out of use).
posted by plinth at 3:30 AM on July 23, 2011


And perhaps the most frequent user of a particular word?

Of my 605,373 words, I'm guessing 50,000 or so are either "ain't" or "y'all".
posted by flapjax at midnite at 3:56 AM on July 23, 2011


Heh. I'm 23558, not 23588.
posted by klangklangston at 8:35 AM on July 23, 2011


I'm not a number, I'm a slightly different number!
posted by Kattullus at 8:43 AM on July 23, 2011 [6 favorites]


If you're still doing them, I'd like to be done!
posted by ArmyOfKittens at 2:22 PM on July 23, 2011


Batch four! Sorry about the typo, klang.
user 18324: 110507 words, 15230 unique, in 2765 comments.
user 30938: 5220 words, 1824 unique, in 166 comments.
user 44996: 43440 words, 6529 unique, in 821 comments.
user 17721: 400179 words, 19116 unique, in 9904 comments.
user 53053: 38394 words, 6295 unique, in 490 comments.
user 51358: 470180 words, 23243 unique, in 4588 comments.
user 62962: 70080 words, 11309 unique, in 1441 comments.
user 19260: 99599 words, 10746 unique, in 2089 comments.
user 27016: 53846 words, 8409 unique, in 1372 comments.
user 13877: 582690 words, 26273 unique, in 8086 comments.
user 23558: 1624167 words, 57834 unique, in 18151 comments.
user 6142: 116605 words, 12234 unique, in 1258 comments.
posted by cortex (staff) at 4:33 PM on July 23, 2011 [2 favorites]


Heh. I'm 23558, not 23588.

Slogger's (23588) requested results were in before your request. I'm sure Cortex wasn't ignoring you. *eyes glance briefly sideways at ashes in official cabal memo disposal ashtray*
posted by BrotherCaine at 11:59 PM on July 23, 2011


oh me!
posted by nile_red at 5:55 AM on July 24, 2011


Is it too late to join the party?

Oh, me, me!
posted by AsYouKnow Bob at 6:40 AM on July 24, 2011


I did my own count.
418,804.
Counting this comment, that makes 418,814.

(For comparison, my doctoral dissertation was a little over 36,000 words.)

drat now my word count is wrong... 418,831
posted by caution live frogs at 9:20 AM on July 24, 2011


Since I am never going to actually write a novel, it makes me happy to know I have posted 470,180 words (23,243 unique) in 4,588 comments. I suspect my unique count of being something of a fanfaronade, though; it is probably mostly errors as I type faster than I spell.
posted by DarlingBri at 1:03 PM on July 24, 2011


Following Madcaptenor's explanation, I entered my numbers into Wolfram Alpha. If you're interested in your own "uniques index", you may find these links useful.

Uniques predicted by Madcaptenor's model.

My actual uniques: 13061. Predicted uniques: 13648.

Unique Index: 2.995

Can someone plot the "unique index" for all dumps so far?
posted by fake at 9:20 AM on July 26, 2011


Can someone plot the "unique index" for all dumps so far?

Somebody can.
posted by madcaptenor at 11:01 AM on July 26, 2011 [1 favorite]


Fantastic, thanks! Frum nou on ah'l mispel evarthin to inflaight mai uniqz!
posted by fake at 1:38 PM on July 26, 2011 [1 favorite]


Hgisaop gsjgir vn pang nhnsi biop abck gqbnpg bhva fth huoaergb.

(That's "from now on I'll invent a new language for every comment to inflate my uniques.")
posted by madcaptenor at 1:54 PM on July 26, 2011


U kud alsew jist rawt therteen yer sekrit languij

aye ee:

(Gung'f "sebz abj ba V'yy vairag n arj ynathntr sbe rirel pbzzrag gb vasyngr zl havdhrf.")

posted by fake at 2:07 PM on July 26, 2011


I could also double rot13 my secret language for even more uniques!
posted by madcaptenor at 2:33 PM on July 26, 2011 [2 favorites]


Double rot13? Everything I write is double rot13d.
posted by shakespeherian at 2:35 PM on July 26, 2011


Pretty please, may I know?
posted by Miko at 10:44 AM on August 5, 2011


« Older Metafilter Fantasy Football Challenge 2011/2012   |   You're it! Newer »

You are not logged in, either login or create an account to post comments