It's been over a year since we first made the
Metafilter Frequency Tables available, and now they're updated with word frequency information for all of 2011 and 2012 as well, bringing the total number of words up to six hundred and thirty-six million. Gosh! (Is a word we've collectively used 5,707 times since 1999!)
For some good background on what's in these tables and what a body might do with them, check out
the original announcement post and the
wiki page about 'em.
But the very short version is that these tables represent the total count of occurrences of any given word in Metafilter comments (specifically from the Metafilter, Ask Metafilter, Metatalk, and Music subsites, just like we cover in the
Infodump), as well as the relative frequency with which each word appears expressed as "parts per million" (or PPM), the number of times that word would appear if the total word count of the source text was exactly one million words.
If you feel like just doing some basic searching with a text editor's "find" function, the easiest thing is to download one of the "
complete" files and play around with that. If you're feeling like doing some more ambitious datawankery, there's a lot of potentially interesting things to do with comparisons between different subsites, or between the same subsite over time, or comparing the Mefi corpus to other more general linguistic corpora that exist on the internet.
posted by cortex to MetaFilter-Related at 11:13 AM (99 comments total)
9 users marked this as a favorite
What rare words do we use a lot, what common words do we eschew, kind of thing.
EDIT: oh I see that was one of the homework problems, up on the main post. Well, do it, someone!
posted by grobstein at 11:27 AM on January 14