It's been over a year since we first made the Metafilter Frequency Tables
available, and now they're updated with word frequency information for all of 2011 and 2012 as well, bringing the total number of words up to six hundred and thirty-six million. Gosh! (Is a word we've collectively used 5,707 times since 1999!)
posted by cortex to MetaFilter-Related at 11:13 AM (99 comments total)
10 users marked this as a favorite
For some good background on what's in these tables and what a body might do with them, check out the original announcement post
and the wiki page about 'em
But the very short version is that these tables represent the total count of occurrences of any given word in Metafilter comments (specifically from the Metafilter, Ask Metafilter, Metatalk, and Music subsites, just like we cover in the Infodump
), as well as the relative frequency with which each word appears expressed as "parts per million" (or PPM), the number of times that word would appear if the total word count of the source text was exactly one million words.
If you feel like just doing some basic searching with a text editor's "find" function, the easiest thing is to download one of the "complete
" files and play around with that. If you're feeling like doing some more ambitious datawankery, there's a lot of potentially interesting things to do with comparisons between different subsites, or between the same subsite over time, or comparing the Mefi corpus to other more general linguistic corpora that exist on the internet.