MeFi clustering analysis
posted by signal to MetaFilter-Related at 9:01 PM (81 comments total)
6 users marked this as a favorite
I ran a clustering algorithm on some data from the Mefi InfoDump
, matching each user with the tags used in her posts. I created a dendrogram
(same image as in first link
). Here's a plain text version
that's not as pretty but is searchable.
The data was pared down selecting the users with 25 or more posts and the tags that were used 10% as much as the tag that was used the most, which gives me 556 users and 80 tags.
I also inverted the data and clustered the tags
themselves, which gives some idea of thematic areas.
If I accept users with >= 5 posts, and tags >= 1% max, I get 2268 users and 1274 tags, and a very tall dendrogram
(and plain text
The python script I used to extract the data from the dumps is here
, the clustering algorithm was taken from Programming Collective Intelligence
and the clustering & dendrogram drawing script is available from the author's site
, under chapter 3.