Can we get a frequency count for users following / tracking tags?

Since My Mefi allows users to filter by tags, it seems like a useful feature to know how popular tags are. We already have the tag cloud, but I'm more curious about the middle-range of tags. Not that I plan to spam the most popular tags, but knowing for example, if "ubuntu" is a more followed tag than just "linux" would be useful when deciding how to tag a post about an Ubuntu announcement or something.

I realize there's probably some privacy concerns about the data, but I think a few heuristics could resolve that (ie, drop all tags below a threshold from the count).

posted by pwnguin to Feature Requests at 1:24 PM

Well, this isn't exactly what you're asking for (it's not following data, rather usage), but tag popularity came up the other day. It's in the infodump. I made graphs.
posted by axiom at 1:29 PM

I'll need to discuss this with everyone before I run any numbers. We've had a lot of concern about tags and privacy in the past as you mention, and My MeFi is a private section of the site. So we'll give it some thought.
posted by pb (staff) at 1:36 PM

There are a lot of tags, too: 149242 unique tags for the blue, and 127640 for the green (I didn't run the others dumps since they're comparatively tiny, but I can if you really want to know). It's power law, though, so the vast majority of tags hardly get used at all -- i.e., the most popular 1000 tags get used roughly 100+ times, while the other hundred-plus thousand get used less than a hundred times apiece.

But what's to stop you from tagging your hypothetical post with both Ubuntu and linux?
posted by axiom at 1:44 PM

I ran the numbers a few years ago and it came out to roughly 5% of the tags being used with any regularity.
posted by Tell Me No Lies at 2:45 PM

Oh, and just for reference linux ranks 142 and the most used tags list and ubuntu ranks 694.

If would be nice if one of the mods could run distributions on the tags followed vs. tags used. If there is a correlation then this whole request is moot.
posted by Tell Me No Lies at 2:59 PM

But what's to stop you from tagging your hypothetical post with both Ubuntu and linux?

Basically it comes down to two things: what tags should I follow, and what tags should I apply to a post? There's both concord and conflict between these two problems; if no tags are applied to a post, then there's no filter to be constructed. On the other hand, placing every word in the post as a tag* would nearly solve the poster's problem but is bound to be much noisier. So the challenge is finding some threshold for useful tags without overwhelming anyone's filter (or mefi's DB indexing).

There's currently very little feedback at the moment for posters. Let's use this post as an example, since my last one failed a bit. There's ten tags there that are only used by this post. By far the largest tag used is "city", which has many many hits but I doubt there's many followers who aren't also following "new york". And "economic" rather than economics, because it's part of the phrase economic downturn. It's an interesting post to me, but fell through my fairly large set of general filters. the man of twists and turns and I failed to come up with a common tag when we should have.

Although, just now I did have an amusing idea. What every tag boils down to is an agreement between two (or more) parties on whether it belongs or not. And there's a flash game site that does this sort of thing for music, images, and video. Perhaps we could have mefi the tagging game some day, though I suspect not. Still would be useful to validate the current model.

* I vaguely recall there being a limit of 20 tags?
posted by pwnguin at 3:28 PM

I vaguely recall there being a limit of 20 tags?

Just looked—we have a limit of 50 tags.
posted by pb (staff) at 3:55 PM

Do you have a character limit too? I seem to recall not being able to have all the tags I wanted even though I don't think there were 50.
posted by unliteral at 5:36 PM

You're right unliteral, I looked too quickly. We have a 50 character limit on the length of a single tag. There's no limit on the number of tags you can add. I misread it a bit ago.
posted by pb (staff) at 5:52 PM

Is there a character limit in the tag input box on the "post" pages?
posted by Night_owl at 6:15 PM

No, there's no limit there.
posted by pb (staff) at 6:22 PM

"Just looked—we have a limit of 50 tags."

If you do its not working as this post has 140, this post has 56, this post has 71, and this post has 103.

To be fair they're pretty epic posts.
posted by Blasdelb at 12:57 AM

Yeah, there's no limit on the number of tags you can add. I misread the code.
posted by pb (staff) at 6:40 AM

I wonder, is there an easy way to figure out which posts have the most tags?
posted by Blasdelb at 7:29 AM

We don't provide any tools for that. But the infodump knows.
posted by pb (staff) at 8:44 AM

We discussed this and decided to share the data. They're aggregate stats and aren't linked to any specific user. And I only included tags that have two or more matches because it's more likely that single-use tags could be tied to a specific user. With that out of the way, here are the numbers:

My Ask MeFi started in April 2008 and 2,625 members have set preferences for this feature. Here are the top tags at My Ask.

My MeFi started in August 2011 and 505 members have set preferences there. Here are the top tags at My MeFi. You can also specify tags to exclude at My MeFi. Here are the top excluded tags at My MeFi.
posted by pb (staff) at 10:04 AM

That's pretty interesting. The excluded list is so short!

I can't believe there are mefites excluding the cats tag. That's not a bannable offense?
posted by axiom at 8:57 PM

Awesome! Obviously MyMefi is just one use case for tags, but it's interesting how it gets used or misused. Since it was implied that the distributions should be the same, taking a quick look at the AskMefi dataset, here's a few observations:

The common ontology problem of pluralization strikes: "book" is used 1215 times without the tag "books", but only 2 people subscribe to "book." Given there's only ten posts tagged "mac" and "book, I'm leaning towards user error than a heavy antimac spam preference.

Moving and apartments are popular tags, but few people subscribe to them.

Similarly, 761 posts tagged Christmas, but no subscribers at all. Also completely unwatched is help, microsoft, office, online, and pain. I'm sure there's a consulting business in there somewhere.

The most popular single letter tag is C. N half as popular, and Y registers two followers.

Guitar is a crazy popular tag even though it's rarely used. Similarly, as one might expect given the popularity of the site among ref librarians, answering questions about literature is more popular than asking them.

The most frequent tag separation errors are writing and sex. Tags are separated by space, so if your subscription is "writing, sex" you get "writing," and "sex". Usually it's not a common thing, but I did see more people subscribe to feminism with a comma than without!

Maybe fishbike will come along and do a more robust analysis of things like rank variation between datasets, but it's pretty clear they're not the same distribution.
posted by pwnguin at 10:16 PM

The information about the PVRblog at is out of date. It is no longer a Retired Project.
posted by unliteral at 10:21 PM

