Unique Tags December 14, 2009 8:53 PM Subscribe
Is there any way to view tags that have only been used once?
It's a minor question driven only by curiosity, but I've been wondering for a while now about what tags might have only been used once in the history of Metafilter. I tried taking a look at the infodump, but I couldn't really find a way. I also didn't have much luck in the tags section of the site, which is pretty basic. Is there a way to look at such a list, for any of the sections of the site?
It's a minor question driven only by curiosity, but I've been wondering for a while now about what tags might have only been used once in the history of Metafilter. I tried taking a look at the infodump, but I couldn't really find a way. I also didn't have much luck in the tags section of the site, which is pretty basic. Is there a way to look at such a list, for any of the sections of the site?
OTOH, that means 50,000 tags have been used more than once. That's a lot of different tags.
posted by smackfu at 9:21 PM on December 14, 2009
posted by smackfu at 9:21 PM on December 14, 2009
I was preparing a post on tags but now this is here, maybe I can piggyback instead (with permission, thanks Rinku):
Is it time to recalibrate the tags page?
The popular tags page is currently showing 49 of the 150 most used tags in 30 pt. At some stage all 150 tags are going to be the same size.
I don't what the breaks are for bumping a tag up a point but of the 49 appearing at the same size (discounting brokenlink [8445]), music [4361] has the most tags and google [574] the lowest.
Marriage [268] has the lowest tags on the page and it's displayed at 14 pt. That's 17 points to play with.
If there was say, 300 between each point size, I think it would reflect the popular tags much better.
posted by tellurian at 9:26 PM on December 14, 2009 [2 favorites]
Is it time to recalibrate the tags page?
The popular tags page is currently showing 49 of the 150 most used tags in 30 pt. At some stage all 150 tags are going to be the same size.
I don't what the breaks are for bumping a tag up a point but of the 49 appearing at the same size (discounting brokenlink [8445]), music [4361] has the most tags and google [574] the lowest.
Marriage [268] has the lowest tags on the page and it's displayed at 14 pt. That's 17 points to play with.
If there was say, 300 between each point size, I think it would reflect the popular tags much better.
posted by tellurian at 9:26 PM on December 14, 2009 [2 favorites]
What FishBike said, about the great prominence of hapex legomena in the tag database. Putting together a list from the tag data in the Infodump wouldn't be too hard (and it sounds like he may have done so already), but it would be fairly overwhelming reading.
As far as recalibrating the popular tags page, yeah, maybe so. I know we've bumped that at least once before. I suppose we could try and find a way to make it auto-adjusting, though I don't know the details of how that page is implemented so it's more a question for Matt or pb.
The popular tag stuff is fairly static, in any case. It'd be neat to see some richer views into tag stuff in general, but I'm not sure what they'd be. One thing I've thought about is doing a sort of weekly/monthly zeitgeist, showing tags that are on the way up or down compared to recent history in terms of usage, but I've never really sat down and worked out the details of how that'd look.
posted by cortex (staff) at 9:41 PM on December 14, 2009
As far as recalibrating the popular tags page, yeah, maybe so. I know we've bumped that at least once before. I suppose we could try and find a way to make it auto-adjusting, though I don't know the details of how that page is implemented so it's more a question for Matt or pb.
The popular tag stuff is fairly static, in any case. It'd be neat to see some richer views into tag stuff in general, but I'm not sure what they'd be. One thing I've thought about is doing a sort of weekly/monthly zeitgeist, showing tags that are on the way up or down compared to recent history in terms of usage, but I've never really sat down and worked out the details of how that'd look.
posted by cortex (staff) at 9:41 PM on December 14, 2009
somebody needs to make a post about hapax legomena just so that hapaxlegomenon can be a tab and then we'd all agree never to use that tag again
posted by Kattullus at 9:51 PM on December 14, 2009 [1 favorite]
posted by Kattullus at 9:51 PM on December 14, 2009 [1 favorite]
Thanks for the replies. 102,036 is a lot more than I expected. I'd still like to take a peek, but with those kinds of numbers I might as well do a find command on the infodump and get a taste.
As for more content/better formating on the tags page, sounds awesome.
posted by Rinku at 9:54 PM on December 14, 2009
As for more content/better formating on the tags page, sounds awesome.
posted by Rinku at 9:54 PM on December 14, 2009
Since the Popular tags page probably doesn't change very often, is there any possibility of getting a popular tags page for the week/month/year instead of just the one for all time use? Or perhaps pages of popular tags for a given year? I think it might be a neat way to find interesting posts or trends.
posted by sambosambo at 10:06 PM on December 14, 2009
posted by sambosambo at 10:06 PM on December 14, 2009
I'm surprised "the" gets used so infrequently, relative to the total number of posts.
posted by Blazecock Pileon at 10:13 PM on December 14, 2009
posted by Blazecock Pileon at 10:13 PM on December 14, 2009
hapaxlegomenon can be a tab and then we'd all agree never to use that tag again
Whip up a Mnemosynus, Kattullus.
posted by tellurian at 10:19 PM on December 14, 2009 [1 favorite]
Whip up a Mnemosynus, Kattullus.
posted by tellurian at 10:19 PM on December 14, 2009 [1 favorite]
$ python >>> file = open("tagdata_mefi.txt") >>> tags = list(line.split()[4].lower() for line in file) >>> len(tags) >>> len(tags) 427715 >>> len(set(tags)) 91974 >>> tags.count("google") 574 >>> from collections import defaultdict >>> counts = defaultdict(int) >>> for tag in tags: counts[tag] += 1 ... >>> counts["google"] 574 >>> unique = list(tag for tag in tags if counts[tag] == 1) >>> len(unique) 63467 >>> for tag in unique: print tag northumberland ...(etc)
posted by effbot at 10:26 PM on December 14, 2009
tellurian: Whip up a Mnemosynus, Kattullus.
Ooooh... good reference. Though I suppose I'd have to whip up a Nemmosynus to maintain the abecedarian mangling.
posted by Kattullus at 10:35 PM on December 14, 2009
Ooooh... good reference. Though I suppose I'd have to whip up a Nemmosynus to maintain the abecedarian mangling.
posted by Kattullus at 10:35 PM on December 14, 2009
"sort -f -k 5 tagdata_mefi.txt | uniq -c -i -f 4 | sort -rn | less"
This leaves extraneous columns.
posted by Pronoiac at 10:47 PM on December 14, 2009 [1 favorite]
This leaves extraneous columns.
posted by Pronoiac at 10:47 PM on December 14, 2009 [1 favorite]
I would also be interested on a revamped tags page. Maybe have trending tags? For instance, I hope we've seen the last of those "georgebush" tagged posts, but its still just as popular today, apparently, as "computers". The tag cloud is nice, but it lacks in functionality sometimes.
I've also thought, and I have no way of determining if this is the case, that there could maybe be a better way for someone to browse through a tag of their choice without needing to figure out that they just add it to the address bar of their browser.
Maybe - now bear with me on this one mods - but maybe we should have a January experiment? I know, I know, old hat now.
posted by battlebison at 12:44 AM on December 15, 2009
I've also thought, and I have no way of determining if this is the case, that there could maybe be a better way for someone to browse through a tag of their choice without needing to figure out that they just add it to the address bar of their browser.
Maybe - now bear with me on this one mods - but maybe we should have a January experiment? I know, I know, old hat now.
posted by battlebison at 12:44 AM on December 15, 2009
Speaking of tags, how's the "tagged favorites" feature coming along, if at all? I know it's probably not the easiest task to accomplish, but if it's doable, I would be unbelievably happy to use it. There's been too many times where I wished I could tag the favorites I've saved.
posted by spiderskull at 12:45 AM on December 15, 2009
posted by spiderskull at 12:45 AM on December 15, 2009
Sounds like a cheesy at&t commercial...
"These are perfectly good tags, Timmy, they have only been used once!"
"But mooooooom..."
posted by qvantamon at 2:26 AM on December 15, 2009
"These are perfectly good tags, Timmy, they have only been used once!"
"But mooooooom..."
posted by qvantamon at 2:26 AM on December 15, 2009
All this talk of zeitgeists and tag trending has prompted me to update the Automated History of MetaFilter page, since it's been a few months since it was last refreshed. I know it's not quite the same thing, but it's one view of what we've been talking about on the front page, using tag data and comment counts.
posted by FishBike at 7:55 AM on December 15, 2009
posted by FishBike at 7:55 AM on December 15, 2009
This leaves extraneous columns.
posted by tarheelcoxn at 8:03 AM on December 15, 2009
cut -f 4 tagdata_mefi.txt | sort -f | uniq -c -i | sort -rn | egrep "^\s*1\W" | lessFixed. Drop the egrep bit if you want to see all tags and not just tags with only one occurrence. I did chop off the header before I started.
posted by tarheelcoxn at 8:03 AM on December 15, 2009
cortex: "What FishBike said, about the great prominence of hapex legomena in the tag database. Putting together a list from the tag data in the Infodump wouldn't be too hard (and it sounds like he may have done so already), but it would be fairly overwhelming reading."
Yep, in anticipation of posting the list here, I ran a query to generate one. Scrolly, it was. I tried to post and alphabetized version on a Google Sites page just now, and I think I broke something. So instead I uploaded it as a text file attachment to said Google Sites page.
posted by FishBike at 8:20 AM on December 15, 2009
Yep, in anticipation of posting the list here, I ran a query to generate one. Scrolly, it was. I tried to post and alphabetized version on a Google Sites page just now, and I think I broke something. So instead I uploaded it as a text file attachment to said Google Sites page.
posted by FishBike at 8:20 AM on December 15, 2009
Now that I'm back from lunch, I notice that my line produces 63467 rows, which doesn't come close to matching Fishbike's 102036. Not sure how I went wrong, but don't trust my egrep. Back to work I go!
posted by tarheelcoxn at 8:36 AM on December 15, 2009
posted by tarheelcoxn at 8:36 AM on December 15, 2009
zombie!
zombieapples
ZombieBaseball
zombiebooks
Zombiebotarmy
zombiecondos
zombiecooking
ZombieFireAnts
zombiegirl
zombiegroundzero
zombiejesus
zombiejosephbeuys
zombiemessagesthatrefusetodie
zombiemovies
zombienazis
zombiequestionFilter
zombiereagan
zombieshatner
zombiesinthesnow
zombiesoftware
zombiestrippers
zombiesurvivalguide
zombietalk
zombietools
zombiewalk
zombieworldnews
zomby
Heh.
posted by cortex (staff) at 8:49 AM on December 15, 2009
zombieapples
ZombieBaseball
zombiebooks
Zombiebotarmy
zombiecondos
zombiecooking
ZombieFireAnts
zombiegirl
zombiegroundzero
zombiejesus
zombiejosephbeuys
zombiemessagesthatrefusetodie
zombiemovies
zombienazis
zombiequestionFilter
zombiereagan
zombieshatner
zombiesinthesnow
zombiesoftware
zombiestrippers
zombiesurvivalguide
zombietalk
zombietools
zombiewalk
zombieworldnews
zomby
Heh.
posted by cortex (staff) at 8:49 AM on December 15, 2009
Every day, I get up and pray to Jah. And he increases the number of tags by exactly one.
posted by Eideteker at 9:00 AM on December 15, 2009 [1 favorite]
posted by Eideteker at 9:00 AM on December 15, 2009 [1 favorite]
ok, I just added a bit more variance to the font sizes on the popular tag clouds. That should make it a little easier to spot the most frequently used tags.
Speaking of tags, how's the "tagged favorites" feature coming along, if at all?
It's on hold for the time being. We're going to let the dust settle from the November favorites experiment, digest the feedback we received, and go from there. My gut feeling is that we'll let favorites be for a while before we add or change anything.
posted by pb (staff) at 9:06 AM on December 15, 2009
Speaking of tags, how's the "tagged favorites" feature coming along, if at all?
It's on hold for the time being. We're going to let the dust settle from the November favorites experiment, digest the feedback we received, and go from there. My gut feeling is that we'll let favorites be for a while before we add or change anything.
posted by pb (staff) at 9:06 AM on December 15, 2009
This goddamn bowling alley is just lousy with skinheads.
posted by Skot at 9:38 AM on December 15, 2009
posted by Skot at 9:38 AM on December 15, 2009
tarheelcoxn: "Now that I'm back from lunch, I notice that my line produces 63467 rows, which doesn't come close to matching Fishbike's 102036. Not sure how I went wrong, but don't trust my egrep. Back to work I go"
If you're just looking at the tag data from the front page, that's probably the reason. I'm looking at the tag data for all four sub-sites combined.
posted by FishBike at 12:05 PM on December 15, 2009
If you're just looking at the tag data from the front page, that's probably the reason. I'm looking at the tag data for all four sub-sites combined.
posted by FishBike at 12:05 PM on December 15, 2009
I should've double-checked that, instead of pasting something I'd written for the wiki, at cross-purposes - that showed the most-used tags first. (Also, I got the tags in reverse alphabetical order from tarheelcoxn, which obviously is (1) omg ANNOYING, & (2) trivial unless you're way OCD. like me. anyway.)
For Mefi only -
For all the subsites -
posted by Pronoiac at 2:40 PM on December 15, 2009
For Mefi only -
tail -n +3 tagdata_mefi.txt | cut -f 4 | sort -f | uniq -c -i -u | lessgives 63,467 unique tags, out of 91,975 total tags, matching tarheelcoxn's count.
For all the subsites -
tail -n +3 tagdata_*.txt | grep -v tagdata_ | cut -f 4 | \ sort -f | uniq -c -i -u | lessgives 102,039 unique tags, out of 152,201, not matching FishBike's counts (102,036 & 152,197). I guessed headers tripped him up, but leaving them in made my counts go up, further away, so I dunno.
posted by Pronoiac at 2:40 PM on December 15, 2009
FYI, special characters are really awful in tags, for tripping up datawankery & for behaving differently on Mefi (normal) vs other subsites (403 errors).
Excruciating details follow.
From FishBike's list:
missing - çIA, è, é, §, & ß. Note, mostly, these links won't currently get you to the articles with them. Hm. è, é, §, & ß (the other matches for the search are for the "ss" tag).
extra - º, & ¯. These are likely two of the above, but I can't tell which.
posted by Pronoiac at 4:14 PM on December 15, 2009 [1 favorite]
Excruciating details follow.
From FishBike's list:
missing - çIA, è, é, §, & ß. Note, mostly, these links won't currently get you to the articles with them. Hm. è, é, §, & ß (the other matches for the search are for the "ss" tag).
extra - º, & ¯. These are likely two of the above, but I can't tell which.
posted by Pronoiac at 4:14 PM on December 15, 2009 [1 favorite]
Ah, we can probably clean those up by hand at some point.
posted by cortex (staff) at 4:23 PM on December 15, 2009
posted by cortex (staff) at 4:23 PM on December 15, 2009
Eek, even making the tagname field nvarchar instead of varchar didn't help, I just got a different set of weird1 characters, and I can't really be bothered to try any harder than that.
Incidentially, though it's not the explanation for the count difference, there's one tag with an embedded space in it: "scifi sf" occurs several times.
A few people seem to have tried to get two words into one tag by enclosing them in quotes, so we get some tags with leading or trailing quotes.
posted by FishBike at 4:28 PM on December 15, 2009
Incidentially, though it's not the explanation for the count difference, there's one tag with an embedded space in it: "scifi sf" occurs several times.
A few people seem to have tried to get two words into one tag by enclosing them in quotes, so we get some tags with leading or trailing quotes.
posted by FishBike at 4:28 PM on December 15, 2009
1: I mean weird to me - there is nothing fundamentally weird about them, of course.
posted by FishBike at 4:29 PM on December 15, 2009
posted by FishBike at 4:29 PM on December 15, 2009
Actually, not to be a lazy git, but if you'll drop in links (or just the threadid numbers) here, I'll clean up the tags in affected threads right now.
posted by cortex (staff) at 4:45 PM on December 15, 2009
posted by cortex (staff) at 4:45 PM on December 15, 2009
Links to the 'scifi sf'-tagged threads:
15831
35549
40347
41693
43709
47031
47490
47580
48051
50900
51521
53382
59956
60791
61074
61688
62721
65207
66591
67474
67881
67894
There are 69 links to posts whose tags contain a double quote (") character... is that too many to post here?
posted by FishBike at 4:57 PM on December 15, 2009
15831
35549
40347
41693
43709
47031
47490
47580
48051
50900
51521
53382
59956
60791
61074
61688
62721
65207
66591
67474
67881
67894
There are 69 links to posts whose tags contain a double quote (") character... is that too many to post here?
posted by FishBike at 4:57 PM on December 15, 2009
... actually there are only 27 posts due to multiple tags with quote marks in the same post. (And a few might actually be OK because it's being used as a shortcut for inches, like in the first one here):
Ask MetaFilter 15111
Ask MetaFilter 33345
Ask MetaFilter 33597
Ask MetaFilter 38188
Ask MetaFilter 51594
Ask MetaFilter 58351
Ask MetaFilter 60193
Ask MetaFilter 70866
Ask MetaFilter 85227
Ask MetaFilter 88487
Ask MetaFilter 91457
Ask MetaFilter 94821
Ask MetaFilter 96026
Ask MetaFilter 103006
MetaFilter 6475
MetaFilter 9004
MetaFilter 18196
MetaFilter 27683
MetaFilter 31745
MetaFilter 40259
MetaFilter 42692
MetaFilter 45912
MetaFilter 49243
MetaFilter 55006
MetaFilter 55533
MetaFilter 59082
MetaFilter 72594
posted by FishBike at 5:04 PM on December 15, 2009
Ask MetaFilter 15111
Ask MetaFilter 33345
Ask MetaFilter 33597
Ask MetaFilter 38188
Ask MetaFilter 51594
Ask MetaFilter 58351
Ask MetaFilter 60193
Ask MetaFilter 70866
Ask MetaFilter 85227
Ask MetaFilter 88487
Ask MetaFilter 91457
Ask MetaFilter 94821
Ask MetaFilter 96026
Ask MetaFilter 103006
MetaFilter 6475
MetaFilter 9004
MetaFilter 18196
MetaFilter 27683
MetaFilter 31745
MetaFilter 40259
MetaFilter 42692
MetaFilter 45912
MetaFilter 49243
MetaFilter 55006
MetaFilter 55533
MetaFilter 59082
MetaFilter 72594
posted by FishBike at 5:04 PM on December 15, 2009
If you're offering to clean up special characters instead of just quotes, cortex, there are 319 of them, or you could use a 19-line Perl script. I could rewrite it to provide handy links, if that helps. And if you're not offering, uh, ignore this.
I've thought about setting up an automatic header parser in Perl that would let you do, say,
posted by Pronoiac at 10:40 PM on December 15, 2009
I've thought about setting up an automatic header parser in Perl that would let you do, say,
- parse.pl favoritesdata.txt "$faver eq $favee" or, in this case,
- parse.pl tagdata_mefi.txt "$tag_name =~ /[\x00-\x19]|[\x7F-\xFF]|\"/"
posted by Pronoiac at 10:40 PM on December 15, 2009
Pronoiac, I'll grab that and run with it. Handing me a functioning Perl script is like Xmas morning, thanks.
posted by cortex (staff) at 6:55 AM on December 16, 2009
posted by cortex (staff) at 6:55 AM on December 16, 2009
It'd be neat to see some richer views into tag stuff in general, but I'm not sure what they'd be.
Would this be a good place to mention that I really miss being able to see all the tags that I, personally, have used on posts? Now we can only see the top nine.
posted by anastasiav at 8:07 AM on December 16, 2009
Would this be a good place to mention that I really miss being able to see all the tags that I, personally, have used on posts? Now we can only see the top nine.
posted by anastasiav at 8:07 AM on December 16, 2009
I was going to upload that earlier script to a wiki, but instead, wrote a parsing script, beanplate. Instead, try:
posted by Pronoiac at 3:29 PM on January 10, 2010
beanplate.pl -c "tag_name =~ /[\x00-\x19]|[\x7F-\xFF]|\"/" -i tagdata_mefi.txtFor the earlier, unique tag question, beanplate replaces the first line of:
tail -n +3 tagdata_*.txt | grep -v tagdata_ | cut -f 4 | \ sort -f | uniq -c -i -u | lesswith
beanplate.pl -f "tag_name" -i tagdata_mefi.txt | \I wrote a draft of this weeks ago, but forgot to post it. Now, I've extended it, & I think I see a way to do anastasiav's request of "show me all my tags."
posted by Pronoiac at 3:29 PM on January 10, 2010
Drat! Speaking of outdated drafts, make that
posted by Pronoiac at 3:55 PM on January 10, 2010
beanplate.pl -f "tag_name" tagdata_*.txt
posted by Pronoiac at 3:55 PM on January 10, 2010
You are not logged in, either login or create an account to post comments
posted by FishBike at 9:01 PM on December 14, 2009 [7 favorites]