Unusual Characters causing trouble with titles and tags October 10, 2008 10:11 PM   Subscribe

limited characters in tags/titles? I put some uncommon non-ASCII characters in the title of a recent post on AskMe, and the address bar and tag links are kinda busted.

So i made a post with some non-standard characters in the title (i think they're from the international phonetic alphabet, but don't quote me on that!). And it seemed reasonable to make one of the tags include some of those characters too, since they're relevant to the post.

When i submitted the post, though (and when i added a new tag), my URL bar gets repointed to: http://ask.metafilter.com/103973/A-boost-at-the-polls-from-jMMlYr, though i think it should probably be: http://ask.metafilter.com/103973/A-boost-at-the-polls-from-jōōlərē. Both links work, at least! But why would the first one show up?

Furthermore, the link to the tag i'd created (nōō'kyə-lər-jōō'lə-rē) is all kinds of garbled in the sidebar of the question display -- and the proposed link target itself even ends up at a dead page.

When i try visiting what i think the correct tag URL should be, i get all kinds of weirdness.

I know the tag is pretty silly and unimportant, but if certain characters aren't going to be allowed, maybe "bad tags" should just be rejected outright? It'd be cool to have full unicode tagging, though!

FWIW, my browser is set to default to the UTF-8 charset (and that's what my current HTTP connections to mefi report in the HTTP headers).
posted by dkg to Bugs at 10:11 PM (15 comments total)

Uh, i don't know why this MeTa post's title seems to be rotated, either. It was intended to be "Unusual Characters causing trouble with titles and tags". Sorry about that!
posted by dkg at 10:12 PM on October 10, 2008


We don't allow full unicode tagging because it causes all kinds of headaches with Apache like you're describing. We should be filtering out any extended characters in tags when you create a post or try to add them after posting, so we'll get that fixed up. We allow unicode characters in post titles, but something obviously got past our link stub filter. We'll get that cleaned up too, thanks for letting us know.
posted by pb (staff) at 10:27 PM on October 10, 2008


Do you really think anyone else would re-use the nōō'kyə-lər-jōō'lə-rē tag?
posted by blasdelf at 11:10 PM on October 10, 2008


Non-answer will be deleted so I'll post it here.
In short, thisis an analogical reformation
Shouldn't there be a comma after thisis?
posted by tellurian at 12:04 AM on October 11, 2008


nōō'kyə-lər-jōō'lə-rē

This is a bad tag. I vote that all bad tags be greeted with the happy birthday polka.
posted by jessamyn (staff) at 6:01 AM on October 11, 2008


And the admin whack-a-mole...
posted by y2karl at 6:32 AM on October 11, 2008


sauce, critter or fritter makes no nevermind
posted by y2karl at 6:35 AM on October 11, 2008


Hey, i acknowledge that nōō'kyə-lər-jōō'lə-rē is a bad tag -- i was mostly trying it to see what would happen. I just recently made sure that a webapp i contribute to was fully-localizable, and that included a lot of charset work. i tend to poke at things that i'm actively curious and care about, which was why i tried it here at MeFi. If the nōō'kyə-lər-jōō'lə-rē tag had worked, i'd probably have deleted it immediately (well, OK, maybe i'd have left it there, because it makes me laugh).

But no: i don't think anyone else would reuse the tag. It would have been awesome if i'd gotten the happy birthday polka from submitting it, though.
posted by dkg at 6:41 AM on October 11, 2008


This is a bad tag.
I'm interested, what is a 'bad tag'?
posted by tellurian at 6:52 AM on October 11, 2008


Is . a bad tag?
posted by tellurian at 6:54 AM on October 11, 2008


Yeah if I recall correctly . doesn't work as a tag. So a bad tag is one that breaks things. I guess you could argue that a bad tag is also one which basically doesn't do anything in the larger tagging scheme, but then you could get into a big kerfuffle about what the purpose of tagging is and then we get into "is it the links or the discussion?!" area.

In my opinion tagging is supposed to be useful for grouping similar posts, generating tag clouds, that sort of thing. It sort of does that and we use tags in a few features like "MyAsk" and the related questions thing. Sometimes people use tags to editoralize or make jokes which isn't really a big deal either way but it's not a "good tag" so to speak because it doesn't play nice with other tags or help people do anything on the site other than look at the tag on that one question and maybe go "heh". I use jokey tags on Flickr a lot because they amuse me, similar to what dkg was trying out on this post.
posted by jessamyn (staff) at 7:10 AM on October 11, 2008


A similar bug: Unicode characters in posts' descriptions are displayed as question marks on the recent activity page.
posted by finite at 12:38 AM on October 12, 2008


I can't believe you guys still believe in unicodes.

/mortified
posted by Mister_A at 10:42 AM on October 13, 2008


Thanks finite, you should be seeing unicode characters in post descriptions in recent activity now.
posted by pb (staff) at 11:11 AM on October 13, 2008


This page is currently the one and only Google result for a search on "nōō'kyə-lər-jōō'lə-rē", by the way.

(I didn't even know you could use Unicode in Google searches ... but apparently, you can.)
posted by Kadin2048 at 9:29 PM on October 14, 2008


« Older Fantasy basketball?   |   What ever happened to respecting people's opinions... Newer »

You are not logged in, either login or create an account to post comments