When tags with non-ASCII characters appear in the tag search results, the links are broken (while tags with non-Latin-1 characters are simply not accessible by any means). Tags with percent signs make the server grumpy. Underscores also seem to be treated somewhat inconsistently (or at least counterintuitively) by the tag search.
1. Tag search vexed by non-ASCII characters, and worse
When tags with non-ASCII characters appear in the tag search results list, they appear correctly in the text, but the link doesn't work. For example, the search {
orr} includes a tag "orripálldýrason". The page is encoded in UTF-8, and the characters appear literally in the text and in the href attribute:
<a href="http://www.metafilter.com/tags/orripálldýrason" target="_self">orripálldýrason</a>
My browser (Firefox 18.0.1 on Ubuntu 12.04.1), at least, follows that link by %-encoding the UTF-8 bytes and issuing the request
GET http://www.metafilter.com/tags/orrip%C3%A1lld%C3%BDrason
Metafilter replies:
Sorry, no matches for the tag orripálldýrason across MetaFilter.
Here the %-encoded bytes have been interpreted not as UTF-8 multibyte sequences, but as individual characters: thus á (U+00E1 LATIN SMALL LETTER A WITH ACUTE), encoded in UTF-8 as the two bytes C3 A1, has appeared as the two characters à (U+00C3 LATIN CAPITAL LETTER A WITH TILDE) and ¡ (U+00A1 INVERTED EXCLAMATION MARK). I'm citing Unicode code points here, but they're the same in Latin-1, which might be what the code thinks it's doing.
If, on the other hand, I manually %-encode the two non-ASCII characters using their Latin-1 values and issue the request
GET http://www.metafilter.com/tags/orrip%E1lld%FDrason
then the desired page is served. Note also that the "tag sidebar" in
the tagged post itself has its hrefs in this Latin-1 form, and that works fine.
It may seem, then, that the solution is just to make the tag search results page construct its hrefs in the same way that the tag sidebar on the thread page does. But wait! There's more! There are a few tags with characters that are not in Latin-1 and which therefore at present simply cannot be accessed. There are
a bunch with so-called "smart quotes",
one with ž, which didn't make it into Latin-1, and
a couple with ™. In these cases, the tag sidebar displays the character in the text but replaces it in the href with %3F (a question mark, perhaps the automatic output of a character encoder faced with an unencodable character) and the links don't work. In fact, it seems that there is no way to request the tag page for such tags... so maybe the way it works on the tag sidebar is not so hot after all.
2. Tags with percent signs very much disliked
The tag "100%pure" (on
this post) cannot be accessed, neither verbatim (as it appears in tag search results pages)
GET http://www.metafilter.com/tags/100%pure
nor %-encoded (which is obviously more righteous)
GET http://www.metafilter.com/tags/100%25pure
Both yield 400 Bad Request.
3. Underscores, unreliably discovered
This one is a little fuzzier, but it seems like the tag search doesn't discover tags very consistently when they contain underscores. Examples:
(On reflection, the last one could be explained by the tag search truncating the results, keeping only the most frequently used tags... but the results for {adam}, for example, cannot be explained this way.)
In all these cases (and others I have tried), the omitted tags are listed if you include the underscore in the search string: {
adam_}, {
4_}, {
human_}, {
space_}. And, of course, if a completely reliable search of tags is really needed, one can go to the infodump. Still, it seems pretty weird.
Other punctuation marks are, it seems, consistently treated either as normal characters (e.g., full stops: {
127} finds
127.0.0.1) or as word delimiters (e.g., hyphen: {
127} finds
lz-127). In my testing so far, only the underscore yields inconsistent results.
I notice that on the adam underscore examples, the ones with capital letters beginning the name are the ones that aren't showing up (and the same with others – a tag search for "seth" or "Seth," for example won't show the one tagged with Seth_Godin), but I don't know why.
posted by taz (staff) at 1:55 AM on January 28