tag = tags May 26, 2009 8:13 AM   Subscribe

Can a tag search for "thing" also return "things"?

It's a small, yet persistently annoying thing, when searching by topic.

I was looking for a recent post on AskMe about umbrellas- http://ask.metafilter.com/tags/umbrella returned 13 results, none correct. http://ask.metafilter.com/tags/umbrellas returned 5, including the correct one. There was 1 overlap.

Can search be updated to account for both cases when searching for either "foo" or "foos"?
posted by mkultra to Feature Requests at 8:13 AM (35 comments total) 4 users marked this as a favorite

That seems like a natural at first, but those pages technically aren't "search results", they're showing all posts that are tagged with a particular exact tag.

(And if you try a search for the tag umbrella or thing, you get several related tags like things, thingy, etc.)

What we might do is tune the list of "related tags" on that page so it includes variations of a word instead of simply tags that are often used alongside that particular tag. Or maybe even a "variations" list. But I'm hesitant to automatically include posts that don't have that particular tag on a tag page. Umbrella is pretty cut and dried, but there might be some cases where including the plural form would be confusing.
posted by pb (staff) at 8:23 AM on May 26, 2009 [1 favorite]


I recall lots of discussion on this, as an early user of flickr and its tag system there. The problem here is that not every word can be made into a plural that easily and, in many cases, the meaning of the word may be completely changed.

Approaching this through a related tags is a good idea. This would include synonyms for a word too. On flickr, anyone searching for pictures tagged 'mom' would presumably also be interested in pictures tagged 'mum' but not necessarily. Generating a list of related tags, which includes plurals, synonyms and perhaps translations is not an easy task.

In the end, the option is to leave it alone, or to lead the user to explore tag clusters.
posted by vacapinta at 8:43 AM on May 26, 2009


Looking at the popular tag list, there are certainly things where adding the plural just doesn't make sense (xp, windows, internet, itunes), but I'd question whether it would hurt anything if the search for that tag pulled in plural forms or whether plural forms would even exist. Also, there are already plural forms of some tags (relationship and relationships), so what do you do with those, combine them?
posted by Brandon Blatcher at 8:47 AM on May 26, 2009


What if you do a tag search for radius or bacterium, eh?
posted by Plutor at 8:47 AM on May 26, 2009 [1 favorite]


(And if you try a search for the tag umbrella or thing, you get several related tags like things, thingy, etc.)

I tend not to "search" using tags via that screen, just going to /tags/[X], which I realize is a bit of Advanced User functionality. Regardless, returning multiple results for tags on that screen still means I need to click through multiple results, even when it's obvious that umbrella = umbrellas.

Umbrella is pretty cut and dried, but there might be some cases where including the plural form would be confusing.

The problem here is that not every word can be made into a plural that easily and, in many cases, the meaning of the word may be completely changed.

I think these are edge cases that are worth sacrificing. How many words have their meaning significantly changed when adding an "s" as opposed to becoming simply plural? How many are likely to be used as tags?

What's the worst that can happen? A limited subset of searches return more results than are directly relevant? That's a better option, IMO, than being too strict.
posted by mkultra at 8:52 AM on May 26, 2009


Also, there are already plural forms of some tags (relationship and relationships), so what do you do with those, combine them?

I'm not advocating changing any tag data at all, just the way that search works.
posted by mkultra at 8:52 AM on May 26, 2009


BB, I think some of the confusion might be in calling the tag pages "search" results. We've come to expect search engines to pull in variations of words when we're searching. Google, Yahoo, et al do this well. But the pages we're talking about are a different beast. The tag pages show all posts that have been tagged with an exact tag by a human being. The poster didn't choose the tags at random, and the tags weren't assigned by a computer. So we have to assume that the tags were carefully chosen, and some other tags were carefully omitted—whether or not that's the case in reality.

Now if you search for "umbrella" using the MeFi search engine, you will get results for posts that include the word "umbrella" and "umbrellas" as you'd expect for search results. And perhaps that's a better tool for recalling a post in this case. Tags are a nice organizing tool, but they're not the only way to track down past posts.
posted by pb (staff) at 8:53 AM on May 26, 2009 [1 favorite]


Perhaps an option at the top of the /tags/foo page to reload with the inclusion of "foos"?
posted by mkultra at 9:12 AM on May 26, 2009


Aren't there dictionaries that could be used to automatically pull the plurals?
posted by Pope Guilty at 9:26 AM on May 26, 2009


mkultra, yeah, behind the scenes that would be adding an "or" option to the tag pages. ie. Show me all posts tagged "umbrella" OR "umbrellas". We currently only have "and" with tags.
posted by pb (staff) at 9:26 AM on May 26, 2009


I'm pretty sure the plural of foo is fee.
posted by Plutor at 9:32 AM on May 26, 2009 [3 favorites]


pb: mkultra, yeah, behind the scenes that would be adding an "or" option to the tag pages. ie. Show me all posts tagged "umbrella" OR "umbrellas". We currently only have "and" with tags.

Yeah, and I guess that would create serious problems for pages that are already using an "AND" to combine two separate tags.
posted by mkultra at 9:34 AM on May 26, 2009


...I guess that would create serious problems...

We'd need to find a different way to show or pages, yeah. But I guess we need to weigh the utility of combining tags that way and compare it with other options like expanding the "related tags", adding "tag clusters", or some other way to expose more tag space on those pages. So I think we'll need to chew on this one a bit.
posted by pb (staff) at 9:38 AM on May 26, 2009


Personally my vote would be to allow wildcards in the tag when looking them up. So you could do something like http://ask.metafilter.com/tags/umbrella* and it would include all of posts that are tagged with something that starts with umbrella. I have no idea how hard that would be to implement though.

What if you do a tag search for radius or bacterium

Speaking of non-standard plurals, I once took a graph theory class where the professor claimed he would automatically give a zero to anyone who used the word "vertexes" on the final.
posted by burnmp3s at 9:52 AM on May 26, 2009


Speaking of non-standard plurals, I once took a graph theory class where the professor claimed he would automatically give a zero to anyone who used the word "vertexes" on the final.

Everyone knows it's vertexen.
posted by inigo2 at 9:56 AM on May 26, 2009


It's not vertizzles?
posted by jabberjaw at 10:00 AM on May 26, 2009 [2 favorites]


pb: So I think we'll need to chew on this one a bit.

Thanks, I appreciate it!
posted by mkultra at 10:06 AM on May 26, 2009


verticles. Which, through a misspelling, is how the word "vertical" came to mean a line between two verticles with the same x-component.
posted by Lemurrhea at 10:25 AM on May 26, 2009


Three vertizzles in my trizizzle?
posted by Night_owl at 10:27 AM on May 26, 2009 [1 favorite]


Moreover, this might accidentally give the silly British folks the idea that it is, in fact, okay to refer to "math" as "maths".

I am 100% kidding.
posted by SpiffyRob at 10:44 AM on May 26, 2009


Your folksonomic taxonomy is failing. Obviously the only solution is to require the use of regular expressions for all forms of search.
posted by blue_beetle at 11:01 AM on May 26, 2009


That sounds like prescriptivist talk, blue_beetle.
posted by Mister_A at 11:12 AM on May 26, 2009


Alternatively, we could work on training people to not use the singular forms when tagging something. I know that pluralizing was encouraged in the big back-tagging project because it helped to reduce ambiguity.

So if someone enters a singular form of a word, an Ajaxy pop up window will alert them that non-plural words are bad, and then if they persist, a collection of progressively louder horrible sounds will begin emanating from their PC, finally, if they continue and do use a singular, their computer will begin to emit a disturbing smell.

You guys can program that in, right? Because I'm thinking of something like a cross between old cat pee and mothballs.
posted by quin at 12:01 PM on May 26, 2009


Sure, we could do the whole sounds/cat pee thing. But I don't think plural vs. singular is the problem here. There are many cases where the distinction is useful. In one case umbrellas might refer to the design of objects that keep rain off your head while umbrella refers to a problem with a specific umbrella that someone owns. This might be an extreme example, but it shows that specific tags can have shades of meaning, and I don't think we want to limit how people tag.
posted by pb (staff) at 12:06 PM on May 26, 2009


soundex would solve all your problems, and result in some hilarious misunderstandings as well.
posted by blue_beetle at 12:56 PM on May 26, 2009


What you would do in this situation is to index your tags / text as both their stemmed and non stemmed versions (using any off the shelf stemmer, porter is fine for "englishy" text.) The search can boost exact matches higher but misses there can fall back on the stemmed version.

If you're not using a FT search backend for tags and are just storing tags as elements in a sql-ish table just store both the stemmed version and the exact version and do an or query on both the input query and its stemmed version. The problem there is that a direct search for "cats" might return pages about a singular cat. Boosting gives the user the impression they have more control over the process...
posted by neustile at 1:02 PM on May 26, 2009 [1 favorite]


on preview, blue_beetle, soundex is not the solution to this problem, and is rarely these days a solution to any problem. You need something with a variable codelength (e.g. double metaphone) and then some intelligent way to deal with multiple words, etc.) But that's way too much work.

A porter stemmer implemented simply would take care of this particular need immediately.
posted by neustile at 1:05 PM on May 26, 2009


Oh you start down that route, and then you want a vocabulary and diambiguation and natural language passing and stuff and before you know it what you're asking for is a Turing compliant AI. And then it kills us all.
posted by Artw at 1:05 PM on May 26, 2009


Won't anyone thing of Rihanna?
posted by Rock Steady at 1:10 PM on May 26, 2009


Just talk of stemming gives me that whole homerdonut drool thing.
posted by jessamyn (staff) at 1:14 PM on May 26, 2009


Won't anyone thing of Rihanna?

Thank you for making the joke I was too lazy to do.
posted by inigo2 at 1:17 PM on May 26, 2009


I came close to requesting the same pony myself, but we were getting up towards two days without a MeTa post and then I forgot and whatever, but anyway I agree that it would be great to make the tagging a little bit fuzzy so that a request for a singular tag also returns a plural one and vice-versa. A minimalist approach would be via a related tags feature, so that http://www.metafilter.com/tags/owl would say "hey we also have 5 posts about owls!" but I'd prefer it to be more: "Yo dawg, we heard you like owl, so we put tags/owls in your tags/owl so you can owls while you owl" and mix them all together.

In any implementation, the potential benefit is people finding what they're looking for more easily (or at all). I don't think any counter-arguments expressed here so far outweigh that benefit.
posted by nowonmai at 1:32 PM on May 26, 2009 [2 favorites]


and before you know it what you're asking for is a Turing compliant AI. And then it kills us all.

So the takeaway here is that we can't do this or there will be a fifth Terminator movie.
posted by cortex (staff) at 2:03 PM on May 26, 2009


Yeah, about those time-travelling robot assassins, you probably should all move and not leave a forwarding address.
posted by Artw at 2:09 PM on May 26, 2009


This post makes me kind of want to dig up my Machine Learning notes - my semi-disastrous final project for that class (a graduate-level course I took my senior year of undergrad) was on identifying synonymous tags in a cloud (eg, not just 'relationship' and 'relationships,' but also 'osx' and 'os x' and 'Windows XP' and 'WinXP.')
posted by Tomorrowful at 8:23 PM on May 26, 2009


« Older Not so anonymous at all, are you!   |   Punctuation buglet Newer »

You are not logged in, either login or create an account to post comments