Blame library school, I guess December 23, 2010 8:31 PM   Subscribe

Metadata for Metafilter Pony request: would it be possible to have a box that pops up that says something like "We noticed you're using tags X and Y, here are some other tags commonly associated with X and Y," to improve the descriptive qualities of user provided tags?

I figure this would probably be pretty hard on the database, and probably difficult to code, but it seems like it would improve the descriptive qualities of tags. I always hesitate when making a post as I get down to the bottom of the page, since I want to make the metadata as descriptive and useful as possible, but I tend to blank when it gets down to that box. Would it be possible to automate the process and have a box of related tags that populates on preview and provides users with helpful suggestions?
posted by codacorolla to Feature Requests at 8:31 PM (33 comments total) 3 users marked this as a favorite

Let me add: I do realize that you can do a version of this by searching different tags and looking at the related tags on that particular page, but it seems as though putting it on the post creation page might streamline the process a bit.
posted by codacorolla at 8:32 PM on December 23, 2010


I'll say I have a hard time with the tags too and it drives me nuts. I'll spend a while thinking of clever headlines and that doesn't bother me, but the tags are frustrating. I figure if my lack of tags gets bothersome people will suggest them to me in memail.

You can add tags to your contacts' posts.
posted by cjorgensen at 8:34 PM on December 23, 2010 [1 favorite]


My first thought is that suggesting related tags based on the tags you've added so far is barking up the wrong tree: there's no guarantee that tags the co-occur often with the tags you've chosen for your question will in fact be tags that are appropriate to your actual question.

What would maybe be more effective is to do some sort of tag suggestion based on the content of your actual question, basically parsing the question text for keywords to correlate to the existing corpus of tags others have used. Which wouldn't have to be particularly database intensive at all (especially considering the relative infrequency with which it'd get used, on the order of a hundred or two times a day at most?), but would still be some work to implement.

It'd be an interesting challenge. How good any experimental version of it would be, and more to the point whether it would be good enough to seem really worthwhile, is an open question.
posted by cortex (staff) at 9:22 PM on December 23, 2010 [1 favorite]


I'll take anal bum cover for $200.
posted by special-k at 9:24 PM on December 23, 2010 [3 favorites]


oops, wrong thread.
posted by special-k at 9:26 PM on December 23, 2010 [2 favorites]


Is there a problem with the current descriptive quality of tags? It seems like some people put in great time and effort into tags and some don't. And I'm not sure a tag-suggester would help the people who don't care much about tags anyway.
posted by pb (staff) at 9:31 PM on December 23, 2010


What would maybe be more effective is to do some sort of tag suggestion based on the content of your actual question, basically parsing the question text for keywords to correlate to the existing corpus of tags others have used. Which wouldn't have to be particularly database intensive at all (especially considering the relative infrequency with which it'd get used, on the order of a hundred or two times a day at most?), but would still be some work to implement

I was talking about the Blue as well, but that is a good idea. There are certain words inside of posts that wouldn't necessarily be common enough to be ignored, but would probably be frequently used and likely not related to the real content of the thread (self-referential stuff like "metafilter" and "the blue")

I've also wondered if you guys had ever considered putting in something to allow other users to suggest tags, and given enough suggestions having the OP be memailed saying "X users thought this might be a worthwhile tag to add." But that also seems like it could be abused.

Just out of curiosity, how much interest and development time do you guys spend on the tagging system? It's a difficult thing to get right, even for people who are trained in descriptive authority work.
posted by codacorolla at 9:34 PM on December 23, 2010


That is: tagging anything definitively is a difficult task, not coding a tagging system. I'm not writing very clearly today.
posted by codacorolla at 9:35 PM on December 23, 2010


Simple formula for tagging:

Relevant proper nouns ( "JohnSmith", "MassiveDynamic", "FordPrius","TheRaftOfTheMedusa")
+ broader subject area ("Politics", "Television", "Cars", "Space")
+ geographical area ("Germany", "London", "Texas")
+ time period ("1940s", "80s", "1992")

Usually covers most bases.
posted by Artw at 9:50 PM on December 23, 2010


Bayesian techniques would be useful in suggesting additional/alternative tags.
posted by five fresh fish at 10:10 PM on December 23, 2010


There are certain words inside of posts that wouldn't necessarily be common enough to be ignored, but would probably be frequently used and likely not related to the real content of the thread (self-referential stuff like "metafilter" and "the blue")

Yeah, you'd need to do some sort of frequency analysis up front on a representative sample of posts (possibly just all posts ever, since it's not that big of a dataset) to establish the relative commonality of terms in a local context.

I've also wondered if you guys had ever considered putting in something to allow other users to suggest tags, and given enough suggestions having the OP be memailed saying "X users thought this might be a worthwhile tag to add." But that also seems like it could be abused.

I would say this sounds like serious overkill. It's a neat idea but it's a solution in search of a problem given that tags are an ancillary rather than a central part of the user experience on metafilter. We want tags to be moderately helpful, but engineering complex methods for crowdsourcing tag submissions is more effort in terms of implementation, maintenance, moderation, and user educating than I think we're at all interested in expending.

Just out of curiosity, how much interest and development time do you guys spend on the tagging system? It's a difficult thing to get right, even for people who are trained in descriptive authority work.

We talk about tagging off and on. We've actually explored the frequency/relevance idea with tagging before, for example—the sidebar on the MyAsk tab on the green is driven by some experimental heuristic tag-matching that me and pb built out a year or two back, based in part on some tag frequency computations I did at the time. It's one of those things that worked well enough that it seemed worth throwing up on that page, but was also not so robust and out-of-the-park great that we felt comfortable making it a part of the main user experience.

I find tagging stuff pretty interesting personally, but as far as what's actually ready for prime time we're generally going to opt on the side of the simplest thing that works well. Something significantly more complex than a current system would need to be really significantly better as well to really justify consideration, and I think pb's question up-thread—is there actually a problem with how tags work out right now?—is where that discussion would really have to start. I don't doubt that we could, if we tried, build a not-terrible tag suggestion system, but is the lack of such a system right now actually causing problems for the site or is it just leaving some duty-minded tag-friendly posters feeling a little underperforming now and then?
posted by cortex (staff) at 10:50 PM on December 23, 2010 [1 favorite]


"HI, I'M CLIPPY, AND I SEE YOU ARE TRYING TO USE TAGS. MAY I SUGGEST...."
posted by edgeways at 2:16 AM on December 24, 2010 [7 favorites]


I always hesitate when making a post as I get down to the bottom of the page, since I want to make the metadata as descriptive and useful as possible

You can add tags after the post has gone up; you don't have to get it right when you make the post. If this hesitation causes you to abort a post that you'd otherwise make, then consider just going through with it and adding additional tags if they spring to mind later or are suggested in the thread.
posted by Rhomboid at 2:52 AM on December 24, 2010


cortex's suggestion of content-related tag suggestions is a good one, and an uber-tag-suggester that looked at both the content and the tags initially supplied by the poster would be a beautiful thing. I think, however, that if engineering time were to be devoted to the tagging, a higher priority would be to improve the tag search system. We discussed this a few weeks back, and there were some great suggestions for being able to refine searches or use Boolean terms etc. Of course, once searching for tags is made useful, then improving the tags on posts will become more important.
posted by nowonmai at 2:56 AM on December 24, 2010


youtube does this, I find that the tags suggested frequently aren't what would be appropriate, maybe 1 out of 5 works.
posted by HuronBob at 3:30 AM on December 24, 2010


The real solution to this problem is to let arbitrary users tag posts. To avoid spam, maybe they can also delete tags. Or possibly a tag will only be added if N people attempt to add it.
posted by DU at 3:38 AM on December 24, 2010


Isn't there already a tag suggester built for mefi, for the back-tagging project? IIRC, that worked rather well..
posted by carsonb at 7:16 AM on December 24, 2010


That was a part of Yahoo's API, I believe (a part which I think pb mentioned has since been shut down, too), not something we had built locally. I thought about DIYing it at the time but that was already functional so I figured, eh.
posted by cortex (staff) at 7:25 AM on December 24, 2010


Yes, we had two versions of the backtagging tool. One version provided suggestions by analyzing the post text with the Yahoo Term Extraction API. (Now defunct, but it looks like they have an alternative through another system.)

My hesitation with auto-suggest in general is that it could limit vocabulary. I think you might get a lot of clicking since it's so convenient. I think there's an argument that could work in our favor since we use tags for retrieval. It's easier to work with a set of 10 tags than 100 if you're trying to recall something. But would limiting the tag vocabulary make the tags more descriptive overall? If that's the problem we're trying to solve, I'm not sure that providing tags for people solves it. If lack of tags altogether is the problem, then auto-suggest would help quite a bit. That's why I think it was better suited for the backtagging project where thousands of posts needed to be tagged.
posted by pb (staff) at 7:35 AM on December 24, 2010


Yeah, the balance of tag variety vs. tag consistency is tricky.

One problem with any tag suggestion system that relies on a pre-computed vocabulary of possible tags is that it will never suggest a nonce tag that might be really apt for a given post (uncommon proper nouns, for example, when someone asks a question or makes a post about the work of super-obscure painter Kresblitz Snogram or whatever).

That said, hopefully that would be among the most obvious ideas a tagger would have themselves when posting.

If we used the whole established working vocabulary of the site as the tag pool, that would avoid the limited-vocab problem to some extent—any word that wasn't a nonce formation in the post text could be recognized as a known word (in the sense of "mefites have typed this on the site before" according to a pre-computed frequency table) and suggested if the heuristic thought it might be especially relevant.

That's still limiting tags to words that have been used in the text of a post, however; suggestions of tags that are semantically related to rather than literally occurring in the post text is a much more complicated problem that I wouldn't even know where to begin with.
posted by cortex (staff) at 7:49 AM on December 24, 2010


This is an interesting conversation, thanks guys.
posted by codacorolla at 7:52 AM on December 24, 2010


There's also, now that Yahoo Term Extraction is no longer with us, the ever-popular Natural Language Toolkit, which is seriously awesome and powerful as far as this sort of thing goes - albeit it's written in Python, and I don't think has a centrally callable API, although I could be wrong. It pretty much can handle anything that you throw at it.
posted by jivadravya at 8:45 AM on December 24, 2010 [2 favorites]


There should be two levels of tagging: major tags from a limited vocabulary, and minor folksonomic tags to provide specialization.

My guess is that the major tags can be derived from an analysis of the folksonomy tags, along with some human oversight. There are going to be broad themes and categories that have arisen naturally; they just need some formal tweaking.
posted by five fresh fish at 9:57 AM on December 24, 2010


Consult this, this, this, this, or this as applicable.

(Why no tag cloud in Projects?)
posted by Sys Rq at 10:09 AM on December 24, 2010


Because that would destroy the universe? Is this a trick question?
posted by cortex (staff) at 10:14 AM on December 24, 2010


Huh. Any reason there's no link to that anywhere?
posted by Sys Rq at 10:20 AM on December 24, 2010


Oh, wait, I found it!

1. Click on a post
2. Click on a tag
3. Click "View popular tags"
posted by Sys Rq at 10:23 AM on December 24, 2010


Probably not, just an oversight I'd imagine. We should probably add a Tags link up in the header.

You can get at it from the page for any given tag, at least.
posted by cortex (staff) at 10:24 AM on December 24, 2010


Yep, just an oversight. I added the Tags link to the Projects header.
posted by pb (staff) at 10:53 AM on December 24, 2010


Just an oversight I'd imagine. We should probably add one million dollars up in my bank account.
posted by cortex (staff) at 11:02 AM on December 24, 2010 [3 favorites]


Artw: Simple formula for tagging:

Relevant proper nouns ( "JohnSmith", "MassiveDynamic", "FordPrius","TheRaftOfTheMedusa")
+ broader subject area ("Politics", "Television", "Cars", "Space")
+ geographical area ("Germany", "London", "Texas")
+ time period ("1940s", "80s", "1992")


Why not add something like this to the new post page? It seems to cover most everything and breaks tagging into simple steps rather than, "Add tags!"
posted by 47triple2 at 3:38 PM on December 24, 2010


i just want to say that while neologisms usually drive me up a wall, i really like the word 'folksonomy"
posted by empath at 6:21 PM on December 24, 2010 [1 favorite]


I hated it when I first heard it. Since then it's proven itself to be a pretty useful descriptor and it's grown on me.
posted by Artw at 8:26 PM on December 24, 2010


« Older And I don't mean spirit like vodka, because I've...   |   Thanks MeFites Newer »

You are not logged in, either login or create an account to post comments