So I decided to build it myself...(and then it died) July 16, 2004 3:28 PM   Subscribe

FineIllGetMyOwnDamnPonyFilter - A couple days ago I became cranky that AskMe didn't have several features I felt were vital to making it work right - A keyword/topic search, categories, an area to summarize the answers, a way for the poster to sum things up, a moderation system so that quality rises to the top, a way to find if my question has already been asked, etc. I'm sure Matt will build this stuff eventually, or not. So I decided to build it myself. [more inside]
posted by y6y6y6 to MetaFilter-Related at 3:28 PM (39 comments total)

Here's the interface for viewing specific question detail.

Here's the page where you can drill down through categories or do a keyword search. I have the category tree opened to the Home Theater category there. In the category tree the first number is the total number of questions in the tree below the point. The second number is the count of questions in the folder. You view questions in a category by clicking on the second number.

As you can see I only have about 20 questions in the system right now. The hard part is summarizing all the answers. Putting things in categories doesn't take long, but has to be done manually of course. Ditto for keywords. With 3000+ questions to go through, this will take me forever.

Who wants to help get questions sorted and summarized?

I have the detail page set up so that if I give you admin access you'll be able to assign categories to a questions, and edit the keyword and conclusions fields. If I had a couple dozen people who wanted to take on 150 questions we could work through the backlog in a couple weeks or less. After that it shouldn't be too tough for a few people to keep things current.

I'm not ready to handout admin status this minute, but certainly in a couple days. I want to build out the categories a bit more first.

I haven't started on moderation yet, mainly because there aren't categories with enough questions for it to make sense. I also plan to build the search out so that you can do some smarter searches on it.

For many questions this format won't be very valuable. "What this song" questions aren't something you'll try to drill down for. But if we have a category for things like digital cameras, and gift ideas, and css help I think we can cut down on the number of repeats and add value to AskMe as a whole.

I've really just grabbed the low hanging fruit here from a coding standpoint, which is my main reason for mentioning it now rather than waiting until it was more built out. What other features would people like to see implemented? What other ways can we categorize things so that answers are easy to find? How can I make this more valuable?

Also one question I'm asking myself is whether to leave overly specific questions out of the system to save time. Since only one person is ever going to ask the "What this song" question, does it make sense to drill down, or search, to get to that question?
posted by y6y6y6 at 3:29 PM on July 16, 2004

Fuck, dude. You are a serious masochist. But thank you anyway.
posted by Wulfgar! at 3:49 PM on July 16, 2004

I'll never understand this geek-fetish for categorising, tagging and sorting everything.
posted by reklaw at 4:04 PM on July 16, 2004

I understand it in principle, but the sorting is only as good as the tagging, which is never perfect. Tagging is a lot of work just to do halfway decent (not to mention filled with subjectivity issues).

An impressive undertaking, y6. How does your portal acquire MeFi content? The RSS feed?
posted by scarabic at 4:14 PM on July 16, 2004

"Tagging is a lot of work just to do halfway decent"

I'm thinking if this is something that gets used it would be worth it to have several sorting or tagging systems going at once. Perhaps one as i have already, and another based on the Dewey Decimal System. Or whatever.

"(not to mention filled with subjectivity issues)"

I've made it possible for admins to easily put questions in more than one category. And it's expected many will end up in several. If the process could easily be made democratic if it's something many people use.

And it's a good idea to keep in mind this doesn't have to be limited to AskMe. The same code could be made to categorize MeFi, other forums, etc.

"How does your portal acquire MeFi content?"

I've been very slowly grabbing questions. A robot grabs one every 30 seconds. Then it's stored in the database locally. Which also lets me do fancy-pants datamining. Like all questions under computers from March '04 where scarabic gave an answer.
posted by y6y6y6 at 4:30 PM on July 16, 2004

They don't like questions where it can't be easily categorized.
posted by Keyser Soze at 4:58 PM on July 16, 2004

I'm interested to see where you take this, y6. The subjectivity question is one to tackle head-on now, rather than waiting until it becomes a problem.

The basic bind in all such open taxonomy projects is that we implement user-submitted tagging systems so that we can distribute the labor to a large pool of people, but with any large pool of people, the chances of disagreement or subjective difference increase.

One of the best things you can do is employ a taxonomy with broad top-level categories. Level 1 of the taxonomy should be like 6-8 categories. It matters exponentially less if people disagree below level 1, so try to make that as impossible as - er, possible.

The key is to consider how people will use the categories on the front end. This means thinking more in terms of "channels" someone might want to subscribe to, content-based fields of interest. The opposite is a verbose, incredibly granular museum of perfectly categorized specimens that only scientists can navigate.

My suggestion is broad top-level categories, followed by a fairly extensive second layer, with free-text keywords below that. This eliminates at least some of the most glaring cross-categorizations at the top level, provides for all but the most niche concerns at the second level, and leaves the system open to infinite interpretation (via search) at the 3rd level.

/pontification from an old taxonomist

As Keyser points out, you may have to also factor in MeFi ettiquette when you form your taxonomy. Some categories may be obvious, but should never be implemented.

To say the least, the content you're likely to deal with will span beyond any existing published taxonomy. We're talking help with all products ever sold, questions on all metaphysical subjects ever thought of, questions about all geographical locations in the world, and memory assistance with all literature/entertainment ever published.

Watching with interest!

(or is there something I can start doing? if so - make it clearer please)
posted by scarabic at 5:55 PM on July 16, 2004

I know nothing about taxonomy. And I'm not ready to give that the attention in really requires. For the time being I'm trying to put things in a structure which is probably too granular. I do that with the knowledge that I can build admin tools to easily move categories from one part of the tree to another. So it will be easy to reorganize later as long as most of the work is moving/combining, rather than dividing and moving.

Right now I'm seeing patterns that match the AskMe style more than they'd match another general knowledge base.

And I'm still plugging away. I think I'll want to start handing out admin rights Sunday or so. But if you wanted to suggest a taxonomy you thought would work, that would be extremely helpful.
posted by y6y6y6 at 6:15 PM on July 16, 2004

Below is a suggestion of an edit to your top-level taxonomy. I've actually spent more time than most trying to manage taxonomies and I am willing to help you as much as you'd like.

I have included 2nd level categories only to help flesh out and explain my edit. I hope they help explain what I mean by things like "Science and Technology."

It's also helpful if, on the back end, taxonomical structure can be changed. For example, don't set things up in such a way that it's difficult to move "electronics" out of "home and garden" and into "technology" down the line, if that's what you decide to do.

This is still 8 categories (the max I'd recommend), but I think they're tighter.

Science & Technology
- Electronics
- Computer Hardware
- Computer Software
- Propgramming
- Physics
- [more...]
Recreation & Travel
- Sports
- Hobbies
- Things to do when visiting...
- [more...]
Entertainment & Arts
- Books
- Movies
- Music
- [more...]
- Food
- Beverages
- Gardening
- Repairs
- [more...]
Family & Culture
- Parenting
- Conflict Resolution
- General Philosophy
- Milestones
- [more...]
- Local Services References
- Outdoor Pursuits
- Legislative Issues
- [more...]
- Mental Health
- Substance Abuse
- Reproductive Health
- Dealing with Doctors
- Insurance
- Diet and Exercise
- [more...]
Jobs & Public Affairs
- Workplace Ethics
- Finding a Job
- Running a Business
- Politics & Activism
- [more...]
posted by scarabic at 6:29 PM on July 16, 2004

Very cool. Thanks.
posted by y6y6y6 at 6:38 PM on July 16, 2004

Mister 6: Taxonomies and other metadata hierarchies are, as people have said, subject to taste. For example, I'm not convinced the tree should be deeper than two levels for an endeavor such as this.

I'm perfectly happy looking at a somewhat larger list of results for "Electronics" rather than having to poke down into subsubcategories "Consumer," "PCB design," "Audio" and so on.

It's likely it would be more intuitive to have a plethora of categories at depth 2 and let other metadata fields do the differentiation within them.

Also, the beetle's taxonomy is better than yours.
posted by majick at 6:57 PM on July 16, 2004

Thanks, majick. I'm a content man, and the y6 clearly has an engineering background that I don't.

I did some work on a large site where I had to organize taxonomies. Oddly enough, it became clear to us at a certain point that the "Books" people didn't really want to hang out with the "Music" people, even though Books/Music are both classified, quite logically under "Entertainment."

But we did find, frustratingly enough, that the "Books" people had quite a bit of overlap with the "Cooking/Recipe" people, and the "Music" people overlapped significantly with the "Travel" and "Restaurants" people. Hrm. What to do? The taxonomy should serve the users, not the data.

You're always going to get screwed in the end, though. The human equation is unfathomable within a single website. Which is to say that there's a fine line between sqeezing your content into categories and squeezing your users into categories. Don't try to do the latter as it will not only fail but alienate your adience. That's part of the reason I suggest keeping the taxonomy to two broad-ish levels (plus keywording). Less work for the taggers and the taxonomist, plus flexbility for the users/subscribers.

Besides, most people never click on taxonomies anyway. They just resort to search 95% of the time. That claim is backed up by a high volume of experience on my part, too. Believe me. Keep the taxonomy minimal, and support multiple classification for any object. Also support keyword synonyms. Tomayto/Tomahto.

Mainly, though, focus on keywording. It's the only bridge between the broad categories and niche interests. Most user needs are best served by a search query, not a taxonomy click. Since all taxonomy assignments can be search indexed, though, you can simply roll your taxonomy into your keywording/search infrastructure, and focus on making that as good as it can be.

Simple! ;)
posted by scarabic at 7:04 PM on July 16, 2004

This is all gold. Thanks again.
posted by y6y6y6 at 7:14 PM on July 16, 2004

I don't see a hot monkey sex category.

This is, I feel, the tragic flaw in the whole taxonomy.
posted by five fresh fish at 7:28 PM on July 16, 2004

Family & Culture
- Sex
- Pregnancy
- Sexual Practices & Fetishes
: freetext: Hot Monkey Sex
: freetext: Hot Fish Sex

## paging fishfucker and FFF ##

*bang!* didja see 'em bonk heads? I love it!
posted by scarabic at 7:38 PM on July 16, 2004

scarabic: That's a nice tree, but it looks more appropriate for a library than Ask MeFi. Computer questions deserve to be a top-level category, both by frequency and semantics (the difference between "why do objects fall down instead of up" and "why does my computer keep going beep" is huge). And the hardware/software divide isn't really useful I think: How about computer problems ("I get a little clown face instead of a bunny rabbit every time I click the teacup icon") vs computer advice ("What kind of teacupomatic should I buy that supports usb 2.0?").
posted by fvw at 8:04 PM on July 16, 2004

Just FYI - Most of the questions I've done so far have been computer questions. If half of everything is computers, it surely needs a top level spot.
posted by y6y6y6 at 8:12 PM on July 16, 2004

I didn't see a "My Fucking Cat" category.
posted by Stan Chin at 8:14 PM on July 16, 2004

That would go under reproductive health (animals).
posted by fvw at 8:40 PM on July 16, 2004

"Technology" is a top-level spot. The problem with elevating "Computers" is this: what do you do with cameras, electronics, stereos, tivos, PHP, CSS, and all the other technology-related questions that arise? "Computers" doesn't hold it all. I'm willing to concede that perhaps "Science" and "Technology" need not be paired together, but there's no good reason why Computers has to be at the top-top level. We all know there are plenty of computer-related questions, but still, the top-level has to have some balance and breadth. Computers might be 95% of the "technology" questions, but that doesn't mean it *has* to be promoted to the top level. Organization!

How about computer problems vs. computer advice

Well, I hear what you're saying, but that's two top-level categories right there. Your top-level is going to be huge before you know it if you keep going at that rate.

The software/hardware divide is less about library categorization and more about people's interests. Some poeple are gearheads, others are power software users. Some are into tinkering and optimization, others are into games and apps. Some people are both hardware and software mavens, but there is a difference between the two, and this is the kind of widget people like to sort by. Again, I was thinking more of "channels" people might want to subscribe to, and less about strict library categorization.

Your "problems" vs. "advice" is less useful in that regard. No one considers himself a "computer problems person," competely apart from a "a computers advice person." You're working on strictly categorizing the questions, there, and not working on delivering content in recognizeabvle themes that people care about. Think of the people who describe themselves as "overclockers" or "mod geeks," or "admins" or "hackers." Do you see a software/hardware thing going on there? I do.

As far as promoting Computers to the top, I don't think it's necessary. Anyone who's willing to subscribe to computer hardware *and* software is probably willing to toss in electronics too.

There are other possible taxonomies, based more on question type:

Fix It Questions
Family/Relationship Advice
Travel and Regional Inquiries
Scientific / Philosophical / Ethical Problems
Trying to Find or Remember Something
Health & Legal Pointers
Learning Resources

These are less about the content area, and more about the style of question. But again, I still don't see "Computers" as one of the top categories, even though it's very common.
posted by scarabic at 9:39 PM on July 16, 2004

Hasn't "self policing" pretty much run it's course by now? I mean, it worked great when we had lots of new people coming in everyday. But today's MetaFilter is made up of people who have *all* been self policed for well over a year. None of us are going to change our ways. Recent attempts at self policing seem to be getting embarrassingly pointless. Can we stop calling it "self policing" and start referring to it by the more accurate term - "masturbatory rhetoric"?
posted by angry modem at 6:26 AM on July 17, 2004

Looks cool but a lot of work to maintain. You spelled Health wrong, unless it's intentional to call it Heath.

Also there is no 'Fucking my Cat' category.
posted by sebas at 7:29 AM on July 17, 2004

fancy-pants datamining. Like all questions under computers from March '04 where scarabic gave an answer.

"fancy-pants datamining", eh? sounds remarkably like them creaky old SQL queries we used to have to use back in the abacus days...
posted by quonsar at 7:35 AM on July 17, 2004

Can we stop calling it "self policing" and start referring to it by the more accurate term - "masturbatory rhetoric"?

i think you are mistaken, angrymodem: it's "self-polishing". you know, the kind of thing that turns a straight SQL statement into fancy-pants datamining? :-)
posted by quonsar at 7:38 AM on July 17, 2004

"Hasn't "self policing" pretty much run it's course by now?"

What? Are you saying that I'm a great poofy hypocrite? Right you are sir. Refuse to loath me at your peril.

New Taxonomological structure in place. Mostly scarabic with a bit of other suggestions. And a heavy dose of, "it's mine, so I'm doing it this way." Importing questions rapidly now.
posted by y6y6y6 at 8:51 AM on July 17, 2004

Okay. New idea.

I'll make it so that you can view the tree several ways. The tree will be created so that it will go down several levels. Like 4-5 maybe. But you can also display the tree so that anything under level two will get promoted and show up under the top level categories.

I'm also going to make it so that your can see all the entries under any part of the tree. So if you click on one of the top level categories you'll see everthing at that level or lower.

So "drill down" sort of folks can get lots of granularity, and "just the facts" folks can get a big dump.

I'm also going to let results be sorted by popularity, ranking, date, and poster.
posted by y6y6y6 at 10:34 AM on July 17, 2004

give this man a pony! : >
posted by amberglow at 10:49 AM on July 17, 2004

I don't understand how a project like this relates to self-policing. This is a wholly external lens for viewing AskMe content different ways. Was angry_modem just taking the piss?

Looks good, y6! Nice revision.
posted by scarabic at 11:34 AM on July 17, 2004

"Was angry_modem just taking the piss?"

It's a MetaTalk post I made about a year ago. I think he was trying to say something about my worth as a human being. Perhaps a case is being made that I'm a serial whiner. Or perhaps perpectual jackasses have no business engineering sister sites. All true I'm sure.
posted by y6y6y6 at 11:54 AM on July 17, 2004

All this talk of taxonomies has me strangely... curious (I'm not a taxonomist myself, mind you, but many of you seem to be)(not that there's anything wrong with that): I understand taxonomies to the extent that I'm asked to use them, and some feel "right" and serviceable, and some feel imposed and inflexible. So I got to wondering, given the culture here, wouldn't the most appropriate taxonomy be one that's "organic" - that is, one that grew with the content? Every time somebody created an AskMe post, they'd be asked to categorize it. They'd be given the option of creating a category, or, re-using one already created by someone else. They'd have the capability of tagging an item with n categories, as many as were applicable. The community organizes its information as it goes. Or is that just antithetical to the concept of taxonomy in the first place? Is the work you're doing here - analyzing an exant body of content and organizing it - sufficient to provide for a future in which the nature of the content has a fairly serious potential to be completely different from whatever's in there now?

(Yes, I suppose I could go look all this up, but I'm interested to hear what MeFites have to say about applying this to our "particular" culture...)
posted by JollyWanker at 12:40 PM on July 17, 2004

(Well, duh... I forgot the whole "Nice job, y6!" part...)
posted by JollyWanker at 12:41 PM on July 17, 2004

'Is the work you're doing here sufficient to provide for a future in which the nature of the content has a fairly serious potential to be completely different from whatever's in there now?"

Nope. It will require plenty of maintainence. I'm thinking it will probably be a wiki type thing where people can add things, but others can come in and "fix" it.

Other applications I've written like this have new items go in as "pending" which means they be active, but it's expected an admin will check in and either approve them or make changes. And of course there's the issue of this being a different website that has no access to the AskMe database. So people wouldn't be able to choose a category when they posted a question. But I really do assume Matt has all this in the pipeline, so who knows what he'll come up with.

The structure I'm using for the tree is something I've been using for years in various forms. It really is fairly easy to make tools to reorganize things later, as long as you don't need to go from very general to very ganular. So I'm not engineering things as much some might think prudent.

And yes, questions can be filed under multiple categories. So far (I have about 215 in so far) I'm finding most questions fit well under more than one things.

And I'm not doing the answer summaries or keywords right now, just sorting into folders. So the search feature, which as scarabic mentions is probably the main way people will access questions, isn't really working much. But categories are interesting. And keywords are largely grunt work. So, that's that for now.
posted by y6y6y6 at 1:01 PM on July 17, 2004

It's not antithetical to let the community dictate the taxonomy, Jolly One, it just doesn't always work well. If you start with zero taxonomy, and ask people to categorize their own posts as they go, you're going to get a zillion different perspectives on what a category is, where a certain question should go, what's under what, how narrow/broad categorizations should be... People are notorious for not wanting to pick a category. They'll always tell you "yeah, I know my question is kinda about homebrewing, but *really* it's a yeast question, which has other applications. Can't you just create a new category for that?"

Creating new categories is always the amateur's reaction to an object that's difficult to categorize. While that's occasionally appropriate, much of the time it isn't, and it leads to taxonomy bloat. A healthy taxonomy responds to the needs of its users, but under the management of a moderator or designer. It must have a flexible infrastructure. But if you really want to harness and focus the input of a lot of users, it helps immeasurably to give them a content structure to start working within and grow that in a managed way, as necessary.
posted by scarabic at 1:36 PM on July 17, 2004


The problem with doing an organic taxonomy, something I've been tempted to suggest myself, is developing relationships between arbitrary bits of information. In this case, it's keywords.

The keywords or other metadata, is the most useful way of developing an organic taxonomy. You, and others who've suggested likewise, are correct in that the taxonomy will be limited in its usefullness over time. It will need to change, otherwise you'll be cramming things into catagories that don't quite fit. The major issues with the organic taxonomy is how do you, in a useful manner, develop relationships between the metadata and the objects in addition the metadata in relation to other metadata. Is a thread about CSS related to Internet Exploerer or Cross Browser Compatability? Which is the parent? When do you have a subcatagory? What about multiple relationships?

(scarabic is spot on with user moderated taxonomies, especially authors themselves. Misspellings, typos, different spellings, etc will lead to useless bloat and increased inefficiency.)

This isn't anything new or even necessarily difficult, but it is easy to do wrong. There are various algorithms that could work behind the scene, including bayesian, genetic and other text filtering algorithms. The most successful organic solution would allow for constant feedback, in some moderated form. "I guessed this should be catagorized here. Is it right?"

An organic solution is a long term solution. If y6y6y6 is interested in implementing one at some point, it will take a different schema to hold the metadata and relationships and potentially a second schema for fast access for searching and UI hooks. A shallow and deep pool, if you will. (I don't know how much data there is or how fast the equipment is, but allowing for the hierarchy to be processed asynchronous to user interaction would be a more efficient use of resources from an end user perspective.) If he's game, I'd pitch in.

y6y6y6, thanks for taking the time and putting in the effort. If you need volunteers - a coder, designer or just an extra pair of hands or some free brain cells, just speak up!
posted by sequential at 1:51 PM on July 17, 2004

I'm happy with what I have for the schema driving the categories right now. If this gets used a lot, or it gets extended to other things I certianly want someone smarter than I am to build it out. What I have right now won't scale well beyond the size AskMe is now. But it's important to keep in mind that most of the value add sites for MetaFilter don't get used much. Lets see whether it gets used or not.

I'll be passing out admin rights tinight. I'll email the people who've shown interest here, as well as anyone who emails me.

You'll need an account to be promoted to admin. If you registered at MFDistilled before, or you created an account for the Matt's gife thing you already have an account.
posted by y6y6y6 at 2:27 PM on July 17, 2004

Oen "organic" option is to allow people to create categories or subcategories on the fly, so long as a moderator can "prune" them later on. Combining dupes, moving members of one category into another, etc, can resolve many of the problems.
posted by scarabic at 4:16 PM on July 17, 2004

Okay. Admin emails sent. Let me know if you want to help work on this.
posted by y6y6y6 at 8:03 PM on July 17, 2004

Your 'benevolent dictator' overseeing the categorization sounds reasonable enough, but I wonder if the MeFi culture operates that way? For that to work, that editorial hand would have to be exercised with the same restraint with which Matt 'moderates' MetaFilter itself - erring rather more often perhaps on the side of personal expression than conformity.

Then again, wouldn't allowing multiple tags per item resolve contention with an obviously mutually beneficial compromise - if an author feels strongly about a particular tag being attached to their item and the 'tag editor' feels it should be something else, simply attach both tags and move on?
posted by JollyWanker at 2:23 PM on July 18, 2004

Yep. Mutliple tags. I'd say about half the ones I've done so far have two or more categories. Here's a good example. it doesn't make sense to create a category for that sort of thing unless there are many questions, and there doesn't seem to be. But it fits in the intersection of those three very well.

I'm happy to have the categories rather fuzzy.
posted by y6y6y6 at 3:29 PM on July 18, 2004

« Older Proxy Cookies   |   How can I avoid making double posts? Newer »

You are not logged in, either login or create an account to post comments