Double Double Toil and Trouble June 16, 2011 1:27 AM   Subscribe

Search seems to be failing me when I check to see if my post is a double. This has happened several times despite efforts to ensure it's not a double.

Example: in my latest post, the description read: "Authorities in Awe of Drug Runners' Jungle-Built, Kevlar-Coated Supersubs".

I duly checked the link, no double there. I also made sure my tags covered the bases, including 'Wired' 'Colombia' 'submarine', 'sub', 'cartel', 'drug', and so on.

To be safe, I searched for "submarine" and "cartel", figuring that if anything had been posted before those keywords would definitely find the first post. Nada.

Yet there had been a post on the subject, with a different link and insufficient tags, and so search did not find it. Its tags included the word 'drugs' while the description contained the plural 'submarines'. I just did a second search for 'submarine' and this post doesn't appear at all.

Since many posts have tags that are often not fleshed out enough (what Jessamyn called mystery meat posts with terrible tags), I'm wondering if there is some way to:

1. expand the search to include keywords from the description area (obviously filtering out common words);

2. adjust the tag search to include plurals, i.e. if I search for the word 'drugs', posts with tags reading only 'drug' will be included, and vice versa (unless this happens already, not certain it does);

3. (if including keywords from the description area) check for single/plurals there as well.

Because then search might well have caught the words 'submarine' and 'drug' (singular) from my tags, or the word 'drug' in my description. I'm not sure if this would put a heavy strain on the server; maybe there is a better way to implement this (I am not a programmer).

I just hate posting doubles, especially when I've tried to do my homework beforehand.
posted by bwg to Bugs at 1:27 AM (26 comments total)

But I don't know what you want out of this.

I thought I was pretty explicit about what I wanted, hence the 1., 2., 3. ...

Is this your way of saying "hey people...tag it 'submarine' not 'narcosub'!!"

It surely wouldn't hurt if posts were better tagged. Now I have no problem with 'narcosub' (because hey, that does sound cool) but a couple of normal word tags would have helped.

Just sayin'.

posted by bwg at 1:51 AM on June 16, 2011 [1 favorite]

Firstly, I think it's always worth using google to search to see if something has been posted before. use "" to reduce your search to only metafilter.

Secondly, doubles happen. It can feel personal if it happens to you, but don't let it upset you or put you off. It's just one of those things. You did your due diligence, and nobody is going to think lesser of you because you wanted to share something cool with the internet.

Well, the hipsters may feel less of you, but anyone who says anything equivalent to "Yeah, this was cool when I read about it two years ago." is a dick.

You did good. It didn't work out, but failure is OK too.

posted by seanyboy at 2:00 AM on June 16, 2011 [1 favorite]

Yeah, maybe I have to get into the habit of searching Google first.

I just felt the internal search is a bit limited and perhaps a tweak might help cut down on doubles is all.

And no worries on it feeling personal, not after being around here for almost 11 years.
posted by bwg at 2:35 AM on June 16, 2011

with respect to keywords and plurals, without knowing the backend implementation here, I'm a big fan of either restricting keywords to a controlled vocabulary (not viable here) or alternatively storing a stemmed version of entered keywords and matching on that rather than the literal text.
posted by russm at 3:16 AM on June 16, 2011

russm: "... or alternatively storing a stemmed version of entered keywords and matching on that rather than the literal text."

That might work, but again I'm not a coding wizard.

And the question is (you know it's coming), can it handle narcosub?
posted by bwg at 3:52 AM on June 16, 2011

I agree that there's not much to be done; doubles happen, but I also agree that poorly-tagged posts (which seems to be growing in number, or maybe I'm just growing grumpier, heh heh) are irritating, and they hurt the site in a small way, not just searching for doubles, but when people search through tags to see how topics have been covered.

Personally speaking, I do the latter all the time, especially in, and I always wonder how many posts I'm missing by doing so, because people haven't bothered tagging their posts with anything remotely sensible.

I don't actually think the orginal narcosubs post was badly tagged at all compared to, say, the five worst offenders I see on any given day. Gman was good enough to tag "drugs", which I think was most important, but I would love to see a fix or six tag mandatory minimum. But maybe people would then just put five or six really stupid tags before posting - which certainly already happens at times.

I dunno, I think this is a small, human problem, and I can't see a big, automated solution to it, really.
posted by smoke at 4:36 AM on June 16, 2011 [1 favorite]

You'd think a multi-million dollar enterprise like Metafilter could afford to not have a crappy search function.
posted by crunchland at 4:56 AM on June 16, 2011

I've noticed that you can add tags to posts by people who are on your mutual contacts list. I've never done this, but would it be frowned upon if people actually did?
posted by Dumsnill at 4:59 AM on June 16, 2011

Why are you still discussing this? From the moment you hit "post" you should have been working on your next FPP. You're never going to win MetaFilter by dwelling on the past. Go go more favorites go!
posted by Eideteker at 5:11 AM on June 16, 2011

"I've noticed that you can add tags to posts by people who are on your mutual contacts list. I've never done this, but would it be frowned upon if people actually did?"

Yes. Matt implemented that feature (well, pb) as a trick. If you actually use it, you are a very bad person.
posted by Eideteker at 5:13 AM on June 16, 2011

It seems to me that MeFi search is actually somewhat broken.

Search for "submarine drug" in MeFi.

The only post the search pulls up is one by Burhanistan in June 2010.

It does not bring up this one (the original that bwg doubled), which is clearly tagged both "submarine" and "drug".

You can find that post by looking for comments containing "submarine" and "drug", but not via a search of posts.
posted by Salvor Hardin at 5:25 AM on June 16, 2011

Eideteker: "If you actually use it, you are a very bad person."

[Contacts Eideteker, waits for mutual contact, retroactively adds hundreds of tags ...]
posted by bwg at 5:30 AM on June 16, 2011

That might work, but again I'm not a coding wizard.

oh, that's a job for pb not for you... :)

And the question is (you know it's coming), can it handle narcosub?

well porter2 stems both "narcosub" and "narcosubs" to the singular "narcosub" so yeah sorta... it's a way of matching across singular/plural and other word variants, but it won't work out that "narcosubs" means "drug & submarines"... for that you'll need skynet...
posted by russm at 5:47 AM on June 16, 2011

russm: "... but it won't work out that "narcosubs" means "drug & submarines"..."

Either that or Subway just came out with an awesome new combo.
posted by bwg at 5:50 AM on June 16, 2011 [1 favorite]

Dumsnill, I do that sometimes. Not often because it feels presumptuous.

tbh, there have been times when I wish I could tag some posts from ppl I haven't friended. Like smoke, I use the tags to search a lot. So most of the time, I try to tag my own posts thoroughly.
posted by zarq at 6:10 AM on June 16, 2011

I just wanted to throw my support behind narcosub.
posted by shakespeherian at 6:29 AM on June 16, 2011

And the question is (you know it's coming), can it handle narcosub?

NOTHING can handle narcosub.

As much as it might suck to have internal search be worse than Google for things like this, it's a reality we're probably going to be stuck with, as even a complete overhaul, while beneficial, might still leave internal behind Google. Google, like Barry, is VERY good.
posted by SpiffyRob at 6:34 AM on June 16, 2011

I've been lurking without logging in for the past week, because I'm on vacation and I shouldn't be on the internet in the first place. But searching for "narcosub" (or anything else) without being logged in gives me this Google error: We're sorry...but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

Now that I've logged in it works fine (i.e. "narcosub" returns a bunch of comments from this thread, and one lonely tag). Has the internal search always been restricted to those of us in the $5 Club?
posted by Chichibio at 7:05 AM on June 16, 2011

I've noticed that you can add tags to posts by people who are on your mutual contacts list. I've never done this, but would it be frowned upon if people actually did

It's totally fine to do this. Please don't do it to make jokes with other people's posts.

And yeah the search is restricted to members and yeah we should probably find a way for it to catch plurals.

That said, yeah, doubles do sometimes happen and while we try to have some tools available to keep them from happening a lot, we usually do recommend a Google search limited to so that we don't have to reinvent Google.
posted by jessamyn (staff) at 7:11 AM on June 16, 2011

I'm curious, why has Mefi opted for an internal search that is not based on Google's site search? It seems that may solve a lot of the search-based issues (and perhaps even render tags irrelevant for searching, though I think tagging is a pretty sweet categorization feature).
posted by lesli212 at 7:12 AM on June 16, 2011

That's cool Jessamyn, if the plurals gets enabled that'll help, and from now on I'll try to use Google first; with luck it will kill most doubles for those who are at least trying to avoid them.

posted by bwg at 7:17 AM on June 16, 2011

Google's site search is kinda terrible. Plus it doesn't search member activity well.
posted by zarq at 7:30 AM on June 16, 2011

I think a lot of the frustration with the internal search stems (sorry) from how it is different from Google. We know Google is out there and the best at what it does. We even have a link to the same search at Google at the bottom of every internal search result. So the internal search focuses on what we can do that Google can't.

Google indexes entire pages. So if the word "submarine" appears in the comments, but the not the post, that entire page is going to come up in Google search results. By contrast, the internal search is looking at discrete pieces of information: individual posts, individual comments, tags. And we even separate these bits into separate searches. So the internal search is good for precision. When you want to search only post text, or only comments.

If the internal search was dedicated to searching for doubles, it'd make sense to cast a wider net for each search. But people are searching in lots of different scenarios, and sometimes that precision is useful. When it's not, we have Google. We're hoping the two compliment each other.
posted by pb (staff) at 7:33 AM on June 16, 2011 [5 favorites]

pb, I appreciate that.

The purpose of my post wasn't to gripe, only to try to point out an issue I kept noticing in the hope that perhaps there was a tweak that might improve the internal search overall.

Just a thought, but if it helps reduce doubles (not just my own), then it was probably worth mentioning.

posted by bwg at 7:44 AM on June 16, 2011

I'm curious, why has Mefi opted for an internal search that is not based on Google's site search?

Because we can tune it to some specific strengths on things that Google is bad at, is the main reason. There are things Google does better, there are things that site search does better (most of them having to do with having search know directly about our site structure and under-the-hood stuff the way Google can't), and using a combination of the two is the best thing you can do if you really want to be thorough or power-usery.

And like Jess says, ultimately we're going to have doubles and we consider to be really, really not a big deal. No one gets in trouble for doubles. It can be a bit of a bummer to work on a post and end up having it deleted, but the best general advice I have is to scale the amount of pre-posting search you do to the amount of effort involved in building the post. If it's gonna be a whopping essay, search hard and well first to save yourself the heartbreak.
posted by cortex (staff) at 8:10 AM on June 16, 2011

The best thing about site search is that it searches within HTML tags, and substrings match. So for example if you want to see every comment where someone has Rickrolled, search for dQw4w9WgXcQ as that is the ID of the video on youtube. This is great because youtube URLs can have all kinds of extra crap in the URL like &feature=related or these days they tend to be made using the hideous evil abomination, which means searching for a full URL is a fool's errand. Even though google claims to have a link: operator, it never seems to work for shit and even if it did, it requires the full URL, not a substring.
posted by Rhomboid at 11:03 AM on June 16, 2011

« Older InfoDump data manipulation?   |   I can't believe it's been two time... Newer »

You are not logged in, either login or create an account to post comments