Vetting for doubles. January 12, 2011 10:35 PM   Subscribe

This is the second time I've made an FPP that's been a double and it hasn't turned up in my searches before posting, or warned me in preview that the site has been linked to before.

The first time was because the URL I linked to was after the / and the second time was because I included the www. in the link.

How can I avoid making this mistake in the future or is there a way that the search function can be improved to help others avoid the same mistake?
posted by empatterson to Bugs at 10:35 PM (55 comments total)

Oh, man, if you hadn't said anything, no one would have known. Now if you make a third like that, you'll get banned and have to start from a new name! (It's true! I used to be ParisParamus!)
posted by klangklangston at 10:47 PM on January 12, 2011 [9 favorites]


Pull the other leg, it's got bells on it ;)
posted by empatterson at 10:49 PM on January 12, 2011 [6 favorites]


The first time was because the URL I linked to was after the / and the second time was because I included the www. in the link.

I think you can avoid that by searching for just the actual site name in the URL (for instance, in your most recent post, you could search for "thisman" only) rather than trying to add the www or the http
posted by amyms at 10:51 PM on January 12, 2011


shit happens
posted by philip-random at 11:21 PM on January 12, 2011 [2 favorites]


Pull the other leg, it's got bells on it ;)

That's how we keep the neighborhood birds safe.
posted by wayland at 11:21 PM on January 12, 2011


I'll admit that I did this recent post in the 'leaped before I looked' spirit, but I guess what I'm really trying to ask is why there were no matches on preview.

When results are returned against previous posts to see if the URL has been referenced before, does the script look for an exact string match or will it do a partial string match to prevent cases where the user has either included/excluded the www?

Also, if 'domain.com' has already been referenced but the user has linked to 'domain.com/page.php' does the script take that into account as well?
posted by empatterson at 11:25 PM on January 12, 2011


Eh, you get used to having doubles deleted. The as you post search is imperfect, and based on the URL and tags for the most part. And at this point in the history of the internet and metafilter double posts are just going to happen.

The best solution is a client-side solution. Accept your doubles gracefully so that it doesn't get in the way of posting cool, and thank the mods and elephant-memory users who help keep metafilter fresh and interesting.
posted by loquacious at 11:30 PM on January 12, 2011 [4 favorites]


Oh I'm not embarrassed about posting doubles. I understand that with so many members it happens and it's not a big deal. I'm delighted by and grateful to the elephant memory types that keep MeFi self regulating.

However, pb says in reference to a similar question:

I think trying to parse every potential variation of every URL would lead to more problems than it would fix things.

So I get from that thread that anything after the / is a problem. But what about when it comes to including/excluding the www?
posted by empatterson at 11:39 PM on January 12, 2011


Forget target URLs. Use Google to search for the subject or for a specific string that is likely to have been quoted, maybe an unusual name.

"ever dream this man" site:metafilter.com

The Advanced Search page gives you other options. Use the "Search within a site or domain" field to specify metafilter.com.
posted by pracowity at 11:54 PM on January 12, 2011 [1 favorite]


If you're asking how to avoid making duplicate posts, then you already have your answer: use the site search feature on the base domain name without www, or if it's a popular domain like youtube then use the site search on just the video ID.

If you're asking if the duplicate checker can be modified to try to be smarter, then I think the answer is 'no' because there are always variations that will trip it up and it's easier just to delete the occasional double post and move on.
posted by Rhomboid at 11:56 PM on January 12, 2011 [1 favorite]


Loquacious, Rhomboid - great answers. I guess I was getting a bit pedantic - sorry about that.

So to summarize, I guess the simplest solution is that the user just needs to do some due diligence and be smart about how they search before posting. Maybe adding a bit about how to avoid doubles in the FAQ or posting guidelines could be the best way to address this issue?
posted by empatterson at 12:16 AM on January 13, 2011


Time for pie?
posted by fixedgear at 3:43 AM on January 13, 2011 [1 favorite]


I'm pretty sure the preview does disregard whether you include the www. I just tried a post preview using the domain you were posting, without the www, and it turned up both the old post (with the www.) and yours that just got deleted (without).
posted by Gator at 4:16 AM on January 13, 2011


Are you searching for the URL of the web site? Because maybe that's the problem -- not everyone mentions the URL within the text of their post. Sometimes they don't even mention the name.
posted by EmpressCallipygos at 4:35 AM on January 13, 2011


But the site search searches the text within tags, so that's no problem. (It will say "keyword in html" next to the result.)
posted by Rhomboid at 4:46 AM on January 13, 2011


Assuming you're not linking to someplace like cnn.com or youtube.com, domains that have been linked to a bazillion times, what you can try is this: before you do anything else, just enter in the domain name of the site you're linking to into the LINK field, and press preview. You'll get a lot of alerts and errors about fields needing to be filled, but you'll also pull up all the posts in the past that have linked to the same domain name. It's not perfect, but it would have saved you the trouble in the cases you sited, where your url was only slightly different from the ones previously posted.
posted by crunchland at 5:09 AM on January 13, 2011


Last night I made a post, almost published it, realized I had missed a closing parenthesis, closed it, and then had it tell me it was a double.

I'm pretty sure the FPP-matcher has been drinking on the job since the day it was coded.
posted by griphus at 5:14 AM on January 13, 2011 [1 favorite]


Your question is a double. Here's how it was answered last time.
posted by Obscure Reference at 5:15 AM on January 13, 2011 [3 favorites]


I always do what crunchland recommends, ie, use the Link field to check for the domain before constructing my post. It isn't perfect, but it works most of the time.
posted by OmieWise at 5:34 AM on January 13, 2011


Some doubles happen because they're photos, art, etc from a content-aggregation site, when the original source of the material has already appeared on mefi. There's no way for the link checker to spot this species of double - use the source, luke.
posted by zamboni at 6:29 AM on January 13, 2011


Nuke your post from orbit. It's the only way to be sure.
posted by Joe Beese at 6:30 AM on January 13, 2011 [1 favorite]


I avoid doubles by barely ever posting anything.
posted by slimepuppy at 6:39 AM on January 13, 2011


Nuke your post from orbit. It's the only way to be sure.

Not really, they'll just clone it 200 years in the future.
posted by Brandon Blatcher at 6:39 AM on January 13, 2011


Previously on MeTa
posted by The 10th Regiment of Foot at 6:50 AM on January 13, 2011


pracowity: "Forget target URLs. Use Google to search for the subject or for a specific string that is likely to have been quoted, maybe an unusual name. "

I use in-site search, for words related to the topic. Sometimes, this can be time consuming.

So, I just posted an FPP about cab drivers. I did in-site searches for the following terms before clicking post:

Jacobson (the author of the articles)
taxi
cab

And then I skim through the FPP's to see if what I've done has been posted previously.

If a search turns up a lot of pages, I use this awesome pony from pb to expand the results page to 100 at a time.

The extra work has cut down on the number of doubles I've had deleted.
posted by zarq at 7:09 AM on January 13, 2011


I'd like to point out how well the double-detecting function actually does work, and that it has saved me some quality appearances here.
[Especially that time when I carefully collected the, um, best examples from the ongoing Unhappy Hipster project (almost all earlier ones; their wittiness elixir got worked out). On preview I was pointed to various posts of years back, with a bright-red MEH stamp all over the responses. Could have been my special day...]
posted by Namlit at 7:16 AM on January 13, 2011


That post makes no sense either time it was posted.
posted by cjorgensen at 8:04 AM on January 13, 2011


Maybe adding a bit about how to avoid doubles in the FAQ or posting guidelines could be the best way to address this issue?

We should add a little text about doubles. That said the best way to address it is for people to deal with the fact that it sometimes happens. Other things that are helpful.

- doing a site check for a few relevant tags from your post
- doing a search using the link field of the "new post" page for just the URL that is your main link, without the www and without any trailing content [assuming it's not CNN or something]

Our main problem, I think, is that people rarely use descriptive enough tgas for their posts. So you get someone posting a direct double but they couldn't find it because the link is slightly different and the original poster didn't use a single tag that you'd expect to find in a post on that topic. People should try to add relevant tags to assist other people in finding their posts, whether it's to remove duplicates or just to find them in the first place.
posted by jessamyn (staff) at 8:13 AM on January 13, 2011


Time for pie?

Hang on, I'm doing due diligence.





Okay, time for pie.
posted by Devils Rancher at 8:23 AM on January 13, 2011 [1 favorite]


By the way, if you had just googled thisman.org, you would have turned up this article on the first page of results:

Ever Dream This Man? Urban Myth, Viral Hoax or Terrifying Boogeyman?:
A little more tunneling, however, quickly undermined the credibility of the source. According to Logicpunk at Metafilter, "The registrant of thisman.org, Andrea Natella, is the director of guerrigliamarketing.it, an advertising agency that uses non-conventional communication techniques, like the creation of fictitious events or campaigns reaching the limits of legality, through which they 'fuck the market in order to enter it'."

posted by zarq at 8:31 AM on January 13, 2011



Time for pie?



It's always time for pie.
posted by louche mustachio at 9:18 AM on January 13, 2011 [1 favorite]


Our main problem, I think, is that people rarely use descriptive enough tgas for their posts. So you get someone posting a direct double but they couldn't find it because the link is slightly different and the original poster didn't use a single tag that you'd expect to find in a post on that topic. People should try to add relevant tags to assist other people in finding their posts, whether it's to remove duplicates or just to find them in the first place.

I admit it, I hate tags. They seem like little SEO games that we get required to use for that purpose; I've never found them reliable enough to be useful in any way. I hate them in Flickr, I hate them here. And that, I suspect, leads to a vicious circle where many people are like me and just type in a few things that pop into our heads to get through the permission screen and go on with the post. Which then leads to their never being reliable enough to be useful in any way, which leads people to hate them, and so on.
posted by norm at 9:32 AM on January 13, 2011


Yep, we defeinitely get some jokers who just type "tags are required" in the tag box which, really, why not just come to my house and give me the finger directly?

They're a finding aid, they're metadata, they're a way to add additional descriptors, and they're becoming more and more necessary as a way for people to track down things they're interested in.

So, I get why people might be tag-paranoid, but here's a real-life example of why using useful descriptive tags could help solve a site problem. Please try to use good tags. Thank you.
posted by jessamyn (staff) at 9:35 AM on January 13, 2011 [5 favorites]


Tags are part of the index for the great big book called metafilter. They can be super useful if you're looking for something specific.
posted by inigo2 at 9:41 AM on January 13, 2011


Sure, but an index will tell you where words are used. And you can search those. I guess my big problem is that they're self-selected. So they're only as useful as the people making the posts. I don't try to game the system, but I find the whole process pretty counter-intuitive, and there is a lot of noise in the tag world. Besides, what was wrong with the tags there? "thisman" was the first tag used. If that wasn't going to find it, what was?
posted by norm at 9:47 AM on January 13, 2011


I avoid doubles by only posting things that no one else thinks are cool.
posted by Jacqueline at 9:50 AM on January 13, 2011 [2 favorites]


Sure, but an index will tell you where words are used. And you can search those.

The difference being that tags reflect what are in the post itself, while searching will also search all the comments. Sometimes you just want the former. It is, though, dependent on the posters using it appropriately.

I don't think anyone's complaining about the tags on this specific post.
posted by inigo2 at 9:55 AM on January 13, 2011


norm: "Sure, but an index will tell you where words are used."

Let's say I do a search for posts on Mefi that include the term "Einstein."

Now look at what happens when I look for posts that are tagged "Einstein."

The results are different. 90 posts included the word Einstein. 40 were tagged that way. It gets interesting when we compare the lists. There are a number of posts that are tagged with "einstein" that don't mention him directly and won't turn up on an index search of the name.

Here's a reverse example: the word homophobia.
Search: 31 matches
Tags: 77 instances.

I think the tags are helpful and am glad we can add them.
posted by zarq at 10:08 AM on January 13, 2011


I don't know if the problem has been fixed yet or not (I suspect not), but another thing to keep in mind with in-site searches is that stemming is not supported.

So, if your post is about canines, don't just search for "dog", also search for "dogs", etc.
posted by hippybear at 10:19 AM on January 13, 2011


It's not the end of the world if doubles get deleted. I only really bother searching extra hard if it's a 'current event' kind of thing or something that looks like it's going viral.
posted by empath at 10:20 AM on January 13, 2011


I get around doubles by simply making really lousy posts.
posted by shakespeherian at 10:25 AM on January 13, 2011


zarq: super interesting, thanks. That surprises me, as I would have thought that a search would also search the tags, and therefore the search would be larger than a tag list.
posted by norm at 10:27 AM on January 13, 2011


That original thread was the first time I've ever argued with Astro Zombie and I spent the whole thing sweating ice from my palms and choosing my words very carefully and muttering "I am arguing with Astro Zombie what the fuck".

It was scary.
posted by Shepherd at 10:35 AM on January 13, 2011


norm: "zarq: super interesting, thanks. That surprises me, as I would have thought that a search would also search the tags, and therefore the search would be larger than a tag list."

You're welcome! Yeah, tags, comments and posts are apparently indexed completely separately. I learned to search a bit more thoroughly after some doubles were deleted.

I don't mind having posts deleted all that much but I do hate wasting time. The longer, link-filled ones tend to take a while to construct.
posted by zarq at 11:20 AM on January 13, 2011


Metafilter: Why not just come to my house and give me the finger directly?
posted by soelo at 2:02 PM on January 13, 2011 [2 favorites]


I get around doubles by making lousy posts
Like hyperlinking everything straight to localhost

I get around (Round round get around I get around)
All over town (Round round get around I get around)
I know how babby's formed (Round round get around I get around)
I go to woods for porn (Round round get around I get around)
posted by SpiffyRob at 2:03 PM on January 13, 2011 [1 favorite]


Post about weird shit no one likes, to ensure that it will be fresh!
posted by Mister_A at 2:10 PM on January 13, 2011


That's what the Mister_A method would be if I posted often enough to lay claim to a particular methodology.
posted by Mister_A at 2:10 PM on January 13, 2011


Right, and song parodies. Lousy posts and song parodies: the shakespeherian experience.
posted by shakespeherian at 2:24 PM on January 13, 2011


What Jacqueline says. Especially N-scale trains. Worked like a charm.
posted by Namlit at 3:40 PM on January 13, 2011


This happens to me so often that I swear the search function is personally fucking with me. "Oh, yeah, I know that was posted before. I just don't feel like telling you about it. SUCKER."
posted by sonika at 5:07 PM on January 13, 2011 [3 favorites]


Last night I made a post, almost published it, realized I had missed a closing parenthesis, closed it, and then had it tell me it was a double.

Last night a DJ saved my life.
posted by flapjax at midnite at 5:35 PM on January 13, 2011


So, I get why people might be tag-paranoid, but here's a real-life example of why using useful descriptive tags could help solve a site problem. Please try to use good tags. Thank you.

No, it's a real-life example of why a good search function could solve a site problem. Tags are dependent upon humans understanding all possible reasons why something could be searched for.

A google site search is your friend. I've gotten burned myself by relying on MeFi's. I will say that it's made me more cautious about posting. (That may be a good thing, I dunno.)
posted by ChurchHatesTucker at 6:30 PM on January 13, 2011


Forget target URLs. Use Google to search for the subject or for a specific string that is likely to have been quoted, maybe an unusual name. "

You've got to do both because of the mystery meat contingent.

inigo2 writes "The difference being that tags reflect what are in the post itself, while searching will also search all the comments. Sometimes you just want the former. It is, though, dependent on the posters using it appropriately."

It would be nice to be able to only search posts. *Checks with a sample search* Holy crap; Fastest. Pony. Delivery. Ever.

Namlit writes "What Jacqueline says. Especially N-scale trains. Worked like a charm."

39 comments is decent traffic for a post.
posted by Mitheral at 6:36 PM on January 13, 2011


Tags are dependent upon humans understanding all possible reasons why something could be searched for.

If people included 5-10 keywords containing all the names in their posts and included a few more descriptive keywords we'd be much farther along than we are now. Not everyone wants to get into a big debate about metadata and "aboutness" but trying to use good tags is something people could do with minimal effort.

Personally I wish MeFi had a killer search but it doesn't. It's tough to recreate the wheel that Google has built just for some additional facet and date features that we'd like and indexing a site that runs on two fairly minimal servers is a big CPU hit. So, here we are. Please consider using good tags.
posted by jessamyn (staff) at 6:38 PM on January 13, 2011 [1 favorite]


« Older tell.metafilter   |   What are the most popular posts ever for... Newer »

You are not logged in, either login or create an account to post comments