Join 3,512 readers in helping fund MetaFilter (Hide)

A better way to prevent double posts?
December 2, 2005 5:08 AM   Subscribe

Seems to be an awful lot of double posts, often within a day or two of each other. Is there a better way to prevent reposts eg a tag search function which orders results by date. (No doubt this has been asked already, probably four posts down...)
posted by brautigan to Feature Requests at 5:08 AM (14 comments total)

Tag results are ordered by date already. Problem with that is, people don't always tag things the same way.
posted by Gator at 5:36 AM on December 2, 2005


At least in some of these cases, the problem would be solved if people would try to make their write-ups more descriptive. In order, here are the entire writeups of each of the triple-posted item linked to in the previous sentence:

1. "Tis the season and we all know at least one of these guys. (Windows Media Player req'd)"

2."Ritual Adornment of a communal habitat. Light and sound combine to impress other nearby members of the species. The counterpoint to a summer of tending carefully controlled foliage."

3. "It's Beginning To Look a Lot Like Christmas. [.wmv file]"

Not only does every one of these writeups utterly fail to give me a useful explanation of what I can expect if I click on the link, they all also fail to help future posters search for double links. That is, uf I wanted to see if this video has been posted before, I might search for something like "Christmas lights"--a phrase that doesn't appear in any of the writeups. Or I might look for "Christmas" or "lights" or "christmaslights" as a tag--but none of the 3 posts actually use any tags.

Clever write-ups are good, but clever and useful writeups are even better.

I hope I'm not coming across as too snarky; it's a great link, and, as I posted in the latest thread, I didn't click on it the first two times it was linked. It's just that, if the writeups had been at all informative, I (and the double-posters) would have been a lot more likely to see it the first time it appeared.
posted by yankeefog at 6:04 AM on December 2, 2005


A better solution would be for users to always search before posting. Even better would be to not be so obsessed with posting things.
posted by yerfatma at 6:05 AM on December 2, 2005


Speaking of people who don't link properly, I screwed up the links in my mini-rant. That should be "some of these cases". D'oh!
posted by yankeefog at 6:06 AM on December 2, 2005


yankeefog's example is excellent. t's even trickier if the item linked to is mirrored on several sites (for example, syndicated or wire-service news stories), because it doesn't necessarily turn up in a search.

But many double posts occur within a minute or two of each other: it's a breaking story and two or more people are composing a post simultaneously, with the slower (frequently better-thought-out or better-composed) post losing out. One could argue, I suppose, that breaking-news posts aren't nearly as good as the post, a day later, that points to a twist in a well-known event (especially if it's an online twist: that's best-of-the-web right there).
posted by mcwetboy at 6:10 AM on December 2, 2005


Amen, yankeefog. Non-descriptive posts are probably worse than NewsFilter, they're a contributing factor to a significant number of dupes. I'd be willing to say that poorly tagged and described posts are probably the biggest single contributor to the imminent death of MetaFilter (okay, okay, I'm exaggerating).
posted by Plutor at 6:25 AM on December 2, 2005


many double posts occur within a minute or two of each other

While this is true for high profile big news breaks, this is rarely true for the bulk of the double posts that I see. All four posts today that were doubles [four!] were of things that were at least a few hours old or even a few months.

Maybe if the Post a Link page had an easy tag search, that would help somewhat? Otherwise I point to the crunchland method as a good way to at least try to be thorough in avoiding them.
posted by jessamyn at 6:30 AM on December 2, 2005


I know Matt said that it was killing the server, but the loss of the search tool has been horrifically bad in terms of checking for double posts. I only caught one before posting it because it was early enough for tags to have been put in... using the google search requires either getting the exact right words in or yeilds hundreds of results.

For example, the "bridge" animation double post this morning didn't show up in the FPP post screen because one link linked to "index.htm" and the earlier one just to the site, or something like that. If you googled "bridge flash" you get over 100 results; "bridge cartoon" you get over 70.
posted by XQUZYPHYR at 6:36 AM on December 2, 2005


"I know Matt said that it was killing the server, but the loss of the search tool has been horrifically bad in terms of checking for double posts"

amen.
If a friendly neighbourhood DBAdmin would offer Matt some help with optimising the search (as he's requested once or twice) I think we'd all be very grateful.
posted by NinjaPirate at 7:17 AM on December 2, 2005


For example, the "bridge" animation double post this morning didn't show up in the FPP post screen because one link linked to "index.htm" and the earlier one just to the site, or something like that. If you googled "bridge flash" you get over 100 results; "bridge cartoon" you get over 70.

Looking through 100 search results isn't too much to ask!

On the other hand, the url double post checker should probably be a little smarter about the way it looks for doubles. You could strip out the originating site and return a list of FPP's from that site as well as the current exact match. Once you come up with 50 or 100 hits on the site search you could stop and give a message "wow, a lot of people link to www.nytimes.com, are you sure you really want to do this" or something.

Of course this could be extended to intermediate comparisons as well...
posted by Chuckles at 7:54 AM on December 2, 2005


Here's the code I use (in PHP, using Perl Compatible Reglular Expressions) to normalize URL's.

$norm_url = preg_replace('#^.*?://#i', ', $norm_url);
$norm_url = preg_replace('#//#i', '/', $norm_url);
$norm_url = preg_replace('#^www.*?\.#i', ', $norm_url);
$norm_url = preg_replace('#\.[^/=]*$#i', ', $norm_url);
$norm_url = preg_replace('#/$#i', ', $norm_url);
$norm_url = preg_replace('#/(index|default|main|home|welcome|blog|about|download|faq)[^/=]{0,3}$#i', ', $norm_url);

It could be easily ported to Java.
posted by Sharcho at 8:59 AM on December 2, 2005


Here's how I'd do link searching- Make a Verity collection of every FPP URL, with the domain as a second, separate column.

First pass would just search the domain of the URL, which should catch most dupes, including where people may link to a different page on the same site or post "abc.com" where "www.abc.com" has already been posted.

To handle links to news sites that will have a lot of links for the same domain, you would either need to:
- Match against a pre-set list of "news" domains (a pain to maintain, but accurate).
- Have some threshold beyond which you can assume that it's a news site that people have linked to often.

If you hit that flag, search against the specific URL.

Verity would, I think, handle that without much of a sweat.
posted by mkultra at 9:07 AM on December 2, 2005


I agree about the need for more clearly worded posts. With all the links that get posted here every day, the site could benefit from sharper, clearer writing.

This post by jonson: Happy Thanksgiving. Here is a gallery of photos of monkeys dressed as jockeys, riding other non-monkey animals.

And a subsequent in-thread link by ab'd al'Hazred:This is video of a monkey washing a cat combined with audio; it is in Apple QuickTime format

were among the best links in recent weeks, not least because of their clear writing.
posted by nyterrant at 9:21 AM on December 2, 2005


This would all be so much easier if there was a "doublepost" tag.
posted by Rothko at 9:59 AM on December 2, 2005


« Older If you close signups for a cou...  |  What's the best way to suggest... Newer »

You are not logged in, either login or create an account to post comments