Avoiding doubleposts March 13, 2002 10:55 AM   Subscribe

"Similarity Score" technology is widely used to match keywords between two blocks of text. Search engines use it, obviously, and sophisticated algorithms exist to help teachers detect plagarism in student papers, etc.

Is there a filter that can be installed to compare two FPPs and warn the poster of a high similarity score? I'm thinking, for example, the next time someone uses the words "National Geographic", "Afghanistan," and "girl" in a FPP and clicks Preview, they would be presented with something like "Warning: A FPP with 70% similarity to the above was posted 1 day ago."
posted by PrinceValium to Feature Requests at 10:55 AM (10 comments total)

Such algorithms are nontrivial. Oracle 8.1x+ includes such abilities out of the box (Oracle Text), but given the error messages that pop up around here occasionally, I don't think Matt's running it.
posted by NortonDC at 11:10 AM on March 13, 2002

at the moment, prince, i don't think it would be a good idea. matt has mentioned in the past that his search algorithm is currently inefficient (it is linear, that is to say brute force: it looks in every thread from the latest to the earliest), and with the volume of text on metafilter right now, adding such a feature may tax the system even farther beyond its limits.
posted by moz at 11:14 AM on March 13, 2002

Just fucking do a search before you post. Jesus!
posted by jpoulos at 11:32 AM on March 13, 2002

That was as subtle as a sledgehammer, yet, I agree. A simply MeFi/google search could have pointed this out.
posted by BlueTrain at 11:41 AM on March 13, 2002

Thanks jpoulos, that was helpful.

My post was in response to the increasing number of people who don't fucking do a search before they post, Jesus.

posted by PrinceValium at 11:42 AM on March 13, 2002

many times when people do a search, it still won't come up. it happened to me once, someone kindly told me about it, and matt removed it. It's not a huge deal. Valium's idea sounds more efficient as far as intended result, but it would eat up wayyyyyy too much server space.
posted by Ufez Jones at 12:26 PM on March 13, 2002

If anyone can find information on Microsoft SQL server natural language text indexing and searching articles, or coldfusion code snippets of similarity scoring, I'd love to see it.

I've been meaning to setup Verity search on the metafilter database, it creates a similarity index and attempts to rank items based on their usage. Anyone ever setup verity successfully in CF?
posted by mathowie (staff) at 12:28 PM on March 13, 2002

Just fucking do a search before you post. Jesus!
No, that's what computers were built for. People aren't supposed to waste their time.

Gartner estimates lost revenue from metafilter searches at 400 MILLION DOLLARS A YEAR
posted by holloway at 1:56 PM on March 13, 2002

next person who uses that shitty acronym needs a punch to the groin.
posted by jcterminal at 3:03 PM on March 13, 2002

What acronym could possibly warrant a PTTG?
posted by PrinceValium at 3:07 PM on March 13, 2002

« Older Optional field to define scope?   |   Gold Star for y2karl Newer »

You are not logged in, either login or create an account to post comments