How does the 'Related Posts' widget work? July 29, 2012 10:44 PM   Subscribe

Related Posts were implemented two years ago on the Blue. How, exactly, do they work? posted by the man of twists and turns to MetaFilter-Related at 10:44 PM (44 comments total) 1 user marked this as a favorite

I always thought pb's main job was sitting there waiting for posts to come in and then frantically searching the site for related posts to stick in that box.
posted by koeselitz at 10:48 PM on July 29, 2012 [17 favorites]


Related Posts are to Mefites as Magnets are to Juggalos.

Well, it WAS going to be said by somebody, and I'm available for abuse...
posted by oneswellfoop at 10:54 PM on July 29, 2012


They work based on tags. Each tag in the system has a "rarity weight". So popular tags like art and music have a low rarity weight. More specific, less frequently used tags have a higher rarity weight. The Related Posts feature looks for rare matches among tags and considers posts with rare tags that match up more related than others. For Ask MetaFilter, the category of the post can bump up the related score a bit.
posted by pb (staff) at 11:10 PM on July 29, 2012 [11 favorites]


That I get "full house" on those every so often probably means I should give some subject areas a rest. That or my tagging is just really consistent.
posted by Artw at 12:25 AM on July 30, 2012 [3 favorites]


I'm with koeselitz here. I was always assuming that it worked sorta like car parking in Japan: you leave your car on some shiny conveyor thingy and pay through a slot in a wall, and it rolls automatically away off to the left and gets lifted into a booth and shut up behind a door, but when you fetch it you have to go behind the scenes and aha! There are dozens of serious men in green overalls working away at a frantic pace to make it all possible...

Follow up question: How many pb's are there really?
posted by Namlit at 2:43 AM on July 30, 2012 [1 favorite]


Follow up question: How many pb's are there really?

Trick question. They're all Paphnuty wearing a mask.
posted by metaBugs at 3:04 AM on July 30, 2012 [3 favorites]


So, Becoming Popular diminishes your Rarity.
posted by Wolfdog at 4:10 AM on July 30, 2012 [2 favorites]


Each tag in the system has a "rarity weight".

So basically what you are saying is that PONIES PICK THE POSTS.

omg
posted by elizardbits at 5:14 AM on July 30, 2012 [14 favorites]


That I get "full house" on those every so often probably means I should give some subject areas a rest. That or my tagging is just really consistent.

My understanding was the poster's ID factored in as well. So that, to use a purely hypothetical example, a "comics" post from you would be likelier to make it into the Related box than an "comics" post from anyone else.
posted by Egg Shen at 5:51 AM on July 30, 2012


nah, Related Posts doesn't look at user IDs at all. It does look at the number of favorites. If two posts have the same related score, the one with more favorites will get the edge for display.
posted by pb (staff) at 8:04 AM on July 30, 2012 [1 favorite]


Yeah, poster is totally unrelated to this function. You may be confusing it with the sort similar procedure for choosing suggested tags for someone's "My Ask" or "My Mefi" filter, which selects tags based on the user-specific site behavior on those respective parts of the site.
posted by cortex (staff) at 8:20 AM on July 30, 2012


Are related posts static? That is, is it calculated at the moment of posting, never to change no matter how much more "related" something posted in the future is, or, is it calculated when a reader loads the post - such that my looking at a post from one year ago means I'm more likely to see optimally relevant related posts than people who commented on the post did, while they were drafting their comments?
posted by SMPA at 8:21 AM on July 30, 2012


I've never observed adding tags changing the related posts.
posted by Artw at 8:28 AM on July 30, 2012


Calculating Related Posts is an expensive process so we cache them heavily. They're cached for 30 days from the initial calculation. So changing the tags on a post could eventually change the Related Posts, but not for a while.
posted by pb (staff) at 8:48 AM on July 30, 2012 [2 favorites]


for someone's "My Ask" or "My Mefi" filter, which selects tags based on the user-specific site behavior on those respective parts of the site.

So, the list of suggested tags in the 'My MeFi Preferences' looks at posts I've created and tagged, or at the posts I've favorited and commented in, or both?
posted by the man of twists and turns at 9:08 AM on July 30, 2012


The suggested tags in My MeFi Preferences is based on tags on posts where you comment and tags on posts that you add to favorites.
posted by pb (staff) at 9:25 AM on July 30, 2012


Let's take a moment to thank pb for the site/being awesome
posted by East Manitoba Regional Junior Kabaddi Champion '94 at 9:39 AM on July 30, 2012 [5 favorites]


They work based on tags. Each tag in the system has a "rarity weight". So popular tags like art and music have a low rarity weight.

So how is "rarity weight" assigned? Is it assigned according to an ad hoc/intuitive process by the mods?

This is really interesting, thanks for sharing this info!
posted by KokuRyu at 9:42 AM on July 30, 2012


Nah, there's no human intervention. Rarity weight is based solely on frequency of use. cortex wrote a script that gives every tag a value based on frequency and we run that periodically.
posted by pb (staff) at 9:45 AM on July 30, 2012 [2 favorites]


Yeah, the less often a tag gets used, the more it's worth when you've got a match.

So e.g. "music" is one of the most-used tags on the site; as a result, it doesn't carry a ton of weight when hunting for related posts, though it will of course carry some so if the current post was tagged only with "music" you'd end up with five Related Posts that were at least probably about music of some sort.

Whereas "Kazakhstan" gets used a lot less often, so posts that have that as a tag will get much more weight when trying to choose among related posts.

And it's additive across multiple tag matches, so if you make a post and use both "music" and "Kazakhstan" as tags, the best matches for the Related Posts candidates will be any other posts that have both, followed probably by posts that have just "Kazakhstan" and then from there stuff tagged "music".

We have only updated that tag weight table infrequently, which is not a huge deal because it's not going to be a particularly volatile thing, but we were just talking this morning about maybe making it a monthly automated task to keep things a little more consistently fresh across the total body of mefi posts and tags.
posted by cortex (staff) at 9:52 AM on July 30, 2012 [2 favorites]


East Manitoba Regional Junior Kabaddi Champion '94: "Let's take a moment to thank pb for the site/being awesome"

Most definitely. This weekend, for the first time, I compared the My MeFi/My Ask Mefi posts in my RSS feed to the RSS feeds for both sites, and I was floored how accurate they were in bringing attention to what I really wouldn't want to miss. I couldn't help but wish I had that same filter for the rest of my life, so I could...well not filter out the rest, but at least draw attention to the important stuff.
posted by MCMikeNamara at 10:15 AM on July 30, 2012


I keep reading rarity weight as "ranty weight" I think I have a kerning problem. I bet pb can fix that too.
posted by jessamyn (staff) at 10:40 AM on July 30, 2012 [1 favorite]


What's a keming problem?
posted by maqsarian at 11:26 AM on July 30, 2012 [16 favorites]


I would enjoy a list ordered by ranty weight.
posted by ctmf at 11:56 AM on July 30, 2012


What's a keming problem?

Education, science, culture and health, apparently.
posted by zamboni at 12:32 PM on July 30, 2012


What's a keming problem?

The best part of this? I read it as "kerning problem."
posted by Betelgeuse at 12:45 PM on July 30, 2012 [2 favorites]


I love you dorks.
posted by jessamyn (staff) at 12:53 PM on July 30, 2012 [5 favorites]


So if I understand pb correctly, it's all done by magic.
posted by arcticseal at 1:02 PM on July 30, 2012 [1 favorite]


I would rather it be based on a rainbow dash weight.
posted by IndigoRain at 1:41 PM on July 30, 2012


"What's kemming Precious?"
posted by blue_beetle at 5:10 PM on July 30, 2012


pb: It does look at the number of favorites.

How do posts from the pre-favorite era fare in being selected as a 'related post?'
posted by Kattullus at 3:45 PM on July 31, 2012


How do posts from the pre-favorite era fare in being selected as a 'related post?'

Just fine. Favorites are only a small piece. They act more like a tiebreaker than a major contributing factor.
posted by pb (staff) at 4:11 PM on July 31, 2012 [1 favorite]


I guess the back tagging is pretty solid then.
posted by Artw at 4:13 PM on July 31, 2012


Related Posts was only possible across all posts once we had every post tagged. So we can thank the backtagging team for that too.
posted by pb (staff) at 4:16 PM on July 31, 2012


:D
posted by jessamyn (staff) at 4:17 PM on July 31, 2012


Thanks backtracking team!

/vaguely suspects this may give older posts an edge over some newer posts.
posted by Artw at 4:18 PM on July 31, 2012


Nerdy question for cortex: Is the weight just directly proportional to the inverse of popularity, or is there some other shape for the weight graph? I mean, if a (hypothetical) tag is found on 1 out of every thousand posts is its weight 100 times that of a post that is on one out of every 10 posts? I'd guess (from squinting at the tag cloud, and just, y'know, experience) that the popularity-rank distribution of tags approximately follows Zipf's law. But it'd be cool if you modly types with DB access (ahem cortex) could actually run the numbers.
posted by axiom at 8:29 PM on July 31, 2012


It's actually something like the log of the inverse, so you don't get a thousand fold difference in weight so much as a factor of five or six (or whatever ln(1000) is, it's been a while). Which I think we went with specifically because early experiments with a direct linear relationship led to weird wonky results.

As far as the tag count distribution, I think it's roughly a power law sort of thing. Steep slope from the most popular down to a long tail, etc.

But! If you want to do a little bit of work to analyze that yourself, you don't even need DB access; the raw tag data is in the Infodump, so with a little scripting or some fun with Excel you could probably test that yourself.
posted by cortex (staff) at 8:55 PM on July 31, 2012


I kind of suspected a direct linear relationship would be, as you put it, wonky (by the by, ln(1000) is around 7). I didn't realize the infodump had that tag info in it (and didn't think to look), and now I have a small project for tomorrow. Thanks, cortex.
posted by axiom at 9:02 PM on July 31, 2012


Yep, power-law.
posted by axiom at 12:54 PM on August 1, 2012


Nice!
posted by cortex (staff) at 1:11 PM on August 1, 2012


Yeah, poster is totally unrelated to this function. You may be confusing it with the sort similar procedure for choosing suggested tags for someone's "My Ask" or "My Mefi" filter, which selects tags based on the user-specific site behavior on those respective parts of the site.

Wow, really? Because I've got less than 100 posts under my belt, yet I see my FPPs in the Related Posts box all the time, often more than one. I thought it might be a favorites thing, but even low-favorite stuff of mine places, such as political newsfilter, which I'd think would be easily drowned out by the high volume of political posts.

Could it be because I use roughly a metric buttload of tags?
posted by Rhaomi at 2:20 AM on August 2, 2012


Maybe, yeah. I don't remember for sure what the actual formula looks like on the server side when it's calculating a score, but I think it's basically adding tag weight values for every matching tag without trying to scale that against the total number of tags on each candidate post, so if you've got a lot of tags you've put on any given post then the chances that that post will offer a matching subset of tags when looking for Relatedness would necessarily be higher.

It's also possible that you tend to read posts on subjects that line up somewhat with your own posting interests.
posted by cortex (staff) at 8:03 AM on August 2, 2012


Ha! It's not just me!
posted by Artw at 9:06 AM on August 2, 2012


« Older Blockquotes being stripped?   |   Interesting Profiles? Newer »

You are not logged in, either login or create an account to post comments