Impossible. Perhaps the archives are incomplete. September 19, 2019 11:56 AM   Subscribe

I'd always thought MeFi site searches were pretty much comprehensive, but last night I came across several search terms that were clearly missing content. Has anyone else noticed this behavior in the past, or could this be a new bug?

While dredging up old threads for this recent MeTa, I noticed a weird discrepancy in sitewide searches for posts vs. comments:

A sitewide search for "Obama" in comments shows results dating back to August 2004, but searching in posts goes back as far as May 2004 -- including multiple comments mentioning the name.

I thought maybe this was because of early '00s search index weirdness or because I'd exceeded some limit (that comment search goes all the way to page 4,549!), especially since searching for multiple words from those earlier comments does bring them up. But I tried again with a less common/more recent term ("MBMBAM") and found the same problem. Searching comments sitewide shows the earliest hit in February 2017, but searching posts goes back to 2014, including an AskMe from that June with several comments mentioning it, as well as a July 2016 MeFi post (so it's not a subsite-specific thing).

Additional weirdness: on that "MBMBAM" search, it says there are 166 results in comments sitewide, which reflects the per-subsite numbers (111 MeFi, 22 AskMe, 21 FanFare, 12 MeTa). A quick check shows these counts are accurate. But going to the last page on the sitewide search only goes back to 2017, whereas narrowing to just MeFi shows even earlier comments from 2012 (!). Even stranger, that sitewide search shows 20 comments per page, but on the last (9th) page, it says "Showing 161 - 166 of 166 matches for mbmbam across comments"... and then shows a full page of 20 results, ending in 2017. Presumably there are many more than 166 sitewide comments, some of which show up when searching within subsites -- but then why do the subsite totals add up? Very odd.

I've done deep dives into the archives on occasion to find earliest references to things and don't recall seeing mismatches like this. Maybe deleted comments are throwing the count off? Help us, frimble, you're our only hope!
posted by Rhaomi to Bugs at 11:56 AM (10 comments total) 3 users marked this as a favorite

Also, some more testing after submitting this post suggests the problem may lie with FanFare: when searching for "MBMBAM" in comments by subsite, the number of results for MeFi and AskMe are accurate, but FanFare (which claims to have 21 hits) shows two full pages (40 hits) -- and there are probably many more beyond that, given it only dates back to 2018.
posted by Rhaomi at 12:04 PM on September 19, 2019

Oh, that's very interesting! And yeah, it does seem specifically to be an issue with the results returned on the "All Sites" selection. Which is good news insofar as doing a deep dive with a site-specific search selection seems sound, but still, why the discrepancy?

My wild guess (frimble will look at the actual query when they have time) is that the All Sites version of the search query is overcollecting results somehow, and running out of pages of results to provide before it actually gets down to the oldest results.

So e.g. with the MBMBAM query, it's correctly identifying 166 (or was before we started discussing it in here) total results summed across various subsites; and then the All Sites view is collecting more than 166 results somehow, and laying them out a page at a time in reverse chronological order, and stopping at page 9 with 180 total returned results out of the 180+ it managed to collect.

My guess is there's a problem with subsites in there somewhere; something being counted in one place and not another, probably a newer and/or oddball one like FanFare Talk that got included in one count but not the other.
posted by cortex (staff) at 12:06 PM on September 19, 2019 [1 favorite]

posted by cortex (staff) at 12:07 PM on September 19, 2019

Thanks, Obama!
posted by It's Raining Florence Henderson at 4:30 PM on September 19, 2019 [11 favorites]

What I've established so far: The count is definitely off – Taking the search string, "mbmbam" as an example, there are over 300 results across all sites, but the "count results" procedure only shows 168. That at least narrows down where the problem lies, and now I'm looking at why the count is off.
posted by frimble (staff) at 3:57 AM on September 20, 2019

Ok, there was a several-year-old copy/paste error in the FanFare section of the "count results" procedure. It should all be working correctly now.
posted by frimble (staff) at 4:23 AM on September 20, 2019 [17 favorites]

“Has anyone else noticed this behavior in the past, or could this be a new bug?”

Yep and nope! Really pleased to see this bug fixed, thanks for all the hard work!

Are there any future plans to give visibility to known issues, what’s being worked on, what’s been done already, etc? It’s be great to submit, see and vote on bug & feature requests big and small.
posted by iamkimiam at 4:35 AM on September 20, 2019 [2 favorites]

posted by Rhaomi at 9:53 AM on September 20, 2019

I see there was a bug. Are site searches still powered by Google? I had a weird search problem years ago that was because of Google's index. Google's index not being complete is something to be aware of when searching the site.
posted by Mitheral at 12:25 PM on September 20, 2019

For logged-in members the search is entirely internal and powered by our DB, so it is (when all goes well) complete and up to date. Google search is used for logged-out folks to save us from melting the DB on driveby/spammy/scraper search traffic.
posted by cortex (staff) at 12:32 PM on September 20, 2019 [2 favorites]

« Older Comment about Bernie Sanders and POC   |   Climate Strike Day on Metafilter Newer »

You are not logged in, either login or create an account to post comments