Exporting older posts to another DB? October 15, 2001 6:47 AM   Subscribe

Would exporting a set of older posts out to another DB help the search engine? I find that the search engine is timing out a lot, and the google cache isn't always current enough to help prevent double posts. Plus, the old posts quite often lead to 404 pages.
posted by machaus to Feature Requests at 6:47 AM (17 comments total)

What might be useful is the slashdot policy of declaring older posts "closed" after a certain period of time, so that they can be rendered as a static file rather than generated from the DB every time. It'd make it much easier to hash those static files for the search engine, as well.
posted by holgate at 8:14 AM on October 15, 2001


that's a really good idea, holgate. no one comments on many threads older than a week, it seems.
posted by moz at 8:43 AM on October 15, 2001


yeah. apart from 1142, that is. I'd say a month is a good amount of time to keep a thread "dynamic". I suspect that having older pages as static HTML would also help when it comes to search engine bots...
posted by holgate at 8:55 AM on October 15, 2001


no one comments on many threads older than a week

When people are able to mark favorites (as with my MefiFilter app), you may find that that's not true at all.
posted by fooljay at 10:50 AM on October 15, 2001


Wouldn't scouring a couple thousand static pages be just as (if not more so) server intensive than scouring through a database?
posted by mkn at 11:59 AM on October 15, 2001


mkn:

no, because the static pages require no interaction with the database. all the web server has to do is serve up the stream of bytes that is the static page: it's already been htmlified and sorted. i'm guessing that what you're thinking is that web crawlers might be able to access matt's database and search the text in that way, bypassing the need to generate html pages, which isn't the case here i don't believe (is it ever the case outside of slashdot-specific crawlers?).
posted by moz at 12:14 PM on October 15, 2001


Yeah, I'd like to do this, and it would ease the load on the server greatly.

It's no small project though. I'd first need to build something that automatically renders static versions of all comments older than one month, then do redirects on all the old links to the static content. Finally, I'd have to create an automated way to add to the archived material, by running a process that did a single day (from a month ago) at a time, that ran nightly after midnight.
posted by mathowie (staff) at 12:26 PM on October 15, 2001


running the process at a specific time should be no big deal. you're running windows, aren't you matt? that should have a task scheduler to suit your needs on that front.

on rendering static versions of comments, i think that one approach would be to use a scripting language like python. you could use the scripting language to connect to the web server, and send an HTTP GET request to the server for each of the threads. the server would then send the byte stream of html back to the script, which would then save it according to some scheme you want to use. you could probably use a regular expression search-and-replace on the html content to convert links to metafilter threads and comments so that they would refer to the static versions.

all theoretical, of course, but no greater than a mid-sized project i think. (of course, that's for me, a programmer by nature.)
posted by moz at 1:10 PM on October 15, 2001


I'm not naysaying this idea. I will mention that I went back three months later to the CSA post to add a link to my review of a local farm since I thought it might be useful to someone using metafilter as a reference site (as I often do).

I suppose it's impossible to render a static page and leave it open for comment. even if that comment wouldn't show for another 24 hours or the like. and I don't suppose there's much call for such a feature anyway.

but I'm thinking of the beer and liquor and music threads.... might it be worthwhile for matt to allow himself to flag certain threads to *not* be closed after a week, for special cases like these? I'm just asking. it probably would never be used at all....

reducing server load should greatly outweigh any consideration of this kind, though.
posted by rebeccablood at 2:34 PM on October 15, 2001


actually, rebecca, my understanding is that it would not be difficult to add a new comment to a static page. the text input area and the buttons and all that can still be on a static page -- you could enter a new comment, and the static page could simply be re-rendered. the only difference is that you would have to re-enter your username and password information for each comment you would like to post.

the above method would not be the greatest thing for matt's file system if it were abused (i.e. thread 1142), as in that case (potentially large) threads would constantly be re-written to the hard drive. but under the assumption that comments such as yours would not be commonplace, i think the benefits would outweigh the risks.
posted by moz at 2:51 PM on October 15, 2001


why not just set a flag on any post with more than say 75 comments to keep it from being exported by the batch file to the static archive? That would take care of the beer, liquor, 9/11, 1142, kaycee, various music, and earthquake threads.

of course, precluding that Matt wants to have some semblance of a normal life on this planet. this sounds like an exercise best handled during a clean slate rebuild, rather than an ad-hoc change.

posted by machaus at 3:01 PM on October 15, 2001


The problem right now is that it's hard to use MeFi as a reference site because the internal search engine simply can't come up with anything under current conditions. And that affects both performance and usability. Using Google reduces the load, but it's not as thorough. Seeing as I don't know how ColdFusion works, I'm stabbing in the dark here, but moz's suggestion seems feasible: you could have a kind of "read-only" mode for posts after a certain period of inactivity, kept on the server as a static "rendering" of the thread, which you have to deliberately re-access in "read-write" mode (dynamically generated) to add a new comment. Although that doesn't necessarily solve the search problem. Hmm.

(Spot the not-really-a-programmer project manager here.)
posted by holgate at 4:44 PM on October 15, 2001


Maybe you can throw in the ability to post on those frozen topics for a micropayment?
posted by hijinx at 7:07 PM on October 15, 2001


holgate:

the ideal as far as searches go, i think, is some way to associate a list of relevant comments and threads with keywords. i don't think that ideal can be achieved, since people can potentially search for millions of important seeming words.

maybe the most practical solution to the question of searching is to limit searches (as i think was suggested somewhere), but rather than limiting them to one month, setting that limit to 2 or 3 monthes -- maybe more. the rest should be searchable, on an if-you-need-to basis. if we do this, of course, we'll have to tell the vigilantes to hold off their attacks if double posts go beyond the search limit.
posted by moz at 7:12 PM on October 15, 2001


hijinx: and, subsequently, few will post to those frozen topics. you might as well shut them down entirely and avoid the hassle of paying taxes on what very few micropayments would be made.
posted by moz at 7:14 PM on October 15, 2001


Re: adding comments to older threads...

How about queuing them? I mean the person adding the comment gets to add it, but instead of immediately rendering the page with the new comment, the comment gets put in a temp db table. This temp db will add the new comments made to older threads in the database say once every 2 days. Once the thread is updated it generates a new static file.
posted by riffola at 11:57 PM on October 17, 2001


You could do it in batches too. So you don't end up making 11k static pages on one day.
posted by riffola at 12:07 AM on October 18, 2001


« Older More Granular Searching   |   Search by Date Newer »

You are not logged in, either login or create an account to post comments