Why all the crashes? October 4, 2006 11:03 AM   Subscribe

Why is metafilter so slow and crash-prone? I know it gets a huge amount of traffic, but what causes the regular outages? Do you guys need a new server or what? I can't think of another major site that experiences as much downtime as metafilter does. (i'm not criticizing, I know how difficult it is to run a site of this scale... just curiouis)
posted by empath to Bugs at 11:03 AM (43 comments total)

I was wondering the same thing. Whats the deal?
posted by wheelieman at 11:16 AM on October 4, 2006


I'll ignore the past (mostly coldfusion related -- it's a shitty language and technology that doesn't scale well) and say that lately, things have been slow due to high traffic and a heck of a lot of search bots. Something like 30% of the site's total traffic (millions of pages a month) are search bots sucking down text. My error logs are filled with data about when they hit a bad link or load up a malformed URL and they'll hit the site hundreds of times in a short time, locking up the db and filling the logs. I've squashed a fair number of them, but there are still some I haven't tracked down.

I'm working with some sysadmin members about the possibility of moving the web server to something a bit more robust and failsafe.
posted by mathowie (staff) at 11:28 AM on October 4, 2006


I blame communism.
posted by blue_beetle at 11:54 AM on October 4, 2006


Heh. I just tried to post an agreement, and got an Internal Server Error.

I was saying, I also manage t see a lot of outages. I tend to visit from 3 to 5 am PST. Are those generally planned outages? If so, you could consider posting an "We're Away" page...
posted by Dunwitty at 12:37 PM on October 4, 2006


As far as I know it is still running on a PC server in Matt's abode. For five bucks and no advertising it's hard to complain about what Matt has managed to do.

(It's a bit slow as I type this)
posted by caddis at 12:43 PM on October 4, 2006


3 to 5 am is generally a "everyone's asleep" outage.
posted by mathowie (staff) at 12:45 PM on October 4, 2006


Matt,

Perhaps you could make a simple script that detects the server having crashed, and restarts the service? It should take about 15 lines of Perl.

(ActiveState Perl for Windows).

Though there is that pesky -- whats the real issue that should be resolved as well.
posted by SirStan at 12:50 PM on October 4, 2006


I like to think of the 3 to 5 am outage as an incentive to move back to the West Coast someday.
posted by muddgirl at 12:50 PM on October 4, 2006


Also enable coldfusions slow page load log, though I am sure you have :~)
posted by SirStan at 12:52 PM on October 4, 2006


As far as I know it is still running on a PC server in Matt's abode.

Stuff of legends. Here's the entire server history:

December 1998: I take my old 300Mhz celeron home computer to my office at UCLA. Working in a computer group at UCLA means that you're free to run a server on the massive OC12 bandwidth for free as long as it is for personal projects. It runs under my desk as I start playing with ColdFusion. It runs here until 2000.

March 2000: I accept employment at Pyra and they have an ISP in the building and a free T1 connection. I put the box into my car and drive up to SF and put it online. It stays here until May of 2001 (after a month or so, I upgrade the server with a monster tower my dad actually pieces together with a friend)

May 2001: In the middle of the Kaycee drama, the ashes of Pyra loses the free T1 and delfuego steps in with a free T1 in his NYC apartment. An article about Kaycee comes out in the NYT just as the server is being overnighted to NYC. The page saying the server is moving runs off my personal PC on my home DSL for all of 24 hours. The server sits in NYC for 2 years or so.

Sometime in 2003: Server moves to Boston along with delfuego, eventually I upgrade it to a dual athlon box

Oct 2004: I finally start treating mefi like a real business and I get two dual xenon servers at servermatrix.com, hosted in racks somewhere in Texas. It has been there ever since, with one upgrade to the db server and web server since. To pay my hosting bills (which are now over a grand a month), I re-enable signups for a $5 one-time cover.

Oct 2006: I'm getting a beefy linux box setup at servermatrix and a member here is helping admin the box and build up some failover. All the Coldfusion developer lists I'm on do say that CF runs faster and more stable on redhat, so I'm setting that up now and hoping to move a few of the subsites to the new box as a test, before moving the whole thing over and wiping out the windows web server.
posted by mathowie (staff) at 1:02 PM on October 4, 2006 [13 favorites]


I have been trying to post a comment for 15 minutes or so that nevertheless your accomplishment is pretty damn impressive for so little cost to the members, and it is.
posted by caddis at 1:30 PM on October 4, 2006


October 29, 2010 - MeFiNet comes online. Nuclear war follows.
posted by blue_beetle at 1:37 PM on October 4, 2006 [1 favorite]


ColdFusion is, like, the RealPlayer of scripting languages. That's why.
posted by reklaw at 1:47 PM on October 4, 2006


Your going to run SQL Server on Red Hat? Or is the database box staying on Windows?
posted by timeistight at 1:49 PM on October 4, 2006


I blame today's slowdown on the Woz sighting in Ask.
posted by SteveInMaine at 2:03 PM on October 4, 2006


And when metafilter goes down, people flock over to metachat and then that goes down.

which gives the phrase "pay it forward" seem ironic and a little bit sad.
posted by seanyboy at 3:11 PM on October 4, 2006


YourYou're
posted by timeistight at 3:14 PM on October 4, 2006


I tweaked a couple things and now it seems downright zippy, but maybe everyone on the east coast finally went home from work and lightened the load.
posted by mathowie (staff) at 3:22 PM on October 4, 2006


And when metafilter goes down, people flock over to metachat and then that goes down.

In light of that, I'm proud to announce MetaMetaChat, a very formal place for strictly controlled discussion of MetaChat outages.
posted by dersins at 3:27 PM on October 4, 2006


In light of that, I'm proud to announce MetaMetaChat, a very formal place for strictly controlled discussion of MetaChat outages.

...seems to be down...
posted by timeistight at 3:35 PM on October 4, 2006




Matt, thanks for the history lesson. Very interesting stuff, especially for relative noobz like me.
posted by snsranch at 3:46 PM on October 4, 2006


No fate.
posted by It's Raining Florence Henderson at 3:55 PM on October 4, 2006


hosted in racks somewhere in Texas
posted by matteo at 4:10 PM on October 4, 2006


I'm working with some sysadmin members about the possibility of moving the web server to something a bit more robust and failsafe.

Dual Xeons and a fat pipe could more than handle the hits if you could just use a decent web engine.

All the Coldfusion developer lists I'm on do say that CF runs faster and more stable on redhat, so I'm setting that up now...

Nooooooo...
posted by Civil_Disobedient at 4:36 PM on October 4, 2006


Is there a cache for the static content, like for non-logged in users?

That would be easier to do on Linux, I think.
posted by smackfu at 4:46 PM on October 4, 2006


All the Coldfusion developer lists I'm on do say that CF runs faster and more stable on redhat, so I'm setting that up now...

Followed by a wholesale conversion to a PHP-based solution? Actually, I know very little about PHP, but I figure I'd be run out of town if I dropped my credentials as a (*gasp!*) .NET developer and suggested that conversion....

Then again, it worked for MySpace.

Matt, thanks for all you do. If you ever do convert MeFi to .NET and need any development/optimization help, I'd gladly repay my years of enjoyment of the MeFi community you've fostered. :)
posted by Brak at 5:16 PM on October 4, 2006


Matt: You do know that you can limit [the observant/respectful] web bots to a certain number of hits within a given timeframe, right? If 30% of your traffic is web crawling bots, I'd say now's a good time to consider that.

Google for robots.txt to find out more...
posted by twiggy at 6:44 PM on October 4, 2006


Any thoughts of rewriting the system with the same DB schema in a language that doesn't suck?
posted by delmoi at 7:02 PM on October 4, 2006


Yeah, delmoi, moving to linux would mean moving to php/mysql would be the next steps. This is a stopgap measure before I move the db server to linux/mysql and the webserver from cf to php. It's something that would take quite a while, probably in stages as each subsite is converted.
posted by mathowie (staff) at 7:05 PM on October 4, 2006


Any thoughts of rewriting the system with the same DB schema in a language that doesn't suck?

Woo! Total rewrite! That always works wonders.
posted by smackfu at 7:39 PM on October 4, 2006


No, he said a language that doesn't suck - not php.
posted by mock at 8:29 PM on October 4, 2006


The Woz sighting, as well as the 200+ post mystery thread probably have a lot to do with the suckage today.

I appreciate your efforts, Matt.
posted by Mr. Gunn at 10:02 PM on October 4, 2006


Biggest of props, Matt.
posted by strawberryviagra at 1:01 AM on October 5, 2006


No, he said a language that doesn't suck - not php.

So what language would you use? I was always under the impression that PHP+MySQL contained the least amount of suck.
posted by lemonfridge at 1:19 AM on October 5, 2006


No, it contains the possibility of huge amounts of suck, but for a simple message board like this, it's definately the right way to go.
posted by Civil_Disobedient at 3:47 AM on October 5, 2006


PHP is so horrifically insecure that anyone planning on using PHP for anything should be forced to use Windows 3.1 for a year as penance.
posted by cmonkey at 3:57 AM on October 5, 2006


That was a fascinating potted history of the life and servers of MeFi. I hadn't realised how expensive the hosting bills were - thank you, as ever, for all you do for the community.
posted by greycap at 4:37 AM on October 5, 2006


Oh my. Double-plus developer love: a post about how the current application sucks and a flame war about tools instead of approach.
posted by yerfatma at 5:34 AM on October 5, 2006


I didn't mean to say it sucks at all, FWIW. Sometimes not being able to access mefi is a feature, not a bug.
posted by empath at 6:34 AM on October 5, 2006


3 to 5 am is generally a "everyone's asleep" outage.

I think of it more as an early morning chance for UK knowledge workers to try and correct the productivity imbalance but then I load up Dicewars and wait...
posted by srboisvert at 10:41 AM on October 5, 2006


PHP is the visual basic of our time. It allows semi-trained people to create useful applications which solve their business needs. None of this code is of the sort that you'd want to maintain even if you were being paid. It's great if your job is finding security vulnerabilities though.

It's not that it's impossible to create good code in PHP, clearly a great programmer can create great software despite the shortcomings of any tool. However the odds are against you being that programmer (for whatever value of 'you').

Personally I quite like the Catalyst Framework. Apparently the SomethingAwful forums are being rewritten to use it. You could always drink the DHH koolaid, and there are some nice frameworks (Django and TurboGears) in python if you find some of the idiosyncracies of perl objectionable.

Also while I'm on the tool hate, MySQL sucks, either use SQLite or PostgreSQL depending on the feature set you need. There is no good reason to continue using MySQL.
posted by mock at 6:30 PM on October 5, 2006


The hamsters are only human, afterall.
posted by oxford blue at 6:38 PM on October 5, 2006


« Older The Woz on MeFi!   |   vancouver meetup Newer »

You are not logged in, either login or create an account to post comments