Server log retention policy? February 9, 2006 8:45 AM   Subscribe

So, Matt, what are your server log retention policies. This is not a snark, it's a serious question. I'm American, and I don't trust my government.
posted by theora55 to Etiquette/Policy at 8:45 AM (129 comments total)

Matt is subject to subpoena. In addition, activity on this site could make him more likely to be wiretapped, etc. Yeah, I'm in tinfoilhat mode, but the news these days is dismal.

Oh, and I really should have added this to the post. My timing sux0rs.
posted by theora55 at 8:52 AM on February 9, 2006


In addition, activity on this site could make him more likely to be wiretapped, etc.

You're delusional...
posted by SweetJesus at 8:57 AM on February 9, 2006


Are you fucking serious?
posted by smackfu at 9:03 AM on February 9, 2006


Matt is subject to subpoena.
posted by theora55 at 10:52 AM CST on February 9


Matt has always been subject to subpoena. So are you, for what it's worth. Despite that heavily editorialized, sky-is-falling post, Matt is not personally subjected to any more legal liabilities than he has been since the first day this site launched.

It's always good to remember that just because something *can* be done, doesn't it mean it *will* be done.
posted by dios at 9:04 AM on February 9, 2006


Matt will not refuse to comply with law enforcement.
posted by Gator at 9:10 AM on February 9, 2006


A member of a government by the worst calls some one else delusional?

I see this as a valid question. We own the copyright our comments, and are encouraged to help self-govern the site. I don't see why Matt would mind telling us the log retention policy.
posted by ?! at 9:10 AM on February 9, 2006


Dios, please present one administration policy that had been argued that it was a slippery slope position that was prone to abuse before it's inception, that later, did not in fact get abused.
posted by Balisong at 9:11 AM on February 9, 2006


You're a fool if you ever assume you're not anonymous. Doesn't really matter what Matt's policies are.
posted by smackfu at 9:12 AM on February 9, 2006


(Let's try that again.)

You're a fool if you assume you're anonymous. Given that, it doesn't really matter what Matt's policies are.
posted by smackfu at 9:13 AM on February 9, 2006


I don't think the question was will Matt deliver records if the government asks. Of course he will, and the original poster seems to understand that.

I think the question was how long does Matt keep records since the original poster feels (I'm assuming) that it would be better if Matt kept them as short a time as possible as then he would be unable to deliver that which does not exist.
posted by willnot at 9:15 AM on February 9, 2006


What willnot said.
posted by raedyn at 9:18 AM on February 9, 2006


Balisong, I'm not going to play that game of utter subjectivity because in your mind, everything is "abused." Not a week passes this place without some new "sky is falling" meme popping up here via an outraged post, so discussing it here in isn't going to accomplish anything. It doesn't have anything to do with this thread, so lets try not to get in a derail. This is a thread about Matt's policies about server logs; this isn't a thread about adiministration policies. You can discuss them in the referenced thread if you would like.
posted by dios at 9:20 AM on February 9, 2006


I don't see why Matt should make it so he can't poke through old logs in cases of abuse or suspected hacking attempts just so we can pretend that this one thing will make us "invisible".

Becoming truly invisible on the internet takes way more effort than getting webmasters to promise to not record your IP or cookie, etc.

If your super net anonymity screen is defeated by this, then it wasn't going to work anyhow.
posted by popechunk at 9:20 AM on February 9, 2006


THE DATABASE IS ETERNAL.
posted by jenovus at 9:20 AM on February 9, 2006


For those of us unfamiliar with technical aspects, what are server logs? What information do they contain? What is their function for the site?
posted by dios at 9:21 AM on February 9, 2006


I am often a fool, assume nothing I do is anonymous, and still think knowing such policies only makes us all better MeFi members.

At a previous job I made sure I would be unable to answer if I was told to reveal what you checked out for the past year. I realize Metafilter isn't a library and Matt needs to make sure no one abuses the site. Yet, how does knowing a policy reduce the site in any way?
posted by ?! at 9:21 AM on February 9, 2006 [1 favorite]


I said 'please' and used a respectful tone... Isn't that your requirements?
posted by Balisong at 9:23 AM on February 9, 2006


Isn't it kind of rude to ask someone you don't know really well about log retention? /freud
posted by Lynsey at 9:25 AM on February 9, 2006


dios, the server logs contain the records of page requests and other activity on the site. Most importantly for these purposes, the logs will record the originating IP address of each page request. With the IP address, an investigating agency may be able to directly determine your identity or obtain that information from your ISP.
posted by monju_bosatsu at 9:27 AM on February 9, 2006


A member of a government by the worst calls some one else delusional?

Huh?

If you think the government has a smokey room full of men monitoring your metafilter postings, waiting for enough evidence so they can start sending subpoenas, yeah, I think you're delusional.

What does that even mean, wiretapping a website? It's a useless term. I mean, IP wasn't exactly built for secrecy, and most everything they could want is already publicly available, maybe with the exception of anonymous ask-mifi postings.

I think the paranoid community around here needs to examine their level of self-importance. None of you guys are Daniel Ellsberg or anything...
posted by SweetJesus at 9:28 AM on February 9, 2006


Balisong: I appreciate respectful tone, but surely you can see that your question is a de-rail. Feel free to e-mail if you would like.

monju_bosatsu: why would Matt want to keep them at all? If they are an indication of the origin of something problematic, sounds to me like Matt should keep them to limit his own liability and protect against absuses.

As a policy matter, I kind of like the idea that people ought to be careful what they say and do on this site. Not keeping track of the logs would be giving encouragement to inappropriate behavior.
posted by dios at 9:32 AM on February 9, 2006


We are all cypherpunks now.

(no seriously, matt. Do you really need to retain server logs?)

As a policy matter, I kind of like the idea that people ought to be careful what they say and do on this site.

Surprise!
posted by sonofsamiam at 9:33 AM on February 9, 2006


Not keeping track of the logs would be giving encouragement to inappropriate behavior.

You better do some more reading on server logs. And give up on the idea of enforcing politeness. If it worked I'd keep punching people until they said "Thank you," but they never od. They just pass out.
posted by yerfatma at 9:37 AM on February 9, 2006


Since Matt has to answer for what ultimately appears, the poster should share some of the burden of responsibility. The poster's burden is eliminated if server logs are not existent. I can envision plenty of scenarios that it would serve the best interest of Matt and the site as a whole that these logs are maintained. We all know that freed from accountability and identifiability, people are more inclined to do things which jeopardize the site and Matt personally.
posted by dios at 9:38 AM on February 9, 2006


On the internet, only the NSA knows you're a dog.
posted by blue_beetle at 9:39 AM on February 9, 2006


yerfatma: if I am not understanding server logs, I apologize. I readily plead ignorance in the matter. But based on what monju_bosatsu said, I am under the impression that they are the only way to identify the source of conduct. Is that true?
posted by dios at 9:40 AM on February 9, 2006


How the government looks for terrorists:

posted by blue_beetle at 9:42 AM on February 9, 2006


What is the worry, except for perhaps anonymous AskMe which I think he deletes after a few weeks? (Someone please correct me on that.) Everything you post is linkable back to you regardless of his retention policies to the extent you stay an active member. I doubt there is much to be worried about as to what you looked at but didn't comment upon (Google on the other hand). Is there a scenario in which an active member might worry and which would be corrected by deletion of the logs?
posted by caddis at 9:45 AM on February 9, 2006


monju_bosatsu: why would Matt want to keep them at all? If they are an indication of the origin of something problematic, sounds to me like Matt should keep them to limit his own liability and protect against absuses.

Not retaining server logs is often viewed as a de facto admission of guilt. It's the default behavior for every major webserver to log all activity and cycle those logfiles when they become too large. They're generally useful for rooting out serious attackers slamming the server with bogus requests, and finding people trying to dodge bans with sock puppets or otherwise dick around with the site. To not have them on hand given the degree to which they are standard operating procedure and also very useful looks very, very suspicious.

That having been said, I hope Matt implements a simple weekly cron job/Windows script to clear out any logs older than one month. Doing so creates the illusion of cooperation and enables him to use the excuse that 'they were using too much space' (logfiles gzipped at maximum compression take up virtually no space, in truth, but compressing logfiles as they are rotated out is NOT default behavior).
posted by Ryvar at 9:45 AM on February 9, 2006


dios, yes that's true, hence my one month retention suggestion.
posted by Ryvar at 9:46 AM on February 9, 2006


Server logs provide long term trends and statistics, too. I think it would be a little silly for Matt to dump them or secure delete the stuff.

I can understand the paranoia, but really, this place is like a mild baby soap when it comes to commentary that would warrant Government sanctioned "wiretapping" or server log retrieval... and the rest. This is not a Bittorrent site, Terrorist training manual YouSendIt place, or some other dubious operation -- it's something entirely different.
posted by gsb at 9:46 AM on February 9, 2006


dios: server logs are not reliable for identifying the source of a troublemaker. Metafilter is still accessible through anonymizing proxies.

The identifying information can also be stored in such a way (hashing) that allows the administrator to see if the same IP is connecting under multiple usernames and any other administrative needs like long-term usage, but still protects users' privacy.

And unless there is reason to suspect that REAL LIVE TERRORISTS are MeFier's there is no need for IP information to ever be disclosed.
posted by sonofsamiam at 9:49 AM on February 9, 2006


dios writes "This is a thread about Matt's policies about server logs; this isn't a thread about administration policies. You can discuss them in the referenced thread if you would like."

MetaFilter: self-policing dios-policed since 2006.

Incidentally, when I originally asked the question, it wasn't meant as snark -- it's a sincere question, and I sincerely hope that Matt has a minimum retention time policy. I also hope he'll allow users to change their usernames, and not retain records of the previous username.

It really doesn't matter if you can trust the current government or not: governments change, sometimes quite quickly. On 14 May 1940, nobody would have called the Dutch government's retention of data on the Frank family threatening. On 15 May 1940, the Dutch government and all its records -- including those on Anne Frank and her family -- were surrendered to its Nazi conquerors.

Any data retained is data at risk of being misused by a hacker or a government; therefore, only necessary data should be retained. There's little need for Matt to retain users' IP addresses for longer than, say, a week.
posted by orthogonality at 9:50 AM on February 9, 2006


If someone with some apache knowledge tells me how to turn off server logging, I'll do it. They've gotten so large that I don't have the spare cycles to even look at them anymore and haven't crunched a single log file in six months.

Seriously, apache logs are out if anyone knows the directive to turn them on if the httpd.conf file.
posted by mathowie (staff) at 9:54 AM on February 9, 2006


I also hope he'll allow users to change their usernames, and not retain records of the previous username.

It can all be part of the Bright Shiny New Day policy.

I'd start over.. Dios could, too.

You never know.. we could become online lovers.
posted by Balisong at 9:56 AM on February 9, 2006


On 15 May 1940, the Dutch government and all its records -- including those on Anne Frank and her family -- were surrendered to its Nazi conquerors.

How on Earth did a thread about server logs get Godwined?
posted by Gamblor at 9:58 AM on February 9, 2006


dios: Server logs can contain just about any information you can imagine about a visitor to your site and what they did while there were there.

*From this IP address, using this type of browser, on this date, at this time, this person accessed my index.php file.

*From this IP address, using this type of browser, on this date, at this time, this person searched my site for the following word.

*From this IP address, using this type of browser, on this date, at this time, this person viewed the image.jpg file.

One of their main functions is to provide data that a program could use to compile statistical information about your website... how many "hits", which browser is the most popular, which file was accessed the most, which search-string did people look for the most, from what other website were my visitors most often referred? They also help server admins and webmasters troubleshoot technical issues.

Matt - just comment them out and restart Apache. The lines that start with CustomLog or AccessLog or ReferrerLog with the VirtualHost entries for your various (sub)domains.
posted by Witty at 9:59 AM on February 9, 2006


I guess you'd have to comment out ErrorLog too. I mean, if I tried to go to http://metafilter.com/jihad it would show up in that log as an "error" (404).
posted by Witty at 10:02 AM on February 9, 2006


Crazy idea. Stop me if you've heard this one:

Say you want to have the utility of comparing ips from three years ago with recent events (to, say, try to correlate recent server activity with long-past events), but you wish to destroy your actual logs older than a week/month/whatever for privacy reasons, a la Ryvar's explanation above.

When you destroy your logs, you actually put your important information (IP, whatever else) through a one-way hash function, and store the transmuted logs indefinitely.

That way, if you're suspicious about some IP correlation based on behavior from the last week or so, you take that currently-retained log info, throw it through your hash, and compare that against your hashed logs. There's a risk of collision, of course, but with a decent hash the risk will be small and regardless the human agent can make a good guess about what is and isn't a match. Basically, it gives enough information for some detective work.

And should the government ever seize those hashed logs, they can't get the actual IP/etc information out of the them, because the hashing destroys that direct correlation.

(There is, of course, the possibility that they would seize the hashing function as well, and thus be able to make some progress at recreating the data by brute-force. I haven't thought of a good solution for that just yet.)

Does anybody do this? Is anything about this idea fundamentally flawed?
posted by cortex at 10:02 AM on February 9, 2006


As a policy matter, I kind of like the idea that people ought to be careful what they say and do on this site.

This is a thread about Matt's policies about server logs; this isn't a thread about your desire to control other people's behavior. You can discuss that in the referenced thread if you would like.

Seriously, apache logs are out if anyone knows the directive to turn them on if the httpd.conf file.

This page has some info about customizing logs for privacy.
posted by Armitage Shanks at 10:06 AM on February 9, 2006


And should the government ever seize those hashed logs, they can't get the actual IP/etc information out of the them, because the hashing destroys that direct correlation.

Well, the government could subpoena the hashed logs, the hashing function, and the current unhashed logs. If the offending user hits the site within the retention period of the unhashed logs, the hashing bit doesn't do much good.
posted by monju_bosatsu at 10:07 AM on February 9, 2006


Also, on the whole isse of government surveillance, I find it much more likely that Matt would be subpoened by a private entity, seeking information about some user that posted copyrighted material, for example.
posted by monju_bosatsu at 10:10 AM on February 9, 2006


cortex writes "Does anybody do this? Is anything about this idea fundamentally flawed?"


It's a decent idea, but....

IP address space is "only" 256^4 = 2^32 (thus the eventual move to IPv6) or about 4.4 billion possible values, less numbers reserved for internal addresses. Given that Matt's hash would have to be quick enough to run in real time, I'd guess that the NSA could iterate over all possibilities reasonably quickly.
posted by orthogonality at 10:11 AM on February 9, 2006


You could also just have a simple script rotate your logs every day at midnight or something... so that you wouldn't have any logs older than a day (in case you needed them at some point, that minute, for troubleshooting or whatever).

Armitage Shanks - some good ideas in there.
posted by Witty at 10:15 AM on February 9, 2006


(logfiles gzipped at maximum compression take up virtually no space, in truth, but compressing logfiles as they are rotated out is NOT default behavior)

It's the default behavior for logrotate under Debian. Under windows (which I believe Matt uses), you have to jump through some hoops to get log rotation to occur at all. cronolog comes in handy here.

To not have them on hand given the degree to which they are standard operating procedure and also very useful looks very, very suspicious.

Or it looks like you don't want to be in a position to have to give them up, have your equipment siezed, be deposed, made to testify, and/or subjected to hostile interrogation on the basis of comments posted by others and over which you have no control.
posted by George_Spiggott at 10:15 AM on February 9, 2006


"As a policy matter, I kind of like the idea that people ought to be careful what they say and do on this site."

Thank you, Ari Fleischer.
posted by klangklangston at 10:18 AM on February 9, 2006


dios writes "As a policy matter, I kind of like the idea that people ought to be careful what they say and do on this site. Not keeping track of the logs would be giving encouragement to inappropriate behavior."


dios, are you confused or fascist? Matt can sanction people by disabling a user account; no knowledge of an IP address is required. IP addresses are only useful if the sanctioning involves discovering users' real identities in order to punish them in real life.

Are you really contending that you "like the idea that people [would] be careful what they say and do on this site" out of fear of consequences in their real life?

WHAT
THE
FUCK
DIOS?

Should we worry that you're going to call the cops on us for what we write here? That's one fucking chilling effect dios.

This is rich: dios complains that he's such a martyr because people here "misunderstand" him, and then dios essentially says that he's happier if we all live in a state of fear of real life consequences.

Again: WHAT THE FUCK?
posted by orthogonality at 10:19 AM on February 9, 2006


Ask Steve Jackson Games or Ernie Ball strings about server seizures. They can shut down your entire operation for months.

It is a real concern.
posted by sonofsamiam at 10:20 AM on February 9, 2006


Wow. Thanks for the numerous explanations. It really reminds me of just how tech-savvy so many of the users are here (and how I am woefully inadequate in comparison).

I find it much more likely that Matt would be subpoened by a private entity, seeking information about some user that posted copyrighted material, for example.
posted by monju_bosatsu at 12:10 PM CST on February 9


I whole-heartedly agree. There are innumerable civil reasons why someone might want information in addition to copyright, and those subpoenas are much more likely than a criminal reason.
posted by dios at 10:23 AM on February 9, 2006


Matt can sanction people by disabling a user account; no knowledge of an IP address is required.

You need IP addresses if you want to "permaban" peope and stop them from re-registering.
posted by smackfu at 10:26 AM on February 9, 2006


Oh, here's a batch file somone's written to manage Apache logfile retention under windows. This would allow you to keep logs long enough to help diagnose very recent problems and abuse but not long enough for anyone to come after you for them -- one day, perhaps?

I haven't tried in myself; on the Windows machines I'm obliged to manage I just use cronlog to generate datestamped files every day, and and every now and then just manually throw away old files.
posted by George_Spiggott at 10:27 AM on February 9, 2006


You need IP addresses if you want to "permaban" peope and stop them from re-registering.

Assuming it's a dedicate IP address, yes.

And Matt, there's a typo (one of several actually) in my earlier post that should have said ...WITHIN the VirtualHost entries for your various (sub)domains.
posted by Witty at 10:29 AM on February 9, 2006


let me tell you, if i were retaining my logs for a month i would be in some severe pain.
posted by wakko at 10:29 AM on February 9, 2006


As a hypothetical matter, if I was advising someone in Matt's shoes, I would tell him to keep the server logs. There are lots of legal reasons why I would encourage a client to do so in order to defend itself and to avoid any potential problems for not keeping accurate records (e.g., sanctions for permitting spoliation of records).

By running this site, Matt exposes himself to various liabilities. He ought to protect himself the best he can. Maintaining detailed documentation is one way. Holding people accountable for their potentially liability-creating actions is another.
posted by dios at 10:32 AM on February 9, 2006


Matt: What if you ever need to give aggregate statistics to potential advertisers? Weren't IP logs useful in the whole Pretty_Generic thing?

Just deleting server logs doesn't mean you'll get your server back any quicker if the men with balaclavas and MP5s come knocking.

Also, what SweetJesus said.
posted by matthewr at 10:35 AM on February 9, 2006




"I kind of like the idea that people ought to be careful what they say and do on this site"

This is what is wrong with America. People actually think this way. And they aren't even ashamed to say it right out loud.
posted by y6y6y6 at 10:39 AM on February 9, 2006


monju_botatsu: True, the recent unhashed log window would be a privacy vulnerability -- I guess working out that trade-off (window of administrative response vs. minimizing privacy risks) would be a challenge that'd require a case-by-case decision. Dunno what would make sense for Mefi, as a hypothetical -- how long does Matt need, as an acceptable window, to notice something hinky (and create, perhaps, a temporary one-off copy of the logs for further analysis outside of the standard destruction window)? A day? Two days? A week?

orthogonality: Yeah, that's what I was thinking. However, what about a well-balanced but much smaller domain of hashed values than IPv4 space? Sure, the input is 256^4, but the hash values could be restricted to, say, 64^4. Then we're mapping 4,294,967,296 possible IPs to 16,777,216 possible hash values, leaving the investigator a range of (in this example) 256 possible values for any hashed IP. And the number could of course be increased further by reducing the domain of hash values. The downside is that investigation for Matt becomes more difficult -- smaller hash domain mean greater chance of collisions on searching -- but I'd be curious whether that'd practically speaking be prohibitive to his search in any given case.

(But then, could the same be said of our hypothetical subpeona dropper? Probably. However, they would be prevented from simply raiding the logs for IPs, which would be something.)

Regardless, it sounds like this has been at least explored before, which warms the very cockles of my CS-nerd heart.
posted by cortex at 10:39 AM on February 9, 2006


Would someone please explain the benefit of deleting old server logs to me, other than saving hard drive space. As long any recent logs are available when the subpeona arrives and the person of interest was logged in during that period the IP is known. Are you asking Matt to never keep server logs?
posted by caddis at 10:39 AM on February 9, 2006


orthogonality, are you out of your fucking mind? take a walk or something. You're going waaaaay out on a limb, here, and it sure as hell looks like an unmerited personal attack to me.
posted by shmegegge at 10:40 AM on February 9, 2006


the men with balaclavas and MP5s come knocking.

I read that as "balalaikas" and was trying to make sense of heavily-armed Russian folk musicians raiding server rooms.
posted by cortex at 10:44 AM on February 9, 2006


"baklavas" here.
posted by rxrfrx at 10:45 AM on February 9, 2006


shmegegge writes "You're going waaaaay out on a limb, here, and it sure as hell looks like an unmerited personal attack to me."


How so?
posted by orthogonality at 10:45 AM on February 9, 2006


If someone with some apache knowledge tells me how to turn off server logging, I'll do it.

not a good idea, matt ... if someone is hacking or otherwise abusing the site, you want to be able to tell where it's coming from and what they're accessing

i think two weeks to a month's worth of logging is reasonable for this ... and to comply with any subpoenas you might encounter, although a consultation with a lawyer should be done before you reach a decision

by the way, people, this is a public website and we're speaking in public ... and many of us have email addresses and websites linked in our user pages ... that's something each of us should consider before we say something that we might be uncomfortable with the government, or whoever, finding how who really said it
posted by pyramid termite at 10:47 AM on February 9, 2006


caddis: from a paranoid standpoint, the idea is to reduce the amount of ancillary information that falls into the hands of the party responsible for the subpeona.

On one level it's laudable to keep no logs and thus protect the (IP) identity of the person in question. You're right that a short log window in which that person is present defeats this.

On another level, though, it's laudable to avoid allowing massive amounts of IP/etc information to fall into the hands of an investigating party with questionable motives. Whether or not they express an interest in the IPs of every user at every connection over the entire history of the logs, that is what they would in fact end up with. From a paranoid perspective, that's a bad, bad thing.
posted by cortex at 10:48 AM on February 9, 2006


by the way, people, this is a public website and we're speaking in public ... and many of us have email addresses and websites linked in our user pages ... that's something each of us should consider before we say something that we might be uncomfortable with the government, or whoever, finding how who really said it

Granted, pyramidtermite. But that's something the smart paranoid will already have considered; as for the non-paranoids, they don't care.

Hopefully, any stupid paranoids will take this opportunity to become a little bit smarter.
posted by cortex at 10:50 AM on February 9, 2006


...that's something each of us should consider before we say something that we might be uncomfortable with the government, or whoever, finding how who really said it.

See, I keep reading different versions of this same idea, but I just don't get it. Can you or someone else provide a hypothetical example of what you think someone might want to say on this website, for example, but not want the government to know about for fear of being tracked down and... whatwever happens next?
posted by Witty at 10:51 AM on February 9, 2006


Matt has no obligation to keep any kind of server logs, any more than you or I have to write down who we meet and talk to every day. As a result, keeping them actually exposes him to *more* liability.
posted by felix at 10:54 AM on February 9, 2006


Isn't log cleaning moot anyway? I was under the impression that Matt was saving IP addresses in the database for each comment posted (or at least for each login).

Is that not true? And, since there's also a time stamp, all anybody needs to do is get access to the database.
posted by willnot at 10:54 AM on February 9, 2006


dios writes ", I would tell him to keep the server logs. There are lots of legal reasons why I would encourage a client to do so in order to defend itself and to avoid any potential problems for not keeping accurate records (e.g., sanctions for permitting spoliation of records)."

Geez dios matt's running a web site not a publicly traded bank or something. There is no expectation of record keeping. The US goverment can't even make libraries keep track of who is signing out books.
posted by Mitheral at 10:55 AM on February 9, 2006


It really doesn't matter if you can trust the current government or not: governments change, sometimes quite quickly. [...] Any data retained is data at risk of being misused by a hacker or a government; therefore, only necessary data should be retained. - orthogonality

Here in Canada, I've heard a few gay people express hesitation about becoming legally married for this very reason. They're concerned about becoming registered homosexuals.
posted by raedyn at 10:57 AM on February 9, 2006


rxrfrx : thank you kindly. it would be like forcing a cinderblock through a crazy straw otherwise.
posted by wakko at 10:59 AM on February 9, 2006


You need IP addresses if you want to "permaban" peope and stop them from re-registering.

There's no reason to suspect that the offender will continue with the same IP. There are numerous ways to get different IPs.
posted by sonofsamiam at 11:00 AM on February 9, 2006


Witty: you have to understand that it's a matter of principle. You're not a paranoid, and neither am I, and because of that we aren't really that concerned about it. But if you're willing to consider the worst case scenario -- that an agent with some power and no scruples wishes to commit unethical deeds with no concern for your welfare -- then perhaps you can see why folks would worry about, if not any specific likely instance, certainly the possibility of that sort of thing happening. And, reflexively, of the ways in which such a scenario can be prevented.

(And that's a key idea, here: if you can prevent a possible abuse without significantly hampering the quality of the experience being modified, why not?)

A hypothetical, though: someone says "the government kidnapped John Nguyen on unfounded suspicion of terrorism, and I haven't seen him since and they're covering it up." The government's agents see this and desire to cover up this leak of their coverup. Subpeona, ip, local geographic search, soft target located, dot dot dot.

That's pretty over the top, but from an absolute perspective, especially to a cynical paranoid, dangerously possible. You can create much less dramatic hypotheticals that would nonetheless be of reasonable concern to folks.
posted by cortex at 11:00 AM on February 9, 2006


First they came for the logfiles, and I didn't speak up, because I wasn't a logfile. So they took the logfiles, which were really freakin' huge, and didn't really contain any useful information anyway. Then nothing happened, because who has the time to sort through all that shit, anyway. But somewhere, some government flunky is laughing his ass off about the pissing elephant.
posted by the shitty Baldwin at 11:01 AM on February 9, 2006


dios writes "As a hypothetical matter, if I was advising someone in Matt's shoes, I would tell him to keep the server logs."

Even though up until a few minutes ago, you had no idea what server logs were. I would advise the opposite, or keeping logs for a minimal time like three days (which is plenty long enough for a site with traffic like this one gets), knowing the technical details intimately as well as the legal history. Having the logs is a much bigger liability, as it makes you a target for those who would mine that data, whether it be the government or a private party. The site and the world will keep going just fine without logs.
posted by krinklyfig at 11:01 AM on February 9, 2006


(And, as others have pointed you, you can easily replace "subpeona" with "forced entry" -- physical or digital. And the implications don't get any prettier at that point.)
posted by cortex at 11:03 AM on February 9, 2006


raedyn writes "Here in Canada, I've heard a few gay people express hesitation about becoming legally married for this very reason. They're concerned about becoming registered homosexuals"


Yes. It's the same reason Visa and MasterCard don't allow businesses (web based or otherwise) to retain customers' credit card numbers; the credit card companies know that retained data will eventually be hacked or misused, but that unretained data can't be misused.

But as the Diebold voting machine scandal shows (Diebold claims it's technologically unfeasable to give receipts to voters using its voting machine; Diebold also manufactures ATM machines that do give receipts), Americans care much more about their money than about their civil liberties.
posted by orthogonality at 11:04 AM on February 9, 2006


People here are confusing a lot of concepts, so I guess I should explain some stuff.

Here what I have in the way of sensitive material:
1. apache server logs -- virtually anonymous, takes some effort to tie them to particular users
2. contributions to the site with IP addresses recorded in the database.

Now, someday, let's say the Secret Service knocks on my door wanting to know about an iranian blogger that is trying to score nuclear secrets or something. The server logs are virtually useless. The database might contain an ask mefi question about technical stuff that would help them and is easily retrieved, along with the IP they used. By law, I would have to turn that kind of information over or shut down the site and drag my lawyer into a prolonged fight.

Keep in mind there's nothing sensitive here, being that everything submitted on the site is public and easily seen by search engines and people browsing the web. If someone wanted to track a user, they can do it already.

Now, I know that apache logs can help you track down problems and various hacks, but I've actually haven't had to use them for years, since all the internal attacks are easily tracked in the database itself. So my inclination is that to someday do web stats monitoring, I could use something like hitbox or google analytics, but I would be happy to dump the logs.

At the moment, all the customlog entries in apache are commented out on virtual hosts, but everything seems to go back to a single log file for all sites on the box. I can't quite figure out how to turn off all logging by apache, which is why I asked.

In general, this is much ado about nothing. Apache server logs are almost useless to the point that I don't even touch them. And if the SS really wanted a glimpse of my database, they would get it, because I couldn't handle taking the site down for six months while I fight a costly court battle.
posted by mathowie (staff) at 11:05 AM on February 9, 2006


Can you or someone else provide a hypothetical example of what you think someone might want to say on this website

hypothetical? ... i can do better than that
posted by pyramid termite at 11:05 AM on February 9, 2006


Wait, you do store ip information for every comment in the database? Huh.
posted by cortex at 11:09 AM on February 9, 2006


Keep in mind there's nothing sensitive here, being that everything submitted on the site is public and easily seen by search engines and people browsing the web. If someone wanted to track a user, they can do it already.

The issue isn't tracking the user, it's identifying the user. But in any case, if the IPs are in the database, then yeah, the server log issue is a bit of a red herring.
posted by monju_bosatsu at 11:11 AM on February 9, 2006


Standard best practice is to keep extraneous data for the shortest period that complies with all legal requirements.

If Mefi was a bank, Matt might have to worry about Basel II, Sarbanes-Oxley, Gramm-Leach-Billey, VISA CISP and god knows how many other things.

As it stands, though, MeFi is essentially unregulated, and Matt is well-served to store only that which is essential towards running the site. You can't be compelled to provide that which does not exist.
posted by I Love Tacos at 11:12 AM on February 9, 2006


Mmmmm, baklava.
*drool*
posted by deborah at 11:13 AM on February 9, 2006


This is just one big red herring. Matt should keep (or at least retain the option to keep) some server logs to fight hackers etc. and once he does no member who keeps posting is anonymous. Once they know who you are who cares where you accessed from? I am probably more concerned about these kinds of privacy issues than most, but once they have your identity the other stuff just seems trivial. Principal is nice, but if it provides only meaningless comfort why bother.
posted by caddis at 11:14 AM on February 9, 2006


mathowie writes "And if the SS really wanted a glimpse of my database, they would get it, because I couldn't handle taking the site down for six months while I fight a costly court battle."


Yeah, I think that's what Rosa Parks said.
posted by orthogonality at 11:16 AM on February 9, 2006


2. contributions to the site with IP addresses recorded in the database

Well keeping IPs in the database isn't a good thing. Isn't it enough to just record the username? That's exactly the sort of information I'd think should be destroyed after a certain amount of information. That sort of audit logging is indeed ripe for abuse by the authorities--especially if it goes back forever.
posted by nixerman at 11:18 AM on February 9, 2006


(Of course, the whole hashing concept could be applied to the IPs stored in the database, as well.)

caddis -- the (hypothetical) idea is to find some way to provide a good compromise of hacker-fightin' short-term logs and privacy-protectin' log destruction. If you put that compromise into practice, you're making use of the principle and it isn't just meaningless comfort.
posted by cortex at 11:19 AM on February 9, 2006


Yeah, I think that's what Rosa Parks said.

Sometimes I wish I lived in orthogonality's universe, where every message board host on the web was Rosa Parks fighting the Nazis. Mostly though, it just makes me tired.
posted by monju_bosatsu at 11:21 AM on February 9, 2006 [1 favorite]


1. apache server logs -- virtually anonymous, takes some effort to tie them to particular users.

Depending on how you have the logs formatted. Mine, for instance, make note of the username the person is signed in with on every log entry. If the person isn't a member or not signed in, then that part of the log entry is blank. But anyway...

...but everything seems to go back to a single log file for all sites on the box. I can't quite figure out how to turn off all logging by apache, which is why I asked.

There should be one more CustomLog, ErrorLog, etc. further up in the file (usually), before of the VirtualHost entries (which are usually towards the bottom of the file).
posted by Witty at 11:22 AM on February 9, 2006


I'd look towards the geek sites for example solutions to these problems. I'm quite sure that Slashdot has an answer for fighting hackers/abuse without storing personally identifiable information.
posted by I Love Tacos at 11:22 AM on February 9, 2006


Cortex - Once you have a short term log you have identities (at least by IP address which is generally not hard for the authorities to turn into an actual identity) so there is no good compromise which will protect your identity. Given that Matt keeps an IP address / username database the whole issue is moot. If I am the G-Man that is what I want. With that I find out who you are and send some goons nice agents to go have a talk with you.
posted by caddis at 11:25 AM on February 9, 2006


"Yeah, I think that's what Rosa Parks said."

Yeah, she said it to HITLER!

Do you listen to yourself, or is the sound of jerking off too loud?
posted by klangklangston at 11:33 AM on February 9, 2006


And if the SS really wanted a glimpse of my database, they would get it, because I couldn't handle taking the site down for six months while I fight a costly court battle. - mathowie

Yeah, I think that's what Rosa Parks said. - orthogonality

Seriously? Geez. Double geez. Or, what monju_bosatsu said.
posted by raedyn at 11:35 AM on February 9, 2006


fm is on a roll !
posted by sgt.serenity at 11:38 AM on February 9, 2006


If it's not too much trouble, I'd like all my browsing history and etcetera to all point to goat.cx and tubgirl. Let's give the feds an eyeful of policy.
posted by five fresh fish at 11:40 AM on February 9, 2006


Rosa Parks fighting the Nazis

I heard that was going to be the plot of the sequel to The League of Extraordinary Gentlemen. It would have also featured Ian Fleming, Gene Roddenberry, and Oscar Wilde as the plucky comic relief.
posted by Gator at 11:41 AM on February 9, 2006


that sort of threat to the mental well being of our agents is getting you investigated fff
posted by caddis at 11:42 AM on February 9, 2006


cortex: Wait, you do store ip information for every comment in the database? Huh.

Not to single you out cortex, but I don't know why that wasn't totally obvious. The Pretty_Generic thing was certainly less than six months ago (time doesn't go by that fast does it?), and mathowie was very specific when he said he doesn't use apache logs.

Anyway, the less information available to jack-booted thugs the better, as far as I'm concerned. On the other hand, I think people should post their real name on their profile page.

If we would all just join politically subversive organizations, threaten to kill politicians, and violate any and all intellectual property law, the authorities would be overwhelmed and we'd have nothing to worry about.
posted by Chuckles at 11:43 AM on February 9, 2006


If a talented network engineer wanted to divine who you are (for the most part) based on your metafilter posting, all it would take is nmap and the willingness to do it. If the feddie govs wanted to do, they could subpoena AT&T or MCI and listen to the traffic on one of the backbones, and Matt wouldn't even know...

No amount of policy or log deletion will make you anonymous on the internet, so the gnashing of teeth about it in this thread is really quite telling.

If you want to be anonymous, use several proxy servers or the TOR network, and even then you aren't "truly" anonymous...
posted by SweetJesus at 11:43 AM on February 9, 2006


Addendum - As a member of the greater paranoid community, it is your responsibility to take measures to protect the anonymity of your identity, not the other way around. There are many client side software packages designed to help make you more anonymous, and they have the added advantage of not requiring someone else's time and energy to protect you from your invisible enemies...
posted by SweetJesus at 12:08 PM on February 9, 2006


Cortex - Once you have a short term log you have identities (at least by IP address which is generally not hard for the authorities to turn into an actual identity) so there is no good compromise which will protect your identity.

caddis -- I understand what you're saying, but I'm not saying the same thing. Where I think we disagree. (And I'm arguing this from the hypothetical case where this information isn't stored elsewhere, as it apparently is in the db in Mefi's case):

Having a short term log does, indeed, reveal the identity of anyone appearing in that window of time, but it protects those who are not within that window. The shorter the window, the less identities revealed, which is why the window ought to be as short as is usable. Disposing of older logs destroys identifying information.

In this case, it's not about protecting the identity of anyone active within the log window -- only eschewing all logging would do that. It's about protecting the identity of everyone else. If I've commented recently enough, yes, I'm boned, but Migs in his absence remains (so to speak) anonymous. Compare that to logs-in-perpetuity -- everyone is identified. That is the distinction I'm making: some vs. all, not none vs. some.

Chuckles: you are right -- that IPs are logged in the db is obvious when reasoned out from Matt's admission that he doesn't use the apache logs at all. I hadn't made the connection. I may have paraconsciously declined to make the connection simply because it's not how I would (and have) done that sort of thing -- I would use the logs, and reconstruct IP-to-comment information dynamically when investigating. Not because it's explicitly a better or easier way, but just because that's how it occurs to me to do it.

As a member of the greater paranoid community, it is your responsibility to take measures to protect the anonymity of your identity, not the other way around.

Very true. A paranoid is not, however, prohibited from taking measures to attempt to increase within those sites he frequents the degree of conformance with his paranoiac guidelines.
posted by cortex at 12:16 PM on February 9, 2006


be careful what they say and what you do, on this site and in real life, or you'll be investigated, and punished. only those who have something to hide worry about being searched, after all. it's a long war. civil rights are obsolete, and probably dangerous.
posted by matteo at 12:38 PM on February 9, 2006

(logfiles gzipped at maximum compression take up virtually no space, in truth
That's the funniest thing I've read all day. Thanks.

(In case that's a little too obscure for the non-technical crowd, a single webserver on a machine running multiple web servers, on a busy website (Alexa ranking around 1500) can generate gzipped or bzipped files that are approximately 10 Mb per day. In four years, that server accumulated a total of 6 Gb of logs, including sys, error, and access logs. This is one server in a pool of more than 20 servers handling web services. Metafilter is ranked about 3500 on Alexa, has fewer servers, and significantly reduced revenues to handle these things. Despite its smaller size, log maintenance can become an expensive, time consuming task, so just rotate all logs, delete them when they get n days old if they are not necessary for development. Set it and forget it. Send an email if you need help.)
posted by sequential at 12:39 PM on February 9, 2006


sequential, there's only one webserver and a daily log can be 150Mb. I just deleted 50Gb of logs and it turns out I stopped saving apache logs back in December. I also set a daily "delete current log files" routine to the web server, just to be sure.
posted by mathowie (staff) at 1:11 PM on February 9, 2006


Well I appreciate it, mathowie. Even if IP info is in the database, there is further identifying information in server logs that can be used to (surprisingly effectively) reconstruct RL identities.
posted by sonofsamiam at 1:17 PM on February 9, 2006


sequential I ran a server with one tenth MeFi's traffic in terms of hits for a year. Our total logfiles after gzip -9 crunched down to 25MBs for the entire year. It stands to reason that Metafilter would then be simple an order of magnitude more than that, give or take (it serves less image files than my site did). 250MB a year is effectively nothing.
posted by Ryvar at 1:23 PM on February 9, 2006


sequential, there's only one webserver and a daily log can be 150Mb.

Jesus H. Christ.
posted by Ryvar at 1:26 PM on February 9, 2006


Wow... having some of the paranoid (and likely violent) psychotics in here rounded up would almost be worth the US becoming fascist.
posted by Krrrlson at 3:27 PM on February 9, 2006


That's pretty much all of us, Krrrlson
ONE OF US! ONE OF US!
posted by Ryvar at 3:34 PM on February 9, 2006


PARANOIDS DO IT EXTREMELY CAUTIOUSLY

HONK IF YOU'RE SURVEILING ME COVERTLY

posted by cortex at 3:39 PM on February 9, 2006


Yes, but what about waxy's MeFi stats? That's the important question here.
posted by stavrosthewonderchicken at 4:07 PM on February 9, 2006


sweetjesus: Didn't mean to confuse you. I just thought anyone who made a site called kakistocrat is announcing he is a member of a kakistocracy.

I know the point was lost as everyone started assuming concerned people are delusional, paranoid, or violent, but I was just looking to know what was the policy. Matt answered. Thanks, Matt.
posted by ?! at 4:29 PM on February 9, 2006


I know the point was lost as everyone started assuming concerned people are delusonal, paranoid,
or violent.

posted by Krrrlson at 4:57 PM on February 9, 2006


blue_beetle writes "How the government looks for terrorists"


Thank you for the fond memories of the dear, departed PLIF.

</digression>
posted by ChrisR at 5:05 PM on February 9, 2006

sequential, there's only one webserver and a daily log can be 150Mb.
Compressed or uncompressed?
sequential I ran a server with one tenth MeFi's traffic in terms of hits for a year. Our total logfiles after gzip -9 crunched down to 25MBs for the entire year.
(I'm sure you know this, Ryvar, I'm just pointing this out for the benefit of others.)

Apache logs are fully customizable, so it's generally hard to compare them unless your comparing similar httpd confs of similar versions of Apache, but there's even problems with that. For example, by default, apache will log every GET query string. Some web applications use immense query strings, dangerously I might add because web browsers vary in how much query string they can handle. You can have thousands of characters on a query string, which turns into several k per hit. Multiplied by tens of thousands of hits, you get big numbers. The same set up, using post, will generate several orders of magnitude smaller log files.

In addition, Alexa's traffic rank algorithm counts unique visitors, not page views. How many people here reload threads to get new comments? Click on multiple internal links on the same visit? Preview their own comments numerous times? I'm reasonably certain that Metafilter has a higher total page load than the site I was referring to, but has a significantly fewer unique visitors.

I've worked with companies that store all of their data in apache logs for near-time processing, to keep server load to a minimum. These are, of course, customized logs, but they can generate several gig of info a day easily, depending on how much data you are collecting. I sincerely doubt Matt is doing this, so it's probably irrelevant other than for the sake of discussion.

As far as legally speaking, Matt, does MeFi have a privacy policy? On the off chance that MeFi is involved in litigation or law enforcement, it would be useful to have. Furthermore, within your privacy policy, you should disclose your data retention policy. As long as you retain data in accordance with your privacy policy, you aren't likely to be held responsible for logs from years ago. Logs are not, to the best of my knowledge, required to be kept. Many companies keep them for small amounts of time specifically to protect their users privacy. Others, keep them indefinitely.
posted by sequential at 5:43 PM on February 9, 2006


You forgot ad hominen, Krrrlson.
posted by orthogonality at 5:45 PM on February 9, 2006


mathowie:
Seriously, apache logs are out if anyone knows the directive to turn them on if the httpd.conf file.

Just set the logfile to /dev/null ...
posted by Laen at 5:47 PM on February 9, 2006


You forgot ad hominen, Krrrlson.

It's only a fallacy if it's untrue and irrelevant, orthogonality.
posted by Krrrlson at 7:38 PM on February 9, 2006


Matt, does MeFi have a privacy policy? On the off chance that MeFi is involved in litigation or law enforcement, it would be useful to have. Furthermore, within your privacy policy, you should disclose your data retention policy. As long as you retain data in accordance with your privacy policy, you aren't likely to be held responsible for logs from years ago.

Last spring I paid my lawyer for a day's work to make up a terms of service and a privacy policy. They're both kind of rough but I could give them a once over and post links to them in the footer like most sites do. The gist of the PP was that I do my best to keep things secure and there is a record of IPs and comments which are kept long-term, but otherwise I'll never logs, sell, or share anything.
posted by mathowie (staff) at 7:53 PM on February 9, 2006

They're both kind of rough but I could give them a once over and post links to them in the footer like most sites do.
Perhaps you should link them in a MeTa thread first and allow the lawyers and industry professionals that read MeTa to have a go at it. It's not really a big deal, of course, unless you end up involved in something messy involving law enforcement or lawyers. Either way, I'd guess that many of us would have something constructive to say about the policy if you don't mind the signal to noise ratio of your typical MeTa thread. :-) It's all free advice, of course.

You could also use a service like TRUSTe. Your mileage will vary on it's usefulness in building consumer confidence, but you also might be able to yoink some useful legalese for your own privacy policy.
posted by sequential at 9:34 PM on February 9, 2006


sequential: you're correct in that I knew that, although I admit now that thinking back I did significantly trim the verbosity of my logs, but when providing my initial estimate in this thread I'd forgotten that I'd done it.

Hence my reaction to Matt's 150MB/day revelation - that's insane from where I'm standing, and certainly non-trivial over the course of a year regardless of compression level. My bad.
posted by Ryvar at 9:36 PM on February 9, 2006


It's all good, Ryvar. I know you well enough to know that you know. You know?
posted by sequential at 9:54 PM on February 9, 2006


jesus, I hate forgetting what I contributed to a thread and then coming back to find I've ignored a discussion for days when I was actually interested in participating.

anyway, ortho, here is how so:

you took an innocent tossed off remark about people being held accountable for their behavior (from a user who has repeatedly said in the past that he wished mathowie institutued more frequent penalties for poor behavior on the site) and attempted to turn it into a call for mefites to be put in jail, then accused said user of desiring a perpetual state of real life fear for mefites.

there is zero reason to believe that's how he meant the comment. the toneof your reply, the use of the caps lock WHAT THE FUCK and the rest of it makes it clear this was just a personal grudge comment. Frankly, it's the kind of foaming-at-the-mouth willful misinterpretation of an innocent remark that gives us liberals a bad name. It was like a shrill personal attack from bizarro world.
posted by shmegegge at 10:31 AM on February 10, 2006


would almost be worth the US becoming fascist.

it'll never be fascist enough for your taste anyway
posted by matteo at 10:57 AM on February 10, 2006


Don't forget to boast about your alleged Resistenza ancestry and offer to beat me up at a meetup, you pathetic buffoon.
posted by Krrrlson at 1:04 PM on February 10, 2006


shmegegge writes "there is zero reason to believe that's how he meant the comment. "


Bullshit. Let me explain:
1) Matt doesn't need IP addresses to lock or ban an account. And matt doesn't (apparently) care to track down anyone who re-registers under another username -- as we've repeatedly seen banned users return as sockpuppets.

So the only reason dios wants IP addresses retained is to allow dios to threaten us all with in-real-life sanctions.

2) You might be right had it been someone other than dios. But dios specializes in purposely ambiguous provocative language that can be interpreted broadly -- and only if called on that does dios introduce some possible narrower interpretation. Or, as in this case, dios, having trolled broadly, departs the thread and allows his less clever and less subtle hangers-on to offer the narrow interpretation for him.

Again, dios is a troll. He deserved to be called on his hope to see people punished outside MeFi for what they write inside MeFi.

But if I'm wrongly interpreting dios, he -- not anyone else who is merely interpretting dios's ambiguous words -- he can unambiguously tell us that himself, right here, and I will humbly apologize for misreading his intentions.
posted by orthogonality at 1:40 AM on February 11, 2006


You are gravely ill, my friend. Seek help.
posted by Krrrlson at 9:19 AM on February 11, 2006


hear hear, krrrlson.

and I will humbly apologize for misreading his intentions.

no you won't. you're bearing a grudge, here. And you're reading way too much into a tossed off sentence. You need to chill.
posted by shmegegge at 7:03 PM on February 11, 2006


« Older There was a post on Ask Mefi a couple of days ago   |   this is like an octuple-post. Newer »

You are not logged in, either login or create an account to post comments