# Fun with statistics July 15, 2004 10:31 PM   Subscribe

Fun with statistics: I have always been curious to the percentage of "ghost accounts" on MeFi, and so today I decided to find out! I took a random sample of 263 (95% +/- 6) user accounts, and discovered that 61.2% of registered users have never commented or posted. Of the 38.8% who have posted, 45.1% have posted less than 10 comments, 42.2% have posted between 10 and 99 comments, and 12.7% have made more than 100 comments.
posted by Quartermass to MetaFilter-Related at 10:31 PM (57 comments total)

Start trimming, or keep them for numerical superiority? After all, there are advertisers investing here. Possible that the random sample could be inaccurate?
posted by Keyser Soze at 10:58 PM on July 15, 2004

This ought to be fun.

No offense, Quartermass, but I'm waiting for more information on your work that will lead me to believe that this is anything but hand-waving. I don't remember much of my Math degree, but I remember me some...
posted by stavrosthewonderchicken at 11:05 PM on July 15, 2004

See, your first mistake was using the word "statistics" -- that single word will cause many people -- like ME -- to instantly go into coronary arrest.

(shudder)
posted by davidmsc at 11:32 PM on July 15, 2004

No offence taken. Obviously, I kept this pretty loose, and it is a far cry from hard science. I was just trying to get a rough guess as to what kind of proportion "ghost accounts" were to active accounts.

All I did was get the sample size from this sample-size calculator, and then figured out how much plus or minus I would be happy with (I arbitrarily chose 6, but I am sticking by that number). I plugged in the MeFi membership total, and it gave me my sample size.

So for this sample, I can be 95% confident that 61.2% of the users have never used their accounts, plus or minus 6%, so the actual number is somewhere between 55% & 67% - close enough for me.

The rest was simple. I went to random.org plugged in my perameters (random numbers between 1 - 17347, with 263 results). I then plugged in the random user numbers, and used a spread sheet to mark if they had ever used their account or not. For those that did, I noted how many comments they made, and marked them accordingly.

After all was said and done, I plugged everything into SPSS, and it gave me some frequency charts with percentages.

All and all, it was an hour well spent!
posted by Quartermass at 11:41 PM on July 15, 2004

Just because they have not posted, doesn't mean they have not "used" their accounts.
posted by dg at 11:45 PM on July 15, 2004

So what?
posted by kjh at 12:00 AM on July 16, 2004

Right. What I meant to say was that they never used their accounts to post a link or comment.

(slowly starting to wish I didn't do this. . .)

Like I said above, the only reason I did this was to satisfy some curiosity, and thus I was aiming only at generalities, and not "hard" scientific fact. I only posted it because I thought others might be curious as well.

This all started with a MeTa thread from a while back where you looked at the users on both sides of you to see who your neighbors were, and so many had "ghosts." Plus, I always wondered, with 17000 members, why you only see such a small percentage. I don't know. . . .
posted by Quartermass at 12:03 AM on July 16, 2004

(slowly starting to wish I didn't do this. . .)

Heh. That's what I meant when I said 'this ought to be fun'. Snarky bastards 'round here.

So, if the numbers are anything like correct, something like 933 people total have made more than 100 comments each (at which point they might be considered, perhaps, substantive contributors to the community (I realize this characterization could be endlessly argued -- it's just shorthand, mmkay?)).

Lurkers and nonmembers are a whole other thing, of course, but 'less than 1000' seems to me to be a pretty good guesstimate of how many usernames I would recognize as having commented before when I saw them in a thread...[/seat of pants]
posted by stavrosthewonderchicken at 12:13 AM on July 16, 2004

WOoooOOooo... First talk post. Does this skew the numbers?
posted by arse_hat at 12:47 AM on July 16, 2004

Damn.
posted by Quartermass at 12:57 AM on July 16, 2004

Damn is right. How difficult would it be to tally the entire user database?
posted by Keyser Soze at 1:25 AM on July 16, 2004

Just because they have not posted, doesn't mean they have not "used" their accounts.

Is there something I'm missing or is the difference between a lurker with an account and a lurker without an account the width of one split hair?
posted by biffa at 1:32 AM on July 16, 2004

As a political philosopher and over-poster, I'd like to point out that the "skew you!" factor is operative

I'm the 80-year-old who takes three 8-year-olds to see "Shrek 2" and whose average age is 26.
posted by MiguelCardoso at 3:02 AM on July 16, 2004

biffa, I believe that there are many people who use their accounts for the extra features, such as "x new comments since your last visit" etc, rather than to actually participate.

But, yeah, perhaps I am splitting hairs a bit when defining "active".
posted by dg at 3:06 AM on July 16, 2004

I suspect there's more lurkers with accounts than you might think - e.g. in this thread, user 3133 makes his first ever comment. Lurking for 3+ years?
posted by BigCalm at 3:51 AM on July 16, 2004

Quartermass: Damn the naysayers! Statistics are fun and unless you really get what its about, it can seem like bullshit. Like how *statistically*, you get more accurate results with this type of analysis than you would by polling the entire user-base. Interesting stuff.
posted by lazywhinerkid at 6:21 AM on July 16, 2004

Well, Matt could give us the exact statistics very easily.

Matt... Hello? [echo...]

On the other hand, another way to get an almost exact number is to write a script (perl, anyone?) to automatically pull each user page and scrape the statistics into a database table where you could run queries on these numbers.

Do I have volunteers?
posted by PigAlien at 6:47 AM on July 16, 2004

99 - or roughly 5.7% of MeFites - have made more than 1,000 comments.

Just because I wanted to throw some numbers around too.
posted by orange swan at 7:04 AM on July 16, 2004

Actually, I'm Demo's brother.

Huh? I thought you'd been banned ...
posted by carter at 7:21 AM on July 16, 2004

Hi, Orange Swan, how do you know that? I'm not doubting it, I'm merely asking if that is listed somewhere or if you ran some script of your own or something.

I'm currently writing a perl script to pull every user page and tally the comments/threads.
posted by PigAlien at 7:32 AM on July 16, 2004

Oh, Quartermass, about 5% 'regular' posters sounds about right to me; maybe even a little high? PigAlien, sounds good! Will you be posting the stats?
posted by carter at 7:37 AM on July 16, 2004

Yeah, and I'm enjoying the exercise, but if Matt or someone else who has direct access could save me the trouble, I'd appreciate it :)
posted by PigAlien at 7:45 AM on July 16, 2004

*waits for the inevitable "think of the server resources we could save if we reused those accounts!"*
posted by quonsar at 8:00 AM on July 16, 2004

PigAlien, maybe you could schedule that script for some time when the server is relatively idle -- doing it during working hours (North America time) might topple an already overtaxed box.

(on preview: especially since it's already so overloaded by all those ghost accounts!)
posted by gleuschk at 8:03 AM on July 16, 2004

I was gonna say it, quonsar, but I was afraid that someone might take me seriously.
posted by DrJohnEvans at 8:08 AM on July 16, 2004

Hi, Orange Swan, how do you know that? I'm not doubting it, I'm merely asking if that is listed somewhere or if you ran some script of your own or something.

Um, well, I went to the MeFi stats page and looked at the list of 100 posters with the most comments. 99 of us had posted over a 1,000 comments, so I then just figured out what percentage of the total membership 99 represents.

I'm definitely not up to writing scripts, and I also don't understand statistics, so I just came up with a basic but hopefully accurate one.

I'd like to know what total percentage of comments those top 100 posters have contributed.
posted by orange swan at 8:13 AM on July 16, 2004

gleuschk, yes, I think that is a valid point. metafilter gets overtaxed all the time. It would be easier if Matt would notice this thread (I'm sure he's busy but will eventually) then he could probably just shoot the numbers off himself and save me the trouble. I'm kind of indulging myself for the programming exercise.
posted by PigAlien at 8:14 AM on July 16, 2004

Between it's inception and January 19, 2004, 432 people made posts to AskMe. oissubke made 22 of those posts, and grumblebee made 15. These were the two most prolific posts during that time frame. 21 people with user numbers under 200 posted questions. but only 16 people with user numbers between 17000 and 17350 did so.
posted by y6y6y6 at 8:24 AM on July 16, 2004

It's all just me and my many multiple personalities.
posted by troutfishing at 8:31 AM on July 16, 2004

...and, when I eat a large fried catfish, Vietnamese style, the night before - I become Ethereal Bligh.
posted by troutfishing at 8:36 AM on July 16, 2004

Wow. I have a lot less comments than I thought I would. Not even 400 and I've been a member since the Kaycee thing. Is my account in danger of being reprocessed to save on server resources? Should I write a lot more?

Do you guys even know who I am? :'(
posted by ODiV at 8:40 AM on July 16, 2004

Did you guys just hear something?
posted by DrJohnEvans at 8:49 AM on July 16, 2004

I don't have a clue who you are, ODiV. But then I've also never heard of several of the top 100 most frequent posters, such as....

These people are sharks, I'm telling you.
posted by orange swan at 8:51 AM on July 16, 2004

I just had a bout of near-crippling nostalgia. I miss those folks.
posted by gleuschk at 8:53 AM on July 16, 2004

Hi Orange Swan, where do you get these statistics? Is there a statistics page I'm missing?
posted by PigAlien at 9:01 AM on July 16, 2004

You do not need to sample the entire database to derive accurate statistics.

Please, before anyone goes and hammers hell out of Matt's server, take a Statistics 101 course, or get Quartermass to advise you.
posted by five fresh fish at 9:09 AM on July 16, 2004

fff, you can't get the top X posters by using statistics. besides, there are 17534 users right now and I'm sure the server gets quite a few hits in a day. I don't know what kind of load that would put on the server. I'm not suggesting I blast his server all at once in the middle of the day. I could set it to do one query per second for 5 hours in the middle of the night or something. Of course, Orange Swan appears to have some information about top posters and such, but so far won't share where she's getting it from. When Matt checks in, he may well offer up some numbers of his own.
posted by PigAlien at 9:22 AM on July 16, 2004

but so far won't share where she's getting it from.

please let me rephrase "won't share" to "has not shared"
posted by PigAlien at 9:26 AM on July 16, 2004

PigAlien, I believe orangeswan got eir statistics from Dan Hersam's MetaFilter Contribution Index.
posted by gleuschk at 9:31 AM on July 16, 2004

I apologize, Orange Swan, because that sounded like I was implying you were intentionally not sharing, when perhaps you hadn't seen my earlier request, had forgotten to answer or thought the answer was obvious.
posted by PigAlien at 9:31 AM on July 16, 2004

1. Having taken (and passed! :D) an entry-level-for-engineers college stats course, I can say with some authority that Quartermass and FFF are correct--one does not need to examine the entire possible data set in order to have statistically significant results. And, being 95% sure about his stats is quite normal, as that's how statistics work, although without knowing this it would kinda sound like an arbitrary percentage.

2. I have a personal interest in this topic, as when I signed up almost 2 years ago, this guy had already grabbed my preferred username a year and some months earlier. And to date he still hasn't posted a darn thing! :'(

Now, I'm barely more active than your run-of-the-mill lurker (despite occasional periods of activity in #mefi) but it still irks me that someone registered the nick and may have very well left years ago, or at the very least is doing nothing more with the name than, as dg points out, using it to keep tabs on what's new since his last visit. Whereas I'd really like to have the name for myself. *whine*

So, for what miniscule amount it's worth, I'd support some kind of trimming-of-the-hedges, so to speak, if it could be done fairly. I wonder if there are any others like me who also have their eyes on unused account names?
posted by cyrusdogstar at 9:36 AM on July 16, 2004

Thanks, gleuschk. I looked through the 'etc' links on the menu and the 'about' pages for any sort of information. I also looked on the Metafilter Wiki. I try to do my research before reinventing the wheel.

My question is this, how are those statistics kept up to date? Has Matt given Dan special access to the Mefi database? The figures seem to be real-time.
posted by PigAlien at 9:37 AM on July 16, 2004

I was just about to post an explanation when on preview Pig Alien had figured it out. I did provide the link in my earlier, explanatory comment. No conspiracy and no worries, PigA!

Oh, and the figures aren't quite real time. If you click on individual users you may find they are slightly out of date, clicking on a user refreshes the data for that user.

Don't know how often the entire page is refreshed.
posted by orange swan at 9:43 AM on July 16, 2004

I swear I kept refreshing this thread to see if you had replied, Orange Swan, and lo and behold I completely missed it! Doing, slap me on the forehead! I probably will stop working on my script, since it would only capture a moment in time anyway and the big questions are answered at Dan's site. I'd still like to see some kind of ranking by user, though. For instance, I have a .77 contribution rate, but where do I rank overall for number of posts? In the last week, in the last month, in the last year?
posted by PigAlien at 9:54 AM on July 16, 2004

(slowly starting to wish I didn't do this. . .)

I wonder how many times Matt has thought this.
posted by Skot at 10:21 AM on July 16, 2004

I'd support some kind of trimming-of-the-hedges, so to speak

And then we could get some merkins, too!
posted by five fresh fish at 10:46 AM on July 16, 2004

Well, I finished my script (wasn't too hard), but I don't know if I'll bother to run it. Perhaps at midnight tonight. The figures will be out of date instantly anyway. Perhaps Matt could create a direct interface to the MEFI database so we could all write our own apps? HA! Just wanted to see your reaction.
posted by PigAlien at 11:03 AM on July 16, 2004

quonsar: I just had a brilliant idea! Why don't we delete the inactive users, and then use the freed-up resources to allow more new members... what!?
posted by reklaw at 1:57 PM on July 16, 2004

Don't let me go all quonsar on your reklaw...
posted by wendell at 2:42 PM on July 16, 2004

Interesting to me is that user 6000 joined on April 12, 2001.

While user 11,000 joined less than 5 months later, on September 5, 2001.

Yet there's quite a "user number mojo" differential between 6K and 11K.

Feh.
posted by scarabic at 9:52 PM on July 16, 2004

Even more interesting would be comments word count total.
posted by snarfodox at 10:15 PM on July 16, 2004

Nope.
posted by scarabic at 2:23 PM on July 17, 2004

PigAlien. did you run your script?
posted by Quartermass at 10:09 AM on July 18, 2004

I did, Quartermass, thanks for asking. I was happy to see that I'm approximately 350 in terms of total posts and in the top 2% of posters. 2000 posts would get me into the top 36! I've got a lot of work to do...

I'd post the results somewhere if anyone wanted them, but where?
posted by PigAlien at 11:46 AM on July 18, 2004

I've also never heard of several of the top 100 most frequent posters, such as....

Whippersnapper! Get off my lawn!
posted by norm at 9:46 AM on July 19, 2004