Fun with statistics: I have always been curious to the percentage of "ghost accounts" on MeFi, and so today I decided to find out! I took a random sample of 263 (95% +/- 6) user accounts, and discovered that 61.2% of registered users have never commented or posted. Of the 38.8% who have posted, 45.1% have posted less than 10 comments, 42.2% have posted between 10 and 99 comments, and 12.7% have made more than 100 comments.
Start trimming, or keep them for numerical superiority? After all, there are advertisers investing here. Possible that the random sample could be inaccurate?
This ought to be fun.

No offense, Quartermass, but I'm waiting for more information on your work that will lead me to believe that this is anything but hand-waving. I don't remember much of my Math degree, but I remember me some...
See, your first mistake was using the word "statistics" -- that single word will cause many people -- like ME -- to instantly go into coronary arrest.

(shudder)
No offence taken. Obviously, I kept this pretty loose, and it is a far cry from hard science. I was just trying to get a rough guess as to what kind of proportion "ghost accounts" were to active accounts.

All I did was get the sample size from this sample-size calculator, and then figured out how much plus or minus I would be happy with (I arbitrarily chose 6, but I am sticking by that number). I plugged in the MeFi membership total, and it gave me my sample size.

So for this sample, I can be 95% confident that 61.2% of the users have never used their accounts, plus or minus 6%, so the actual number is somewhere between 55% & 67% - close enough for me.

The rest was simple. I went to random.org plugged in my perameters (random numbers between 1 - 17347, with 263 results). I then plugged in the random user numbers, and used a spread sheet to mark if they had ever used their account or not. For those that did, I noted how many comments they made, and marked them accordingly.

After all was said and done, I plugged everything into SPSS, and it gave me some frequency charts with percentages.

All and all, it was an hour well spent!
Just because they have not posted, doesn't mean they have not "used" their accounts.
So what?
Right. What I meant to say was that they never used their accounts to post a link or comment.

(slowly starting to wish I didn't do this. . .)

Like I said above, the only reason I did this was to satisfy some curiosity, and thus I was aiming only at generalities, and not "hard" scientific fact. I only posted it because I thought others might be curious as well.

This all started with a MeTa thread from a while back where you looked at the users on both sides of you to see who your neighbors were, and so many had "ghosts." Plus, I always wondered, with 17000 members, why you only see such a small percentage. I don't know. . . .
(slowly starting to wish I didn't do this. . .)

Heh. That's what I meant when I said 'this ought to be fun'. Snarky bastards 'round here.

So, if the numbers are anything like correct, something like 933 people total have made more than 100 comments each (at which point they might be considered, perhaps, substantive contributors to the community (I realize this characterization could be endlessly argued -- it's just shorthand, mmkay?)).

Lurkers and nonmembers are a whole other thing, of course, but 'less than 1000' seems to me to be a pretty good guesstimate of how many usernames I would recognize as having commented before when I saw them in a thread...[/seat of pants]
WOoooOOooo... First talk post. Does this skew the numbers?
Damn.
Damn is right. How difficult would it be to tally the entire user database?
Just because they have not posted, doesn't mean they have not "used" their accounts.

Is there something I'm missing or is the difference between a lurker with an account and a lurker without an account the width of one split hair?
As a political philosopher and over-poster, I'd like to point out that the "skew you!" factor is operative

I'm the 80-year-old who takes three 8-year-olds to see "Shrek 2" and whose average age is 26.
biffa, I believe that there are many people who use their accounts for the extra features, such as "x new comments since your last visit" etc, rather than to actually participate.

But, yeah, perhaps I am splitting hairs a bit when defining "active".
I suspect there's more lurkers with accounts than you might think - e.g. in this thread, user 3133 makes his first ever comment. Lurking for 3+ years?
Quartermass: Damn the naysayers! Statistics are fun and unless you really get what its about, it can seem like bullshit. Like how *statistically*, you get more accurate results with this type of analysis than you would by polling the entire user-base. Interesting stuff.
Well, Matt could give us the exact statistics very easily.

Matt... Hello? [echo...]

On the other hand, another way to get an almost exact number is to write a script (perl, anyone?) to automatically pull each user page and scrape the statistics into a database table where you could run queries on these numbers.

Do I have volunteers?
99 - or roughly 5.7% of MeFites - have made more than 1,000 comments.

Just because I wanted to throw some numbers around too.
Actually, I'm Demo's brother.

Huh? I thought you'd been banned ...
Hi, Orange Swan, how do you know that? I'm not doubting it, I'm merely asking if that is listed somewhere or if you ran some script of your own or something.

I'm currently writing a perl script to pull every user page and tally the comments/threads.
Oh, Quartermass, about 5% 'regular' posters sounds about right to me; maybe even a little high? PigAlien, sounds good! Will you be posting the stats?
Yeah, and I'm enjoying the exercise, but if Matt or someone else who has direct access could save me the trouble, I'd appreciate it :)
*waits for the inevitable "think of the server resources we could save if we reused those accounts!"*
PigAlien, maybe you could schedule that script for some time when the server is relatively idle -- doing it during working hours (North America time) might topple an already overtaxed box.

(on preview: especially since it's already so overloaded by all those ghost accounts!)
I was gonna say it, quonsar, but I was afraid that someone might take me seriously.
Hi, Orange Swan, how do you know that? I'm not doubting it, I'm merely asking if that is listed somewhere or if you ran some script of your own or something.

Um, well, I went to the MeFi stats page and looked at the list of 100 posters with the most comments. 99 of us had posted over a 1,000 comments, so I then just figured out what percentage of the total membership 99 represents.

I'm definitely not up to writing scripts, and I also don't understand statistics, so I just came up with a basic but hopefully accurate one.

I'd like to know what total percentage of comments those top 100 posters have contributed.
gleuschk, yes, I think that is a valid point. metafilter gets overtaxed all the time. It would be easier if Matt would notice this thread (I'm sure he's busy but will eventually) then he could probably just shoot the numbers off himself and save me the trouble. I'm kind of indulging myself for the programming exercise.
Between it's inception and January 19, 2004, 432 people made posts to AskMe. oissubke made 22 of those posts, and grumblebee made 15. These were the two most prolific posts during that time frame. 21 people with user numbers under 200 posted questions. but only 16 people with user numbers between 17000 and 17350 did so.
It's all just me and my many multiple personalities.
...and, when I eat a large fried catfish, Vietnamese style, the night before - I become Ethereal Bligh.
Wow. I have a lot less comments than I thought I would. Not even 400 and I've been a member since the Kaycee thing. Is my account in danger of being reprocessed to save on server resources? Should I write a lot more?

Do you guys even know who I am? :'(
Did you guys just hear something?
I don't have a clue who you are, ODiV. But then I've also never heard of several of the top 100 most frequent posters, such as....

These people are sharks, I'm telling you.
I just had a bout of near-crippling nostalgia. I miss those folks.
Hi Orange Swan, where do you get these statistics? Is there a statistics page I'm missing?
You do not need to sample the entire database to derive accurate statistics.

Please, before anyone goes and hammers hell out of Matt's server, take a Statistics 101 course, or get Quartermass to advise you.
posted by five fresh fish at 9:09 AM on July 16, 2004

fff, you can't get the top X posters by using statistics. besides, there are 17534 users right now and I'm sure the server gets quite a few hits in a day. I don't know what kind of load that would put on the server. I'm not suggesting I blast his server all at once in the middle of the day. I could set it to do one query per second for 5 hours in the middle of the night or something. Of course, Orange Swan appears to have some information about top posters and such, but so far won't share where she's getting it from. When Matt checks in, he may well offer up some numbers of his own.
but so far won't share where she's getting it from.

please let me rephrase "won't share" to "has not shared"
PigAlien, I believe orangeswan got eir statistics from Dan Hersam's MetaFilter Contribution Index.
posted by gleuschk at 9:31 AM on July 16, 2004

I apologize, Orange Swan, because that sounded like I was implying you were intentionally not sharing, when perhaps you hadn't seen my earlier request, had forgotten to answer or thought the answer was obvious.
1. Having taken (and passed! :D) an entry-level-for-engineers college stats course, I can say with some authority that Quartermass and FFF are correct--one does not need to examine the entire possible data set in order to have statistically significant results. And, being 95% sure about his stats is quite normal, as that's how statistics work, although without knowing this it would kinda sound like an arbitrary percentage.

2. I have a personal interest in this topic, as when I signed up almost 2 years ago, this guy had already grabbed my preferred username a year and some months earlier. And to date he still hasn't posted a darn thing! :'(

Now, I'm barely more active than your run-of-the-mill lurker (despite occasional periods of activity in #mefi) but it still irks me that someone registered the nick and may have very well left years ago, or at the very least is doing nothing more with the name than, as dg points out, using it to keep tabs on what's new since his last visit. Whereas I'd really like to have the name for myself. *whine*

So, for what miniscule amount it's worth, I'd support some kind of trimming-of-the-hedges, so to speak, if it could be done fairly. I wonder if there are any others like me who also have their eyes on unused account names?
Thanks, gleuschk. I looked through the 'etc' links on the menu and the 'about' pages for any sort of information. I also looked on the Metafilter Wiki. I try to do my research before reinventing the wheel.

My question is this, how are those statistics kept up to date? Has Matt given Dan special access to the Mefi database? The figures seem to be real-time.
posted by PigAlien at 9:37 AM on July 16, 2004

Oh, and the figures aren't quite real time. If you click on individual users you may find they are slightly out of date, clicking on a user refreshes the data for that user.

Don't know how often the entire page is refreshed.
I swear I kept refreshing this thread to see if you had replied, Orange Swan, and lo and behold I completely missed it! Doing, slap me on the forehead! I probably will stop working on my script, since it would only capture a moment in time anyway and the big questions are answered at Dan's site. I'd still like to see some kind of ranking by user, though. For instance, I have a .77 contribution rate, but where do I rank overall for number of posts? In the last week, in the last month, in the last year?
(slowly starting to wish I didn't do this. . .)

I wonder how many times Matt has thought this.
I'd support some kind of trimming-of-the-hedges, so to speak

And then we could get some merkins, too!
posted by five fresh fish at 10:46 AM on July 16, 2004

Well, I finished my script (wasn't too hard), but I don't know if I'll bother to run it. Perhaps at midnight tonight. The figures will be out of date instantly anyway. Perhaps Matt could create a direct interface to the MEFI database so we could all write our own apps? HA! Just wanted to see your reaction.
quonsar: I just had a brilliant idea! Why don't we delete the inactive users, and then use the freed-up resources to allow more new members... what!?
Don't let me go all quonsar on your reklaw...
Interesting to me is that user 6000 joined on April 12, 2001.

While user 11,000 joined less than 5 months later, on September 5, 2001.

Yet there's quite a "user number mojo" differential between 6K and 11K.

Feh.
Even more interesting would be comments word count total.
posted by snarfodox at 10:15 PM on July 16, 2004

Nope.
PigAlien. did you run your script?
I did, Quartermass, thanks for asking. I was happy to see that I'm approximately 350 in terms of total posts and in the top 2% of posters. 2000 posts would get me into the top 36! I've got a lot of work to do...

I'd post the results somewhere if anyone wanted them, but where?
I've also never heard of several of the top 100 most frequent posters, such as....

Whippersnapper! Get off my lawn!
