Do we have a minute to spare? September 26, 2009 3:57 PM   Subscribe

Lukas Mathis is asking people to 'rate the quality of some comments from popular websites'

The majority of them are from recent metafilter threads — I've only seen a bit of youtubery.
posted by blasdelf to MetaFilter-Related at 3:57 PM (126 comments total) 1 user marked this as a favorite

One comment is just "Wtf" - are these being pulled by some sort of script? I assume no one's being asked if they want to participate in this project, whatever it is.
posted by Solon and Thanks at 3:59 PM on September 26, 2009 [1 favorite]


Lukas Mathis is running the risk of receiving a mailbox full of cease and desist letters.
posted by netbros at 4:03 PM on September 26, 2009 [1 favorite]


Luke Mathis is Jack's raging bile duct.
posted by Effigy2000 at 4:19 PM on September 26, 2009


This project's not implemented correctly for MeFi comments - by pulling them as plain text and stripping the formatting, you lose the italic quote indicators, making what would otherwise be a straightforward comment look like bizarre schizophrenic ranting.
posted by Optimus Chyme at 4:20 PM on September 26, 2009 [13 favorites]


How do I opt out?
posted by mr_crash_davis mark II: Jazz Odyssey at 4:20 PM on September 26, 2009


I have a comment, but I don't think he'd like hearing it.
posted by kittens for breakfast at 4:25 PM on September 26, 2009 [1 favorite]


Optimus Chyme: "4This project's not implemented correctly for MeFi comments - by pulling them as plain text and stripping the formatting, you lose the italic quote indicators, making what would otherwise be a straightforward comment look like bizarre schizophrenic ranting."

Not to mention lack of context.

Or implied consent.
posted by iamkimiam at 4:29 PM on September 26, 2009 [2 favorites]


The good thing is that once he gets enough data, he can tell us all how we rate as worthless / valuable people.
posted by Meatbomb at 4:29 PM on September 26, 2009


mr_crash_davis mark II: Jazz Odyssey: "How do I opt out?"

How do I opt in to this idea of using other peoples' work?

You know, so I can make sure that I don't.
posted by Science! at 4:32 PM on September 26, 2009


I also checked a bunch and they were all from MetaFilter.
posted by Science! at 4:35 PM on September 26, 2009


A bunch in this case means twelve.
posted by Science! at 4:37 PM on September 26, 2009


You know, I have a reputation for being grumpy about any use of the site that's not explicitly approved by Matt, but I have a real hard time thinking that this is unethical or shady. I don't know how much data he'll collect or what the possible use could be, but I can't really see why y'all are objecting so strongly. Educate me.
posted by Optimus Chyme at 4:40 PM on September 26, 2009


...and most were from that NYTimes science/religion thread, which meant I have no opportunity to stumble across one of my own comments and mark it as "useless, stupid, crappy".
posted by julen at 4:41 PM on September 26, 2009


You know, I have a reputation for being grumpy about any use of the site that's not explicitly approved by Matt, but I have a real hard time thinking that this is unethical or shady. I don't know how much data he'll collect or what the possible use could be

And, well, there you go.
posted by kittens for breakfast at 4:43 PM on September 26, 2009


I clicked through several and got nothing but MeFi comments. What the hell?
posted by Pope Guilty at 4:47 PM on September 26, 2009


If this comment shows up on that site, I'm going to sue; Also, if it doesn't, I will sue, because it is an awesome comment.
posted by Astro Zombie at 4:54 PM on September 26, 2009 [1 favorite]


Just as an aside, clicking on the links to his site and then clicking back here really screws up my Greasemonkey script that tells me someone made a comment in this thread, how many comments they made, etc. Example:

Other [2/2]: «≡·Other [2/4]: «≡»Other [4/4]: «≡·

It's next to every comment in one form or another, but not if I reload the page by clicking on the main MeTa site and then the thread.

So if nothing else, he's messing with my Greasemonkey and I do not approve.
posted by Marie Mon Dieu at 4:57 PM on September 26, 2009


When you get tired of reading the metafilter comments out of context, head over to the youtube comment markov generator and then have a stiff drink.
posted by Rhomboid at 4:59 PM on September 26, 2009 [3 favorites]


And, well, there you go.

I mean in the sense that, okay, now he knows that 60% of respondents found MeFi comment 541,181 to be "good"; BFD.
posted by Optimus Chyme at 5:01 PM on September 26, 2009


Yeah, what gives?
posted by joe lisboa at 5:02 PM on September 26, 2009


neat. i wonder if matt knows about it.
posted by tehloki at 5:04 PM on September 26, 2009


Rhomboid: That is a horrible creation, and you are a bad person for bringing it to my attention.
posted by The Great Big Mulp at 5:15 PM on September 26, 2009


Optimus Chyme, it's not so much the "What does he find out about it" that rankles, it's the "Why don't I spider the comments of website X and reproduce them with no attribution".

And it's only a mild rankle, I'm just a wee bit cranky today.
posted by mr_crash_davis mark II: Jazz Odyssey at 5:27 PM on September 26, 2009


Well, aside from the fact that our words (i.e. speech) are being taken without our permission, they also have been re-purposed and re-contextualized for some experiment that we unwittingly are participating in. What's next?

Basically, no consent was obtained (implied or otherwise), and no anonymity was granted for the authors involved. If this is trying to be scientific in any way, it is definitely one of those studies involving deception, and, as far as I can tell, it's certainly not endorsed by an academic institution or an IRB. In my opinion, that makes it closer to the 'hey, let's scrape this website and drive traffic over here' camp. God knows why. But I think the problems with that are pretty clear.
posted by iamkimiam at 5:29 PM on September 26, 2009 [3 favorites]


I ran through a bunch and it started out mefi-ish, but then devolved into something more about cocks, then it got super hideously racist. YMMV, but a lot of it was really NSFW if that matters to you. I also memailed Lukas - has anyone tried regular email or his twitter?
posted by donnagirl at 6:00 PM on September 26, 2009


So, nobody got treated to a bunch of racist comments like I did? And it was like OC said, hard to tell if they had put part of what they were replying to in their comment or not.
posted by soelo at 6:02 PM on September 26, 2009


Context is key. You could just take random digits from phone numbers in the yellow pages and ask people to rate them, and it would be just as meaningless. Also, how do the royalties work for this?
posted by Elmore at 6:08 PM on September 26, 2009


Ha! I just checked a few and the majority of them were from Metafilter. I know this because they all had things like "Thanks cashman", "thanks twoleftfeet", "thanks flapjax". It's nice to be polite.
posted by Elmore at 6:11 PM on September 26, 2009


Please rate the usefulness of some User Friendly strips.
posted by EatTheWeek at 6:35 PM on September 26, 2009 [2 favorites]


Metafilter: it started out mefi-ish, but then devolved into something more about cocks
posted by mr_crash_davis mark II: Jazz Odyssey at 6:39 PM on September 26, 2009 [1 favorite]


Ha! I just checked a few and the majority of them were from Metafilter. I know this because they all had things like "Thanks cashman"

Guys, guys, this experimental site is not that bad, lets give it a chance, huh?
posted by cashman at 6:39 PM on September 26, 2009 [3 favorites]


I don't know how much data he'll collect or what the possible use could be, but I can't really see why y'all are objecting so strongly.

I can't figure out the concerns here, either, especially given MeFi members' generally radical disdain for copyright. Here's the last thread I remember on the subject, in which comments such as this or this receive a ton of favorites and agreement throughout the thread. Seems like the best summation of the matter in this case and elsewhere, as posted later there, is:

Information wants to be free, unless it's mine!

posted by game warden to the events rhino
posted by msbrauer at 6:43 PM on September 26, 2009 [4 favorites]


I just see stuff like this:

"u guys r stupid obamas running this country diwn.i say yes to immature comments and yes to funny videos.u got pwned obama"

Went through about 10 of them and it was all like that.

Like OC, not really feeling the outrage, it just looks like someone's personal project. I get the objection in the abstract sense, but really, it's just comments on a public website. This text is also being served by google's index.
posted by cj_ at 6:44 PM on September 26, 2009


...making what would otherwise be a straightforward comment look like bizarre schizophrenic ranting.

Wait, you're saying these DON'T look like actual metafilter comments?
posted by Dormant Gorilla at 6:44 PM on September 26, 2009 [1 favorite]


Folks, a thousand link farms hosted in shadowy Balkan basements are currently using our comments to sell V1@GR@. Let's not get our naturals in a twist because someone who isn't the scum of the Earth is joining them in using the comments (which we post publicly, of course) without our permission.
posted by sonic meat machine at 6:50 PM on September 26, 2009 [1 favorite]


I don't love the re-use of MeFi comments (and posts, which seem especially bizarre stripped of their links) but I don't hate it in particular.

On the other hand, I don't quite understand the point of the project. Comments without context don't tend to make a lot of sense at the best of times. And when YouTube is one of your sources, it ain't the best of times.

Plus, it showed me "LOL" about 5 times in a row.

I did not LOL.
posted by jacquilynne at 6:51 PM on September 26, 2009 [1 favorite]


I get the sense that it's for a personal project. I could only speculate on what the data would be used for, but it doesn't seem sketchy (like the guy who just reposted entire threads wholesale -- along with some ads, of course).

Perhaps he's trying to find out if there's a way to quantify comment quality between various sources and then look for patterns between highest and lowest rated stuff? That could be useful for a plugin to filter stupid comments. I know there was a project to do something like this called StupidFilter.

In any case, judging by his blog, it's probably something geeky like that, which I approve of. It's not presented as something meant to be useful on its own, nor are the ads.
posted by cj_ at 7:24 PM on September 26, 2009


*nor are there ads
posted by cj_ at 7:25 PM on September 26, 2009


we don't understand, ergo we fear

it's late, either go to bed or wrap tinfoil around your head!
posted by HuronBob at 7:36 PM on September 26, 2009


no one cares about redneck country music kanye still sings good and thats all that matters and i bet u would't say anythin wat u said to his face so shut the hell up
posted by drjimmy11 at 7:37 PM on September 26, 2009 [2 favorites]


Metafilter: bizarre schizophrenic ranting

UNATCO CONTROLS THE AMBROSIA got a laugh out of me from one of the Markov Generator linked above, as well.
posted by Askiba at 7:56 PM on September 26, 2009


Metafilter: "bizarre schizophrenic ranting"

Elmore writes "Context is key."

Sometimes. Plenty of comments like scarabic's or ikkyu's infamous contributions don't need any kind of context to be awesome standouts.

msbrauer writes "I can't figure out the concerns here, either, especially given MeFi members' generally radical disdain for copyright."

Note the difference between copyright infringement outrage and plagiarism outrage. It'd be buck simple to attribute these comments via link back and yet they chose not to.
posted by Mitheral at 7:57 PM on September 26, 2009


TIME FOR SOME COMMENTS
posted by electroboy at 8:00 PM on September 26, 2009 [1 favorite]


As others have (somewhat) said, it does seem like futile wankery to ask people to give an opinion about comments without context. I looked at a few and stopped; it was a waste of time in its current iteration (unless it was a social experiment, which...well, one can only hope).
posted by Red Loop at 8:15 PM on September 26, 2009

msbrauer writes "I can't figure out the concerns here, either, especially given MeFi members' generally radical disdain for copyright."
Note the difference between copyright infringement outrage and plagiarism outrage. It'd be buck simple to attribute these comments via link back and yet they chose not to.
It seems to me that most of the objections in this thread were about the fact that he did not obtain the consent of the authors, not about plagiarism. The same objections can obviously be raised about song piracy, for example, so I frankly don't see what's incorrect about what msbrauer pointed out.

It also doesn't seem to me that "plagiarism" is necessarily an appropriate term. He's not trying to pass these off as his own; he clearly states that they're random comments from popular websites. That he does not specifically give the names of the authors does not (to me) imply plagiarism.

If you are reading this comment on lkmc.ch, I command you to click on the happy face. Did you hear me? I command you. The word "command" in that last sentence is supposed to be italicized (though you won't see it as such), so you know I'm serious.
posted by Flunkie at 8:17 PM on September 26, 2009 [2 favorites]


Wait, you mean other people can read whatever I type in this here thing? Oh crap.
posted by joe lisboa at 8:44 PM on September 26, 2009


useless
stupid [x]
crappy
awesome
interesting
helpful
posted by orme at 9:08 PM on September 26, 2009


Only the third one I looked at!

The way to best post on Metafilter is to...

posted by orme at 9:10 PM on September 26, 2009


the funny thing is that more people have probably commented in this thread than will ever actually vote on the comments

if you give a fuck that people can randomly copy and paste your comments and then rate them with no consideration of context or identity, on a site with no credibility or real meaning, then you probably shouldn't be posting here or anywhere else, period

he seems like a loser, who gives a shit what he and his friends think?
posted by pyramid termite at 9:27 PM on September 26, 2009


If this really offends you, I suppose you can go throw some noise into his data. Personally, I'm having a hard time getting too worked up about it.
posted by ryanrs at 9:32 PM on September 26, 2009


and if it's random, why do i keep getting the same 5 or 6 comments? - i can actually taste the suck in my mouth as i look at this ...
posted by pyramid termite at 9:33 PM on September 26, 2009


The comments are unattributed, so it's not like it's spawning google history on your username. It looks to me like he's scraping a few sites. It's probably just a test run of a code base, if I were guessing. Doesn't seem like it's much to get upset about.
posted by dejah420 at 9:46 PM on September 26, 2009


I laughed when I ran across this one:

Joe Beese, seriously, find some content that's actually worthy of the MeFi front page. You're two out of three for ridiculously thin, bullshit posts. HP LaserJet may actually make this thread worthwhile, but seriously, try harder.

I have no idea what that last line means. I just pictured someone printing out the entire thread in question, and then: "Oh, now that it's on paper this is kind of okay."

HP LaserJet to the rescue!
posted by Pater Aletheias at 9:50 PM on September 26, 2009 [7 favorites]


Somebody up to writing a little script to automatically rate everything awesome? Should mess up his stats enough to abandon the idea...
posted by DreamerFi at 9:52 PM on September 26, 2009


Raping of American 2.0 Heelllooo lil puppy. Typical loser liberals Human females = hot the neocons are PURE EVIL

I should put some coffee on. I'll be staying up even later than I thought.
posted by double block and bleed at 9:53 PM on September 26, 2009


This website takes information from other websites and posts it. Website does not ask for express permission from content creator before posting. Users then judge posted information. Some even refer said posts as "crappy." I, for one, would never have anything to do with such a diabolical and nefarious organization.
posted by ActingTheGoat at 9:53 PM on September 26, 2009 [2 favorites]


On second thought, I can't really tell the difference between the YT Markov comments and the real ones.

I like some of the stupid comments on YouTube, but then I remember that I have enough idiots around me IRL.
posted by double block and bleed at 10:06 PM on September 26, 2009


Pater Aletheias: Meet HP LaserJet P10006
posted by blasdelf at 10:09 PM on September 26, 2009 [1 favorite]


I already got my favorites, so fuck that guy and his rating system.
posted by Blazecock Pileon at 10:12 PM on September 26, 2009 [1 favorite]


Basically, no consent was obtained (implied or otherwise), and no anonymity was granted for the authors involved.k.oh and i have a question to how do i make my snail come out.Answer that on my question on how big it gets k.hope this helps tehehe
posted by applemeat at 10:15 PM on September 26, 2009


Say folks, it looks like he's doing work for a project like this one, if not that very one. Somehow I doubt that's much comfort to those of you that were riled by this, but personally I'd take it as flattery.
posted by tarheelcoxn at 10:19 PM on September 26, 2009


Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiuismod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris ynisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum won.
posted by painquale at 11:15 PM on September 26, 2009 [8 favorites]


Well, aside from the fact that our words (i.e. speech) are being taken without our permission, they also have been re-purposed and re-contextualized for some experiment that we unwittingly are participating in.

Sounds like sampling to me.
posted by philip-random at 11:25 PM on September 26, 2009 [1 favorite]


Out of context, these comments don't make much sense
posted by delmoi at 11:32 PM on September 26, 2009


Pater Aletheias: Meet HP LaserJet P10006

I got into an interesting discussion about epistemology and the nature of truth and thought with HP LaserJet.
posted by delmoi at 11:35 PM on September 26, 2009


Dear Metafilter,

Please allow me to introduce the concept of FAIR USE. In many situations, people do not need your permission to make use of your copyrighted works. This may or may not apply in this situation, or in your jurisdiction. Please consult a lawyer. Do not apply directly to forehead. Not for use by pregnant males. May cause incompetance. Ask your doctor if FAIR USE is right for you.
posted by blue_beetle at 11:40 PM on September 26, 2009 [2 favorites]


Metafilter: re-purposed and re-contextualized for some experiment that we unwittingly are participating in.

(seize that from a "random" site, will you?)
posted by Durn Bronzefist at 11:41 PM on September 26, 2009


It's a bit disconcerting and I wish he told us what he was doing first -- or asked how we might feel about it -- but upon reflection, it's at worse a bit like the running of the bulls... only in this case, we're trying to outrace the stupid. Only the worst stand to be humiliated.

Don't worry, MeFi. You don't have to run fast.... you just have to run faster than that bozo! (Gestures nebulously...)
posted by markkraft at 11:55 PM on September 26, 2009


Half suspect that the final results will be MeFi wanking material, with us being considered substantially more thoughtful than, say, YouTube.

(Hooray?!)
posted by markkraft at 11:59 PM on September 26, 2009


Dude, I'm pretty sure fair use only applies to downloading movies, tv episodes, and music without paying for them.
posted by !Jim at 11:59 PM on September 26, 2009 [6 favorites]


Treating Hamburglary as a crime which can be solved via reducing Hamburglar's hunger for hamburgers assumes it's a crime of hunger and that its victims therefore share culpability for having burgers.

I'd say "discuss," but really, there's nothing to discuss. Everyone wants hamburgers, but most have the restraint and respect for others to not steal them. Hamburglar's inability to control his desire for burgers is not the problem of his victims.


I got this. I'm not even going to bother to check that this is from metafilter, I just know.
posted by atrazine at 12:25 AM on September 27, 2009 [8 favorites]


I got this. I'm not even going to bother to check that this is from metafilter, I just know.

Cool. I know which thread that comment belongs in, having visited before it was added.

I guess this is the new recent activity.
posted by Durn Bronzefist at 1:23 AM on September 27, 2009


Please allow me to introduce the concept of FAIR USE

Dear Blue Beetle,

"Fair use" is a legal defence. I don't think anyone was proposing suing this guy for copyright infringement, so it's completely irrelevant to bring it up.

(or in other words, just because what he's done is probably legal, that doesn't mean we can't be narked by it)
posted by cillit bang at 2:58 AM on September 27, 2009 [5 favorites]


" youtube comment markov generator "

10 REM YOUTUBE COMMENT MARKOV GENERATOR IS FREE SOFTWARE
20 REM RELEASED UNDER THE GNU/GPL LICENSE.
30 A$ = "lol " : REM ROOT STRING
40 C = C + 1 : REM ITEM COUNTER
50 DIM B$(3)
60 B$(0) = "wut" : B$(1) = "sux" : B$(2) = "@1:23" : B$(3) = "kanye"
70 I = INT(RND(1)*4)
80 PRINT "YouTube Comment " + C + ": ";
90 PRINT A$ + B$(I): PRINT
100 GOTO 40

posted by majick at 6:16 AM on September 27, 2009 [4 favorites]


Aw man, I re-dimensioned my array in front of everyone.
posted by majick at 6:20 AM on September 27, 2009 [3 favorites]


You mean you aren't getting paid for your comments? The guy's sent me several nice checks already. I approve of his project!
posted by languagehat at 7:06 AM on September 27, 2009 [1 favorite]


This video iz gay. The part ware the guy rakked his balls on the stare way handle wuz LOL, but wut kind of fukkin moran trys to sk8board down 2 rales on hiz hands? Wut duz he think he iz, toni hauk? Dude shud hav crushshed hiz skul and got hiz darwen aword. Stupid ppl sux.

Just doing my part to poison the data
posted by double block and bleed at 9:31 AM on September 27, 2009 [1 favorite]


It's OK guys, I'm pretty sure these are just User Friendly dialogue.
posted by klangklangston at 9:55 AM on September 27, 2009


I'd like to point out that I hid the DNA of the Ring virus (from Rasen, the sucky-no-one-cares-about sequel to The Ring) in some of my comments. If you develop a cough after reading any of my comments, you're gonna die.
posted by qvantamon at 10:28 AM on September 27, 2009


Dude, I'm pretty sure fair use only applies to downloading movies, tv episodes, and music without paying for them.

Exactly. And copyright only applies to things created or owned by large corporations.
posted by Kid Charlemagne at 10:33 AM on September 27, 2009


what
posted by fixedgear at 10:37 AM on September 27, 2009


You know what really annoys normal people about nerd humor? Nerds have terrible timing and don't know when to stop running their giggle into the ground. A lot of non-nerdy people find that social "tone deafness" awkward and annoying to deal with, especially if they don't grasp the subtlety of repetitive humor or the nerdy undertones behind it.

10 REM YOUTUBE COMMENT MARKOV GENERATOR IS FREE SOFTWARE
20 REM RELEASED UNDER THE GNU/GPL LICENSE.
30 REM
40 REM VERSION 2.0.6 [C] CBM BASIX KREW
50 REM GREETZ TO: ESI, TRIAD, FLT, IKARI, S-X, S8, INC
60 REM RZR1911, DOD, RSI, ACID
70 P = 5 : REM NUMBER OF PREDICATES
80 A$ = "lol" : REM ROOT STRING
90 DIM B$(P)
100 FOR Y = 1 to P: READ B$(Y): NEXT
110 X = INT(RND(1)*3)+1 : REM COUNT OF PREDICATES TO USE
120 C = C + 1 : REM ITEM COUNTER
130 PRINT "YouTube Comment " + C + ": ";
140 PRINT A$;
150 FOR Y = 1 to X
160 I = INT(RND(1)*P+1)
170 PRINT " " + B$(I);
190 NEXT
180 PRINT
200 GOTO 110
1000 DATA "wut", "sux", "@1:23", "kanye", "omg"

posted by majick at 10:47 AM on September 27, 2009


HAI WHAT IS GOING ON? I'M SCARED AND CONFUSED AND ALONE AND IT IS DARK AND I THINK THERE ARE BEARS.

KTHX
posted by trip and a half at 11:29 AM on September 27, 2009


There actually are bears and you are alone with nowhere to run, and even if you had somewhere to run you're confused and it's dark, so... well, you know. Being scared is OK.

HTH
posted by rjs at 11:52 AM on September 27, 2009 [1 favorite]


/waves at HP Laserjet
posted by Pater Aletheias at 12:13 PM on September 27, 2009


"•••♣♣♠♠♠♣♣♥♥♥♣♠••••••♣♣♠♠♠♣♣♥♥♥♣♠••••
for EVERYBODY das a TRUE FAN of Lil Wayne check out my new joint Grind Hard I GUARANTEE U GONNA DIG IT. Punchlines and flow is CRAZY! Trust me on dis...thanks! Weezy went in on dis doe lol..
•••♣♣♠♠♠♣♣♥♥♥♣♠••••••♣♣♠♠♠♣♣♥♥♥♣♠••••"


It's pulling comments like this now, I looked at about twenty offerings and they don't appear to be MeFi-ish anymore.
posted by longsleeves at 12:20 PM on September 27, 2009


I looked at about twenty offerings and they don't appear to be MeFi-ish anymore.

So what happens if we keep quoting those other comments, and then the new comment shows up?
posted by niles at 12:55 PM on September 27, 2009


Stuff from today's Polanski thread is showing up now.
posted by CunningLinguist at 12:57 PM on September 27, 2009


Stuff from today's Polanski thread is showing up now.

That should make for some jarring out-of-context moments.
posted by longsleeves at 1:05 PM on September 27, 2009 [1 favorite]


Plus, it showed me "LOL" about 5 times in a row.

This. A better implementation would at least use a cookie to prevent the same person from rating the same comment more than once. That, too, could be circumvented, but the way it is now, it's just begging for someone to mess with the data. Unless they're trying to evaluate the test-retest reliability of their measure, or how people's evaluations of the same lame comment change over time.
posted by limeonaire at 1:14 PM on September 27, 2009


Dammit, you go off the grid for a weekend and...

I have my pitchfork and the tines are newly sharpened and I've infected each tine with active H1N1. Jesus, I hope I am not too late.
posted by jerseygirl at 3:48 PM on September 27, 2009


Still no posts from admins in this thread? Weird, I guess they don't mind or just aren't keeping up with metatalk lately.
posted by tehloki at 8:08 PM on September 27, 2009 [1 favorite]


If this comment shows up on that site, I'm going to sue; Also, if it doesn't, I will sue, because it is an awesome comment.

Your comment is OK.

This comment, on the other hand, is much better.
posted by krinklyfig at 12:42 AM on September 28, 2009


So what happens if we keep quoting those other comments, and then the new comment shows up?

Wait. What if someone else is just posting comments from somewhere else to here?
posted by krinklyfig at 12:44 AM on September 28, 2009


Hey, I'm Lukas Mathis. I'm sorry I haven't seen this thread earlier, or I would have answered sooner.

I want to answer some points you guys have made.

First, yes, I strip out all of the HTML, which often removes context. Also, some comments are replies to other comments, and that context is also lost. I hope that the quantity of comments that are being rated will offset this problem.

Second, this will only run for about three days. At the end of today, I will remove the site, and I will delete all of the comments and their ratings. I won't store any of the data, and I will only publish very generic results (along the lines of "comments containing the word 'shpadoinkle' are on average rated 1.2 points worse than the average rating"). I will not publish any data that would allow anyone to get any kind of data relevant to privacy.

I also want to apologize to everyone who feels I did something wrong. If you've contacted me and I haven't replied, it's because I did not receive your message. I want to point out that this is only a very temporary thing, and that I'm not making any money on it, and that I think the results may be interesting to you. The basic goal is to get one simple usability recommendation out of it.
posted by L_K_M at 2:59 AM on September 28, 2009


It's easier to ask forgiveness than permission, but asking permission tends to generate less ill will.
posted by Pope Guilty at 3:14 AM on September 28, 2009 [1 favorite]


@Pope Guilty: Frankly, it did not occur to me that people would object. In fact, I specifically picked Mefi and YouTube because I've seen them used in other mashups (with positive feedback in Mefi threads), so it never occurred to me that anyone would object. In hindsight, it's obvious to me that I screwed up. I should have been very clear about what I was doing (that it's a temporary three-day thing, and that I don't intend to keep or publish any of the comments or ratings), and I should have contacted Mefi and informed them about the project.

If I had expected anyone to take objection, I would have asked for permission.
posted by L_K_M at 3:25 AM on September 28, 2009


I think the results may be interesting to you

Ohhh... yeah, not so much.
posted by jerseygirl at 3:26 AM on September 28, 2009 [1 favorite]


If I had expected anyone to take objection, I would have asked for permission.

Well now you know people object. So the thing to do is take it down now.
posted by jerseygirl at 3:27 AM on September 28, 2009


Well now you know people object. So the thing to do is take it down now.

I was hoping people would not object if they understood what I was doing, but you're right. I've taking it down.
posted by L_K_M at 3:30 AM on September 28, 2009


You're welcome?
posted by gman at 4:38 AM on September 28, 2009


> Ohhh... yeah, not so much.

That's kind of a dickish thing to say. Was this guy's crime so egregious we've all just decided to tar and feather him no matter what? The guy comes here to explain and apologize and we just use the occasion to kick him in the nuts? Is it really completely unthinkable he might use the data for something interesting? Has anybody here suffered any actual damage? Jesus, people.
posted by languagehat at 6:39 AM on September 28, 2009 [5 favorites]


it did not occur to me that people would object

Me neither, and I'm the guy who thought Holden Karnofsky should have been drawn and quartered. I think most of you are making beanmountains out of beanhills.

Ohhh... yeah, not so much.
posted by jerseygirl at 3:26 AM on September 28


Especially you.
posted by Optimus Chyme at 6:45 AM on September 28, 2009


Didn't look like it was going to be "interesting" to the people who were already citing things like fair use, implied consent, out of context, etc.

Didn't seem like it warranted the upset even last night, but approaching it with a mod first or MeTa-ing it beforehand explaining what the goal is/was probably would have gone a long way to avoid the uproar and whatnot.

But even with LKM's explanation this morning, the data feels like it'd be skewed anyway because of the way the comments were pulled sans HTML and resulted in giant mishmashes of quoted text and replies. It was like asking "is this long paragraph of comment nonsense from MeFi better than LOLZ comments x5 from YouTube?"
posted by jerseygirl at 7:45 AM on September 28, 2009


I presume you had/have similar issues with this then, right? Are you going to demand it be "taken down now?"
posted by Optimus Chyme at 7:50 AM on September 28, 2009


Was there a MeTa thread about that where people were rankled?
posted by jerseygirl at 7:55 AM on September 28, 2009


Was there a MeTa thread about that where people were rankled?
posted by jerseygirl at 7:55 AM on September 28


Actually, everyone seemed to like it. i don't see why L_K_M's project is some evil master plan while the YouTube-MeFi mashup was enjoyed by all. Maybe you can tell me.
posted by Optimus Chyme at 8:06 AM on September 28, 2009


First of all, thanks to everyone who came to my defense. You can't believe how terrible it feels to wake up to something like this.

Now, for the points made by jerseygirl:

approaching it with a mod first or MeTa-ing it beforehand explaining what the goal is/was probably would have gone a long way to avoid the uproar and whatnot.

Yes, I should have done that. I decided not to say too much about the project because I felt that telling people what I intended to do would influence how they voted. I didn't post anything to mefi because it didn't occur to me that people would object to it, and also because having too many people from mefi voting on it might have influenced the results.

In hindsight, not saying anything about what I was doing and not asking mefi about it was a clear mistake.

But even with LKM's explanation this morning, the data feels like it'd be skewed anyway because of the way the comments were pulled sans HTML and resulted in giant mishmashes of quoted text and replies.

As far as I can tell from looking at the results, they seem plausible. I honestly doubt stripping the HTML caused a significant error; I think in most cases, the people who voted noticed if parts of a comment were quotes. I would have liked to let it run for another few hours to get some more comments, but I think the results are valid.

It was like asking "is this long paragraph of comment nonsense from MeFi better than LOLZ comments x5 from YouTube?"

Comments from the two sites are evaluated independently of each other. Since YouTube limits comments to 500 characters, not doing so would hurt the average quality of shorter comments, as you point out.
posted by L_K_M at 8:10 AM on September 28, 2009


So, wait, your problem is only whether or not anyone objects in MeTa? Because the objections were pretty silly—this is fair use; the idea that there would need to be an IRB hearing for this is ludicrous; the contextual issues would be study design problems, not moral problems—and taking those objections at face value, especially demanding that he stop what he's doing right now because of them, is a wild over-reaction.

Further, arguing that because there were objections he should stop is begging the question that these were valid objections. Since they're not, there's no need to stop what he's doing except to mollify a bunch of people who worked themselves into a tizzy over the possibility that this might be bad some point in the future rather than actually, factually bad now.
posted by klangklangston at 8:22 AM on September 28, 2009 [1 favorite]


the idea that there would need to be an IRB hearing for this is ludicrous

Yes, IRB certainly does not apply to collecting existing, public data without interaction with the creators. There is no issue of consent in the technical sense whatsoever, with respect to collecting the data. Permission maybe, but consent no.
posted by advil at 8:35 AM on September 28, 2009


Well now you know people object. So the thing to do is take it down now.
posted by jerseygirl


This person doesn't speak for all of us.
posted by haveanicesummer at 8:58 AM on September 28, 2009


jerseygirl, if you are going to speak for all of us, could you use a less snotty tone? Thanks.
posted by CunningLinguist at 9:00 AM on September 28, 2009


Or what haveanicesummer said.
posted by CunningLinguist at 9:01 AM on September 28, 2009


Noted. I didn't mean to come off as such a bitch, really. I dropped a line to LKM a little while ago. I completely apologize and flag myself and the comment for being a total asshole.
posted by jerseygirl at 9:03 AM on September 28, 2009


"My first computer was a Performa 450, my first programming language was HyperTalk, my first electric guitar was a cheap Peavey, my first videogame was a VCS 2600 and my current snowboard is from Lib Tech."

How Coupland-esque.
posted by mippy at 9:09 AM on September 28, 2009


I was only stating the IRB thing in the event that the poster was using this data for any sort of scientific study. The project at first seemed like a science/research project to me, now less so. Anyways, carry on. After reading many other people's thoughts (including the site creator's) I'm less worried and or bothered by the scraping of MeFi and of other people's comments. I guess this is harmless. There's just so many examples where people do nefarious stuff, I've become somewhat skeptical I realize.
posted by iamkimiam at 9:15 AM on September 28, 2009


Oh, and to respond to advil above...I think an IRB would determine that both implied consent (a page header or something) would be required (to inform the survey respondents about what their participation means and what it will be used for. If deception about the use is involved, this would need to be addressed as well), as well as some sort of privacy measure on the site (to protect the survey respondents and the comment authors). Many of the comments are sensitive or contentious in nature, and include the author's screennames. Since these comments are traceable, they would either need to be anonymized or permission might need to be obtained. They are being re-purposed and analyzed in a different context that their authors originally intended. They're being judged, out of context even.

I do realize that I'm taking the extreme CYA approach here, but if this was at all scientific research, then yeah, those are some of the steps that would probably need to be taken. But fortunately, this site doesn't seem like its trying to be any of that, so yay.
posted by iamkimiam at 9:24 AM on September 28, 2009


That was gracious, jerseygirl. Thanks.
posted by CunningLinguist at 9:54 AM on September 28, 2009


I've crunched the numbers and written the outline of the blog post. Here's what it will contain:

- Correlation of average rating and length
- Correlation of average rating and wrong punctuation (e.g. "comments containing '….' are on average rated 1.2 points worse than the overall average rating")
- Correlation of average rating and usage of CAPS LOCK
- Correlation of average rating and swearing
- Correlation of average rating and usage of certain words ("you", "consider")
- Correlation of average rating and average word length
- A short comparison of average word length and average quality between YouTube and MeFi

It will not contain any text from any comment, it will not contain any data from any individual comment (in fact, the comment text itself was the only thing I ever stored so I could get its length, average word length, and so on), and it will not contain any ratings other than average ratings over the groups of comments mentioned above (e.g. "MeFi comments between 9 and 16 letters were rated 1.92 on average").

I will delete all of the comment texts and ratings by the time I publish the blog post.

Does anyone object to any of this data being released?
posted by L_K_M at 10:05 AM on September 28, 2009


@jerseygirl: Thank you.
posted by L_K_M at 10:06 AM on September 28, 2009


Oh, interesting. I didn't see your earlier comment about publishing this data until now. Just curious, what field of research are you interested in? This topic could be approached from many different angles. There has already be a significant amount of published work already in Psycholinguistics about correlations between perception of intelligence and word length. Also, swearing and punctuation. I don't have time to find links right now, but if that's something you're interesting in looking into, I could offer some help in finding articles later.

One more suggestion, if I may. You might be able to kill two birds with one stone if you make up some controlled, sample comments for listeners to rate. This would get around the IRB, as well as compare apples-to-apples when it comes to comment content. For example, you could have 10 base-form comments, with variations on each question. Ex. One comment set could be:

I am all for this idea, but there are a lot of things to consider
Im' all for this idea, but there're a lot of things to consider
I'M ALL FOR THIS IDEA.
I'm all fucking for this idea.
I'm all for this idea, but there are a lot of things you should consider.
I'm in wholehearted support of this idea, but there are a lot of extraneous factors you should consider.
(A comment from youtube that means this)
(A comment from MeFi that means this).

Ideally, you'd have 9 other comment sets, and then you'd select one comment for each set randomly, to create let's say, 80 different conditions (8x10=80; is my method right? that seems like a lot of conditions), randomized and assigned anew to each page refresh. To cover all 8 testing variables, and 80 different conditions, you'd need a LOT of response data, but it's the internet and that can be done easily. Also, some of the 8 criteria could, and already are, mixed in other sentences. This could cut the size of your sentence sets, but also introduces some confounding variables. ie. are people rating this sentence low because there's swearing, or because the words are short, or both? Isolating these things is a bit tricky. That's why starting out with controlled stimuli is so important. But it can be done. Maybe consider measuring fewer correlations, but explore the easily measurable ones more thoroughly and holistically (for ex. swearing and caps lock are more salient, controllable and easily measurable than word/sentence length.*)

I'm happy to help you out if you want. Feel free to email me. Also, I hope I haven't offended you with my earlier snark and skepticism about your study. Please forgive me.

*Partly because swearing is a closed class of socially marked words, and caps lock is a binary measure. It'd be interesting to see swearing AND caps lock vs. just one of these vs. neither.
posted by iamkimiam at 11:01 AM on September 28, 2009


Just curious, what field of research are you interested in?

Well, the most basic question I'm trying to answer is simply this: When creating a commenting system where users can comment, does it make sense to limit the comment length, and if so, how long should we allow comments to be? This question occurred to me because whenever I'm working on a site where users can comment, inevitably somebody will request that comment length be restricted. However, I've never seen any evidence that this does any good, and when looking at existing websites, it seems to me that sites with comment length restrictions often have lower-quality comments.

All the other evaluations are simply side-effects. I thought it would be fun to see what else I could find out, given that I have the data.

There has already be a significant amount of published work already in Psycholinguistics about correlations between perception of intelligence and word length. Also, swearing and punctuation. I don't have time to find links right now, but if that's something you're interesting in looking into, I could offer some help in finding articles later.

I would very much be interested in this.

This could cut the size of your sentence sets, but also introduces some confounding variables. ie. are people rating this sentence low because there's swearing, or because the words are short, or both? Isolating these things is a bit tricky.

That's a great idea. I'd very much be interested in doing a follow-up project based on this idea.

Also, I hope I haven't offended you with my earlier snark and skepticism about your study.

Well, no. In fact, it's very helpful (and I'm reluctant to even call it a "study", since that implies some basic scientific rigor along the lines of a clinical trial or a controlled experiment, which I clearly don't have).

I only wish I had had this discussion before going ahead with my previous setup.
posted by L_K_M at 11:29 AM on September 28, 2009


I can see already that the end result of the study itself and the contretemps it triggered will be much better comments because

A. People will pause before hitting "post" to try to envision weather a comment would generate smiley-faces or frowny-faces or meh-faces, if held up for judgement.

B. Commentors will cram in enough context to be absolutely sure that thousands of years from now, human and/or extraterrestrial researchers will not be baffled by the comment, even if it is gleaned free of the original context as an isolated fragment of data from storage methods badly degraded by time and the elements, and will be able to assign their equivalent of smiley, frowny, or meh faces as they see fit.
posted by longsleeves at 2:34 PM on September 28, 2009


Hindsight being perfect and all, would this have been a candidate for Projects?

That's an honest qestion, I don't spend much time over on the, uh, whatever colour that is cause I can't see it.
posted by geckoinpdx at 5:39 PM on September 28, 2009


The blog post is now online, and I've deleted all the data:
http://ignco.de/188
posted by L_K_M at 1:27 AM on September 29, 2009


I was hoping people would not object if they understood what I was doing, but you're right. I've taking it down.
posted by L_K_M at 5:30 AM on September 28 [+] [!]

Noted. I didn't mean to come off as such a bitch, really. I dropped a line to LKM a little while ago. I completely apologize and flag myself and the comment for being a total asshole.
posted by jerseygirl at 11:03 AM on September 28 [+] [!]


God damn it, now there's no one in here to hate!

i don't see why L_K_M's project is some evil master plan while the YouTube-MeFi mashup was enjoyed by all.

I think it pretty much boiled down to the fact that the former seemed like maybe OUTSIDERS were being invited to JUDGE US while the latter pretty clearly made us all look like a bunch of smarty-pants.
posted by nanojath at 10:55 AM on September 29, 2009


« Older SLC Meetup   |   At least it's not another downfall parody Newer »

You are not logged in, either login or create an account to post comments