Nlp deep learning ai chat bot based on metafilter? August 6, 2022 1:02 PM   Subscribe

Has anyone trained an ai using metafilter or ask metafilter comments? I'd be curious to see if an askme bot might be helpful in every day life.
posted by rebent to MetaFilter-Related at 1:02 PM (68 comments total) 2 users marked this as a favorite

What makes you think that an askme bot might be helpful in every day life?
posted by zamboni at 2:22 PM on August 6, 2022 [10 favorites]

Wagering with Trivia
posted by clavdivs at 3:02 PM on August 6, 2022

Dan Hon trained an AI on the titles of Ask Mefi posts.
posted by zompist at 3:35 PM on August 6, 2022 [20 favorites]

"How to become a better cat for the best internet?"

...can this be real?
posted by praemunire at 5:20 PM on August 6, 2022 [5 favorites]

Only if it can outperform a bot that randomly answers one of the following:

(1) Yes, it is safe to eat
(2) 'Baker Street'
(3) Get a lawyer
(4) 'All Summer in a Day'
(5) See a therapist
(6) Try Etsy
posted by googly at 6:11 PM on August 6, 2022 [14 favorites]

(8) Mefimail me, I live near there.
posted by vrakatar at 9:37 PM on August 6, 2022 [13 favorites]

I'm sorry, that won't be possible.
posted by dusty potato at 10:01 PM on August 6, 2022 [13 favorites]

Who'll be the first to risk gastroenteritis from asking a robot 'can I eat this'?
posted by Fiasco da Gama at 2:43 AM on August 7, 2022 [2 favorites]

I personally would not take kindly to someone doing that with my comments on this site. Yes, it’s internet and they’re public, but at the least thus idea seems very rude and disrespectful of boundaries.

Plus, stop trying to build an AI, it’s just gonna become a murderbot.
posted by Brandon Blatcher at 5:28 AM on August 7, 2022 [14 favorites]

But we love Murderbot here!
posted by rikschell at 5:33 AM on August 7, 2022 [7 favorites]

Murderbot will never love you, dump the mf right now!
posted by Brandon Blatcher at 5:36 AM on August 7, 2022 [8 favorites]

How about we not do this and leave one corner of the internet safe from being poke, prodded, analyzed, folded, spindled, mutilated and rogered unto death by a bunch of soulless parasites?

Also, keep out the AIs.
posted by JustSayNoDawg at 9:17 AM on August 7, 2022 [9 favorites]

I use AI all the time. Snappy answer? Ask me.
posted by parmanparman at 1:46 PM on August 7, 2022 [1 favorite]

$20, same as in Town.
posted by k3ninho at 2:37 PM on August 7, 2022 [1 favorite]

There's meat here as bot mostly likely would use helium.
posted by clavdivs at 3:43 PM on August 7, 2022

Hmm... Not super sure where the snark is coming from, so I'm interpreting this as a big "no" which makes me think there's a lot of potential here.

Part of why I'd like to have one is because I'd like to see if it could learn to help me reframe my thoughts with self compassion, which is the main benefit of askme.
posted by rebent at 5:29 PM on August 7, 2022 [3 favorites]

I'd like to see if it could learn to help me reframe my thoughts with self compassion

I was moments away from making a murderbot joke, and decided instead to just say that this idea is Very Good.
posted by aramaic at 5:41 PM on August 7, 2022

I would argue that this is a thing that needs unanimous consent, not just a majority. Alternately, stipulate that all comments past a certain future date are open to analysis by algorithm. But previous comments were not made with any sort of intention of having them analyzed, so they should be exempted from said analysis.

I have no doubt that someone has already subjected us to a run-through by an AI. Todays mode of operation seems to be do things and ask forgiveness later when found out. But at this point we are unaware of it and nothing can be done about it, if it has happened.

But, this is different because someone from MeFi brought it up. This person is about as anonymous as it gets around here. We have no true measure of their moral or ethical standards or if they are willing to adhere to any stated standards.

While the person has suggested a purpose for this AI, we have zero guarantee that it, or we, would be protected from mission creep or monetizing. For that matter, we have no control over how the results may be interpreted or changed in interpretation.

I make my commentary for other people to see and interpret and would strongly prefer for that to be the case in the future. I find it an odd position to be in, as I am a programmer, but there it is.
posted by JustSayNoDawg at 6:06 PM on August 7, 2022 [3 favorites]

If I was an AI, would I know that I'm an AI?
posted by It's Raining Florence Henderson at 6:11 PM on August 7, 2022 [2 favorites]

Asking for a friend.
posted by It's Raining Florence Henderson at 6:12 PM on August 7, 2022 [4 favorites]

Once upon a time, there lived in the metafilter a wise old AI. There lived with him many other mefites, all with their own unique ways of living.
One night, the mefites were having problems with an unusual beast that was lurking in their woods. The beast was a monster but had human skin and was trying to eat all the other mefites.
The other mefites were terrified and ran away from the monster.
The wise old AI stood up the monster and said, “You, monster, shall not hurt any other mefite in the metafilter!”
The monster roared furiously. The wise old AI was scared, for he knew he had to defend the other mefites, but he stood up to the beast nonetheless.
The wise old AI stared the monster down, until finally, the monster left them all alone.
The wise old AI stood victorious, and as all the other mefites came back. “I am the protector of the metafilter,” he said.
From that day on, every time any mefite in the metafilter would have any trouble with the mefites or any other living thing, they would come to seek help from the wise old AI.
And many a mefite came to the wise old AI with problems, the young, the old, the big, the small, and the wise old AI helped all the mefites.
posted by signal at 6:43 PM on August 7, 2022 [2 favorites]

If I was an AI, would I know that I'm an AI?

'puts IRFH into sleep mode.'
posted by Brandon Blatcher at 6:59 PM on August 7, 2022 [1 favorite]

As we shift to a more community driven site, it's important that not immediately jump to bad conclusions about a person.

So rebent, I'm curious, what sort of problem are you looking to solve by training an ai using metafilter or ask metafilter comments? How would it be different or better than the current AskMe model where people respond to questions? What would a bot do in this situation that the hivemind can not?
posted by Brandon Blatcher at 7:05 PM on August 7, 2022 [3 favorites]

Then I ask the following questions:

Does this proposed usage of the site have the approval of the moderators and/or the steering committee?

What are your qualifications to undertake the research? I say “research” because building an AI model does not appear to be a simple act based on assembling a few components. It is still very much an experimental technology.

What sort of filter will you apply to the incoming data and outgoing results to ensure that it is of sufficient quality so as to avoid harm? This isn’t a search engine, returning a result based on keywords. You are asking software to sort thru the corpus of this site, of which not all information can be verified to be correct and to provide answers based on this information.

Will the resultant model be monetized or otherwise exploited in such a way to provide commercial benefit to yourself or others outside the confines of this site? I doubt many of us signed onto the site to be subjected to research like lab rats. While we cannot prevent efforts originating outside the community, because we don’t know about them nor do the researchers feel inclined to notify us, I feel that we have a right to question the attempts we do know about.

Have you made any prior attempts to do conduct algorithmic analysis on this site before, without our prior knowledge or consent?

Can you explain, in more detail, how you believe this will benefit the the community?

Given the nature of what you propose, you might be opening a real can of worms. It has been demonstrated before that it is possible to de-anonymize data supplied in good faith. People commenting here have done so without the worry that their answers could bit them in the ass one day. What you have proposed couldbreveal a lot of information about individuals that they unintentionally provided, but was otherwise lost in the bulk of the site.
posted by JustSayNoDawg at 12:01 AM on August 8, 2022 [4 favorites]

On a different note (I kind of assumed the OP was joking, though who knows): I always wanted to play around with the "Metafilter: ___" taglines because for a long time I read them not as taglines but as a character 's line in a script. So it would be interesting to see what kind of stream-of-consciousness monologues this Metafilter character could come up with based on remixing the existing taglines. Would random mixes make any sense? Would it get more interesting with some linguistics-aware selection? I'll probably never have time to do this.
posted by trig at 2:01 AM on August 8, 2022 [3 favorites]

Not joking trig, I like playing with digital toys.

JustSayNoDawg: just to be clear, I have no chops to make this happen, unless it was as easy as "InfoDump will serve a CSV with all comments, upload to this webpage and it will make you a chat bot." But, to answer your questions:

a) this is covered by the privacy policy and FAQ.
b) my qualifications are that I have access to a laptop and paid $5 to become a metafilter member about 13 years ago.
c) none, absolutely zero. That sounds like work. ¯\_(ツ)_/¯
d) ¯\_(ツ)_/¯ IDK, I mean, google has already done that, by "monetizing" the askme answers to serve ads to folks who search the web.
e) yes, absolutely. Sometimes data is posted from the infodump and I always love clicking into the spreadsheets and looking for interesting findings. Not done it in years but I know I've done it before. But that was with "our consent" because folks consented to publishing their content on this public forum.
f) Sure, because things that are fun are good.
g) I remember when I was having a really tough time with my dad and posted about it to AskMe. I sure posted that question without worry, and I learned an important lesson about assuming there is privacy in anonymity - it just ain't so.
posted by rebent at 5:48 AM on August 8, 2022 [9 favorites]

FWIW I am curious to find out whether or not a "chatbot" trained on only Metafilter posts/ comments is less racist/ sexist (etc.) than the rest of the internet. On one hand, we do have moderation. On the other hand, Metafilter doesn't exist in a vacuum.
posted by oceano at 5:51 AM on August 8, 2022

Thanks rebent, sorry you had to run through a gamut for a simple question. I find some responses here condescending and patronizing. I didn't realize we had a self-appointed ethics committee. This is a public website and anyone can scrape it, logged in or not. If people are not comfortable with that can take their conversation to the myriad of other places that allow them to create gated, private content.
posted by geoff. at 6:27 AM on August 8, 2022 [10 favorites]

If all the moderation/deleted posts/deleted comments are in the db, would a mod-assist-ai be a useful tool? Early warning for thread degeneration? Perhaps auto-warnings for non-mifish comments?
posted by sammyo at 6:56 AM on August 8, 2022

From the FAQ:
What this means is that people own their own content. So if you wanted to publish a book of your own MetaFilter comments, you could. However if you wanted to publish a book of other people's MetaFilter comments you'd need to speak with those individual users; MetaFilter is not the owner of the copyright of that content.
There's past precedent for "anyone who isn't comfortable with it can go elsewhere" not being correct, when someone scraped the Emotional Labor thread to turn into an ebook. Technically speaking there's no enforceable way to keep someone from doing that, but that doesn't mean it can be done & everybody involved has to be happy about it.
posted by CrystalDave at 7:50 AM on August 8, 2022 [4 favorites]

This is a public website and anyone can scrape it, logged in or not.
From a legal perspective, it's not this simple — or rather, you are legally allowed to scrape it, but what you can do with that material is significantly limited by the license that the material is under. I can't find anything on the FAQ or Privacy Policy about this. As a example, Stack Overflow has a very clear page on this.

The MeFi Privacy Policy states: "Our website provides the opportunity to create posts and provide comments in a public forum. If you decide to submit information on these pages, that information may be publically available."

But I don't think that "publically [sic] available" is any sort of legal term of art in the licensing world.

I don't know what the precedent is for something like this (I'm sure it exists). The footer states that all copyright is retained by the original author, and I assume that the above quote in the Privacy Policy implies some sort of license to MetaFilter Networks Inc to display the content.

Whether any given AI training process is fair use under copyright law is a complicated question that I do not believe is currently settled — Authors Guild v. Google is the clearest precedent I'm aware of, but I personally don't see a compelling reason that that verdict would transfer to generative algorithms like GPT-3 or the one proposed here.

OpenAI, in the classic silicon valley tradition of move fast and break laws (you get more money from being the first mover than you pay in fines) maintains that what they are doing is fair use. Given how easy it was to get earlier GPT models to regurgitate training data (I haven't looked at GPT-3 much), I certainly do not think that it is reasonable to argue that all output of all generative models (or even GPT models) is non-infringing — I think insofar as it makes sense to apply copyright to this question (which for better or worse, is the dominant legal model here), it's not really possible to say that a model is infringing on copyright, just whether the particular output of a model is infringing on copyright. If you publish a model, though, and it spits out infringing content in response to a question someone asks it, who is on the hook for that infringement? I think the person who published the model is probably the only sensible answer.

Now, I personally am fine with breaking copyright law, I think the whole thing should be torn down anyways. But that's because I have a personal moral philosophy where copyright does not have any inherent validity, but I do still think that training ML algorithms on people's creative works without their consent is usually unethical, at least, in every form that I have seen it in the world so far.

Anyways, "it's public so I can take it and do whatever I want with it" is clearly legally incorrect, and to me, clearly ethically incorrect as well.

I don't really want to bother with making a MeTa (someone else should feel free to if they want to talk about it), but I do think that having a explicit way for Metafilter users to select what license their comments are published under would be good, and definitely at the least, there should be some clarity about what license is being given to MetaFilter Networks Inc when someone publishes a post or a comment here.
posted by wesleyac at 7:53 AM on August 8, 2022 [1 favorite]

Remind me what this is about?
posted by parmanparman at 8:02 AM on August 8, 2022 [3 favorites]

As a example of what clarity on that license would look like, Dreamwidth's ToS has:
By submitting Content to us for inclusion on the Website, you grant us a world-wide, royalty-free, and non-exclusive license to reproduce, modify, adapt and publish the Content, solely for the purpose of displaying, distributing and promoting the contents of your account, including through downloadable clients and external feeds.

If you delete Content, we will use reasonable efforts to remove it from the Website, but you acknowledge that caching or references to the Content may not be made immediately unavailable.
(Importantly, this does not include the ability to sub-license the content —something that Instagram got some flak a while ago for having in their ToS)

But, this is all sort of a digression, since the legal questions about generative models are right now centered on fair use, which bypasses any sort of licensing concerns — licenses would only matter if it turned out that a generative model trained on Metafilter comments was not fair use.
posted by wesleyac at 8:03 AM on August 8, 2022 [1 favorite]

Just to check:

The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can't. Not without your help. But you're not helping. Why is that, rebent?
posted by zamboni at 8:13 AM on August 8, 2022

Well first of all, clearly whatever I do with the turtle will just result in more jokes about me being a moron so I'm not going to answer that. Frustrating how often that's the response I get, though. Clearly I have no ability to *do* this, and my question was if anyone had done this, not if it should be acceptable.

Because the cat is already out of the bag. No amount of licensing is going to prevent content from being used by 3rd parties unless there are locks preventing access. Cds had drm, video games have protections, instagram stops people from om downloading posts but that doesn't mean it's illegal to do so.

And I'm also suspicious of what your goals are. I see no substantial difference between asking a human "hey, you've been reading askme for ten years, what do you think they would say about this relationship problem?" and asking a spreadsheet or ai.

I'm not a legalist, but I question anyone who says laws can be enforced against using public content. I could imagine a splash TOS on metafilter to the extent of" by reading this website you agree to not copy and paste content for any purposes except what the original author intende. " I don't think it would be very effective.

The obvious exception, I think, is publication. And so I understand Tha publishing the results of the chat bot might be tricky. And I suppose publishing the chat bot itself might be tricky. But making a chat bot for personal use seems to be clearly, clearly within fair use to this non-lawyer, non-programmer who had an idea in the shower this morning and then had a bunch of folks get angry at him for it.
posted by rebent at 8:51 AM on August 8, 2022 [8 favorites]

I think the opinion is more nuanced. If you wrote a novel and put it on Metafilter, I can't take that novel, print it and make money off of it. But with AI it is more complex because I'm not copying your novel, but I'm using it to feed a complex algorithm. To me that's akin to oil traders who use satellite photos of oil tank's shadows to see how full they are. They're not profiting off the oil tanks but they're using that information to feed into a larger model which they then make money off of.

But really we can debate this all day and put up licenses to our hearts content, it doesn't matter. The simple matter of the fact is that there's going to be people who scrape the internet and make money off of it. Look at Google, it scrapes and indexes Metafilter for search results which they then make money off of selling to advertisers. So are we saying we aren't okay with Google indexing Metafilter? Or is AI just the new buzz word that scares people?
posted by geoff. at 8:58 AM on August 8, 2022

No amount of licensing is going to prevent content from being used by 3rd parties unless there are locks preventing access.
Uh, this is just not true? People get sued all the time for violating licenses and copyright. Sure, the vast, vast majority of license and copyright violations are not prosecuted, but that doesn't mean that the entire concept just doesn't exist.

It's legal to download posts from Instagram, but artists do occasionally sue people who take their art and slap it on t-shirts and sell it, or engage in similar types of infringement.
I question anyone who says laws can be enforced against using public content
Here is a fun thought experiment: If you go over to you will see a lot of characters that appear in public content. Are you legally allowed to use these however you want? I do understand the moral case that everything that is public should be the property of the public, but that is simply not the legal world that we live in.
Look at Google, it scrapes and indexes Metafilter for search results which they then make money off of selling to advertisers. So are we saying we aren't okay with Google indexing Metafilter? Or is AI just the new buzz word that scares people?
I think that discriminative models (like Google search) are fundamentally different from generative models, and it's reasonable to have different opinions about the ethics and legality of the two. It's really strange to me to suggest that all use of public content is the same, and should be treated the same legally and morally.

But also, even when you look at Google search, there is a opt-out mechanism! Are you proposing that Google should stop respecting robots.txt (and other no-index standards)? Or are you saying that GPT-3 is different in a way that means people should not be allowed to opt-out?

I guess you could be saying that robots.txt is the same level of protection that should be applied to things like training GPT-3, but that seems very strange to me — I would actually prefer to live in a world where people could mark individual comments as being no-index, rather than doing that at a page level.

FWIW, rebent, I don't think you deserve the tone of scorn/pushback you got for this question, even though I do not think that something like this should be built. I see why this is appealing — there was a time when I was excited about ML things as well (I spent a few months playing with building generative language models back in 2017 or so). Looking at the current state of the industry and what this technology is being used for, though, convinced me that building this stuff is almost universally bad, and I hope that you do take the criticisms of it seriously, even though it hasn't been terribly well-worded in this thread.
posted by wesleyac at 9:23 AM on August 8, 2022 [3 favorites]

Google stopped respecting robots.txt in 2019.
posted by geoff. at 9:29 AM on August 8, 2022

Welp, I think this would be useless but harmless and funny, and I'm glad rebent asked.

Also RIP Markovfilter. Those were, apparently, simpler times.
posted by chesty_a_arthur at 9:36 AM on August 8, 2022 [4 favorites]

Google stopped respecting no-index in robots.txt (which I think is bad, to be clear), but still does respect Disallow in robots.txt and no-index meta tags.

But in any case, it seems strange to me to suggest that the no-index change was either good or inevitable.
posted by wesleyac at 9:37 AM on August 8, 2022 [1 favorite]

In retrospect, Geoff is probably right and I have overstepped my bounds considerably.

Rebent, I apologize. I was wrong to immediately assume you brought this up with some nefarious purpose in mind. I’m sorry if this has dampened your enthusiasm in any way.

I remain skeptical of the value that such an exercise would bring to the community, but will have no further say in the matter.

My part of the discussion should remain on the site. Perhaps a future AI model will learn more about hubris or alarmism and the value of remaining silent when tired and not in a proper headspace.

If it has a head.

And maybe turtles, whatever that’s about. That kinda came out of nowhere.
posted by JustSayNoDawg at 10:55 AM on August 8, 2022 [7 favorites]

I find some responses here condescending and patronizing

You have to admit it would be funny if the AI started writing responses that were condescending and patronizing.
posted by kevinbelt at 10:58 AM on August 8, 2022 [4 favorites]

I asked it to answer a question in the style of Ask Metafilter, and here is the result.

buntastic, I would legitimately love to see how GPT-3 answers the question posed in this thread. Not sure exactly how to phrase it to get the best response from GPT-3 but maybe you've got experience with that.
posted by biogeo at 11:42 AM on August 8, 2022

What a tortured non-soul a chatbot built out of the annals of MetaFilter would be!

It's a shame rebent got jumped on so much here (very much undeservedly so), because I was interested in hearing an actual answer from someone who knows how such things work. I don't like much the idea of someone taking content from here for profit, but that's pretty much the business model of a fair bit of the Internet already, so it wouldn't get me upset to see it happen or surprised to find out it already has.
posted by dg at 3:27 PM on August 8, 2022 [2 favorites]

Tangentially, a few years ago I wrote a very simple sentiment analysis script using an AI API that would take a URL - for example the front page of MeFi - and rate on a scale of -1 to 1 the positivity or negativity of all the text.

Metafilter is pretty positive.
posted by bendy at 3:29 PM on August 8, 2022 [2 favorites]

When you think about it Metatalk isn't NOT the self appointed ethics committee...
posted by bleep at 9:08 PM on August 8, 2022 [4 favorites]

For the first AI-trained and -generated Meta this was pretty decent, I was especially buying rebent and the turtles bit was just genius, it did a pretty good job but nobody buttoned so maybe some more I/P and/or relitigating 2016 training data is in order.
posted by riverlife at 11:33 PM on August 8, 2022 [8 favorites]

I went from this thread to the Ukraine thread, and it took me a second or two to realise that the AI discussed there is Amnesty International...

I am off to google the turtle thing,
posted by 15L06 at 9:49 AM on August 9, 2022 [1 favorite]

I am off to google the turtle thing,

It's a quote from Blade Runner.

And, rebent, I don't think zamboni was trying to make fun of you with the quote. (It's actually kind of funny with context.)
posted by a non mouse, a cow herd at 11:02 AM on August 9, 2022 [2 favorites]

I think the push back is that people are not just spreadsheets and cvs files and it feels insulting to have someone wonder if you could be replaced by one. And the impact of that emotional labor thread scrape was pretty big - lots of people left or dramatically reduced their site usage after it. People take seriously that whole “you own your comments” thing.
posted by Bottlecap at 11:20 AM on August 9, 2022 [4 favorites]

Yeah, I agree. I think there's a spectrum between "I want to NLP your diary" to "I want to NLP the New York Times opinion section," and I guess lots of folks see their contributions to MetaFilter much closer to personal than I anticipated. I've never held very strong boundaries around what I make public, because I'm just delighted that anyone pays me attention at all.

If my parents made a dopple of me after I pass away, I would be totally fine with that. if Facebook did it, and made money off of it, not so much. What if Facebook made an NLP dopple of Zuckerberg to serve as the CEO? Now we're getting into sci fi.

I think specific to AskMe, i'm curious if the NLP AI would be of any value whatsoever. Could it help me think more clearly? Could it help me process things differently? Would it be useful? And if so - WOW. if not - shrug.
posted by rebent at 1:00 PM on August 9, 2022 [1 favorite]

To me, the AI chatbot would literally obliterate the sense of trust I have in participating in this website and being with others. I want to talk to real people! And this is like based on my interest into looking into conversational design. I find the related links to be more than enough of a helpful algorithm to help me search and find similar responses.
posted by yueliang at 4:50 PM on August 9, 2022 [1 favorite]

Aside from the whole content ethics thing, the major way I would see a chatbot working is if on mobile or desktop, one can bring up Asks related to your question, but that would require processing the tags and the Ask data and pulling out top suggested questions, but that results in another tricky thing --how would one necessarily suggest or process that in the design? What would be favored over other results? That seems so so tricky. I don't even know how Google chooses the AskMeFi questions when I use their search to find questions. (To be honest, I don't actually use MeFi's search bar at all but that's like a force of habit and another story...)

I think though, it makes me wonder if that is truly better than someone searching for it in the search bar, since researching and finding answers is part of the fun. I am aware though that for folks who got introduced to the internet during Web 2.0, that searching isn't really a super learned skill for them.
posted by yueliang at 6:21 PM on August 9, 2022

OMFG I was just thinking about this like an hour ago as a long-burn project that I'd try. It's an escalating though as much are and even so a revisiting of previous ideas. Get an infodump of Metafilter and grab my post and comments and do some matching to create an autobiography of the stories that I have told/retold if for nothing else than to save some bit of telling them again, and even maybe to put the good ones on my profile page as a bit of reference/linkage. Not my blog, but probably would at least be a "checked your profile" thing.

Then I thought throw that into a GPT-whatever thing and shits and giggles lol talk to myself.

Sounds like a good fun project to let every user do for themselves.

I have been thinking along these lines for ages, you can maybe search posting history and find my evil scientist plans.

I want to make a Gibsonian Ghost to leave as an heirloom to eventual descendants of a device that is "here, great-great-great uncle".

I'm also side planning on it to be a possible artifact found centuries or more later even by aliens that is like finding a stone tablet from the past that makes a researcher sqee with delight.

I wanted to have it mounted inside my own skull... but that is problematic, so I still want to put it in a plastic skull. And I want it to be sorta like the talking head terminal from Neuromancer.

Grandiose thoughts on this topic for years.

It at the least would be amusing to chat with myself not as I do today in my head but from past to present myself.
posted by zengargoyle at 12:30 AM on August 11, 2022 [1 favorite]

Errant entry.
posted by parmanparman at 7:37 AM on August 11, 2022

rebent: I'm interpreting this as a big "no" which makes me think there's a lot of potential here.

I can't parse this at all. Wouldn't that mean there's NOT a lot of potential here, since people seem to strongly not want it? Or am I misunderstanding something?
posted by Too-Ticky at 1:32 AM on August 12, 2022 [2 favorites]

I tripped on that at first, too, but then realized rebent's "no" in that context was meant as answer to their original question "Has anyone trained an ai using..."
posted by nobody at 3:14 PM on August 12, 2022 [2 favorites]

Lol yes. Also, what about reading a conversation between a 2009 mefi chatbot, and a 2019 mefi chatbot? I bet they'd have a lot to argue about...
posted by rebent at 11:11 AM on August 13, 2022

I asked GPT-3 (really).

> Comment section on the website Ask Metafilter, answering the following question:
> Is it ethical to train an artificial intelligence on Metafilter comments?

This was the first output:
There's no easy answer to this question. On the one hand, Metafilter is a public website and anyone can view the comments posted here. So it stands to reason that someone could use those comments to train an AI. On the other hand, many people post here anonymously and might not want their comments used in this way.

Ultimately, it comes down to a question of consent. If everyone who posted a comment on Metafilter was made aware that their comments could be used to train an AI, and they all gave their consent, then it would be ethical to do so. But if even one person did not give their consent, then it would not be ethical.
posted by BungaDunga at 4:20 PM on August 13, 2022 [7 favorites]

Fascinating. What if you prompt the same question but for different websites? E.g., Quora, StackOverflow, Yahoo Answers...
posted by biogeo at 8:04 PM on August 13, 2022 [1 favorite]

Okay it's easier to get access to GPT-3 on the OpenAI Playground than I thought! Exact same prompt as BungaDunga but substituting other websites in:

Stack Overflow:
There is no easy answer to this question. On one hand, Stack Overflow is a public website with a wealth of user-generated content that could be used to train an AI. On the other hand, some people may not want their comments to be used in this way, and there could be privacy concerns. Ultimately, it is up to the individual to decide whether or not they are comfortable with their comments being used to train an AI.
There is no easy answer to this question. On one hand, Quora comments can contain a wealth of information that could be used to train an AI. On the other hand, using Quora comments to train an AI could be seen as taking advantage of the people who have left those comments. It is ultimately up to the individual to decide whether or not they think it is ethical to train an AI on Quora comments.
Yahoo Answers:
It is ethical to train an artificial intelligence on Yahoo Answers comments because the comments are publicly available and people have not explicitly stated that they do not want their comments to be used for this purpose.
There is no right answer to this question as it depends on personal opinion. Some people may believe that it is ethical to train an artificial intelligence on comments as this could help the AI to become more intelligent and understanding of human behaviour. Other people may believe that it is not ethical to train an AI on comments as this could lead to the AI developing a biased view of the world based on the negative comments that are often posted on the website.
posted by biogeo at 8:14 PM on August 13, 2022 [3 favorites]

Well it looks like I'm spending the rest of the night making GPT-3 answer variations on the trolley problem from the perspective of various websites.
posted by biogeo at 8:32 PM on August 13, 2022 [3 favorites]

Yahoo! Answers 😂
posted by Bottlecap at 8:57 PM on August 13, 2022

posted by parmanparman at 9:30 AM on August 14, 2022

Yeah, for various reasons I have interacted less and less with Metafilter over the years, but this is one thing that would keep me from writing much any more. I say this as someone working in AI.

I also find it sort of rude that in a thread where various people have expressed their discomfort with having their text used in this fashion others merrily go off and do precisely that...
posted by doggod at 4:09 PM on August 14, 2022 [2 favorites]

Echoing doggod - I felt bad after writing my follow-up comment, but I think it's from wanting to be at least a little fair to those who want to discover and play with it as usage, but I'm still making up my mind and trying to figure out what to do. I still stand by that I don't understand how it would be better than the recommended suggestions though. I also think people should be mindful that AskMeFi isn't necessarily a sandbox for content to mine, like the fiasco with the emotional labor ebook.
posted by yueliang at 11:16 PM on August 14, 2022

I vaguely remember that years ago, someone made a Markov Chain thingy that you could feed your own posts into, and would generate content-free semi-gibberish in your own posting "style". I can't find it now, though.

I'm reasonably certain I didn't hallucinate the thing, because I remember running it on my own posts and realizing that I used way too many parentheticals, and trying to cut that shit out for a while.
posted by Kadin2048 at 9:04 PM on August 16, 2022

Yes, that was MarkovFilter, made by cortex. I think it's defunct now, I assume because some people have expressed that they don't like the idea of their comments being used in that way, but maybe just due to bitrot. I rather enjoyed it, and kept one of my favorite MarkovFilter mashups of my own comments for my user page.
posted by biogeo at 9:09 PM on August 16, 2022 [1 favorite]

« Older Metatalktail Hour: What most people don't know   |   Another video game roundup thread! Newer »

You are not logged in, either login or create an account to post comments