If artificial intelligence uses your work, it should pay you

silence7@slrpnk.net · 1 year ago

If artificial intelligence uses your work, it should pay you

fubo@lemmy.world · edit-2 1 year ago

The argument regarding the specific case of AI-generated images of real actors makes sense, but the headline overgeneralizes hugely.

If you write a book about carpentry, and someone checks that book out from the library, reads it, learns how to do carpentry from it, and goes into the carpentry business, they do not owe you a share of their profits.

It’s nice if they give you credit. But they do not owe you a revenue stream.

If they are a robot, the same remains true.

The Snark Urge@lemmy.world · edit-2 1 year ago

Corollary: if a corporation scapes the talk of the whole internet, which itself was shaped by the aggregate culture and knowledge of ten thousand years of human history, and their resultant product is an AI that can replace workers, it is morally valid to eminent domain that shit and divert its profits to a fledgling UBI program.

Edit to add: Not a statement about how UBI should really work, just a throwaway comment about seizing means.

d3Xt3r@lemmy.world · edit-2 1 year ago

UBI should be a government initiative, and funding for it should be collected in form of tax, irrespective of AI. Because more and more humans are getting replaced with automation and technology in general, and a lot of this being done so gradually that you don’t notice it, or think of it as a problem. Every time you saw headlines like “xx corporation has laid off hundreds/thousands of employees” in the past, had very little to do with AI, but could have to do with technology and progress in general, plus a lot of other factors. Every little new development could have a butterfly effect that’s hard to calculate.

Neither AI, nor the loss of jobs in general, should be a factor for UBI funding. AI is just another new technological development, maybe even a disruptive one, but it’s nothing so new that we need to pick up our pitchforks against.

As for compensating creative owners, that’s a bigger discussion on IP protection and ownership in general, and the responsibility falls upon the IP owners (and maybe appropriate laws). For instance, we’ve seen news sites, science publishers etc paywall their work, and that’s because they want to protect their work and get compensation for viewership - and this has nothing to do with AI. If people want compensation for their work, then they should take appropriate measures to protect their work, and/or come up with alternate revenue streams, if it’s impossible to paywall their work (for instance, how some youtubers choose to seek sponsorship or patreon donations). If people want to prevent their work from being stolen and redistributed, appropriate action should be taken against the persons/sites stealing their work (eg via DMCA etc). It’s not the AI’s fault for eating up copyrighted content on public sites like pastebin.com or Scribd, it’s the fault of the people uploading it.

FaceDeer@kbin.social · 1 year ago

UBI should not be dependent on its specific sources and specific destinations. It’s universal, it’s right in the name. It should be funded by a tax on the wealthy - regardless of how that wealth is obtained - and be issued to everyone.

The goal is not to “level the playing field” so that human employees can continue to labor and companies can’t afford to hire robots to replace them. The goal is to make it so that if companies replace all their employees with robots those employees don’t have to find some other job to continue living.

pulaskiwasright@lemmy.ml · 1 year ago

If you write a book about carpentry, and someone checks that book out from the library, reads it

AI is not a person. That’s why its works aren’t eligible for copyright. You’re arguing that AI should have the same rights as a person in this regard and that’s not an established right, nor should it be.

ForgetReddit@lemmy.world · 1 year ago

Also the analogy makes zero sense. It’s more accurate to say someone checks out a book about carpentry, reads it, then writes another book on carpentry by moving the words around a bit despite knowing nothing about carpentry.

NebLem@lemmy.world · edit-2 1 year ago

More accurately someone who knows nothing about German, writing, or carpentry but learns German and carpentry by reading hundreds of thousands of books and then decides to write a book about carpentry in German.

pulaskiwasright@lemmy.ml · edit-2 1 year ago

the AI still doesn’t learn carpentry. It just knows how books about carpentry generally read.

phillaholic@lemm.ee · 1 year ago

I’m not sure that’s a fair comparison. You wouldn’t instantly ingest that information and know it. It’s more like photocopying a book and including it in another book that you sell. It’s a paradigm shift, and I’m not sure what the answer is.

dorkian_gray@lemmy.world · edit-2 1 year ago

deleted by creator

phillaholic@lemm.ee · 1 year ago

I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.

dorkian_gray@lemmy.world · edit-2 1 year ago

deleted by creator

Taleya · 1 year ago

AI isn’t learning how to do carpentry though. It’s simply including my work in an aggregate pool that it now claims as its own.

FaceDeer@kbin.social · 1 year ago

It is not. The AI’s model does not contain a copy of your work, there is no “aggregate pool.” AI is not some sort of magical compression algorithm that’s able to somehow crush whole images down to less than a byte of data. The only thing that it’s “including” in itself are the concepts that it learned from your work. Those are ideas, which are not copyrightable.

silence7@slrpnk.net · 1 year ago

A key difference is that AI models tend to contain actual pieces of the training data, and on occasion regurgitate it. Kind of like randomly reproducing parts of the book during the course of your career as a carpenter. That’s the kind of thing that actually results in copyright lawsuits and damages when real people do it. AI shouldn’t be getting a pass here.

fubo@lemmy.world · 1 year ago

Oh sure, if a copyright holder can demonstrate that a specific work is reproduced. Not just “I think your AI read my book and that’s why it’s so good at carpentry.”

silence7@slrpnk.net · 1 year ago

The thing is that they’re all reproduced, at least in part. That’s how these models work.

fubo@lemmy.world · edit-2 1 year ago

Reproducing a work is a specific thing. Using an idea from that work, or a transformation of that idea, is not reproducing that work.

Again: If a copyright holder can show that an AI system has reproduced the text (or images, etc.) of a specific work, they should absolutely have a copyright claim.

But “you read my book, therefore everything you do is a derivative work of my book” is an incorrect legal argument. And when it escalates to “… and therefore I should get to shut you down,” it’s a threat of censorship.

Cylusthevirus@kbin.social · 1 year ago

A person reading and internalizing concepts is considerably different than an algo slurping in every recorded work of fiction and occasionally shitting out a bit of mostly Shakespeare. One of these has agency and personhood, the other is a tool.

silence7@slrpnk.net · 1 year ago

The problem is that the LLMs (and image AIs) effectively store pieces of works as correlations inside them, occasionally spitting some of them back out. You can’t just say “it saw it” but can say “it’s like a scrapbook with fragments of all these different works”

fubo@lemmy.world · 1 year ago

I’ve memorized some copyrighted works too.

If I perform them publicly, the copyright holder would have a case against me.

But the mere fact that I could recite those works doesn’t make everything that I say into a copyright violation.

The copyright holder has to show that I’ve actually reproduced their work, not just that I’ve memorized it inside my brain.

silence7@slrpnk.net · edit-2 1 year ago

The difference is that your brain isn’t a piece of media which gets copied. The AI is. So when it memorizes, it commits a copyright violation

fubo@lemmy.world · edit-2 1 year ago

If that reasoning held, then every web browser, search engine bot, etc. would be violating copyright every time it accessed a web page, because doing so involves making a copy in memory.

Making an internal copy isn’t the same as publishing, performing, etc. a work.

conciselyverbose@kbin.social · 1 year ago

No, it doesn’t. Learning from copyrighted material is black and white fair use.

The fact that the AI isn’t intelligent doesn’t matter. It’s protected.

radix@lemmy.world · 1 year ago

deleted by creator

FaceDeer@kbin.social · 1 year ago

No, that’s not how these models work. You’re repeating the old saw about these being “collage machines”, which is a gross mischaracterization.

FaceDeer@kbin.social · 1 year ago

That article doesn’t show what you think it shows. There was a lot of discussion of it when it first came out and the examples of overfitting they managed to dig up were extreme edge cases of edge cases that took them a huge amount of effort to find. So that people don’t have to follow a Reddit link, from the top comment:

They identified images that were likely to be overtrained, then generated 175 million images to find cases where overtraining ended up duplicating an image.

We find 94 images are extracted. […] [We] find that a further 13 (for a total of 109 images) are near-copies of training examples

They’re purposefully trying to generate copies of training images using sophisticated techniques to do so, and even then fewer than one in a million of their generated images is a near copy.

And that’s on an older version of Stable Diffusion trained on only 160 million images. They actually generated more images than were used to train the model.

Overfitting is an error state. Nobody wants to overfit on any of the input data, and so the input data is sanitized as much as possible to remove duplicates to prevent it. They had to do this research on an early Stable Diffusion model that was already obsolete when they did the work because modern Stable Diffusion models have been refined enough to avoid that problem.

BrianTheeBiscuiteer@lemmy.world · 1 year ago

If I was to read a carpentry book and then publish my own, “regurgitating” most of the original text, then I plagiarized and should be sued. Furthermore, if I was to write a song and use the same melody as another copyrighted song I’d get sued and lose, even if I could somehow prove that I never heard the original.

I think the same rules should apply to AI generated content. One rule I would like to see, and I don’t know if this has precedent, is that AI generated content cannot be copyrighted. Otherwise AI could truly replace humans from a creative perspective and it would be a race to generate as much content as possible.

scarabic@lemmy.world · 1 year ago

Analogies to humans are not relevant, and yours is a bad one anyway. LLMs don’t read a carpentry book and then go build houses. They chew up carpentry books and spit out carpentry books.

Your final line remains to be established in court.

krayj@lemmy.world · edit-2 1 year ago

The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

The ‘copy’ that the AI retains indefinitely is a verbatim copy of the original work, and the entire point of “copyright” is to control how and where copies are used.

Yes, there are ‘fair use’ exceptions to copyright. I don’t think you realize it, but your argument is less about whether this violates copyright (it absolutely does under the textbook definition) and more about whether there should be a fair-use exemption for AIs; you seem to think yes, I would disagree.

I’d also argue the AI example qualifies as it as ‘derivative work’ based on the original, which STILL would require honoring copyright laws and compensating the creators of the original works. Basically, before reading the book it was just “AI”. After reading the book it has become “AI + book1”, a derivative work, and on and on and on.

fubo@lemmy.world · edit-2 1 year ago

The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

However, that is how it works when a human memorizes a copyrighted work. If I memorize a poem, I may then reference it from my memory without further need for the original text before me. If I am an actor and learn my lines for a play, I commit them to my memory.

Which is not an infringement.

The infringement happens if the human performs or publishes that work; e.g. reciting that copyrighted poem or play from memory before an audience; writing that work down from memory and publishing it; etc., without a copyright license for that performance or republication.

I suggest merely applying the same standard: infringement doesn’t happen when a work is read, indexed, scanned, etc.; it does happen if that work is then recited.

For instance, ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so. (Try it! It will answer questions about the text, but it will freeze up if asked to recite it; evidently because it has a filter against reciting copyrighted material.)

FaceDeer@kbin.social · edit-2 1 year ago

No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.

I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.

TitanLaGrange@lemmy.world · edit-2 1 year ago

ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so.

I tried that several weeks ago while discussing some details of the Harry Potter world with ChatGPT, and it was able to directly quote several passages to me to support its points (we were talking about house elf magic and I asked it to quote a paragraph). I checked against a dead-tree copy of the book and it had exactly reproduced the paragraph as published.

This may have changed with their updates since then, and it may not be able to quote passages reliably, but it is (or was) able to do so on a couple of occasions.

krayj@lemmy.world · edit-2 1 year ago

deleted by creator

FaceDeer@kbin.social · 1 year ago

That’s not how these AIs work. They don’t contain verbatim copies of their training data. They get trained on terabytes of text, they couldn’t possibly remember it all.

If artificial intelligence uses your work, it should pay you

If artificial intelligence uses your work, it should pay you

Opinion | If artificial intelligence uses your work, it should pay you