How can I add a simple requirement “do not train Al on the source code of the program” to AGPLv3 or GPLv3 and thereby create a new license?

Don’t know is it a good place for such a question but I try :).

Why did I come up with such an stupid idea? There have been reported cases where artificial intelligence such as Github Copilot has been trained on many open source and free software projects, and in some cases it can output code snippets from GPL-licensed projects without specifying it. https://www.pixelstech.net/article/1682104779-GitHub-Copilot-may-generate-code-containing-GPL-code

I am not a lawyer, and I do not know where it is better to insert such a requirement. And how to formulate it in the best and correct form.

I understand it maybe complicated to check, to comply with this requirement and it may cause other difficulties, but I still think it can be a useful addition.

How to fit it with the fundamental freedoms of the GPL or it is unfitable?

I understand that this would make the license non-free, since it puts constraints on what the code can be used for. It’s sad that it doesn’t combine in some way. Maybe change requirements to do not train “closed source AI”(without code and training data of AI model publicly available).

And how can I name it? Is it better to name it without “GPL” If this new license cannot be considered free? NoAIFL or your variants :)?

Is it good to just add a new item?

For example like this:

Additional Clause:
You may not use the source code of this program, or any part thereof, to train any artificial intelligence model, machine learning model, or similar system without explicit written permission from the copyright holder.

or

Section [X]:
Restrictions on AI Training You may not use the source code of this program, or any part thereof, to train any artificial intelligence model, machine learning model, or similar system without explicit written permission from the copyright holder.

What you think about it? Maybe you already know licenses like this?

  • RobotToaster@mander.xyz
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    1 month ago

    I understand that this would make the license non-free

    You can potentially get around that by just specifying that any AI trained on it is considered a derivative work, and thus must be released under your new license.

    That said, it’s potentially moot, the argument the AI companies use for training on commercial data and art is that it’s fair use under various exemptions.

    • JustVik@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 month ago

      That’s a good idea. Now I have to think about how to formulate it better and what it will mean. :)

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      That’s not up to OP to “specify;” either it already is the case (for everybody) or it isn’t, according to the legal definition of “derivative work.”

      (I take the position that it is, BTW – AI code generation is massive copyright infringement in general, and a way of laundering copyleft code for proprietary uses in particular.)

      • dustycups
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        I think so too & have made that point in the past.
        Does anyone know of some more legally credible references that agree with us?

    • Possibly linux@lemmy.zipM
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      Right now there is a huge legal question surrounding copyrighted works and AI. Since the GPL is enforced via copyright it depends on the court rulings around AI.

      Basically there lots of unanswered questions at the moment