• OmanMkII
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    2
    ·
    10 months ago

    I was curious if a robots.txt equivalent exists for AI training data, and there was some solid points here:

    If I go to your writing, I read it & learn from it. Your writing influences my future writing. We’ve been okay with this as long as it’s not a blatant forgery.

    If a computer goes to your writing, it reads it & learns from it. Your writing influences its future writing. It seems we are not okay with this, even if it isn’t blatant forgery.

    [AI at the moment is] different because the company is re-using your material to create a product they are going to sell. I’m not sure if I believe that is so different than a human employee doing the same thing.

    https://news.ycombinator.com/item?id=34324208

    I still think we should have the ability to opt out like we do with search engines and webcrawlers, but if the algorithm works ideally and learns but does not recycle content, is it truly any different from a factory of workers pumping out clones of popular series on Amazon? I honestly don’t know the answer to that.

    • deweydecibel@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      edit-2
      10 months ago

      The problem is not the technology, the problem is the businesses and the people behind them.

      These tools were made with the explicit purpose of taking the content that they did not create, repurposing them, and creating a product. Throw all these conversation about intelligence and learning out the fucking window, what matters is what the thing does, and why it was created to do that thing.

      Until we reach a point where there is some sort of AI out there that has any semblance of free will, and can choose not to learn if fed certain information, and choose not to respond to input given to it without being programmed to do not respond, then we are not talking about intelligence, we are talking about a tool. No matter how they dress it up.

      Stop arguing about this on their terms, because they’re gaslighting the fuck out of you.

    • Appoxo@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      6
      ·
      10 months ago

      Afaik the OpenAI bot may choose to ignore it? At least that’s what another user claimed it does.

      • JohnEdwa@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        12
        ·
        10 months ago

        Robots.txt has been always ignored by some bots, it’s just a guideline originally meant to prevent excessive bandwidth usage by search indexing bots and is entirely voluntary.

        Archive.org bot for example has completely ignored it since 2017.

    • Mossy Feathers (She/They)@pawb.social
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      10 months ago

      This is kinda my take on it. However, the way I see it is that the AI isn’t intelligent enough yet to truly create something original. As such, right now AI is closer to being a tool than a being. Because of that, it somewhat bothers me that I’m being used to teach a tool. If I thought that companies like OpenAI were truly trying to create beings and not tools, then I’d feel differently.

      It’s kinda nuanced, but a being can voluntarily determine whether or not something is copyright infringing, understand why that might be an issue, and then decide whether or not to continue writing based on that. A tool can’t really do that. You can try and add filters to a tool to avoid writing copy written text, but that will have flaws and holes in it. A being who understands what it’s writing and what makes it plagiarism vs reference vs homage/inspiration/whatever is less likely to have those issues.