A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times’ suit goes well beyond that to show how the material ingested during training can come back out during use. “Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples,” the suit alleges.

The suit alleges—and we were able to verify—that it’s comically easy to get GPT-powered systems to offer up content that is normally protected by the Times’ paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.

The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.

The suit is dismissive of attempts to justify this as a form of fair use. “Publicly, Defendants insist that their conduct is protected as ‘fair use’ because their unlicensed use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose,” the suit notes. “But there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”

The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: “statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity.”

    • FunkyStuff [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      11 months ago

      That’s a good question, right? You’d think that the established media tycoons like Murdoch would have the kind of pull to have killed this baby in the womb, but they didn’t. Is that because they’re confident they can adapt to it?

      • daisy [he/him, comrade/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        6
        ·
        11 months ago

        The more I think about this, the more I wonder if it’s all an elaborate play by the media companies to get the tech companies to buy them out. The tech companies have ridiculously huge cash reserves, and media companies’ stocks aren’t nearly as valuable as people think. For example, the New York Times has a market cap of $8 billion USD, and made a profit of $90 million USD in their July/August/September 2023 quarter. Apple made $23 billion USD in profit in that same quarter, has a market cap of $3 trillion USD, and has cash reserves that would make Scrooge McDuck envious.

        Imagine if all these legal fights over AI scraping are the media industry’s way to say to the tech companies “Hey, the data we have the rights to is incredibly valuable to your AI work. We could tie you up in court for years, setting you well behind your competitors. Wanna make a bid?”

        • FunkyStuff [he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          2
          ·
          11 months ago

          That’s totally valid, but what about the Disneys, the Universals, and the Sonys? Not all media companies are made equal, and there’s a lot of inertia behind those giants despite the falling rate of profit.

          • drhead [he/him]@hexbear.net
            link
            fedilink
            English
            arrow-up
            3
            ·
            11 months ago

            Have you SEEN what Disney has been making lately? They’d gladly pivot to AI slop the second it matches their declining quality standards.

          • daisy [he/him, comrade/them]@hexbear.net
            link
            fedilink
            English
            arrow-up
            2
            ·
            11 months ago

            Of course it’s just an idea. It’s probably also a plan that would appeal more to print media companies that have doubts about long-term profitability and stand to lose a lot from text-generation AIs.