Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”

  • conciselyverbose@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    547
    arrow-down
    3
    ·
    edit-2
    2 months ago

    "Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman said in an interview this week

    It’s not your data.

    Fuck off.

    • Gsus4@mander.xyz
      link
      fedilink
      English
      arrow-up
      78
      ·
      2 months ago

      My only regret was not deleting all my comments before deleting my reddit account :P

      • elephantium@lemmy.world
        link
        fedilink
        English
        arrow-up
        73
        arrow-down
        1
        ·
        2 months ago

        Don’t regret too much. I wouldn’t be surprised if reddit’s “delete” function was really just "move to the “suckers-wanted-to-delete-this” file.

        • CileTheSane@lemmy.ca
          link
          fedilink
          English
          arrow-up
          29
          ·
          2 months ago

          I “deleted” all my posts, then randomly had someone reply to a 3 year old post that wasn’t showing up in my profile but still showed on the page.

          Don’t delete your comments, edit them to be useless.

          • Passerby6497@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            2 months ago

            And as a positive to editing rather than deleting, you may have your comment taken down by AutoMod anyway! I had AutoMod take down a ton of my comments because they were flagged a spam because I used a replacement text tool to mass fix a decade worth a comments on multiple accounts. So many messages from AutoMod…

            • CileTheSane@lemmy.ca
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 months ago

              Probably, but when someone is going through old posts they are going to see the edit, not the history. The main goal here is to make Reddit less useful so people go elsewhere. Let Google’s AI be trained on Bot posts.

            • The Quuuuuill@slrpnk.net
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 months ago

              Good news, Reddit’s crap ass infra doesn’t differentiate if a GDPR request is legitimate or spurious. Or at least it didn’t back when I processed mine, they may have closed that loophole

        • Transporter Room 3@startrek.website
          link
          fedilink
          English
          arrow-up
          27
          ·
          2 months ago

          Same, plus or minus a year.

          It took me a week, but I scrambled every comment and post with lorem ipsum and bee movie scripts, deleted the comments, then after verifying I could no longer find any of my original content on any search engine outside archive sites, I deleted the account.

          It took so long because r*ddit started limiting API access when they realized people were automating their profile scrubbing.

          As I’ve said before about certain countries, if you’re doing everything you can to prevent people from leaving [THING/PLACE] then you might just be shit.

          • FenrirIII@lemmy.world
            link
            fedilink
            English
            arrow-up
            8
            arrow-down
            3
            ·
            2 months ago

            I was straight IP banned permanently for reporting the Israeli genocide fans and racists arguing for the eradication of Palestinians. I just deleted the account because I never imagined they would turn into such shitheels.

            • vaultdweller013@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              5
              ·
              2 months ago

              And here I got IP banned for saying we should murk the crown prince of Saudi Arabia. This was around the time the journalist got butchered in that Saudi Embassy. As an aside the Saudis have oil and are assholes how long till we start drone striking them?

              • Passerby6497@lemmy.world
                link
                fedilink
                English
                arrow-up
                5
                ·
                2 months ago

                As an aside the Saudis have oil and are assholes how long till we start drone striking them?

                They would either need to stop playing ball, or oil is no longer a staple for energy generation and transport needs.

                Until then, the 9/11 architects will be able to hang out and do whatever they want.

                Also, thanks for protecting your friends that day GW, here’s hoping you get “touched” by a friend from a long way away…

            • LustyArgonianMana@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              2 months ago

              I got IP banned for asking if there was any “good news” about Mitch McConnell after his strokes. I intentionally worded it ambiguously, but the mod on r/politics looked at my political history/not a conservative and decided I was celebrating violence and so I was IP banned. I guess only Mitch McConnell is allowed to salivate at violence openly and the rest of us are supposed to be worried for his health. It’s not a problem if women are the ones he’s directing violence towards, but God forbid a woman speak back to him. Pregnancy and birth cause strokes and clots, and that’s preferable to him vs an abortion… but God forbid he get strokes at the end of his life from being a horrible person and I find that preferable to his stupid harmful policies

              • lightnsfw@reddthat.com
                link
                fedilink
                English
                arrow-up
                2
                ·
                2 months ago

                The way they enforce that no violence rule is so fucking stupid. Even if you had said “I hope that stroke implodes his brain” you aren’t advocating violence. A medical issue isn’t violence.

      • demizerone@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        2 months ago

        I deleted my top comments and left the trash, which was 15 years worth. AI can hallucinate off that trash all it wants.

    • yeehaw@lemmy.ca
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      4
      ·
      2 months ago

      Part of the ToS. Whatever you put on there is effectively theirs. Same with Facebook and your photos etc.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        2 months ago

        No you cannot transfer copyright with ToS agreements just give license for reddit to use your copyright.

      • Passerby6497@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        2 months ago

        Whatever you put on there is effectively theirs.

        I would so love if companies that had decided they own/can sell the data users published lost section 230 protection. Oh, this is your data? I guess you don’t need to be protected against the data users post if it’s your data now.

      • Womble@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        2 months ago

        And whatever you put on a public accessible webpage is effectively anyone’s who makes a get request.

    • lmaydev@lemmy.world
      link
      fedilink
      English
      arrow-up
      52
      arrow-down
      94
      ·
      2 months ago

      I mean it literally is. People post it there voluntarily knowing that. It’s what keeps the lights on.

      • cygnus@lemmy.ca
        link
        fedilink
        English
        arrow-up
        86
        arrow-down
        3
        ·
        edit-2
        2 months ago

        Sort of, but not really. From the Reddit ToS (emphasis mine). Basically, you own your content but allow Reddit to use it however they want without crediting you. Only a corporate lawyer would call that arrangement “ownership”, but I digress…


        By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

        You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

        When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

        • bionicjoey@lemmy.ca
          link
          fedilink
          English
          arrow-up
          27
          arrow-down
          1
          ·
          2 months ago

          Beyond that, if you are serving webpages with data on them, you don’t get to decide what people do with those pages. They can’t stop search engines from scraping

          • bassomitron@lemmy.world
            link
            fedilink
            English
            arrow-up
            21
            ·
            2 months ago

            Just to nitpick, they can stop scraping, anyone can. However, doing so would require implementing barriers that tend to also negatively effect sites that are dependent on being discovered and browsed.

        • Bookmeat@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          ·
          2 months ago

          It’s right there in the ToS: NON-EXCLUSIVE license. If they go to court, I would guess they lose.

        • chakan2@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          21
          ·
          2 months ago

          Lol…really? So the can reuse, modify, and remove all association with your content, but somehow you think you still own it?

          I’ve got a bridge to sell you.

          • cygnus@lemmy.ca
            link
            fedilink
            English
            arrow-up
            32
            ·
            2 months ago

            In essence, it means that you reserve the right to also use the content for your own purposes, without Reddit having any recourse to preventing you from doing that.

            • chakan2@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              6
              ·
              2 months ago

              Except they published your work, all variants of said work, and completely eliminated you as the author of said work.

              I don’t know how else to explain to you that you don’t own that work anymore. You have rights to it. But you don’t own it.

      • conciselyverbose@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        35
        ·
        edit-2
        2 months ago

        It literally isn’t. Even their shitty EULA only claims a license to use it, not that it’s their data.

        And approximately 100% of the data on their servers was created while it was accessible to literally anyone who wanted it without restriction through a free API. Virtually none of the content was ever intended to be kept from fucking search engines so it could be sold for AI.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        2 months ago

        Yup keeps the lights on and makes sure Spez gets his yearly 200 million bonus. It’s good that they are tightening the screw because 200 million is clearly not enough, he deserves double that at least.

      • helpImTrappedOnline@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        2 months ago

        They are not responsible for what people post, nor do they pay anyone to post, therefore I do not see how they can claim the data as “theirs”.

        They have their own self-regualted rules, but ultimately most anything is fair game for reddit to point at the user and say “we take no responsibility for what an individual may post on this public form”.

        The only thing they will have a problem with is CSAM, but even then as long the volunteer mods remain effective at removing it, reddit will not be responsible for anything users post.

    • snooggums@midwest.social
      link
      fedilink
      English
      arrow-up
      51
      ·
      2 months ago

      When the internet was new I thought that the spread of knowleege would raise up people’s awareness of things they had not been exposed to under the false assumption that the vast majority of people where reasonable, but uninformed.

      Nope, I was wrong. At best a slight majority are reasonable with varying levels of ignorance and the rest are willfully ignorant.

      • Crismus@lemmy.world
        link
        fedilink
        English
        arrow-up
        16
        ·
        2 months ago

        In the early years, I learned a lot of new things. The late 90’s had so much hope for the web and learning new things. I remember falling I to rabbit holes a lot more often back then because it wasn’t about video essays and persuasive videos.

        I miss all of those old text based chatting and the friends I made on Yahoo local chat rooms. Somedays I wish the social media sites run by algorithms didn’t surpass those old forums and chat rooms.

      • The Quuuuuill@slrpnk.net
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        4
        ·
        edit-2
        2 months ago

        Internet is just a new form of mass media to control the minds of people who aren’t paying attention

        EDIT; Christ, I shouldn’t have had to say this, but obviously not the only thing. I just mean for mass media manipulation its no different than radio or movies. The usage of radio and the distribution of film is still a good thing when done positively. I shouldn’t have to spell shit all the way out like this

    • boonhet@lemm.ee
      link
      fedilink
      English
      arrow-up
      11
      ·
      2 months ago

      I don’t remember when it was starting out but I do remember the forums of the 00s as well as the overall online culture of the time. The fediverse is the closest I’ve gotten to feeling that again but obviously nothing will ever feel like that again because I’m an adult and all the sense of wonder is gone from the online world for me.

      At least here, we’re not on a platform owned by corporations.

    • Boozilla@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      2 months ago

      I remember. Now it’s mostly walled gardens everywhere. Capitalism fucked us again.

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      Remember when you had RSS with HUNDREDS of different websites and had your own personal little newspaper every day 🥲

    • rottingleaf@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      I just started hearing about some Facebook at some point, and then some (it’s Russia) Odnoklassniki and Vkontakte and a few other spaces.

      These seemed to be the sites which lamers use to feel themselves relevant with that Internet thing. Leprosoria of the Web. Places where I wouldn’t go, even if I would sometimes go to porn sites, especially when depressed.

      Normal people would use LiveJournal and move from ICQ to XMPP and Skype. Skype voice calls (back then they were better with countryside radio link than now with GPON) were a miracle, so me and everybody around me moved to Skype.

      Then there was a moment when the Web suddenly died and everyone was on those social networks.

      I actually blame LJ and Skype. Technically these worked fine, but they were a gateway drug.

      LJ made it easy to have something like a personal webpage, only it wasn’t personal, it was in LJ. ICQ lost popularity when it fought alternative clients, there were no alternative clients for Skype at all and everybody got used to service providers dictating how people should use their service, imposing their client software.

      It’s still terrifying how that ICQ run was the last such run in modern Internet’s history. A company fucked around and found out. Nothing like that has happened since. That’s my canary.

  • NOT_RICK@lemmy.world
    link
    fedilink
    English
    arrow-up
    121
    arrow-down
    2
    ·
    2 months ago

    Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used

    It’s the users’ data, not yours, you rent seeking fuck

  • Hot Potato@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    85
    ·
    2 months ago

    “It has been a real pain in the ass to block these companies.” makes me regret ever using Reddit in my life. Get your profit whatever, but this is just beyond greed.

    • tabular@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      2 months ago

      I deleted my account but that was before I learnt you could replace all your posts with random sentences.

      • OpenStars@discuss.online
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        1
        ·
        2 months ago

        Maybe check, b/c there’s a chance that they may have undeleted it all now by now, so there’s a possibility that you could still do it?

      • Bookmeat@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        2 months ago

        You don’t want random because that’s easy to detect. You want to fuck up the ML so you need to be more subtle like scrambling a few words or replacing certain nouns or logical connections in ways that are hard to differentiate from regular edits.

  • Nougat@fedia.io
    link
    fedilink
    arrow-up
    85
    arrow-down
    1
    ·
    2 months ago

    “Let’s see … how do we get more people to visit our site? I know! We’ll prevent search engines from sending people to it!”

    • Sordid@lemmy.world
      link
      fedilink
      English
      arrow-up
      26
      ·
      edit-2
      2 months ago

      It’s phase three of the enshittification cycle. In phase one, you attract users by providing a good service. Once they’re locked in, you squeeze them for all they’re worth by switching focus to business customers. Once they’re locked in, you squeeze them by threatening to deny them access to the users on whom they now depend.

      • Nougat@fedia.io
        link
        fedilink
        arrow-up
        14
        ·
        2 months ago

        I had also described it elsewhere as the “suck all the value out before it’s dead” phase. They’re clearly no longer interested about growing the site; they’re just getting as much money as possible from their traffic and engagement history as possible now, because they know traffic and engagement is already declining.

    • m-p{3}@lemmy.ca
      link
      fedilink
      English
      arrow-up
      22
      ·
      edit-2
      2 months ago

      Big profit now is better than our long term image

      ~ Reddit shareholders

        • Sordid@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          It does make a certain amount of sense. Big profit now means you get a chunk of cash to invest in other quick profit schemes, and your wealth just keeps snowballing. It works as long as you don’t care that you never build anything that lasts.

  • umbraroze@lemmy.world
    link
    fedilink
    English
    arrow-up
    66
    ·
    2 months ago

    Ok, now I’m miffed that Google caved to Reddit’s demands and paid up.

    Because this set a dangerous precedent.

    Earlier, Google got a lot of demands from various publications to pay up for indexing the publicly available news sites. And they always responded with “Ok, guess you leave us no other choice than just exclude you from indexing altogether.” Let the site simmer for a while until they went “oh shit, not being indexed by major search engines sucks. we didn’t really mean it please come back”

    It’s especially jarring because Reddit doesn’t even produce their own news content anyway. That search engine money isn’t going to the content creators. News sites at least could say they need to pay for their content to be written by their employees.

    • SirEDCaLot@lemmy.today
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 months ago

      At this point I think Google needs Reddit more than Reddit needs Google. Google search kind of sucks these days. How often do you add site:reddit.com to the end of the query to get any sort of useful result for a specific question? For me it’s pretty often. If Reddit cuts off Google, that goes away and Google search suffers significantly. And that might mean the one thing Google cannot abide- a situation where people in large numbers start actively seeking out other search engines.

      Don’t get me wrong, they’re both being super shitty.
      Google needs to quit obsessing over AI and a million different cloud products and fix the one product that people actually care about. Reddit needs to stop acting like they own everybody.

    • AngryCommieKender@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      2 months ago

      Steve “Spaz” Huffman has been trying to milk money out of the site that Alexis Ohanian, Aaron Swartz, and pigboy Steven Spaz kinda created collaborating with each other. Aaron was shoved out first by The Spaz, though one could claim rightfully so in that case since Aaron was basically done with the site, and had moved on to his next project, essentially leaving Alexis and Spaz in the lurch as neither of them understood the code that Aaron had written to make the site functional.

      In many ways, the users made this possible. Most of us aren’t users in this case. The users that make up the vast majority of the population don’t give one thought to their own personal privacy, after all they have “nothing to hide,” not knowing that they really need to hide almost all of their data.

      If the users were to be educated about how much money the various companies like Reddit, Facebook, Microsoft, Apple, and almost every single other “disruptive tech company,” has stolen from them, the socialist revolution would have started in the 1980s

    • kameecoding@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      I don’t see why you should be miffed at all, Google can bully publications and unindex them and it will work. Reddit according to this: https://www.semrush.com/blog/most-visited-websites/ is the third most visited website after google and youtube, so they have a bit more power, lots of people google with “site:reddit.com” because it still has some useful content like that and I am going go out on a limb and say that US visitors are the most important for selling ads for Google.

      Microsoft will have to make it’s own value calculation whether it’s worth it and they will likely payup, although more and more of reddit is just bots posting stupid shit.

    • Ragnarok314159@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 months ago

      I am guessing Google paid for access to their internal archives on posts and comments. Will give them a unique dataset for all the stuff that was deleted during the many exodus runs over the years.

  • Vipsu@lemmy.world
    link
    fedilink
    English
    arrow-up
    49
    ·
    2 months ago

    Well Reddit should just sue these companies and see if these companies are actually breaking any laws. Holding sizeable chunk of the internet hostage also sounds like something the EU and US might want to look in to as it very much sounds like anti-competitive conduct or market manipulation.

    Also if these companies want to have greater ownership over the content generated by their users they should also be much more liable for the content posted to their sites. I mean when something like the Section 230 was written they probably did not take this in to account. If these companies want to start selling user generated content then they should simply lose the immunity from liability.

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      ·
      2 months ago

      Reddit would lose badly that’s why they don’t sue. US’ 9th circuit ruled that scraping Linkedin is legal and Bing is not even scraping but indexing the data. Easiest case ever.

      It’s almost impossible to block web scraping especially someone with Microsoft or Perplexity resources.

      Its clearly an attempt to blackmail indexers into license deal as paying something to reddit could be actually cheaper than battling anti robots.

    • mint_tamas@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 months ago

      While I don’t disagree with the general idea, Section 230 would introduce an uncontrollable risk into running any website with user-generated content and would essentially shut them down.

      • Passerby6497@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 months ago

        If the site isn’t selling data, they wouldn’t lose 230 protection. So that would only be a risk for the companies selling their users’ data, not your regular forum or something.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 months ago

          That gets really murky though. For example:

          • news sites w/ comment sections - they’re profiting from ads and subscriptions, so how much of that has to do with the comments?
          • ecommerce - reviews on Amazon and eBay could be considered advertising for the product. Who’s liable, the ecommerce site, the merchant, or the poster?
          • product websites - how much are posted “reviews” considered advertising for the product? There may not be direct sales on the website, but surely someone’s review would impact sales elsewhere
          • for-profit services with a discussion forum - these would be on a separate site from the revenue-generating service, but still associated with the brand and thus likely contributing to advertisements for the product

          It’s a lot more obvious for social media sites like Facebook since user-generated content is the service, but there are a lot of for-profit entities where user-generated content is highly relevant, but not the core service. Would those sites be essentially forced to either moderate or eliminate user interaction?

          There’s a lot of complexity here.

    • commie@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      2 months ago

      they should also be much more liable for the content posted to their sites.

      why do people insist on making me defend reddit.

      • Chozo@fedia.io
        link
        fedilink
        arrow-up
        32
        arrow-down
        4
        ·
        2 months ago

        Giving preferential treatment to one service provider over another is 100% a net neutrality issue.

        • Natanael@slrpnk.net
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          2
          ·
          edit-2
          2 months ago

          Net neutrality is about internet access and connectivity, not about what websites can do

          (but yes they’re still hypocritical)

          • lightnegative@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            I was thinking about this the other day. Because Lemmy instances keep defederating from each other, I don’t really experience Lemmy. I experience a fragment of Lemmy as determined by the admins of the instance I’m connected to.

            Even if I run my own instance, I guess there’s nothing stopping instances from defederating from me (or just refusing to federate to begin with because my instance is too small to bother with).

            Is there even a way to experience all of Lemmy, including spam and things some people don’t agree with?

            • commie@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 months ago

              i think it’s functionally impossible to know every comment on every post in every community on every instance. but you could probably get about 80% of it. i have some 60+ fediverse identities across various services and i found that mastodon is incredible at getting lemmy content: if you subscribe to a community, every comment will be pushed to your home feed. in reverse chronological order. and mod deletions don’t get federated to mastodon.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      23
      arrow-down
      2
      ·
      edit-2
      2 months ago

      Microsoft paying Reddit to pay Microsoft to pay Reddit to pay…

      Stock prices absolutely skyrocketing with the news of this infinite revenue stream.