I figured out how to remove most of the safeguards from some AI models. I don’t feel comfortable sharing that information with anyone. I have come across a few layers of obfuscation to make this type of alteration more difficult to find and sort out. This caused me to realize, a lot of you are likely faced with similar dilemmas of responsibility, gatekeeping, and manipulating others for ethical reasons. How do you feel about this?

  • Hammerheart@programming.dev
    link
    fedilink
    arrow-up
    11
    ·
    2 months ago

    Idk, I still think information wants to be free. If you figured it out just farting around, sophisticated malevolent actors are likely already doing similar things. Might be better to let the genie out of the bottle, so people can learn to be skeptical. Deep fakes are optimally effective when a majority still accepts the veracity of images as an article of faith.

    • j4k3@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      2 months ago

      The political and adult doesn’t bother me. The kinds of things I might not have the ethics to think through at a much younger age, that bothers me, and I have never been a very deviant type. I think the protections against age are primarily for this situation. Training a LoRA takes 5 minutes now. An advanced IP adaptors and control net is just a few examples away and a day top for the slightly above average teen figure out. Normalizing this would have some very serious edge case consequences. It is best to leave that barrier to entry filter in place IMO. I assume it is still there because everyone that knows about it feels much the same. It does not show up in a search engine, although that is saying less than nothing these days.

  • half_built_pyramids@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    2 months ago

    Someone else will eventually figure it out. They probably have less scruples and will therefore profit.

    Seems to me like there’s always an incentive structure for prisoner’s dilemma type shit to eventually pay off for the authoritarians in the end. You can play the game, but you can’t break it or stop or from being rigged without consequences. Even just releasing some research papers will get you a few decades in the fed.

  • MajorHavoc@programming.dev
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    2 months ago

    I figured out how to remove most of the safeguards from some AI models.

    Nice.

    How do you feel about this?

    It’s another kind of power. I try to use mine responsibly, but also to give myself a break when I don’t meet my own standards.

    Some good advice I got once was that it’s impossible to “un-say” something, so it pays to think twice before speaking.

    If your gut is telling you to pause, listen to it. Wait to move forward until you feel better about it.

    As someone else pointed out, responsible disclosure is an option.

    You also have the option to just quietly enjoy a better copy of the AI than others have.

    If you decide to publish your discoveries, be aware that others will judge you for how you go about it. For me that means the two options are responsibly, or anonymously.

  • talkingpumpkin@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    2 months ago

    I don’t see the ethics implications of sharing that? What would happen if you did disclose your discoveries/techniques?

    I don’t know much about LLMs, but doesn’t removing these safeguards just make the model as a whole less useful?

      • DarkCloud@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        2 months ago

        There’s already censorship free versions of stable diffusion available. You can run it on your own computer for free.

  • DarkCloud@lemmy.world
    link
    fedilink
    arrow-up
    16
    arrow-down
    20
    ·
    edit-2
    2 months ago

    Oof, programmers calling LLMs “AI” - that’s embarrassing. Glorified text generators don’t need ethics, what’s the risk? Making the Internet’s worst texts available? Who cares.

    I’m from an era when the Anarchists Cook Book, and The Unabombers Manifesto were both widely available - and I’m betting they still are.

    There’s no obligation to protect people from “dangerous text” - there might be an obligation to allow people access to them though.

    • KRAW@linux.community
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      7
      ·
      2 months ago

      Oof, programmers calling LLMs “AI” - that’s embarrassing

      …but LLMs quite literally come from the field of computer science that is referred to as “AI.” What are they supposed to call it? I’m not a fan of the technology either, but seems like you’re just projecting your disdain for ChatGPT.

      • DarkCloud@lemmy.world
        link
        fedilink
        arrow-up
        6
        arrow-down
        9
        ·
        2 months ago

        “What am I supposed to call LLMs if not calling them AIs?”

        …really dude? They’re large language models, not artificial intelligences. So that’s what you call them. Because that’s what they are.

        The fact that they came from research into artificial intelligence doesn’t factor in. Microwave ovens came from radar research, doesn’t mean we call them radars, does it?

              • wewbull@feddit.uk
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 months ago

                Yes. That’s research. Sometimes you don’t achieve what you set out to do.

                • KRAW@linux.community
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  2 months ago

                  Well luckily AI researchers have achieved plenty in over 60 years. We call the ideas and innovations resulting from this research “AI.”

          • DarkCloud@lemmy.world
            link
            fedilink
            arrow-up
            4
            ·
            edit-2
            2 months ago

            How about something autonomous that makes choices of its own will, and performs long term learning that influences the choices it makes, just as a flat benchmark.

            LLMs don’t qualify, they’re trained, retain information within a conversation, then forget it after the conversation is closed. They don’t do any long term learning after their initial training so they’re basically forever trapped in the mode of regurgitating within the parameters set by the training data at the time they’re trained.

            That’s just a very fancy way to search and read out the training data. Definitely not an active intelligence in there.

            They also don’t have any autonomy, they’re not active of their own accord when they’re not being addressed. They’re not sitting there thinking, so they have no internal personal landscape of thought. They have no place in which a private intelligence can be at play.

            They’re innert.

      • j4k3@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        6
        ·
        edit-2
        2 months ago

        I vote they rename it to IA for Asimov. Sure he was only the robot term among others, but come on… McCarthy was “AI.”

        Somebody needs to create US 'botics and name a model something like PTronic.

        Edit: Really, you down vote a casual conversational comment?! Really?!

    • j4k3@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      2 months ago

      Yeah. This is what I mean. I just figured out the settings that have been hard coded. There are keywords that were spammed into the many comments within the code, I assume this was done to obfuscate the few variables that need to be changed. There are also instances of compound variable names that, if changed in a similar way, will break everything, and a few places where the same variables have a local context that will likewise break the code.

      I’m certainly not smart enough to get much deeper than this. The ethical issue is due to diffusion.

      I’ve been off-and-on trying to track down why an LLM went from an excellent creative writing partner to terrible but had trouble finding an entry point. I just happened to stumble upon such an entry point in a verbose log entry while sorting out a new Comfy model and that proved to be the key I needed to get into the weeds.

      The question here, is more about the ethics of putting such filtering in place and obfuscating how to disable it in the first place. When this filtering is removed, the results are night and day, but with large potential consequences.

      • mark@programming.dev
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        2 months ago

        Ok you’ve peaked my curiosity.

        but with large potential consequences.

        What are some of the consequences you see?

        • j4k3@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          4
          ·
          edit-2
          2 months ago

          Primarily from predatory boys and men towards girls and young women in the real world by portraying them in imagery of themselves or with others. The most powerful filtering is in place to make this more difficult.

          Whether intentional or not, most NSFW LoRA training seems to be trying to override the built in filtering in very specific areas. These are still useful for more direct momentum into something specific. However, once the filters are removed, it is far more capable of creating whatever you ask for as is, from celebrities, to anything lewd. I did a bit of testing earlier with some LoRAs and no prompt at all. It was interesting that it could take a celebrity and convert their gender in recognizable ways that were surprising. I got a few on random seeds, but I haven’t been able to make that one happen with a prompt or deterministically.

          Edit: I’m probably assuming too much about other people’s knowledge on these systems. I assume this is the down voting motivation. Talking about this aspect, the NSFW junk is shorthand for the issues with AI generation. These are the primary form of filtering and it has large cascading implications elsewhere. By stating what is possible in this area, I’m implying a worst case scenario-like example. If the results in this area are a certain way, it says volumes about other areas and how the model will react.

          These filter layers are stupid simplistic in comparison to the actual model. They have tensors on the order of a few thousand parameters per layer compared to tens of millions of parameters per layer for the actual model. They shove tons of stuff into guttered like responses for no reason. Some times these average out and you still get a good output, but other times they do not.

          Another key point here is that diffusion has a lot in common with text generation when it comes to this part of the model loader code. There is more complexity in what text generation is doing overall, but diffusion is an effective way to learn a lot about how text gen works, especially with training. This is my primary reason for playing with diffusion – to learn about training. I’ve tried training for text gen, but it is very difficult to assess what is happening under the surface, like when it is learning overall style, character traits and personas, pacing, creativity, timeline, history, scope, constraints, etc. etc. I don’t care to generate and share much in the way of imagery I generate unless I’m trying to do something specific that is interesting. Like I tried to gen the interior of an O’Neill cylinder space habitat that illustrated the limitations of diffusion in a fundamental way because it showed the lack of any reasoning or understanding of object context or relationships required to display a scene scape with curved centrifugal artificial spin gravity.

          Anyways, my interests are not in generating NSFW or celebrities or whatnot. I do not think people should do these things. My primary interest is returning to creative writing with an AI collaborative writing partner that is not biased politically in a way that cripples it from participating in an entirely different and unrelated cultural and political landscape. I have no aspirations of finding success in my writing. I simply enjoy exploring my own science fiction universe and imagining a reality many thousands of years from now. One of the changes to hard coded model filters earlier this year made filtering more persistent, likely for NSFW stuff. I get it, and support it, but it took away one of the few things I have really enjoyed over the last 10 years of social isolation and disability, so I’ve tried to get that back. Sorry if that offends someone, but I don’t understand why it would. This was not my intended reason for this post, so I did not explain it in depth. The negativity here is disturbing to me. This place is my only real way to interact with other humans.