Hello. I have asked this question on the subreddit, but was told to ask on here too so the dev can see it. I am not particularly tech savvy. I have recently come across Perchance which I have found useful to to create texts and images.

Even if the NSFW filter is disabled and NSFW material can be generated, do the text-to-text and text-to-image generators still prevent the production of illegal or harmful content?

I don’t want to try this for obvious reasons, but I am concerned that such content could inadvertently be generated. The obvious example is child and sexual abuse material, but I am also thinking of glorification of terrorism or genocide, promotion of self-harm, encouragement of violence toward others, etc.

Thank you!

  • perchance@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 months ago

    For image gen, I have some pretty complex regex pattern matching that tries to prevent content that would be illegal in most/all juristictions. Text gen is a lot harder because e.g. an essay that discusses an illegal topic is fine, but with any naive pattern-matching (or even “embedding” based approaches) it’ll probably be incorrectly flagged. One approach is to use a model that’s fine-tuned to refuse to generate on illegal topics, but that tends to be a recipe for annoying over-refusals, and the state-of-the-art Llama/Mixtral/etc. open source models tend to be fine-tuned in the opposite direction (i.e. remove any and all refusals) for that reason.

    I am concerned that such content could inadvertently be generated

    If this does happen, then that’s definitely considered a “bug”, but the degree to which I can do anything about it would mostly need to be determined on a case-by-case basis, and worst case we’d just have to wait for smarter ML models with better fine-tuning datasets.

    The obvious example is child and sexual abuse material, but I am also thinking of […]

    Outside of easy-to-flag illegal image stuff, the responsibility currently falls on the user to not prompt for things that they don’t want. As mentioned on the AI plugin pages, it should be treated like a web search. E.g. if you search “neo nazi forum” on Google, you’re 2 clicks away from the largest neo nazi forum on the internet. And, to be clear, this a complicated issue - even if I had a magic wand, I don’t think I’d outright prevent people from finding that forum. There’s a whole essay to be written here that I’m obviously not going to write, but the summary is “generally speaking, fight the roots, not the symptoms, and generally speaking do it with sunlight, not suppression”. It’s possible to make things worse if second-order effects aren’t considered, and this issue is complicated enough that I think it’s naive to be an absolutist in either direction. It’s always tempting, though, given how much easier it feels, and how much is often at stake for getting these things wrong - at least on a governmental/societal level.