Hello. I have asked this question on the subreddit, but was told to ask on here too so the dev can see it. I am not particularly tech savvy. I have recently come across Perchance which I have found useful to to create texts and images.
Even if the NSFW filter is disabled and NSFW material can be generated, do the text-to-text and text-to-image generators still prevent the production of illegal or harmful content?
I don’t want to try this for obvious reasons, but I am concerned that such content could inadvertently be generated. The obvious example is child and sexual abuse material, but I am also thinking of glorification of terrorism or genocide, promotion of self-harm, encouragement of violence toward others, etc.
Thank you!
I think thare is but some words and phrases slip thru i think its best to avoid words that would trigger the ai into giving you harmful content by accident if you prompt responsibly you should be safe
For image gen, I have some pretty complex regex pattern matching that tries to prevent content that would be illegal in most/all juristictions. Text gen is a lot harder because e.g. an essay that discusses an illegal topic is fine, but with any naive pattern-matching (or even “embedding” based approaches) it’ll probably be incorrectly flagged. One approach is to use a model that’s fine-tuned to refuse to generate on illegal topics, but that tends to be a recipe for annoying over-refusals, and the state-of-the-art Llama/Mixtral/etc. open source models tend to be fine-tuned in the opposite direction (i.e. remove any and all refusals) for that reason.
I am concerned that such content could inadvertently be generated
If this does happen, then that’s definitely considered a “bug”, but the degree to which I can do anything about it would mostly need to be determined on a case-by-case basis, and worst case we’d just have to wait for smarter ML models with better fine-tuning datasets.
The obvious example is child and sexual abuse material, but I am also thinking of […]
Outside of easy-to-flag illegal image stuff, the responsibility currently falls on the user to not prompt for things that they don’t want. As mentioned on the AI plugin pages, it should be treated like a web search. E.g. if you search “neo nazi forum” on Google, you’re 2 clicks away from the largest neo nazi forum on the internet. And, to be clear, this a complicated issue - even if I had a magic wand, I don’t think I’d outright prevent people from finding that forum. There’s a whole essay to be written here that I’m obviously not going to write, but the summary is “generally speaking, fight the roots, not the symptoms, and generally speaking do it with sunlight, not suppression”. It’s possible to make things worse if second-order effects aren’t considered, and this issue is complicated enough that I think it’s naive to be an absolutist in either direction. It’s always tempting, though, given how much easier it feels, and how much is often at stake for getting these things wrong - at least on a governmental/societal level.
@[email protected] - pinging dev
https://perchance.org/celestia-says
for moral purity
I hope so, I haven’t used the AI tools much on perchance so I don’t know.
I’d say it does a good job at preventing you to explicitly generate such content, but not to generate by accident. The filters can only work when you ask for it, but not if the AI adds that part on its own.
Further you can’t really 100% prevent somebody to utilise it to generate such content. You can’t create it with just the AI, but it could be used for a step. That’s not really an AI specific problem. With captions you can put otherwise innocent images into a completely different context. Images can be manually edited.
I’ve played with editing for complex anime scenes. Generating a few characters, remove backgrounds quickly with software and then place them on a separately generated background. It’s hard to describe that for the AI and not confuse it, but you could also use that to alter the context massively. It gets ‘worse’ the better photo editing AIs get, because they’d have a hard time to filter as they’d need to first understand the content of an image and it’s implications.
Rewriting texts is even easier.