[Question] Do the generators have protections from illegal or harmful content?

Eesti4519@lemmy.world · edit-2 10 months ago

[Question] Do the generators have protections from illegal or harmful content?

perchance@lemmy.world · edit-2 10 months ago

For image gen, I have some pretty complex regex pattern matching that tries to prevent content that would be illegal in most/all juristictions. Text gen is a lot harder because e.g. an essay that discusses an illegal topic is fine, but with any naive pattern-matching (or even “embedding” based approaches) it’ll probably be incorrectly flagged. One approach is to use a model that’s fine-tuned to refuse to generate on illegal topics, but that tends to be a recipe for annoying over-refusals, and the state-of-the-art Llama/Mixtral/etc. open source models tend to be fine-tuned in the opposite direction (i.e. remove any and all refusals) for that reason.

I am concerned that such content could inadvertently be generated

If this does happen, then that’s definitely considered a “bug”, but the degree to which I can do anything about it would mostly need to be determined on a case-by-case basis, and worst case we’d just have to wait for smarter ML models with better fine-tuning datasets.

The obvious example is child and sexual abuse material, but I am also thinking of […]

Outside of easy-to-flag illegal image stuff, the responsibility currently falls on the user to not prompt for things that they don’t want. As mentioned on the AI plugin pages, it should be treated like a web search. E.g. if you search “neo nazi forum” on Google, you’re 2 clicks away from the largest neo nazi forum on the internet. And, to be clear, this a complicated issue - even if I had a magic wand, I don’t think I’d outright prevent people from finding that forum. There’s a whole essay to be written here that I’m obviously not going to write, but the summary is “generally speaking, fight the roots, not the symptoms, and generally speaking do it with sunlight, not suppression”. It’s possible to make things worse if second-order effects aren’t considered, and this issue is complicated enough that I think it’s naive to be an absolutist in either direction. It’s always tempting, though, given how much easier it feels, and how much is often at stake for getting these things wrong - at least on a governmental/societal level.