I’ve used ComfyUI for a few days now and overall it seems like the quality of the generated images are a bit lower than what I can do in AUTOMATIC1111. Sometimes the difference is subtle (maybe just selection bias) and sometimes the difference is pretty clear. I’ve done my best to get the settings and inputs to be equal in both tools. Side question: is it normal for the same seed to generate different outputs in different tools?
I’m using medvram on A1111 and I thought I read that’s the default for Comfy as well. Not using xformers in either case. No HiResFix. No face restoration.
Only definite difference I’ve seen is that Comfy uses “pytorch cross attention” and A1111 uses “Automatic” cross attention optimization (didn’t see a pytorch option).
Example of what I’m seeing (NSFW - ladies in bikinis): https://imgur.com/a/LzAzKfV
Update: IMGUR removed my generated images so I’ll just attach them here. First one is Comfy and second one is A1111.
I found this post on the Comfy GitHub about weights and such: https://github.com/comfyanonymous/ComfyUI/discussions/521
So it seems like I need to use AdvancedClipEncode to keep the same behavior as A1111. Will try later today.
After using this module and “CLIP Set Last Layer” (i.e. Skip CLIP) I was able to generate an image nearly identical to A1111. I think “proper” interpretation of the prompts has helped immensely (since most ppl share their A1111 prompts). Sad to say though there are a few prompts I use that just look better with the GPU seed and AFAIK I can’t turn that on for Comfy.
I’m thinking one day I’ll do everything in Comfy, writing nodes looks really fun, but for now I’ll probably straddle the two.
Regarding different outputs for the same seed: have You changed seed source to CPU in A1111? The noise You get that way is consistent across different hardware vendors and different from the GPU-sourced one.
I have used A1111 just now to check and got very simmilar results to ComfyUI that way.