Transparent Generations

A cute hydra plushie

We have another new feature available for people to use from the AI Horde. This is the capacity to use Layer Diffuse to generate images with a transparent background directly (as opposed to stripping the image background with a post-processor).

As someone who’s dabbled into video game development in the past (which was in fact the reason I started the AI Horde) being able to generate sprites, icons and other assets can be quite useful, so once I saw this breakthrough, it immediately became something I wanted to support.

To use this feature, you simply need to flip on the transparent switch if your UI supports it, and the Horde will do the rest. If you’re an integrator, simply send “transparent: true” in your payload.

Take note that the images generated by this feature will not match the image you get with the same seed when transparency is not used! Don’t expect to take an image you like and remove the background this way. For that you need to use the post-processor approach.

Also keep in mind, not every prompt will work well for a transparent image generation. Experiment and find what works for you.

As part of making this update work, me and Tazlin also developed, discovered and fixed a number of other issues and bugs.

What would be most interesting for you is a slight change on how hires-fix works. I discovered that the implementation we were using was using the same amount of steps for the upscaled denoising which was completely unnecessary and wasting compute. So we now use a smart system which dynamically determines how many steps to use for the hires-fix based on the denoising strength you used for hires-fix and the steps for the main generation, and we also exposed a new key on the API where you can directly pass a hires-fix denoising strength.

The second fix is allowing hires-fix on SDXL models, so now you can try to generate larger SDXL images at the optimal resolution.

Finally there were a lot of other minor tweaks and fixes, primarily in the horde-engine. You can read further for more development details on this feature.

This update required a significant amount of work as it required that we onboard a new comfyUI node. Normally this isn’t difficult, but it turns out this node was automatically downloading its own LoRa models on startup, and those were not handled properly for either storage or memory. Due to the efficiency of the AI Horde worker, we do a lot of model preloading along with some fancy footwork in regards to RAM/VRAM usage.

So to make the new nodes work as expected, I had to reach in and modify the methods which were downloading models so that they use our internal mechanisms such as the model manager. Sadly the model manager wasn’t aware of strange models like layer diffuse, so it required me adding a new catch-all class of the model manager for all future utility models like these.

While waiting for Tazlin to be happy with the stability of the code, we discovered another major problem: The face-fixer post-processors we were using until now had started malfunctioning, and generating faces with a weird gray sheen. After some significant troubleshooting and investigation, we discovered that ComfyUI itself on the latest version had switched to a different internal library which didn’t play well with the custom nodes doing the face-fixing.

First I decided to update the code of the face-fixer nodes we were using, which is harder than it sounds, as it also downloads models automatically on startup, which again needs to be handled properly. Updating the custom nodes fixed the codeformer face-fixer, but gfpgan remained broken and the comfyUI devs mentioned that someone would have to fix it. Unfortunately those nodes didn’t seem to be actively maintained anymore so there was little hope to just wait for a quick fix.

Fortunately another custom node developer had run into the same problems, and created a bespoke solution for gfpgan licensed liberally, which I could copy. I love FOSS!

In the meantime, through our usual beta testing process, we discovered that there were still some funkiness in the new hires-fix approach, and Tazlin along with some power users of the community were able to tweak things so that they could work more optimally.

All in all, quite a bit of effort in the past month for this feature, but now we provide something which along with the embedded QR Code generation, I’ve seen very few other GenAI services provide, if at all.

Will you use the new transparent image generation? If so, let us know how! And remember if you have a decent GPU, you can help other generate images by adding your PC onto the horde!