Image Remix on the AI Horde

The initial deployment of the Stable Cascade (SC) on the AI Horde supported just text2image workflows, but that was just a subset of what this model can do. We still needed to onboard the rest of its capabilities.

One such capability was the “image variations” option, which allows you to send an image to the model, and get a variation of that image, perhaps with extra stuff added in, using the unClip technology. This required quite a bit of work on hordelib so that it uses a completely different ComfyUI workflow but ultimately this was not so much harder than just adding the img2img capabilities to SC.

The larger difficulty came when I wanted to add the feature to remix multiple images together. The problem being that until now the AI Horde only supported sending a single source image and a single source mask, so a varying amount of images was not possible at all.

So to support this, I needed to touch all areas of the AI Horde. The AI Horde had to accept and upload each of them on my R2 bucket and provide individual download links. The SDK had to know to expect and provide methods to download those images in parallel to avoid delays, to the reGen worker had to be able to receive those images and send them to hordelib which should know how to dynamically adjust a comfyUI pipeline on-the-fly to add as many extra nodes as required.

So after 2 weeks of developing and testing, we finally have this feature available. If your Horde front-end supports the “remix” feature. You can send up to 1-6 images to this workflow along with a prompt, and it will try its best to “squash” them all together into one composition. Note that the more images you send, and the larger the prompt, the harder it will be for the model to “retain” all of them in the composition. But it will try its best.

As an example, here’s how the model remixes my own avatar. You’ll notice that the result can understand the general concepts of the image, but can’t follow it exactly as it’s not doing img2img. The blur is probably caused by the need to upscale my original image, which is something I’d like to fix on the next pass.

Likewise, this is the Haidra logo

And finally, here’s a remix of both logo and avatar together

Pretty neat, huh?

This ability to send extra source images also lays the groundwork for the Horde to support things like InstantID, which I hope I’ll be able to work on supporting soon enough.

Stable Cascade on the AI Horde!

A while ago Stability.ai released a new model on a different architecture, that seems to provide very promising results and very fast training: Stable Cascade. I really wished to offer it on the AI Horde so after getting explicit permission from Emad in Reddit PMs (due to its more restrictive license for APIs), I set out to implement it.

Unfortunately the Stable Cascade model and ComfyUI workflow require the use of two different checkpoints, which went against the AI Horde worker paradigm at the time, which expected one file per model, so I had to make multiple changes in a lot of packages which expected this paradigm. The Worker, hordelib, the model reference and its SDK, all of them required tweaking to avoid crashing.

Fortunately, while the changes were complicated, I managed to implement them without much debugging. I did initially run into some troubles with the image quality being garbage, which turned out required ComfyAnon tweaking the implementation on ComfyUI a bit, but once that was done, everything fell in place and now you can use the AI Horde to request Stable Cascade images and therefore check the capability of this model, even if you don’t have 20G VRAM to spare.

You can try it out on Artbot

Alongside Stable Cascade, I thought it’s high time we start expanding our SDXL model selection, so the following models have also been onboarded.

  • Juggernaut XL
  • Anime Illust Diffusion XL
  • Pony Diffusion XL
  • Animagine XL
  • DreamShaper XL (Lightning version)

We quickly realized that we also need to expand our model reference to better inform people of the requirements for some of these models. For example Pony Diffusion XL doesn’t work unless you set clip_skip to 2, and DreamShaper requires low steps, cfg and specific samplers. If you know to set those settings correctly, you’ll get amazing images, else you get hot garbage. Soon the horde will be warning you when trying to use a model outside its specifications.

Other than that, we haven’t been completely idle. Some other notable achievements in the previous weeks are:

Firstly, the AI Horde now supports an educator role for accounts. If you are an education institution and you want to use one of the AI Horde free tools for the classroom, you can request your account to be set as an educator, which will force all your requests to be SFW and increase your account’s concurrency.

I also spent some time improving the AI Generation of the Mastodon bot @dungeons, so that it gets nicer images for each campaign protagonist. Will admit I had a lot more fun than I should improving the versatility and variability of the generations and tweaking then results for each model. You can see (or follow) the results in the dedicated account replying with those images.

On the worker side, Tazlin has also been very busy improving the efficiency of our generations. We have added now some improvements such as downloading the loras for the next job, while performing the inference for the previous one, or adding more efficiency for those people with more powerful machines.

I’m now hard at work trying to onboard more Stable Cascade capabilities as they are added to ComfyUI and to add support for more advanced workflow capabilities.