Image Remix on the AI Horde

A robot witch mixing colorful liquids in its magic cauldron

The initial deployment of the Stable Cascade (SC) on the AI Horde supported just text2image workflows, but that was just a subset of what this model can do. We still needed to onboard the rest of its capabilities.

One such capability was the “image variations” option, which allows you to send an image to the model, and get a variation of that image, perhaps with extra stuff added in, using the unClip technology. This required quite a bit of work on hordelib so that it uses a completely different ComfyUI workflow but ultimately this was not so much harder than just adding the img2img capabilities to SC.

The larger difficulty came when I wanted to add the feature to remix multiple images together. The problem being that until now the AI Horde only supported sending a single source image and a single source mask, so a varying amount of images was not possible at all.

So to support this, I needed to touch all areas of the AI Horde. The AI Horde had to accept and upload each of them on my R2 bucket and provide individual download links. The SDK had to know to expect and provide methods to download those images in parallel to avoid delays, to the reGen worker had to be able to receive those images and send them to hordelib which should know how to dynamically adjust a comfyUI pipeline on-the-fly to add as many extra nodes as required.

So after 2 weeks of developing and testing, we finally have this feature available. If your Horde front-end supports the “remix” feature. You can send up to 1-6 images to this workflow along with a prompt, and it will try its best to “squash” them all together into one composition. Note that the more images you send, and the larger the prompt, the harder it will be for the model to “retain” all of them in the composition. But it will try its best.

As an example, here’s how the model remixes my own avatar. You’ll notice that the result can understand the general concepts of the image, but can’t follow it exactly as it’s not doing img2img. The blur is probably caused by the need to upscale my original image, which is something I’d like to fix on the next pass.

Likewise, this is the Haidra logo

And finally, here’s a remix of both logo and avatar together

Pretty neat, huh?

This ability to send extra source images also lays the groundwork for the Horde to support things like InstantID, which I hope I’ll be able to work on supporting soon enough.