Stable Cascade on the AI Horde!

A while ago Stability.ai released a new model on a different architecture, that seems to provide very promising results and very fast training: Stable Cascade. I really wished to offer it on the AI Horde so after getting explicit permission from Emad in Reddit PMs (due to its more restrictive license for APIs), I set out to implement it.

Unfortunately the Stable Cascade model and ComfyUI workflow require the use of two different checkpoints, which went against the AI Horde worker paradigm at the time, which expected one file per model, so I had to make multiple changes in a lot of packages which expected this paradigm. The Worker, hordelib, the model reference and its SDK, all of them required tweaking to avoid crashing.

Fortunately, while the changes were complicated, I managed to implement them without much debugging. I did initially run into some troubles with the image quality being garbage, which turned out required ComfyAnon tweaking the implementation on ComfyUI a bit, but once that was done, everything fell in place and now you can use the AI Horde to request Stable Cascade images and therefore check the capability of this model, even if you don’t have 20G VRAM to spare.

You can try it out on Artbot

Alongside Stable Cascade, I thought it’s high time we start expanding our SDXL model selection, so the following models have also been onboarded.

  • Juggernaut XL
  • Anime Illust Diffusion XL
  • Pony Diffusion XL
  • Animagine XL
  • DreamShaper XL (Lightning version)

We quickly realized that we also need to expand our model reference to better inform people of the requirements for some of these models. For example Pony Diffusion XL doesn’t work unless you set clip_skip to 2, and DreamShaper requires low steps, cfg and specific samplers. If you know to set those settings correctly, you’ll get amazing images, else you get hot garbage. Soon the horde will be warning you when trying to use a model outside its specifications.

Other than that, we haven’t been completely idle. Some other notable achievements in the previous weeks are:

Firstly, the AI Horde now supports an educator role for accounts. If you are an education institution and you want to use one of the AI Horde free tools for the classroom, you can request your account to be set as an educator, which will force all your requests to be SFW and increase your account’s concurrency.

I also spent some time improving the AI Generation of the Mastodon bot @dungeons, so that it gets nicer images for each campaign protagonist. Will admit I had a lot more fun than I should improving the versatility and variability of the generations and tweaking then results for each model. You can see (or follow) the results in the dedicated account replying with those images.

On the worker side, Tazlin has also been very busy improving the efficiency of our generations. We have added now some improvements such as downloading the loras for the next job, while performing the inference for the previous one, or adding more efficiency for those people with more powerful machines.

I’m now hard at work trying to onboard more Stable Cascade capabilities as they are added to ComfyUI and to add support for more advanced workflow capabilities.

The AI Horde now seamlessly provides all CivitAI Textual Inversions

Almost immediately after the AI Horde received LoRa support, people started clamoring for Textual Inversions which is one of the earliest techniques to fine-tune Stable Diffusion outputs.

While I was planning to re-use much of the code that had to do with the automatic downloading of LoRas, this quickly run into unexpected problems in the form of pickles.

Pickles!@\

In the language of python, pickles is effectively objects in memory stored into disk as they are. The problem with them is, that they’re terribly insecure. Anything stored into the pickle will be executed as soon as you load it back into RAM, which is a problem when you get a file from a stranger. There is a solution for that, safetensors, which ensure that only the data is loaded and nothing harmful.

However, while most LoRas are a recent development and were released as safetensors from the start, textual inversions (TIs) were developed much earlier, and most of them are still out there as pickles.

This caused a big problem for us, as we wanted to blanket support all TIs from CivitAI, but that opened the gate for someone to upload a malicious TI to CivitAI and then request it themselves through the horde and pwn all our workers in one stroke! Even though technically CivitAI scans uploaded pickles, automated scans are never perfect. All it would take is someone to discover one way to sneak by an exploit through these scans. The risk was way to high for our tastes.

But if I were to allow only safetensors, only a small minority of TIs would be available, making the feature worthless to develop. The most popular TIs were all still as pickles.

So that meant I had to find a way to automatically convert pickles into safetensors before the worker could use them. I couldn’t do it on the worker side, as the pickle has to be loaded first. We had to do it in a secure location of some sort. So I built a whole new microservice: the AI Hordeling.

All the Hordeling does is provide a REST API where a worker can send a CivitAI ID to download, and the Hordeling will check if it’s a safetensor or not, and if not, download it, convert it to safetensor and then give a download link to the safetensor to the worker.

This means that if someone were to get through the CivitAI scans, all they would be able to exploit is the Hordeling itself which is not connected to the AI Horde in any way and can be rebuilt from scratch very easily. Likewise, the worker ensure that they will only download safetensor files which ensure they can’t be exploited.

All this to say, it’s been, a lot more work than expected to set up Textual Inversions on the horde! But I did it!

So I’m excited to announce that All textual inversions on CivitAI are now available through the AI Horde!

The way to use them is the very similar to LoRa. You have to specify the ID or a unique part of its name so that the worker can find them, in the “tis” field. The tricky part is that TIs require that their filename in the prompt, and the location of the TI matters. This complicates matters because the filename is not easy to figure out by the user, especially because some model names have non-Latin characters which one can’t know how they will be changes when saving on disk.

So the way we handle it instead is that one needs to put the CivitAI model ID in the prompt, in the form of “embedding:12345”. If the strength needs to be modified, then it should be put as “(embedding:12345:0.5)”. On the side of the worker, they will always save the TIs using their model ID, which should allow ComfyUI to know what to use.

I also understand this can be quite a bother for both users and UX developers, so another option exists where you allow the AI Horde to inject the relevant strings into the prompt for you. You can specify in the “tis” key, that you want the prompt to be injected, where, and with how much strength.

This will in turn be injected in the start of the prompt, or at the end of the negative prompt with the corresponding strength (default to 1.0). Of course you might not always want that, in case you want to the TI to be places in a specific part of the prompt, but you at least have the option to do it quickly this way, if it’s not important. I expect UX designers will want to let the users handle it both ways as needed.

You are also limited to a maximum of 20 TIs per request, and there’s an extra kudos cost if you request any number of TIs.

So there you have it. Now the AI Horde provides access to thousands of Textual Inversions along with the thousands of LoRa we were providing until now.

Maximum customization power without even a need for a GPU, and we’re not even finished!

Hordelib repo is temporarily down due to DMCA

As I mentioned last time, we received a DMCA take-down for a couple of missing attributions on AGPL3 files. We have since added those back, but as is the case with DMCAs, once you start them, they don’t stop on their own.

We had already send a counter-notice, but unfortunately not fast enough. The Hordelib repo has now been hidden from the world. You can read the whole bogus DMCA requesting it here.

I remind that hlky had made no attempt to request those attributions informally, just went straight to DMCA, which tracks since this person is always acting in bad faith. It just in this case it’s a convenient way to attack the AI Horde project once more.

This is a minor inconvenience, as the hordelib is still available in pypi, but annoying nonetheless.

Hopefully we’ll be back soon enough, and until then we’ll just re-host elsewhere for a while.

The AI Horde Worker Moves to a Completely New Inference Backend

Close to a month an a half ago, our last remaining maintainer for the nataili library dropped out and we were left functional but “rudderless” as far as inference goes. We could continue operations, but we couldn’t onboard new features anymore as neither me nor any of the remaining regulars have ML knowledge.

In desperation, I asked one of our regulars, Jug, who had been helping out with some python work on the worker if he thinks it would be possible to switch to the ComfyUI software as a backend, as it had some good ideas and was modular enough to be of use to us

To my surprise Jug not only thought it was a good idea, but jumped with both legs in the deep and started hacking around to make it work. Not only that, but we managed to suck-in another regular developer in, Tazlin, who also started helping us with design best practices. As a result, the new library we started developing was built from the ground up to have extensive coverage support which will make us discover regression bugs that much easier.

First steps were to develop feature parity, and that required not only to wrangle the comfyUI pipelines to be called from nataili, but also to port features which we were using in the AI Horde Worker, such as clip, over to the comfyUI.

This early phase was were I could still provide some help, as I’m pretty good at porting features and writing tests for them, and then integrating stuff into the AI Horde Worker, but still the lion’s share of the work on hordelib was being done by Jug, with Tazlin making the code much more reliable and maintainable.

A couple of weeks in, we had almost all the features we needed, but this is where the tricky business started. First we noticed is that comfyUI was not handling multi-threading well, which make sense as it’s meant to be used by a single user on a single PC. That added massive amounts of instability, because our AI Horde Worker is using threads for everything, to nullify latency delays.

So the next phase for about two more weeks was stabilizing the thing, which required a much deeper dig into the comfyUI internals to wrangle individual processes into a multi-threaded paradigm.

Finally that was done, about 1 month after I inquired about moving to comfy. Then we discovered the next problem: Due to all the mutex locks to prevent multi-threaded instability, the whole things was now much slower than nataili was. Like significantly so!

So another two weeks were spend of figuring out where to slowdowns occurred in our implementation and tweaking things to work more optimally, and even trying to figure out if there was indeed a slowdown in the first place as comparisons with the nataili was difficult to achieve.

We even built a whole benchmark suite to see overall speeds in inference, without getting confused with HTML and model loading latency.

But beta testers were still informing us of a seemingly lower kudos reward, so then we suspected the old way of calculating kudos was not applying well to the hordelib inference, due to it working differently. For example it has no slowdown for weights, but control-net types gave out different speeds than we expected, even different speeds per control type.

To track this down, Jug trained a new Neural Network for figuring how much time a generation is expected to take, rather than try to time each individual feature. The new model was so successful at 96% accuracy, that we decided to onboard it onto the AI Horde itself, as a way to calculate kudos more accurately.

This investigation did point us to some things that worked unexpectedly within comfyUI, for example longer prompts than 77 tokens tended to be quite slower, which was a quality thing after speaking with the comfyUI devs. We did discover a workaround for the AI Horde but it’s these sort of things that are introducing unexpected slowdowns compared to before. We’re going to continue looking for and tweaking things as we discover them.

The good news is that the overall quality of images using the comfyUI branch has increased across the board. Not only that but weights not only don’t add extra slowdown (so the extra kudos cost is removed), but they can also exceed 1.3 without causing the image to distort, which is how most other UIs are using them anyway.

The big change is that images with the same payload and the same seed, will look different in comfyUI compared to nataili. This is simply due to the way inference works and something we’ll have to live with.

1.0.0

So now we have the three pillars built: Parity, Stability and Speed; it’s time to go live!

The hordelib has been bumped to 1.0.0 and the AI Horde Worker to 21.0.0. When you run update-runtime next time, you’ll automatically be switched to the new inference backend but you may need to update your bridgeData.yaml file ahead of time.

Very shortly

  1. Set the vram_to_leave_free and ram_to_leave_free to values that work for you.
  2. rename nataili_cache_home to cache_home
  3. You can delete any unused keys (like disable_voodoo)

Also as a user of the AI Horde, keep in mind that the new Workers do not yet support tiling and pix2pix

But not only if the new inference available for the AI Horde, but also for everyone else. Due to the generic way we’ve built it, any python project which needs access to image generation can now import hordelib from pypi, and get access to all the multi-threaded text2img and img2img functionality we provide!

What’s next

With the move to hordelib, we are now effectively outsourcing our inference development upstream, which allow us to get to use new developments in stable diffusion as they get on-boarded into their software. Hopefully development of ComfyUI will continue for the foreseeable future as I am really not looking forward to changing libraries again any time soon >_<

This also means that we now finally have the capability to onboard LoRas and textual inversion as well which have been requested for a long time, but we never had the capability in our backend. Likewise with new Stable Diffusion models and all the exciting new developments happening practically weekly.

It’s been a lot of hard work, but we’re coming out of it stronger than ever, thanks to the invaluable help of Jug, Tazlin and the rest of the AI Horde community!