Stable Cascade on the AI Horde!

A while ago Stability.ai released a new model on a different architecture, that seems to provide very promising results and very fast training: Stable Cascade. I really wished to offer it on the AI Horde so after getting explicit permission from Emad in Reddit PMs (due to its more restrictive license for APIs), I set out to implement it.

Unfortunately the Stable Cascade model and ComfyUI workflow require the use of two different checkpoints, which went against the AI Horde worker paradigm at the time, which expected one file per model, so I had to make multiple changes in a lot of packages which expected this paradigm. The Worker, hordelib, the model reference and its SDK, all of them required tweaking to avoid crashing.

Fortunately, while the changes were complicated, I managed to implement them without much debugging. I did initially run into some troubles with the image quality being garbage, which turned out required ComfyAnon tweaking the implementation on ComfyUI a bit, but once that was done, everything fell in place and now you can use the AI Horde to request Stable Cascade images and therefore check the capability of this model, even if you don’t have 20G VRAM to spare.

You can try it out on Artbot

Alongside Stable Cascade, I thought it’s high time we start expanding our SDXL model selection, so the following models have also been onboarded.

  • Juggernaut XL
  • Anime Illust Diffusion XL
  • Pony Diffusion XL
  • Animagine XL
  • DreamShaper XL (Lightning version)

We quickly realized that we also need to expand our model reference to better inform people of the requirements for some of these models. For example Pony Diffusion XL doesn’t work unless you set clip_skip to 2, and DreamShaper requires low steps, cfg and specific samplers. If you know to set those settings correctly, you’ll get amazing images, else you get hot garbage. Soon the horde will be warning you when trying to use a model outside its specifications.

Other than that, we haven’t been completely idle. Some other notable achievements in the previous weeks are:

Firstly, the AI Horde now supports an educator role for accounts. If you are an education institution and you want to use one of the AI Horde free tools for the classroom, you can request your account to be set as an educator, which will force all your requests to be SFW and increase your account’s concurrency.

I also spent some time improving the AI Generation of the Mastodon bot @dungeons, so that it gets nicer images for each campaign protagonist. Will admit I had a lot more fun than I should improving the versatility and variability of the generations and tweaking then results for each model. You can see (or follow) the results in the dedicated account replying with those images.

On the worker side, Tazlin has also been very busy improving the efficiency of our generations. We have added now some improvements such as downloading the loras for the next job, while performing the inference for the previous one, or adding more efficiency for those people with more powerful machines.

I’m now hard at work trying to onboard more Stable Cascade capabilities as they are added to ComfyUI and to add support for more advanced workflow capabilities.

The AI Horde now seamlessly provides all CivitAI Textual Inversions

Almost immediately after the AI Horde received LoRa support, people started clamoring for Textual Inversions which is one of the earliest techniques to fine-tune Stable Diffusion outputs.

While I was planning to re-use much of the code that had to do with the automatic downloading of LoRas, this quickly run into unexpected problems in the form of pickles.

Pickles!@\

In the language of python, pickles is effectively objects in memory stored into disk as they are. The problem with them is, that they’re terribly insecure. Anything stored into the pickle will be executed as soon as you load it back into RAM, which is a problem when you get a file from a stranger. There is a solution for that, safetensors, which ensure that only the data is loaded and nothing harmful.

However, while most LoRas are a recent development and were released as safetensors from the start, textual inversions (TIs) were developed much earlier, and most of them are still out there as pickles.

This caused a big problem for us, as we wanted to blanket support all TIs from CivitAI, but that opened the gate for someone to upload a malicious TI to CivitAI and then request it themselves through the horde and pwn all our workers in one stroke! Even though technically CivitAI scans uploaded pickles, automated scans are never perfect. All it would take is someone to discover one way to sneak by an exploit through these scans. The risk was way to high for our tastes.

But if I were to allow only safetensors, only a small minority of TIs would be available, making the feature worthless to develop. The most popular TIs were all still as pickles.

So that meant I had to find a way to automatically convert pickles into safetensors before the worker could use them. I couldn’t do it on the worker side, as the pickle has to be loaded first. We had to do it in a secure location of some sort. So I built a whole new microservice: the AI Hordeling.

All the Hordeling does is provide a REST API where a worker can send a CivitAI ID to download, and the Hordeling will check if it’s a safetensor or not, and if not, download it, convert it to safetensor and then give a download link to the safetensor to the worker.

This means that if someone were to get through the CivitAI scans, all they would be able to exploit is the Hordeling itself which is not connected to the AI Horde in any way and can be rebuilt from scratch very easily. Likewise, the worker ensure that they will only download safetensor files which ensure they can’t be exploited.

All this to say, it’s been, a lot more work than expected to set up Textual Inversions on the horde! But I did it!

So I’m excited to announce that All textual inversions on CivitAI are now available through the AI Horde!

The way to use them is the very similar to LoRa. You have to specify the ID or a unique part of its name so that the worker can find them, in the “tis” field. The tricky part is that TIs require that their filename in the prompt, and the location of the TI matters. This complicates matters because the filename is not easy to figure out by the user, especially because some model names have non-Latin characters which one can’t know how they will be changes when saving on disk.

So the way we handle it instead is that one needs to put the CivitAI model ID in the prompt, in the form of “embedding:12345”. If the strength needs to be modified, then it should be put as “(embedding:12345:0.5)”. On the side of the worker, they will always save the TIs using their model ID, which should allow ComfyUI to know what to use.

I also understand this can be quite a bother for both users and UX developers, so another option exists where you allow the AI Horde to inject the relevant strings into the prompt for you. You can specify in the “tis” key, that you want the prompt to be injected, where, and with how much strength.

This will in turn be injected in the start of the prompt, or at the end of the negative prompt with the corresponding strength (default to 1.0). Of course you might not always want that, in case you want to the TI to be places in a specific part of the prompt, but you at least have the option to do it quickly this way, if it’s not important. I expect UX designers will want to let the users handle it both ways as needed.

You are also limited to a maximum of 20 TIs per request, and there’s an extra kudos cost if you request any number of TIs.

So there you have it. Now the AI Horde provides access to thousands of Textual Inversions along with the thousands of LoRa we were providing until now.

Maximum customization power without even a need for a GPU, and we’re not even finished!

AI Horde’s AGPL3 hordelib receives DMCA take-down from hlky

I have tried to avoid writing about hlky drama for the sake of the AI Horde ecosystem. I don’t want to delve into negative situations and I was hoping by ignoring this person our community can focus on constructive matters in improving the Open Source Generative AI tools.

However recent developments have forced my hand, and I feel I need to write and inform the larger community about this. I will attempt to stick to the facts.

The AI Horde Worker includes a customized library: hordelib.This library is completely based on ComfyUI.

Yesterday we were forwarded 2 DMCA take-down requests from GitHub originating from hlky requesting to take down hordelib because of claims against a couple of files I ported from the previous library I was co-authoring with hlky, nataili.

Nataili was developed as AGPL3 from the start. This is the main reason I chose it as the backend to the AI Horde Worker instead of using a bigger player like Automatic1111 WebUI (which, back then, did not have a license.)

Unfortunately a big reason we abandoned nataili, is because hlky attempted to sabotage the AI Horde ecosystem and demanded that we stop using the nataili free software source library, going against everything the Open Source movement stands for. There is more drama pertaining to this behind the scenes, but as I said, I want to stick to the public facts in this post.

Nevertheless, eventually we couldn’t maintain nataili so we decided to create hordelib instead which would also insulate us from hlky. However some critical components we needed for supporting our image alchemy and anti-CSAM capabilities were not available natively in comfyUI, so I ported over the necessary files from nataili for those purposes. Remember, these are files licensed under the AGPL3, so this is completely and irrevocably allowed.

In the process I stripped out the explicit license mention in those files, because our whole repository is licensed under AGPL3, and it goes against our style to add unnecessary licenses to each file. As far as I understood, this was allowed by the license terms.

The DMCA take-down claims that removing those copyright and license strings from those files is a sufficient reason to request the whole repository to be taken down!

I have since attempted to get some clarify on this issue on my own. The only relevant part from the license I can see is this

Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms:

[…]
b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or

AGPL3 License

And I mean, fair enough, this seems clear enough, but I need to point out that the original licenses put in those files by hlky did not require preservation of author attributions!

Nevertheless in the interest of expediency and in the spirit of open source I have since re-added the attributions to those files.

Unfortunately, once you send official DMCA notices, things start becoming serious and you never know which way the dice roll will go on this. I feel we have a pretty clear-cut case that we did nothing wrong here and certainly nothing that would require a whole FOSS library to be taken down!

I have sent a counterclaim to GitHub in an attempt to ensure they don’t take any take-down steps.

However, given the numerous bad faith acts by hlky to this day, the most prudent option would be to excise these files completely. I would rather not have any mention or contribution of this person in our library, as they go against everything the Free Software movement stands for!

If you have the skills to contribute an alternative code for a clip and blip interrogation modules, please contact me ASAP!

Likewise if you have any advice you can give on this issue I’d appreciate it.

The AI Horde Worker Moves to a Completely New Inference Backend

Close to a month an a half ago, our last remaining maintainer for the nataili library dropped out and we were left functional but “rudderless” as far as inference goes. We could continue operations, but we couldn’t onboard new features anymore as neither me nor any of the remaining regulars have ML knowledge.

In desperation, I asked one of our regulars, Jug, who had been helping out with some python work on the worker if he thinks it would be possible to switch to the ComfyUI software as a backend, as it had some good ideas and was modular enough to be of use to us

To my surprise Jug not only thought it was a good idea, but jumped with both legs in the deep and started hacking around to make it work. Not only that, but we managed to suck-in another regular developer in, Tazlin, who also started helping us with design best practices. As a result, the new library we started developing was built from the ground up to have extensive coverage support which will make us discover regression bugs that much easier.

First steps were to develop feature parity, and that required not only to wrangle the comfyUI pipelines to be called from nataili, but also to port features which we were using in the AI Horde Worker, such as clip, over to the comfyUI.

This early phase was were I could still provide some help, as I’m pretty good at porting features and writing tests for them, and then integrating stuff into the AI Horde Worker, but still the lion’s share of the work on hordelib was being done by Jug, with Tazlin making the code much more reliable and maintainable.

A couple of weeks in, we had almost all the features we needed, but this is where the tricky business started. First we noticed is that comfyUI was not handling multi-threading well, which make sense as it’s meant to be used by a single user on a single PC. That added massive amounts of instability, because our AI Horde Worker is using threads for everything, to nullify latency delays.

So the next phase for about two more weeks was stabilizing the thing, which required a much deeper dig into the comfyUI internals to wrangle individual processes into a multi-threaded paradigm.

Finally that was done, about 1 month after I inquired about moving to comfy. Then we discovered the next problem: Due to all the mutex locks to prevent multi-threaded instability, the whole things was now much slower than nataili was. Like significantly so!

So another two weeks were spend of figuring out where to slowdowns occurred in our implementation and tweaking things to work more optimally, and even trying to figure out if there was indeed a slowdown in the first place as comparisons with the nataili was difficult to achieve.

We even built a whole benchmark suite to see overall speeds in inference, without getting confused with HTML and model loading latency.

But beta testers were still informing us of a seemingly lower kudos reward, so then we suspected the old way of calculating kudos was not applying well to the hordelib inference, due to it working differently. For example it has no slowdown for weights, but control-net types gave out different speeds than we expected, even different speeds per control type.

To track this down, Jug trained a new Neural Network for figuring how much time a generation is expected to take, rather than try to time each individual feature. The new model was so successful at 96% accuracy, that we decided to onboard it onto the AI Horde itself, as a way to calculate kudos more accurately.

This investigation did point us to some things that worked unexpectedly within comfyUI, for example longer prompts than 77 tokens tended to be quite slower, which was a quality thing after speaking with the comfyUI devs. We did discover a workaround for the AI Horde but it’s these sort of things that are introducing unexpected slowdowns compared to before. We’re going to continue looking for and tweaking things as we discover them.

The good news is that the overall quality of images using the comfyUI branch has increased across the board. Not only that but weights not only don’t add extra slowdown (so the extra kudos cost is removed), but they can also exceed 1.3 without causing the image to distort, which is how most other UIs are using them anyway.

The big change is that images with the same payload and the same seed, will look different in comfyUI compared to nataili. This is simply due to the way inference works and something we’ll have to live with.

1.0.0

So now we have the three pillars built: Parity, Stability and Speed; it’s time to go live!

The hordelib has been bumped to 1.0.0 and the AI Horde Worker to 21.0.0. When you run update-runtime next time, you’ll automatically be switched to the new inference backend but you may need to update your bridgeData.yaml file ahead of time.

Very shortly

  1. Set the vram_to_leave_free and ram_to_leave_free to values that work for you.
  2. rename nataili_cache_home to cache_home
  3. You can delete any unused keys (like disable_voodoo)

Also as a user of the AI Horde, keep in mind that the new Workers do not yet support tiling and pix2pix

But not only if the new inference available for the AI Horde, but also for everyone else. Due to the generic way we’ve built it, any python project which needs access to image generation can now import hordelib from pypi, and get access to all the multi-threaded text2img and img2img functionality we provide!

What’s next

With the move to hordelib, we are now effectively outsourcing our inference development upstream, which allow us to get to use new developments in stable diffusion as they get on-boarded into their software. Hopefully development of ComfyUI will continue for the foreseeable future as I am really not looking forward to changing libraries again any time soon >_<

This also means that we now finally have the capability to onboard LoRas and textual inversion as well which have been requested for a long time, but we never had the capability in our backend. Likewise with new Stable Diffusion models and all the exciting new developments happening practically weekly.

It’s been a lot of hard work, but we’re coming out of it stronger than ever, thanks to the invaluable help of Jug, Tazlin and the rest of the AI Horde community!

State of the AI Horde – 26/03/2023

Things are progressing very rapidly in this dawn of the AI and likewise for the AI Horde. I thought it would be a good idea to post about all the things that changed and improved in recent days for our service.

More Requests. More statistics.

I’ve deployed endpoints to measure the usage of the AI horde. Now that one month has passed, we can take a look.

  • Per day, we are averaging 356,378 images (3.7 terapixelsteps) and 45,248 texts (4 megatokens)
  • In the past month, we produced 11,475,183 images, generating a staggering 127.6 terapixelsteps. Text has also picked up significant speed since merging the hordes with 1,241,895 generated texts for a total of 112.8 megatokens!

Top 10 Stable Diffusion models

The AI Horde offers close to 200 models at the same time. Our statistics allows us to see how the popularity of the various models changes day to day and month to month. The below are just the top 10 models being used.

  • Deliberate 22.2% (2550591)
  • stable_diffusion 15.1% (1730426)
  • Anything Diffusion 11.0% (1257688)
  • Hentai Diffusion 4.1% (468473)
  • Realistic Vision 3.0% (338742)
  • Counterfeit 2.7% (310337)
  • URPM 2.6% (297853)
  • Project Unreal Engine 5 2.5% (289006)
  • waifu_diffusion 1.8% (211572)
  • Abyss OrangeMix 1.8% (205268)

For the longest time SD 1.5 (stable_diffusion above) was king, but in the past month, Deliberate has confidently taken the lead and has been leading the pack with a staggering 20% of all image requests passing through the AI Horde! This speaks very highly for the popularity of the model

Top 10 Text models

Almost as many text models exist for the AI Horde, but they’re more varied. However last months saw the release of two big milestones, the Pygmalion models for chat-like generation, which happened after the gimping of the Character AI models. The new Llama model was also released, bringing unparalleled miniaturization of the model size, allowing consumer GPUs far more coherence.

  1. PygmalionAI/pygmalion-6b 52.4% (651566)
  2. KoboldAI/OPT-13B-Erebus 14.0% (174393)
  3. KoboldAI/OPT-6.7B-Erebus 6.7% (83249)
  4. KoboldAI/OPT-6.7B-Nerybus-Mix 3.8% (46747)
  5. KoboldAI/OPT-13B-Nerybus-Mix 2.8% (35110)
  6. KoboldAI/OPT-13B-Nerys-v2 2.7% (33667)
  7. Facebook/LLaMA-13b 1.9% (23367)
  8. KoboldAI/OPT-6B-nerys-v2 1.9% (23232)
  9. OPT-6.7B-Nerybus-Mix 1.6% (19268)
  10. KoboldAI/OPT-2.7B-Erebus 1.0% (12464)

We can see Pygmalion has immediately dominated text generation, with Mr.Seeker’s storytelling models mopping up the rest, but the Llama Ascendancy is just beggining!

Ratings, botting and counter-measures

A few months ago we started collecting ratings for the LAION non-profit to help improve the models existing in the commons, as the success of midjourney has a lot to do with them training their models with the best images their previous generation created.

The initial design was very simple to allow integrators to onboard it fast and giving good kudos rewards for those helping us. Unfortunately people almost immediately started abusing this by creating bots to rate randomly, therefore poisoning our collection’s accuracy.

I always knew this was a possibility but I was hoping I wouldn’t be forced to add countermeasures quite so soon. So I spent quite a few days adding a captcha mechanism (along other things) to block at least the low hanging fruit.

It immediately led to a drop in ratings per day which automatically shows just how much damage botted ratings were doing

New Features

We are fortunate enough to have gathered some great collators for the inference aspect of the AI Horde. So I wanted a big shout-out.

  • ResidentChief has stepped up strongly to help add new features and squash bugs in the nataili library. As a result the AI Horde now supports inpainting on many more models, a lot more post-processors, such as more upscalers and background removers, controlnet improvements, and so many other stuff too numerous to mention. They’re a beast!
  • Jug has been working on improving the AI Horde worker practically non-stop. Giving us a great terminal control, and improving the webui. Plus a lot of bugfixes and improvements in the bridge part of things
  • Tazlin who’s been doing a great deal of tech support in the channels as well as helping me detect and figure out malicious ratings. And also sending some code improvements as well!
  • Aes Sedai who’s been putting a ton of work on improving the moderation capabilities of the AI Horde with a custom frontend.

And of course all the frontend integrators like rockbandit, aqualxx, sgt.chaos and concedo, who’ve been keeping the frontends up to date, with a lot of features smartly using the capabilities of the AI Horde in ways even I had not expected!

CI/CD and pypi

I finally got around to adding CI/CD pipelines for AI Horde Worker and nataili. Now they will be automatically versioned when the right tag is applied to a PR. The Nataili package has also been republished to pypi and will also automatically receive new versions whenever we publish a new release on GitHub.

The notifications also automatically publish a notification on discord, so people can be aware when something new is up.

Alchemists

Using the new post-processing improvements from ResidentChief, I’ve expanded the interrogation worker so that it can now perform post-processing on images, as well as img2text operations. Unfortunately the previous name didn’t fit so well, so now I’ve renamed it to “Alchemist”, to signify it’s capability to convert images to something else.

Likewise, the official names for image worker is now “Dreamer” and text worker is now “Scribe”. Why not 🙂

Final Word

The pace of progress in this space is mind-blowing. I can’t wait to see what we achieve together in the coming days!

The AI Horde Worker has a control UI

Another great update has landed now that the AI Horde Worker is in its own repository, a Web UI built with Gradio! All kudos to ResidentChief who has been doing amazing work for the horde lately!

The new WebUI is completely optional and it can run alongside either the Stable Horde worker or the interrogation Worker, allowing you to tweak their settings on the fly through a very simple interface. This should make it significantly easier for people to adjust their settings.

It still needs some work (I would like some information popups for each feature), but this should work for now. Soon we’ll add things like worker control (maintenance on/off etc) as well as user information and stats.

To run the new worker, simply call bridge-webui.cmd/sh

This should also nicely allow someone to update their bridge setting while sitting on their couch and AFK. Maybe we can add to it things like bridge control, allowing it to start/stop the worker through it. Many exciting possibilities!