Another great update has landed now that the AI Horde Worker is in its own repository, a Web UI built with Gradio! All kudos to ResidentChief who has been doing amazing work for the horde lately!
The new WebUI is completely optional and it can run alongside either the Stable Horde worker or the interrogation Worker, allowing you to tweak their settings on the fly through a very simple interface. This should make it significantly easier for people to adjust their settings.
It still needs some work (I would like some information popups for each feature), but this should work for now. Soon we’ll add things like worker control (maintenance on/off etc) as well as user information and stats.
To run the new worker, simply call bridge-webui.cmd/sh
This should also nicely allow someone to update their bridge setting while sitting on their couch and AFK. Maybe we can add to it things like bridge control, allowing it to start/stop the worker through it. Many exciting possibilities!
The Nataili ML backend powering the workers of the Stable Horde has for a while now supported models which can perform image interrogation (AKA img2text) operations. For example captioning images or verifying whether they are displaying NSFW content or not. For almost as long, I’ve wanted to allow the AI Horde to facilitate the widespread use of those models, the same way we do for Stable Diffusion.
A primary reason for wanting this is the fact that the requirements to run a worker on the horde are fairly heavy, needing at least a mid-range GPU on your PC and most people just don’t have the capacity to provide that. Yes there is always a chance to run generations on free cloud services like Google Colaboratory, but that replaces cost with time and attention.
So I felt that being able to use models which are fairly low-powered and can run even on CPUs would provide a way for almost everyone to join the horde and start gaining kudos for themselves. The final push I needed to do this was discovering that there was useful accessibility browser extension out there which had already ceased operations because they couldn’t find cheap compute. Which is effectively what the horde has been built to do!
I was planning to get this done 2 weeks ago, but unfortunately I got massively sick during the holidays so I couldn’t do much of anything. So I moved my vacation days to the new year and finally got cracking.
Unfortunately, while the implementation of those models is much simpler than stable diffusion, preparing the AI Horde to be able to serve these was not quite as straightforward. The problem being that until now I built the horde under two core assumptions:
The input is going to include a prompt of some sort on which to run inference
The prompt would always expect the same type of results. Whether that is image or text.
Image interrogations flip these lot of these on their head. The input has to be a simple image, with no prompt from the user (other than payload tweaks), and the end result can differ wildly from each other, for example one being text, the other boolean and yet another returning a dictionary.
To make things worse, I did not want to duplicate my worker code, something which required me to implement table polymorphism within SQLAlchemy, which is a tricky subject on its own. More importantly, it requires modifying existing tables, which meant I needed to set up a development instance of the stable horde so that I can actually test the changes before going live. That in turn meant a new server, new DB, new nodes etc. Happily I had most of it ready via my Ansible code, but I still needed to tweak things to run on a new domain etc.
All-in all, designing, building and testing image interrogations took me the best part of a whole week.
So I am proud to announce that the new feature is now live on the stable horde!
As always, you have to look at the api documentation for each endpoint you want to use. But very simply, you simply send an image URL you want to interrogate and specify which interrogation forms you want to use, like so:
It otherwise works similar top image generation from a client’s perspective, with the difference that you don’t need to use a check/ endpoint, and you can keep polling the interrogate/status/ endpoint directly. Once a form is completed, you will get a result from that form matching its type.
Currently we support three interrogation forms: caption, nsfw, and interrogation
Caption: Returns a string describing the image
NSFW: Returns a true/false boolean depending on whether the image is displaying NSFW imagery or not.
Interrogation: Returns a dictionary of key words best describing the image, with an accompanying confidence score. This takes the most time of all the interrogations and is rewarded accordingly in kudos.
As I mentioned before, the worker code had to be completely refactored. It now lives in a new repository as well as the nataili repo will soon turn into a pip package I can install externally, so this is in preparation of that.
To start an interrogation worker, you use the same code, but you start with a different bridge script.
./horde-interrogation_bridge.cmd -n "The Deep Questioning" --max_threads=5 --queue_size=5
As the models used by the interrogation worker are much more lightweight, it actually benefits more from high threads and high queue_sizes, so feel free to crank those up so that it’s best utilizing your worker. A lot of the new code changes I did to the horde also allow your worker to pick up many forms at the same time, which will cut down on the poll requests to the horde, further reducing idle time.
However be careful not to set these too high. because you’re picking up the requests in advance, nobody else will work on them until your worker gets to them. If your queue is high and your threads are low (or slow), then you’ll notice your horde performance is not going to be great.
However the result of each form will be sent back as soon as it’s done, so as it save as much time as possible.
One more thing to note is that an Interrogation worker is different from the Stable Diffusion worker. As such you cannot use the same name! However they DO use the same bridgeData.py. If you plan to run both types of workers, utilize the command line arguments to tweak the bridge settings accordingly instead of having to change your bridgeData.py all the time.
In the future I plan to further tweak the bridge so that it can run parallel with the stable diffusion worker to best utilize your space processing. I also want to tweak the model loading so that optionally you can offload the whole thing to CPU. But I need to test if the speeds for this make sense first.
Another cool possibility from this refactor is that it opens the doors for different worker types on the horde, which in turn gives me an opening I’ve been considering for a while now, which is the complete merge of the Stable and KoboldAI horde into one service. This will reduce the amount of code juggling I have to do, and hopefully simplify things for everyone with a common kudos system.
I am excited to see what use cases you all will come up with this new system!
For the past 2 weeks I’ve been trying to build the new feature of the horde which will allow even people with a low powered GPU, or just CPU to join the horde and provide a service to gather kudos.
It is called “Interrogation” as it will “interrogate” source images to discover aspects about them, such as an image caption, or whether they are displaying NSFW content etc. This feature can then be on-boarded into new or existing tools, such as perhaps automatically captioning images for micro-blogging services as an accessibility feature, or a browser plugin for parental controls etc.
However what I thought would be a bit tricky but doable soon keeps running into various snags. First, I lost one complete week of work from my vacation by getting the nastiest cold I’ve had for the past 10 years at least. Flattening me for almost 9 days. Then it was holiday period where I had to put more attention to family and friends.
Now that I’m finally able to concentrate more on it, I find it’s actually an order of magnitude more complex than I initially expected, requiring me to actually have to update my existing database tables (always a risky proposition) and also redesign my approach to use such things as “Polymorphic tables” so that I don’t end up duplicating hundreds of lines of code between similar classes.
And while I’m doing this, the horde ML backend has been receiving a surprisingly increased pace of improvements, recently implementing depth2img, adding diffusers to voodoo-ray, CodeFormers and a ton of urgent bug-fixes and other improvements.
To say my attention has been split is an understatement.
But I’m slowly but surely making more progress. I hope to have something out soon-ish. I do wonder if it will require a complete horde downtime this time for the DB upgrades. Never done that before. Kinda scary, not gonna lie…
Through the great work of @ResidentChiefNZ The Stable Horde now supports depth2img, which is a new method of doing img2img which better understands the source image you provide and the results are downright amazing. This article I think explains it better than I could.
See below the transformation of one of my avatars into a clown, a zombie and a orangutan respectively.
To use depth2img you need to explicitly use the Stable Diffusion 2 Depth model. The rest will work the same as img2img.
Warning, depth2img does not support a mask! So if your client allows you to send one, it will just be ignored.
If you are running a Worker you can simply update you bridge code and you mustupdate-runtime as it uses quite a few new packages. Afterwards add the model to your list as usual.
We recently also enabled diffusers to be loaded into voodoo ray, so this will allow you to not only keep the depth2img in RAM along with other models, but also the older inpainting model! Please direct all your kudos to @cogentdev for this! I am already running both inpainting, depth2img, sd2.1 and 15 other 1.5 models on my 2070 with no issues!
If you have built your own Integration with the stable horde such as clients or bots, please update your tools to take into account depth2img. I would suggest adding a new tab for it, which forces Stable Diffusion 2 Depth to be used and prevents sending an image mask. This is to avoid confusion. This will also allow you the opportunity to provide some more information about the differences between img2img and depth2img.
Enjoy and please shower the people behind the new updates with Kudos where you see them!
Overall a very well researched article. I can’t find any issues with it. Personally I would liken the AI Horde technology as a mix between BitTorrent and Folding@Home, but the former has some negative connotations for many people.
Some things I could address from the article
It’s not entirely clear whether every fork of Stable Diffusion should work, but you can try.
There’s no “forks” of stable diffusion. There’s checkpoints and multiple models and the horde supports every .ckpt model and some diffusers models. I suspect the author confused Stable Diffusion the model, with clients and frontends using it, like automatic1111.
There is a tiny bit of a catch: the kudos system. To prevent abuse of the system, the developer implemented a system where every request “costs” some amount of kudos. Kudos mean nothing except in terms of priority: each request subtracts kudos from your balance, putting you in “debt.” Those with the most debts get placed lowest in the queue. But if there are many clients contributing AI art, that really doesn’t matter, as even users with enormous kudos debts will see their requests fulfilled in seconds.
Indeed each request consumes kudos to fulfill, but you don’t actually go in debt. While we do record the historical amount of kudos you’ve consumed for statistics, your actual total as a registered user never goes below 0. This means as a registered user, you will always have more priority than an anonymous user (who typically remains at -50 kudos). Your kudos minimum also allows you to generate with slightly higher resolution and steps than an anonymous user.
Images won’t automatically download, but you can go to the Images tab and then manually download them.
That it totally dependent on the client. It works this way for Artbot, but Lucid Creations for example is a local application, so the images are saved with a button click. Other clients might save automatically.
Other than that, great article!
To be honest, I’ve been actually quite surprised that nobody has written about the SH until now. The SH went live in early September, soon after Stable Diffusion came out, and we’ve generated 13 Million images until now (or approximately $50K of value) but none of the big AI and AI Art focused news reports has given a single mention of it! Now, I am not one for conspiracy theories, but it sounds extraordinary unlikely that absolutely nobody in the scene has noticed us until now or felt we are newsworthy, especially since many people have directly tweeted to some of the big AI and Stable Diffusion players about it.
Oh well, a PC-magazine is the first to report on the Stable horde. So be it! I wonder how many people will discover the Stable Horde from it.
We need people to test a PR for onboarding stable diffusion inpainting onto voodoo ray. Please pull this branch and load inpainting and see how stable it is and/if the VRAM is unloaded correctly after each image.