New Discord Bot for the Stable Horde

For a few months now the Stable Horde has had its own Discord bot, developed by JamDon. One of the most important aspects I wanted for the bot (and the reason for its original creation), was the ability to be able to gift kudos to people via emojis, which would serve as a way to promote good behavior and mutual aid.

In the process, the the bot received more and more features, such as receiving the functionality of being able to generate images from the Stable Horde, or getting information about the linked horde account etc.

Unfortunately development eventually slowed and then 2 months ago or so, ago JamDon informed me that they do not have time anymore to continue development. Further complicating things was the fact that the bot was written in JavaScript which I do not speak, which made it impossible for me to continue its development on my own. So it languished unmaintained, as the horde got more and more features and other things started changing. It was the reason why I couldn’t make the “r2” payload parameter true by default for example.

The final straw was when our own bot got IP banned by the horde because it was a public bot and had been added to a lot of servers, which we do not control. And apparently people there attempted to generate unethical images, which the horde promptly blocked. Unfortunately that meant that the bot image generation also stopped working everywhere every time this happened.

At the same time, another discord regular had not only developed their own discord bot based on the stable horde, but a whole JavaScript SDK! The bot was in fact very well developed and had most of the features of the previous stable horde bot plus a lot of new stuff like image ratings. The only thing really missing which was really important, was the ability to gift images via emojis, which was the original reason to get as discord bot in the first place 🙂

Fortunately with some convincing and plenty of kudos, zelda_fan agreed to onboard this functionality, as a few other small things that I wished for (like automated roles), and the Stable Horde Bot was reborn!

Unfortunately this did mean that all existing users were logged out and had to log in once more to be able to use the functionality, and it’s commands did change quite significantly, but those were fairly minor things.

Soon after the new bot was deployed, it was also added to the official LAION discord as well, so that their community could use it to rate images as well. I also checked and the bot has been already added to 365 different servers by now. Fortunately its demand is not quite as massive as it’s not prepared to scale quite as well as the stable horde itself.

BTW If you want to add the bot to your own discord server, you can do so by visiting this link. If you want to be able to transfer kudos, you’ll need to contact me so I onboard your emojis though. But other functionality should work.

Stable Horde receives stability.ai processing power!

A week ago I mentioned that we had begun a collaboration with LAION to provide them with ratings on images. The amount of ratings we have received since then has blown away all our expectations! In just a week, you’ve all rated close to 130.000 individual images! As a comparison, the LAION-aesthetics v2, which was instrumental for training Stable Diffusion v1.x, used less than 600K rated images. We’ve reached 1/4 of that amount in a week!

Needless to say, these amounts seemed to turn some heads to the power of mutual aid provided by the stable horde, and some gears were set in motion.

LAION spoke with stability.ai directly and arranged that it would likewise benefit them to support the health of the stable horde itself. Since stability.ai is set to be the most direct beneficiaries of a better trained the laion-aesthetics v3 it makes perfect sense.

I was not privy to the discussions that happened, but I was happy to learn that Tom, the CTO of stability.ai arranged to provide us with some sponsored resources in the form of 4 VMs with RTX4000s Nvidia GPUs!

Quite surprisingly I had to deploy the VMs myself, so I crafted the most optimal setup for taking advantage of those 8Gb of VRAM through my experience with my own RTX2070. Each of them has been loaded with standard stable_diffusion 1.5 and 2.1 and each of them then has 8-10 other finetuned models to help cover the versatility provided by the Stable Horde. Granted, we are serving close to 100 different models currently, but the fact that those workers will remain running consistently 24/7, should help provide cover and allow other workers to switch to less supported models as well.

I hope this is the start of a fruitful collaboration between the stability.ai and the Stable Horde. The way I see it, the current scenario is a win-win for everyone. We get a more consistent service which allows more people to use it and makes them more likely to rate images to give back, which are then fed back to LAION and by extension stability.ai.

The Stable Horde has its first chrome extension!

About a week ago I deployed image interrogation to the stable horde, allowing low-powered GPUs and high-powered CPUs to also be able to become productive contributors on the horde and generate kudos for their owners.

A few days ago, the extension I talked about was finally released once more, relying on the Stable Horde this time: GenAlt

GenAlt is an extension that allows visually impaired people to generate alt-text for any image they encounter on the internet, giving them freer access to an area they were previously excluded. The extension’s description goes more into length about its stated purpose so I urge you to share it so that people who need it can find it

The first release of the extension was setup to automatically pick up every image displayed in the webpage and send them over to the horde for captioning it. That mean that simple scroll through twitter would lead to hundreds of images being sent to the horde for captioning per person!

That in turn led to the stable horde ending with 2000-4000 images to interrogate in its queue. Even with my own worker handling 20 threads at a time, it was just impossible to clear them all, which effectively meant the interrogation service became unusable. To top it off, as the stable horde started deleting expired interrogations, the extension received 404 responses, but unfortunately didn’t take that as a sign to abort polling for them.

At one point we had almost maxed out our available connections to each stable horde backend. But fortunately we kept chugging without much impact. It was one hell of a stress test though!

So I asked the developer to switch it to be triggered with a button or an image-hover action, which while not as user friendly, certainly wouldn’t completely flood the horde. That change (along with fixing the 404s) was finally deployed yesterday and that took care of the flooding issue.

An example of the GenAlt new trigger context menu

Now finally the horde is easily handling the captions as they trickle in at a controllable amount. The developer is planning some more updates, such as triggering it on mouse-hover instead of a specific context menu button, which is not as easy to access, and possibly we can onboard translating the captions before we send them back.

A collaboration begins between Stable Horde and LAION!

last week I wrote how we started creating a new dataset of stable horde images to provide to LAION. Today I am proud to announce that we have further deepened our collaboration by setting up a mechanism which will allow the Stable Horde community to contribute dataset aesthetic ratings for LAION datasets!

Me along with hlky from Sygil.dev have used the last weekend to deploy a new service which allows us to aesthetically rate images from LAION’s multiple datasets. We deployed an API and thus allowed any client to interact with it. You can read the details of how it works on the blog I linked above, so I’m not going to repeat everything.

This is exciting for me because the Stable Horde has suffered from a distinct lack of visibility. None of the major AI-focused media (newsletters, YouTubers etc) have mentioned us to date. The very first coverage we got was from a PC magazine!

All that is to say that it’s been an uphill struggle to get the Stable Horde noticed in a way that will lead to more workers which will allow us to democratize access to AI for everyone. So I am very happy to pivot the amazing stable horde community in such a positive work which will bring more attention to what we’re trying to achieve.

We are still hard at work tweaking the information we store for each rating. For example we store the amount of images they had generated at the time of the rating, which will allow researches to filter out potentially spammy users.

We are also adding more and more countermeasures, as there’s always the fear that someone will just script random ratings to get kudos. Even though the Stable Horde is free to use without kudos and even though kudos has no value, people do strange things to see “numba go up”. Now I don’t particularly care if people harvest kudos like this, but I very well do care about our ratings being poisoned by garbage.

So if you’re someone who wants to make an exploit script to harvest kudos via ratings, please just join our discord instead. The kudos flow like candy when you’re active! And you will also not be harming the AI community itself.

Already our exported dataset has grown to 80K shared images. We have 20K ratings on the LAION datasets within 2 days. For comparison some of the biggest rated datasets have just 175K ratings which were done by paid workers (and we all know how motivated they are to be accurate). Our kudos incentives and community passion to improve AI is surprising even my wildest expectations to be honest!

Here’s to making the best damn dataset that exists!

Sharing is Caring

For a while now I’ve been discussing with LAION on a way to use the power of the horde to help them in some fashion. After coordination with hlky from the Sygil.dev crew, I decided to provide an opt-in mechanism for people to store their text2img stable diffusion generations in an alternative storage bucket. This bucket in turn will be provided to LAION so that they can use it for aesthetic training, or for other similar purposes.

So today I finally released this new mode. For clients it’s a simple flag during the payload. Set “share” to True, and your request will be uploaded to the specific storage bucket. This will also save me infrastructure costs as I will not have to pay for the storage out of my own pocket for these.

To give a further incentive for people to turn this on, and because I wanted a way to show the cost of running the horde, I have also implemented a “horde tax” kudos burn. Every time you generate an image, the overall kudos cost is then increased by 3. This signifies the overall resource cost of passing through the horde, such as bandwidth, i/o and storage. However, if you opt to turn on the sharing switch, the overall “tax” is just +1 kudos.

You might ask, why not make the cost proportional to the overall kudos cost. Something like +30%/+10%. The reason is that the overall kudos cost is dependent on how difficult it is for the workers to generate that image. From the perspective of the horde, a 512x512x50 image is not much different from a 1024x1024x300 image, even though the latter would take an order of magnitude more time to generate.

In fact, many small requests are technically worse for the horde infrastructure costs, than a few small ones. It’s not that I want to discourage the small ones though, because they are actually good for the horde workers (and thus the overall generation speed). Therefore the “tax” is fairly trivial in the grand scheme of things. Just a bit of extra “burn”.

One important thing to note however, is that anonymous accounts image generations are always shared. This is part of my general strategy where I want to discourage anonymous use of the horde. It just more difficult to manage the load when people are using it like that. This is why anonymous has the lowest priority and the most restrictions. And now they will always help provide data to LAION as well.

Finally, img2img and inpainting requests are never shared. This is because those are based on existing images and I cannot know if someone used some personal photo at low strength or something. So I prefer to err on the side of caution.

This is not the last support the AI Horde plans to give to LAION either. We are already working on new features like an aesthetic rating trainer and so on. I hope this sort of assistance can be put to good use for the benefit of all humanity!

The AI Horde Worker has a control UI

Another great update has landed now that the AI Horde Worker is in its own repository, a Web UI built with Gradio! All kudos to ResidentChief who has been doing amazing work for the horde lately!

The new WebUI is completely optional and it can run alongside either the Stable Horde worker or the interrogation Worker, allowing you to tweak their settings on the fly through a very simple interface. This should make it significantly easier for people to adjust their settings.

It still needs some work (I would like some information popups for each feature), but this should work for now. Soon we’ll add things like worker control (maintenance on/off etc) as well as user information and stats.

To run the new worker, simply call bridge-webui.cmd/sh

This should also nicely allow someone to update their bridge setting while sitting on their couch and AFK. Maybe we can add to it things like bridge control, allowing it to start/stop the worker through it. Many exciting possibilities!

Image Interrogations are now available on the Stable Horde!

The Nataili ML backend powering the workers of the Stable Horde has for a while now supported models which can perform image interrogation (AKA img2text) operations. For example captioning images or verifying whether they are displaying NSFW content or not. For almost as long, I’ve wanted to allow the AI Horde to facilitate the widespread use of those models, the same way we do for Stable Diffusion.

A primary reason for wanting this is the fact that the requirements to run a worker on the horde are fairly heavy, needing at least a mid-range GPU on your PC and most people just don’t have the capacity to provide that. Yes there is always a chance to run generations on free cloud services like Google Colaboratory, but that replaces cost with time and attention.

So I felt that being able to use models which are fairly low-powered and can run even on CPUs would provide a way for almost everyone to join the horde and start gaining kudos for themselves. The final push I needed to do this was discovering that there was useful accessibility browser extension out there which had already ceased operations because they couldn’t find cheap compute. Which is effectively what the horde has been built to do!

I was planning to get this done 2 weeks ago, but unfortunately I got massively sick during the holidays so I couldn’t do much of anything. So I moved my vacation days to the new year and finally got cracking.

Unfortunately, while the implementation of those models is much simpler than stable diffusion, preparing the AI Horde to be able to serve these was not quite as straightforward. The problem being that until now I built the horde under two core assumptions:

  1. The input is going to include a prompt of some sort on which to run inference
  2. The prompt would always expect the same type of results. Whether that is image or text.

Image interrogations flip these lot of these on their head. The input has to be a simple image, with no prompt from the user (other than payload tweaks), and the end result can differ wildly from each other, for example one being text, the other boolean and yet another returning a dictionary.

So I needed to set up a way to do that in a way that I hadn’t engineered until now, which required building the pipeline inside the AI Horde from scratch.

To make things worse, I did not want to duplicate my worker code, something which required me to implement table polymorphism within SQLAlchemy, which is a tricky subject on its own. More importantly, it requires modifying existing tables, which meant I needed to set up a development instance of the stable horde so that I can actually test the changes before going live. That in turn meant a new server, new DB, new nodes etc. Happily I had most of it ready via my Ansible code, but I still needed to tweak things to run on a new domain etc.

Finally this also required that I implement polymorphism on the bridged worker as well. The existing worker code has evolved to use quite advanced mechanism for queuing, threading etc and I didn’t want to just duplicate it. Unfortunately the code itself has become very spaghetti and is was high time I de-indented it with extreme prejudice and then implement worker polymorphism as well.

All-in all, designing, building and testing image interrogations took me the best part of a whole week.

So I am proud to announce that the new feature is now live on the stable horde!

Response Body from an image interrogation request

As always, you have to look at the api documentation for each endpoint you want to use. But very simply, you simply send an image URL you want to interrogate and specify which interrogation forms you want to use, like so:

{ "forms": [ { "name": "caption" }, { "name": "nsfw" } ], "source_image": "https://i.redd.it/ggkxrfgq7u9a1.png" }
Code language: JSON / JSON with Comments (json)

It otherwise works similar top image generation from a client’s perspective, with the difference that you don’t need to use a check/ endpoint, and you can keep polling the interrogate/status/ endpoint directly. Once a form is completed, you will get a result from that form matching its type.

Currently we support three interrogation forms: caption, nsfw, and interrogation

  • Caption: Returns a string describing the image
  • NSFW: Returns a true/false boolean depending on whether the image is displaying NSFW imagery or not.
  • Interrogation: Returns a dictionary of key words best describing the image, with an accompanying confidence score. This takes the most time of all the interrogations and is rewarded accordingly in kudos.

As I mentioned before, the worker code had to be completely refactored. It now lives in a new repository as well as the nataili repo will soon turn into a pip package I can install externally, so this is in preparation of that.

To start an interrogation worker, you use the same code, but you start with a different bridge script.

./horde-interrogation_bridge.cmd -n "The Deep Questioning" --max_threads=5 --queue_size=5
Code language: JavaScript (javascript)

As the models used by the interrogation worker are much more lightweight, it actually benefits more from high threads and high queue_sizes, so feel free to crank those up so that it’s best utilizing your worker. A lot of the new code changes I did to the horde also allow your worker to pick up many forms at the same time, which will cut down on the poll requests to the horde, further reducing idle time.

However be careful not to set these too high. because you’re picking up the requests in advance, nobody else will work on them until your worker gets to them. If your queue is high and your threads are low (or slow), then you’ll notice your horde performance is not going to be great.

However the result of each form will be sent back as soon as it’s done, so as it save as much time as possible.

One more thing to note is that an Interrogation worker is different from the Stable Diffusion worker. As such you cannot use the same name! However they DO use the same bridgeData.py. If you plan to run both types of workers, utilize the command line arguments to tweak the bridge settings accordingly instead of having to change your bridgeData.py all the time.

In the future I plan to further tweak the bridge so that it can run parallel with the stable diffusion worker to best utilize your space processing. I also want to tweak the model loading so that optionally you can offload the whole thing to CPU. But I need to test if the speeds for this make sense first.

Another cool possibility from this refactor is that it opens the doors for different worker types on the horde, which in turn gives me an opening I’ve been considering for a while now, which is the complete merge of the Stable and KoboldAI horde into one service. This will reduce the amount of code juggling I have to do, and hopefully simplify things for everyone with a common kudos system.

I am excited to see what use cases you all will come up with this new system!

Stable Horde now has a native mobile app!

A very cool person has developed a FOSS mobile app! This works both on Android and iOS.

It is not quite as feature complete as the other stable horde web clients like Artbot and Stable UI (which also work on phones perfectly well), but I’m sure it will get there very soon!

You can get it on Google Play Store and Apple App Store

Image Interrogation Progress

For the past 2 weeks I’ve been trying to build the new feature of the horde which will allow even people with a low powered GPU, or just CPU to join the horde and provide a service to gather kudos.

It is called “Interrogation” as it will “interrogate” source images to discover aspects about them, such as an image caption, or whether they are displaying NSFW content etc. This feature can then be on-boarded into new or existing tools, such as perhaps automatically captioning images for micro-blogging services as an accessibility feature, or a browser plugin for parental controls etc.

However what I thought would be a bit tricky but doable soon keeps running into various snags. First, I lost one complete week of work from my vacation by getting the nastiest cold I’ve had for the past 10 years at least. Flattening me for almost 9 days. Then it was holiday period where I had to put more attention to family and friends.

Now that I’m finally able to concentrate more on it, I find it’s actually an order of magnitude more complex than I initially expected, requiring me to actually have to update my existing database tables (always a risky proposition) and also redesign my approach to use such things as “Polymorphic tables” so that I don’t end up duplicating hundreds of lines of code between similar classes.

And while I’m doing this, the horde ML backend has been receiving a surprisingly increased pace of improvements, recently implementing depth2img, adding diffusers to voodoo-ray, CodeFormers and a ton of urgent bug-fixes and other improvements.

To say my attention has been split is an understatement.

But I’m slowly but surely making more progress. I hope to have something out soon-ish. I do wonder if it will require a complete horde downtime this time for the DB upgrades. Never done that before. Kinda scary, not gonna lie…

depth2img now available on the Stable Horde!

Through the great work of @ResidentChiefNZ The Stable Horde now supports depth2img, which is a new method of doing img2img which better understands the source image you provide and the results are downright amazing. This article I think explains it better than I could.

See below the transformation of one of my avatars into a clown, a zombie and a orangutan respectively.

To use depth2img you need to explicitly use the Stable Diffusion 2 Depth model. The rest will work the same as img2img. 

Warning, depth2img does not support a mask! So if your client allows you to send one, it will just be ignored.  

If you are running a Worker you can simply update you bridge code and you must update-runtime as it uses quite a few new packages. Afterwards add the model to your list as usual. 

We recently also enabled diffusers to be loaded into voodoo ray, so this will allow you to not only keep the depth2img in RAM along with other models, but also the older inpainting model! Please direct all your kudos to @cogentdev for this! I am already running both inpainting, depth2img, sd2.1 and 15 other 1.5 models on my 2070 with no issues!  

If you have built your own Integration with the stable horde such as clients or bots, please update your tools to take into account depth2img. I would suggest adding a new tab for it, which forces Stable Diffusion 2 Depth to be used and prevents sending an image mask. This is to avoid confusion. This will also allow you the opportunity to provide some more information about the differences between img2img and depth2img.  

Enjoy and please shower the people behind the new updates with Kudos where you see them!