Image Interrogation Progress

For the past 2 weeks I’ve been trying to build the new feature of the horde which will allow even people with a low powered GPU, or just CPU to join the horde and provide a service to gather kudos.

It is called “Interrogation” as it will “interrogate” source images to discover aspects about them, such as an image caption, or whether they are displaying NSFW content etc. This feature can then be on-boarded into new or existing tools, such as perhaps automatically captioning images for micro-blogging services as an accessibility feature, or a browser plugin for parental controls etc.

However what I thought would be a bit tricky but doable soon keeps running into various snags. First, I lost one complete week of work from my vacation by getting the nastiest cold I’ve had for the past 10 years at least. Flattening me for almost 9 days. Then it was holiday period where I had to put more attention to family and friends.

Now that I’m finally able to concentrate more on it, I find it’s actually an order of magnitude more complex than I initially expected, requiring me to actually have to update my existing database tables (always a risky proposition) and also redesign my approach to use such things as “Polymorphic tables” so that I don’t end up duplicating hundreds of lines of code between similar classes.

And while I’m doing this, the horde ML backend has been receiving a surprisingly increased pace of improvements, recently implementing depth2img, adding diffusers to voodoo-ray, CodeFormers and a ton of urgent bug-fixes and other improvements.

To say my attention has been split is an understatement.

But I’m slowly but surely making more progress. I hope to have something out soon-ish. I do wonder if it will require a complete horde downtime this time for the DB upgrades. Never done that before. Kinda scary, not gonna lie…

depth2img now available on the Stable Horde!

Through the great work of @ResidentChiefNZ The Stable Horde now supports depth2img, which is a new method of doing img2img which better understands the source image you provide and the results are downright amazing. This article I think explains it better than I could.

See below the transformation of one of my avatars into a clown, a zombie and a orangutan respectively.

To use depth2img you need to explicitly use the Stable Diffusion 2 Depth model. The rest will work the same as img2img. 

Warning, depth2img does not support a mask! So if your client allows you to send one, it will just be ignored.  

If you are running a Worker you can simply update you bridge code and you must update-runtime as it uses quite a few new packages. Afterwards add the model to your list as usual. 

We recently also enabled diffusers to be loaded into voodoo ray, so this will allow you to not only keep the depth2img in RAM along with other models, but also the older inpainting model! Please direct all your kudos to @cogentdev for this! I am already running both inpainting, depth2img, sd2.1 and 15 other 1.5 models on my 2070 with no issues!  

If you have built your own Integration with the stable horde such as clients or bots, please update your tools to take into account depth2img. I would suggest adding a new tab for it, which forces Stable Diffusion 2 Depth to be used and prevents sending an image mask. This is to avoid confusion. This will also allow you the opportunity to provide some more information about the differences between img2img and depth2img.  

Enjoy and please shower the people behind the new updates with Kudos where you see them!

The Stable Horde is in the news!

A new article has been published in PCWorld.com about the Stable Horde!

Overall a very well researched article. I can’t find any issues with it. Personally I would liken the AI Horde technology as a mix between BitTorrent and Folding@Home, but the former has some negative connotations for many people.

Some things I could address from the article

It’s not entirely clear whether every fork of Stable Diffusion should work, but you can try.

There’s no “forks” of stable diffusion. There’s checkpoints and multiple models and the horde supports every .ckpt model and some diffusers models. I suspect the author confused Stable Diffusion the model, with clients and frontends using it, like automatic1111.

There is a tiny bit of a catch: the kudos system. To prevent abuse of the system, the developer implemented a system where every request “costs” some amount of kudos. Kudos mean nothing except in terms of priority: each request subtracts kudos from your balance, putting you in “debt.” Those with the most debts get placed lowest in the queue. But if there are many clients contributing AI art, that really doesn’t matter, as even users with enormous kudos debts will see their requests fulfilled in seconds.

Indeed each request consumes kudos to fulfill, but you don’t actually go in debt. While we do record the historical amount of kudos you’ve consumed for statistics, your actual total as a registered user never goes below 0. This means as a registered user, you will always have more priority than an anonymous user (who typically remains at -50 kudos). Your kudos minimum also allows you to generate with slightly higher resolution and steps than an anonymous user.

Images won’t automatically download, but you can go to the Images tab and then manually download them.

That it totally dependent on the client. It works this way for Artbot, but Lucid Creations for example is a local application, so the images are saved with a button click. Other clients might save automatically.

Other than that, great article!

To be honest, I’ve been actually quite surprised that nobody has written about the SH until now. The SH went live in early September, soon after Stable Diffusion came out, and we’ve generated 13 Million images until now (or approximately $50K of value) but none of the big AI and AI Art focused news reports has given a single mention of it! Now, I am not one for conspiracy theories, but it sounds extraordinary unlikely that absolutely nobody in the scene has noticed us until now or felt we are newsworthy, especially since many people have directly tweeted to some of the big AI and Stable Diffusion players about it.

Oh well, a PC-magazine is the first to report on the Stable horde. So be it! I wonder how many people will discover the Stable Horde from it.

Some napkin math

Stable Horde has generated ~180 Terapixelsteps of images. Assuming each image is 512x512x30 that is like 22 million images (higher resolutions have an exponential difficulty).

Using the current cost of http://dreamstudio.ai, the Stable Horde has generated for free a value of close to $45000! Using the old http://dreamstudio.ai costs (Stable Horde has been up almost as long), this is closer to $230.000 All this value has been given out voluntarily, with no ads or fine print.

Taking into account the post-processors allowed and the exponential difficulty of higher resolution images (Stable Horde allows up to 3072×2048), these numbers can easily be doubled.

For reference, in its stable horde lifetime, my patreon account has made $500,most of which has gone to infrastructure costs.

Codeformer and Reddit Bot for the Stable Horde

I haven’t been able to improve the Stable Horde a lot lately. I was planning to do a lot of work during the week leading to Christmas, but unfortunately the universe had another idea and not only infected me with the nastiest cold I’ve had for decades, but my whole family as well, including the visiting Grandma!

So instead of adding necessary new features, I’ve been instead flattened at bed, trying to muster enough concentration to do some basic updates and answer questions.

Nevertheless, there’s a few improvements added, mostly through the work of some members of the community.

First is the addition of the CodeFormer face-fixing post-processor which seems massively better than the GFPGAN model. Now all clients can request that an image be bassed through CodeFormer for an immediately improvement in faces. Soon I plan to allow this to run in isolation as well

The other new thing is improvements on the workers themselves, allowing them to pickup and perform jobs more efficiently.

The other big news I have is that wrote and unleashed the first Reddit bot for stable Diffusion. That was initially created as an entry for the Ben’s Bites Hackathon since I couldn’t submit the Stable Horde itself (I didn’t win btw), but it was quite an eventful release. My initial release got caught by the automated reddit anti-spam filter, shadow-banning my account and banning my subreddit. Then I refactored the bot to use my own R2 CDN and released it with a new account while asking for a reddit review on my original account. Fortunately my bot account and subreddit got unbanned and I finally released it a third time properly, and it’s been up ever since!

The way the bot is created you can request images from it all over reddit, and it will post the images in its own subreddit for everyone to see and vote on.

There’s also been a lot of new models and styles onboarded, which are also used by my reddit and mastodon bots.

The next plan now is to allow image interrogation on the stable horde, as well as direct image post-processing (without stable diffusiion), so as to allow even people with low-powered machines to be able to contribute for kudos.

The Stable Horde: AI image generation for everyone through mutual aid

After completing the KoboldAI Horde, and onboarding into the KoboldAI client, I felt that there is a really big opening for doing a similar thing using the open sourced AI image generating model, Stable Diffusion. I already have the code for setting up a crowdsourcing cluster, so it shouldn’t take too much refactoring to make the same underlying code work with Stable Diffusion.

The first thing I had to do was figure out what is going to run on the workers. For this, I decided to reuse the stable-diffusion-webui fork by simply adding my bridge code on top of it (as it doesn’t provide a REST API like KoboldAI). Once I had a valid bridge, it was time to fork the Horde.

And thus, the Stable Horde was born!

It follows the same approach, where workers running some version of Stable Diffusion constantly poll for new generations to complete and then send it back to the horde to hand it over to their final destination. For now the stable horde is only handling fairly basic text2image generations, but since it’s based on the webui, I can tap into the features that added upstrean much easier, without having to develop them myself.

The code started as a fork of the Stable Horde, but has by now become my primary repo. In fact, with the addition of the second version of the REST API, I have decided to merge both Hordes into a single repository in order to better share code updates (because copying code from one repo to the other was driving me nuts!). This is coming soon, and it means that the Stable horde will always remain in parity with the KoboldAI horde from now on.

While there are other free image generation tools out there, I believe none is doing anything like what I am attempting. Most of these are based on providing free Stable Diffusion by eating the costs themselves, but with an undefined business plan. And when I see that, my suspicions are already raised, as a free service like that, typically means you’re the product! It also doesn’t help at all that they are not sharing the code behind them.

Now you might say, “But db0, your service is also free, how come the same criticism doesn’t apply to you?”. Which is a great question. The answer is that the reason the Stable Horde is free is because it’s volunteer based. That means, at the end of the day, someone is indeed paying for electricity (that is, myself primarily atm), but the point is that it is self-managing through people’s innate drive for mutual-aid.

That means that if I get a jump in popularity, which in turn exceeds the Horde’s current image generation capacity (and therefore slowing things down too much), the belief is that there will be enough people annoyed by the speed, that they will join their own power to the horde to benefit themselves with higher priority, but also everyone else.

And yes, there is always some amount of “small print”. While the Stable Horde is built on anarchistic principles of mutual aid and direct action, the fact is that we do not control the underlying workers. Therefore it is theoretically possible for people to act maliciously on the worker side, which is why I always warn people who will use the Horde that I cannot guarantee that nobody will see your prompts. So act accordingly.

Nevertheless, one of the things I’m offering is something that I just haven’t seen anyone else do for image gen, and that is a fully functioning RESTful API. The purpose of this it to further enable image generation for everyone in new and exciting ways that enable tools to use this capability without bankrupting their owners for a side hobby whose demand suddenly spiked. Already people have started creating some interesting tools, such as a weather app which uses the Stable Horde to generate a dynamic image representing the weather, based on environmental conditions.

On my end, I am interested in helping game developers figure out ways to implement AI into their games. For this purpose I have already released a Godot Add-On which allows you to request AI image generation during a game’s runtime. I have further used this add-on to create my own Stable Diffusion GUI client that can run on any device, without the need for a complicated install procedure, or a GPU.

All of this is just scratching the potential of what can be achieved by allowing automation to connect directly to Stable Diffusion (or text generation), and I’m excited to see what people will come up with in the future!