Another great update has landed now that the AI Horde Worker is in its own repository, a Web UI built with Gradio! All kudos to ResidentChief who has been doing amazing work for the horde lately!
The new WebUI is completely optional and it can run alongside either the Stable Horde worker or the interrogation Worker, allowing you to tweak their settings on the fly through a very simple interface. This should make it significantly easier for people to adjust their settings.
It still needs some work (I would like some information popups for each feature), but this should work for now. Soon we’ll add things like worker control (maintenance on/off etc) as well as user information and stats.
To run the new worker, simply call bridge-webui.cmd/sh
This should also nicely allow someone to update their bridge setting while sitting on their couch and AFK. Maybe we can add to it things like bridge control, allowing it to start/stop the worker through it. Many exciting possibilities!
The Nataili ML backend powering the workers of the Stable Horde has for a while now supported models which can perform image interrogation (AKA img2text) operations. For example captioning images or verifying whether they are displaying NSFW content or not. For almost as long, I’ve wanted to allow the AI Horde to facilitate the widespread use of those models, the same way we do for Stable Diffusion.
A primary reason for wanting this is the fact that the requirements to run a worker on the horde are fairly heavy, needing at least a mid-range GPU on your PC and most people just don’t have the capacity to provide that. Yes there is always a chance to run generations on free cloud services like Google Colaboratory, but that replaces cost with time and attention.
So I felt that being able to use models which are fairly low-powered and can run even on CPUs would provide a way for almost everyone to join the horde and start gaining kudos for themselves. The final push I needed to do this was discovering that there was useful accessibility browser extension out there which had already ceased operations because they couldn’t find cheap compute. Which is effectively what the horde has been built to do!
I was planning to get this done 2 weeks ago, but unfortunately I got massively sick during the holidays so I couldn’t do much of anything. So I moved my vacation days to the new year and finally got cracking.
Unfortunately, while the implementation of those models is much simpler than stable diffusion, preparing the AI Horde to be able to serve these was not quite as straightforward. The problem being that until now I built the horde under two core assumptions:
The input is going to include a prompt of some sort on which to run inference
The prompt would always expect the same type of results. Whether that is image or text.
Image interrogations flip these lot of these on their head. The input has to be a simple image, with no prompt from the user (other than payload tweaks), and the end result can differ wildly from each other, for example one being text, the other boolean and yet another returning a dictionary.
To make things worse, I did not want to duplicate my worker code, something which required me to implement table polymorphism within SQLAlchemy, which is a tricky subject on its own. More importantly, it requires modifying existing tables, which meant I needed to set up a development instance of the stable horde so that I can actually test the changes before going live. That in turn meant a new server, new DB, new nodes etc. Happily I had most of it ready via my Ansible code, but I still needed to tweak things to run on a new domain etc.
All-in all, designing, building and testing image interrogations took me the best part of a whole week.
So I am proud to announce that the new feature is now live on the stable horde!
As always, you have to look at the api documentation for each endpoint you want to use. But very simply, you simply send an image URL you want to interrogate and specify which interrogation forms you want to use, like so:
It otherwise works similar top image generation from a client’s perspective, with the difference that you don’t need to use a check/ endpoint, and you can keep polling the interrogate/status/ endpoint directly. Once a form is completed, you will get a result from that form matching its type.
Currently we support three interrogation forms: caption, nsfw, and interrogation
Caption: Returns a string describing the image
NSFW: Returns a true/false boolean depending on whether the image is displaying NSFW imagery or not.
Interrogation: Returns a dictionary of key words best describing the image, with an accompanying confidence score. This takes the most time of all the interrogations and is rewarded accordingly in kudos.
As I mentioned before, the worker code had to be completely refactored. It now lives in a new repository as well as the nataili repo will soon turn into a pip package I can install externally, so this is in preparation of that.
To start an interrogation worker, you use the same code, but you start with a different bridge script.
As the models used by the interrogation worker are much more lightweight, it actually benefits more from high threads and high queue_sizes, so feel free to crank those up so that it’s best utilizing your worker. A lot of the new code changes I did to the horde also allow your worker to pick up many forms at the same time, which will cut down on the poll requests to the horde, further reducing idle time.
However be careful not to set these too high. because you’re picking up the requests in advance, nobody else will work on them until your worker gets to them. If your queue is high and your threads are low (or slow), then you’ll notice your horde performance is not going to be great.
However the result of each form will be sent back as soon as it’s done, so as it save as much time as possible.
One more thing to note is that an Interrogation worker is different from the Stable Diffusion worker. As such you cannot use the same name! However they DO use the same bridgeData.py. If you plan to run both types of workers, utilize the command line arguments to tweak the bridge settings accordingly instead of having to change your bridgeData.py all the time.
In the future I plan to further tweak the bridge so that it can run parallel with the stable diffusion worker to best utilize your space processing. I also want to tweak the model loading so that optionally you can offload the whole thing to CPU. But I need to test if the speeds for this make sense first.
Another cool possibility from this refactor is that it opens the doors for different worker types on the horde, which in turn gives me an opening I’ve been considering for a while now, which is the complete merge of the Stable and KoboldAI horde into one service. This will reduce the amount of code juggling I have to do, and hopefully simplify things for everyone with a common kudos system.
I am excited to see what use cases you all will come up with this new system!
For the past 2 weeks I’ve been trying to build the new feature of the horde which will allow even people with a low powered GPU, or just CPU to join the horde and provide a service to gather kudos.
It is called “Interrogation” as it will “interrogate” source images to discover aspects about them, such as an image caption, or whether they are displaying NSFW content etc. This feature can then be on-boarded into new or existing tools, such as perhaps automatically captioning images for micro-blogging services as an accessibility feature, or a browser plugin for parental controls etc.
However what I thought would be a bit tricky but doable soon keeps running into various snags. First, I lost one complete week of work from my vacation by getting the nastiest cold I’ve had for the past 10 years at least. Flattening me for almost 9 days. Then it was holiday period where I had to put more attention to family and friends.
Now that I’m finally able to concentrate more on it, I find it’s actually an order of magnitude more complex than I initially expected, requiring me to actually have to update my existing database tables (always a risky proposition) and also redesign my approach to use such things as “Polymorphic tables” so that I don’t end up duplicating hundreds of lines of code between similar classes.
And while I’m doing this, the horde ML backend has been receiving a surprisingly increased pace of improvements, recently implementing depth2img, adding diffusers to voodoo-ray, CodeFormers and a ton of urgent bug-fixes and other improvements.
To say my attention has been split is an understatement.
But I’m slowly but surely making more progress. I hope to have something out soon-ish. I do wonder if it will require a complete horde downtime this time for the DB upgrades. Never done that before. Kinda scary, not gonna lie…
Divided by Zer0 is an epic Pythonista (one of the 4% most active Python users) who spends a lot of time commenting on issues between pushes. Divided is a fulltime hacker who works best in the morning (around 11 am).
Just expressing how happy I am to be hacking again.
Ever since I’ve started coding games on OCTGN and working significantly with python, my ethusiasm for programming has been rekindled. So much so that I’ve pretty much stopped doing almost all other things I was doing lately, including reading reddit.
Today, after I went to bed coding and woke up to code afresh, I got to thinking why I ever stopped doing it. The truth is that I stopped doing it (or rather, never fully started) because I didn’t have a good base at modern programming languages and because I didn’t have a project to draw my interest in doing learning them. This is actually a big problem for me. Due to the way my ADHD-addled brain works, I find it nearly impossible to work on something that A) doesn’t excite me, B) doesn’t give me a clear short-term goal. And because of this, the thought of first learning a programming language before trying to use to code something I want to have, was overwhelming.
I always wanted to get into python, and I did start with a few introductory texts, but I quickly lost attention. Not being able to do, or having something to do with what I learned, killed my interest. And this is where OCTGN helped: It allowed me to start slow (small incremental improvements), on a working base to work on (other games), on something that interested me (making defunct CCGs I love work online).
Slowly I’ve been building up steam and learning python as I go, which is the only way I know that works for me. The result is that I spent almost all day yesterday getting the automation for the Dune CCG work, and then re-writing the code to be more intuitive and succinct. This shit’s addicting!
My hope is that once I’ve perfected all the games I want to on OCTGN, I’ll be able to use the knowledge and familiarity I’ve achieved in python to move to some of the other projects currently sitting in the back-burner of my brain.