Embedded QR Codes via the AI Horde

Around the same time last year, the first controlnet for generating QR codes with Stable Diffusion was released I was immediately enamored with the idea and wanted to have it ASAP as an option on the AI Horde. Unfortunately due to a lot of extenuating circumstances [gesticulates wildly] I had neither the time, nor the skills to do it myself, nor the people who could help us onboard it. So this fell on the wayside while way more pressing things were being developed.

Today I’m very excited to announce that I have finally achieved and deployed it to production! QR code generation via the AI Horde is here!

To use is fairly simply, assuming your front-end of choice supports it. You simply provide the text that you want represented as a QR code and the AI Horde will generate a QR code, and then using controlnet, will generate an image where the QR code is embedded into it, as if it’s part of the drawing. You can scan the examples below to see it in action.

You’ll notice that unlike some of the examples you’ll find online elsewhere, the QR code we generate is still fairly noticeable as a QR code, especially when zoomed out, or at a distance. The reason for this is that the more fitting you make to the image, the less likely it is that the QR code is scannable. The implementation I followed to achieve this result is specifically tailored to sacrifice “embedding” for the purpose of scannability.

So when you want to generate QR codes, you need to keep in mind that this is a very finicky workflow. The diffusion process can easily “eat” or modify some components of the QR code so that the final image is not readable anymore. The subject matter and model used matters surprisingly much. Subjects which are somewhat noisy (such as the brain prompt in the featured image above) tend to give enough to the model to work with to reshape that area in a way that creates a QR code. Wheres no matter how hard I tried, I couldn’t get it to generate a QR code with an anime model and an anime woman in the subject.

Along with the basic option to provide the QR Code text, you can also customize some more areas from it. For example you can choose where the QR code will be placed in the image. By default we’ll always display it in the center, but sometimes the composition might be easier if you choose to place it on the side, or to the bottom. You can choose a different prompt for the anchor squares, increase or decrease the border thickness, and more. Your front-end should hopefully be explaining these options to you.

If you want to try and make some yourselves right now, I’ve added the necessary functionality to my Lucid Creations front-end already, so feel free to give it a try right now.

Continue reading further to get some development details.

The road leading to me making this feature available was fairly long. Other than all the other priorities I had for the horde, we also had the misfortune that one of our core contributors on the backend/comfyUI side, went suddenly missing at the end of summer. As I am still more focused the middleware/api and infrastructure (plus so much more, halp!) and Tazlin is focused on efficiency, and code maintenance & quality, we didn’t have the necessary skills to add something as complex as QR code generation.

Once it was clear that our contributor wasn’t coming back and nobody else was stepping up to help, I finally accepted that if I want it done, I have to learn to do that part myself as well. So in the past few months I embarked on a journey to start adding more and more complex comfyUI workflows. First came Stable Cascade which required me to build code which can load 2 different model files at the same time. Then Stable Cascade Remix which required that I wrangle up to 5 source images together.

Note that I’m mostly re-using existing fairly straightforward ComfyUI workflows which do these tasks. I don’t have the bandwidth to learn ComfyUI itself that much. But the work of making said workflows function within the horde-engine with payloads that are send via the AI Horde REST API is quite a complex amount of work on top of those. As I hadn’t built this “translation layer”, I was avoiding that area of the code until now, and this work helped me build up enough knowledge and confidence to be able to pull of translating a much much more complex ComfyUI workflow like the QR codes.

So after many months, I decided it was finally the time to tackle this problem. The first issue is getting an actually good QR Code ComfyUI workflow. Unlike the previous workflows I used, it’s surprisingly difficult to find something that works immediately. Most simple QR Code workflows both required that one generates the QR image externally and generated mostly unscannable images.

I was fortunate enough to run into this excellent devlog by Corey Hanson who not only provided instructions on what works and what doesn’t for QR codes, but even provided a whole repository with prebuilt ComfyUI workflows and a custom node which would also generate a QR code as part of the workflow. Perfect!

Well, almost perfect. Turns out the provided ComfyUI workflows were fairly old, and at the rate GenerativeAI progresses even a couple of months means something can easily be too stale to use. On top of that they were using a lot of extra custom nodes in their examples that didn’t parse, which a ComfyUI newbie like me had to untangle. Finally those workflows were great, especially for local use, but a bit overkill for the horde usage.

So first order of business was to understand, then simplify the workflow to just do the bare needed to get a QR code. Honestly it took me a bit of time to simply get the workflow running in ComfyUI itself and half-way understand what all the nodes were doing. After that I had to translate it to the horde-engine format, which by itself required me to refactor how I parse all comfyUI workflows to make it more maintainable in the future.

Finally QR codes require a lot more potential text inputs, which I didn’t want to start explicitly storing in the DB as new columns as they’re used only for this specific purpose. So I had to come up with a new protocol for sending an open ended amount of extra text values. Fortunately I had already the extra_source_images code deployed so I just copied part of the same logic to speed things up.

And then it was time for unit tests and the public beta and all the potential bugs to fix. Which is when I realized that the results on SD 1.5 models were a bit…sucky, so I went back to ComfyUI itself and actually figured out how to make the workflow work with SDXL as well. The results were way more promising.

Unfortunately while the SDXL QR Codes are way nicer, the requirements to generate them are almost tripled compared to SD 1.5. Not only does one need to run SDXL models, but SDXL controlnets are almost as big as the models themselves. The QR code controlnet is 5G on its own, and all that needs to be loaded in VRAM at the same time as the mode. All this means that even middle-range GPUs struggle to generate SDXL QR codes in a reasonable amount of time. This meant that I also had to adjust the worker to give the option for people serving SDXL models to skip SDXL controlnet, and also properly route this switch via the AI Horde.

Nevertheless, this an areas that makes the AI Horde shine, as those with the necessary power, can support those who need it. Most people will find it really hard or frustrating to generate even a single QR code, never-mind an SDXL one, only to discover that it’s unscannable, but through the horde they can easily generate dozens with very little expertise needed and find the one that works for them.

So It’s been a long journey, but it’s finally here, and the expertise I gained by achieving it also means that I now have enough knowledge to start adding more features via ComfyUI. So stay tuned to see more awesome workflows on the AI Horde!

Eudaimonia community

I thought it might be interesting to point out that I opened a new community in the Divisions by zero lemmy to post things about content living as I couldn’t find any other fitting space. There’s just not a lot of locations one can share articles and discuss about such topics that also don’t devolve into spiritualism or self-help guru grifts, both of which I intensely dislike.

So that community is to post about things in materialistic context, with a preference for empiricism and scientific thinking about it, but more squishy secular philosophy is also encouraged for topics which don’t work too well empirically.

If you’ve been around, you probably know I’m going to be posting some Epicurus sooner or later 😀

Take a look and post some relevant stuff you run into.

Image Remix on the AI Horde

The initial deployment of the Stable Cascade (SC) on the AI Horde supported just text2image workflows, but that was just a subset of what this model can do. We still needed to onboard the rest of its capabilities.

One such capability was the “image variations” option, which allows you to send an image to the model, and get a variation of that image, perhaps with extra stuff added in, using the unClip technology. This required quite a bit of work on hordelib so that it uses a completely different ComfyUI workflow but ultimately this was not so much harder than just adding the img2img capabilities to SC.

The larger difficulty came when I wanted to add the feature to remix multiple images together. The problem being that until now the AI Horde only supported sending a single source image and a single source mask, so a varying amount of images was not possible at all.

So to support this, I needed to touch all areas of the AI Horde. The AI Horde had to accept and upload each of them on my R2 bucket and provide individual download links. The SDK had to know to expect and provide methods to download those images in parallel to avoid delays, to the reGen worker had to be able to receive those images and send them to hordelib which should know how to dynamically adjust a comfyUI pipeline on-the-fly to add as many extra nodes as required.

So after 2 weeks of developing and testing, we finally have this feature available. If your Horde front-end supports the “remix” feature. You can send up to 1-6 images to this workflow along with a prompt, and it will try its best to “squash” them all together into one composition. Note that the more images you send, and the larger the prompt, the harder it will be for the model to “retain” all of them in the composition. But it will try its best.

As an example, here’s how the model remixes my own avatar. You’ll notice that the result can understand the general concepts of the image, but can’t follow it exactly as it’s not doing img2img. The blur is probably caused by the need to upscale my original image, which is something I’d like to fix on the next pass.

Likewise, this is the Haidra logo

And finally, here’s a remix of both logo and avatar together

Pretty neat, huh?

This ability to send extra source images also lays the groundwork for the Horde to support things like InstantID, which I hope I’ll be able to work on supporting soon enough.

The playground schematic analogy for designing a fediverse service.

In recent days, the discussion around Lemmy has become a bit…spicy. There’s a few points of impact here. To list some examples:

This is not an exhaustive list. There’s significant grumbling about lemmy under the mastodon hashtag too.

On the flipside, there’s also been positive reinforcement towards lemmy and its dev, as can be seen by the admin of lemm.ee and many of the lemmy ecosystem in that thread.

You’ll notice all of these are frustrations about the (lack) of sufficient moderation in the tool-set of lemmy. This is typically coming from a lemmy admin’s perspective and the things that are very important to protect themselves and their communities.

In the discussions around these issues, a few common arguments have been made, which while sounding reasonable at face value, I think are the wrong thing to say to the situation at hand. The problem is somewhat that the one making these arguments feels like they’re being more than fair, while the ones receiving them feel dismissed or disrespected.

Before I go on, I want to make clear that I am writing this out of a place of support. I have been supporting lemmy years before the big lemmy exodus and after I made my own lemmy instance, I have created dozens of third party tools to help the ecosystem, because I want lemmy to succeed. That is to say, I’m not a random hater. I am just dismayed that the community is splintering like this, out of what seems to me, like primarily a communication issue.

So one of the analogies made in the sunaurus thread, likened lemmy development to designing a playground. It strikes me that this analogy is perfect, but not for the reason the one making it expects. Rather, it is perfect for exemplifying how someone coming from wholeheartedly supporting FOSS developers might still misjudge the situation and escalate a situation through miscommunication.

In this analogy, the commenter likened lemmy development like building a playground and external people asking for some completely unrelated feature, like a bird-watching tower, and expecting the developers to give it priority. The problem here is that the analogy is flawed. The developers are not building a playground for themselves. They’re building a playground schematic, which they expect people would and should deploy in many other locations.

Some people might indeed ask for “bird-watching observation posts” in such a schematic and it would be more than fair to ask them to build it themselves. but it is fallacious to liken any and all requests as something as out of scope as this. Some people might request safety features on the playground and those should absolutely be given more priority. We already know what can happen if you design a shitty playground, even if you give it for free!

To extend this analogy, the other lemmy admins, are not asking for luxury features. They are asking for improvements in the safety of the playground. Some people point out that metal slides become dangerous based on the weather. Some other point out that the playgrounds might be built in very unsafe areas, so a fence to protect the children from predators should be mandatory.

And here is the disconnect in communication happens. The overworked developer is already busy designing the next slide which can get them paid, or making sure things don’t break down as fast etc, and they perceive the safety requests as “luxury” items, someone should deal with themselves. However for the people who have to deal with upset parents and missing children, this dismissive attitude come out as downright malicious.

And thus you have a situation where both sides see the other are unreasonable. The devs see the people asking for the safety features as entitled, while the people who are suffering through the lack of those safety features perceive the developers as out-of-touch and dismissive.

Leave such a situation to fester long enough, and you start to get the exact situation that we have now. The Lemmy software starting to get a bad reputation in the areas concerned about most safety while forks and rebuilds are popping up.

All of this hurts the whole FOSS ecosystem by splintering development effort into multiple projects instead of collaborating on a single one. It turns our strength into a weakness!

This also brings me to another argument I see lemmy devs making somewhat too often. That they don’t have anything to gain from a larger community and just get more headaches. I always felt this was a patently absurd statement!

The lemmy devs are making more than 3K a month from lemmy. Enough that they are claiming they’re working on lemmy full-time. These funds don’t come because they’re running or developing a single forum for themselves. They come because they provide the “playground schematic”. If the community splinters into other software than lemmy, naturally the funds going towards lemmy development will likewise dry up.

This statement is completely upside down. The more people there are using and hosting lemmy, the more the lemmy developers benefit.

I would argue, the people lemmy developers should be listening to most are exactly the people hosting lemmy instances. These are the people putting incalculable hours into running and maintaining the servers and often paying out of pocket per month, for giving a service to others. Each admin is basically free value to the lemmy developers.

My position is in fact is that this scales down like layers. Lemmy Admins need to listen to instance admins most. Instance admins need to listen to community mods most. And finally community mods should listen to their users most. In this way you create bottom-up feedback mechanism, that doesn’t overwhelm any single person easily and everyone has a chance to be heard.

In AI Horde, I follow a similar approach. The segment of my community I listen to the most is in fact not the ones who are giving me money. It’s the ones who are providing their free time and idle compute for no other benefit than their own internal drive: The workers. They are effectively using their time for the benefit the whole ecosystem, which indirectly benefits me most. Without the workers, there would practically be no AI Horde, even if I am the only remaining. Likewise, without lemmy admins, lemmy as a software would be dead, even if the lemmy.ml people kept hosting forever.

So what can be done here. I think an important aspect here is to make sure we are talk in the same wavelength. Cutting down on miscommunication is very important to avoid exacerbating an already precarious situation.

Secondly, it’s totally understandable that lemmy devs don’t have enough time for everything. But likewise, there’s a ton of people who need safety features but can’t get them. As such, in my opinion priority should be put into making the frameworks that more easily allows people to extend lemmy functionality, even if it doesn’t match the lemmy developers visions, or immediate roadmap.

For this reason I strongly suggest that effort should be put into developing a plugin framework for lemmy. Ironic to suggest a completely different feature when the problem is too many things to do, but this specific feature is meant to empower the larger community to solve their own problems easier. So in the long term, it will massively reduce the incoming demands to the lemmy developers.

In the meantime, I do urge people to always consider that there’s always a human behind the monitor on the other side. A lot of time people don’t have the skills to effectively communicate what they mean, which is even worse in text form. We all need be a bit more charitable on what the other side is trying to say, especially when we’re trying to collaborate for a common FOSS project.

Post-Mortem: The massive lemmy.world -> lemmy.dbzer0.com federation delays.

A couple days ago, someone posted on /0 (the meta community for the Divisions by zero) that the incoming federation from lemmy.world (the largest lemmy instance by an order of magnitude) is malfunctioning. Alarmed, I started digging in, since a federation problem with lemmy.world will massively affect the content my community can see.

As always my first stop was the Lemmy General Chat on Matrix where I asked the lemmy.world admins if this appears to be something on their end. To their credit both their lead infra admin and the owner himself jumped in to assist me, changing their sync settings, adding custom DNS entries and so on. Nothing seemed to help.

But the problem is must still be somewhere in lemmy.world I thought. It’s the only instance where this is happening and they upgraded to 0.19.3 recently, so something must have broken. But wait, this didn’t start immediately after the upgrade. Someone pointed out this very useful federation status page, which kinda point that the problem is only on lemmy.world.

Not quite, other big instances like lemmy.ml and lemm.ee were not having any issues with federation with lemmy.world (even though 2 dozen others like lemmy.pt were), and they are as big if not bigger than lemmy.dbzer0.com. A problem originating from lemmy.world cannot be possibly affecting only some specific instances. To make matters worse, both me and lemmy.ml are using the same host (OVH), so I couldn’t even blame my hosting provider somehow.

So obviously the main culprit it somewhere in my backend, right? Well, maybe. Problem is, none of the components of my infrastructure were overloaded, everything sitting between 5-15% utilization. Nothing to even worry about.

OK, so first I need to make sure it’s not a network issue somehow specifically between me and lemmy.world specifically. I know OVH gave me a bum floating IP in the past and were completely useless at even understanding that their floating IP was faulty, so I had to stop using it. Maybe there’s some problem with my loadbalancers.

Still, I’m using haproxy, which is nothing if not fast and rock solid. So I didn’t really suspect the software. Rather, maybe it’s a network issue with the LB itself. So first thing I did is double the amount of Loadbalancers in play, by setting my DNS record to point to my secondary LB at the same time. This should lessen the amount of traffic hitting my LB and even take them at a completely different VM, and thus point if the problem is on the haproxy side. Sadly, this didn’t improve things at all.

OK so next step, I checked how long a request takes to return from the backend after haproxy sends it over. The results were not good.

I don’t blame you if you cannot read this, but what this basically says is that a request hitting a POST on my /inbox, took between 0.8 and 1.2 seconds. This is bad! This is supposed to be a tiny payload to tell you an event happened on another instance, it should be practically instant.

Even more weird, this is affecting all instances, not just lemmy.world. So this is clearly a problem on my end, but it also confused me. Why am I not having troubles with other instances? The answer came when I was informed that 0.19.3 added a brand new, special new federation queue.

You see, the old versions of lemmy used to send all federation actions over as soon as they received them. Fire and forget style. This naturally lead to federation events being dropped due to a myriad of issues, like network, downtimes, gremlins etc. So you would lose posts, comments and votes, and you would (probably) never realize.

The new queue added order to this madness, by making each instance send its requests serially. A request would be sent again and again until it succeeded. And the next one would only be sent if the previous one was done. This is great for instances not experiencing issues like mine. You see, at this point, I was processing 1 incoming federation request per second approximately, while lemmy.world was sending around 3. Even worse, I would occasionally timeout as well by exceeding 10 seconds to process, causing 2 more seconds or wait time.

Unlike lemmy.world, other federating instances to mine didn’t have nearly as much activity, so 1 per second was enough to keep up to sync with them. This explained why I seemingly was only affected by lemmy.world and nobody else. I was somewhat slow, but only slow enough to notice if the source had too much traffic.

OK, we know the “what”, now we needed to know the “why”.

At this point I’m starting to suspect something is going on my Database. So I have to start digging into stuff I’m really not that familiar. This is where the story gets quite frustrating, because there’s just not a lot of admins in the chat who know much about the DB stuff of lemmy internals. So I would ask a question, or provide logs, and then had to wait sometimes hours for a reply. Fortunately both sunaurus from lemm.ee and phiresky were around, who could review some of my queries.

Still, I had to know enough sql to craft and finetune those queries myself and how to enable things like pg_stat_activity etc.

Through trial and error we did discover that some insert/update queries were taking a bit too much time to do their thing, which could mean that we were I/O bound. Easy fix, disable synchronous_commit, sacrificing some safety for speed. Those slow queries went away, but the problem remained the same. WTF?!

There was nothing else clearly slow in the DB, so there was nothing more we could do there. So my next thought was, maybe it’s a networking issue between my loadbalancers and my backend. OK so I needed to remove that from the equation. I set up a haproxy directly on top of my backend which would allow me to go through the loopback interface and have 0 latency. For this I had to ask the lemmy.world admins to kindly add lemmy.dbzer0.com directly to their /etc/hosts file so they alone would hit my local haproxy.

No change whatsoever!

At this point I’m starting to lose my mind. It’s not networking between my LB and my backend, and it’s not the DB. It has to be the backend. But it’s not under any load and there’s no errors. Well, not quite. There’s some “INFO” logs which refer to lost connections, or unexpected errors, but nobody in the chat seems to worry about them.

Right, that must mean the problem is networking between my backend and my database, right? Unlike most lemmy instances, I keep my lemmy DB and my backend separated. Also, the DB has a limited amount of connections and lemmy backend itself limits itself to a small pool of connections. Maybe I run out of connections because of slow queries?

OK let’s increase that to a couple thousands and see what happens.

Nothing happens, that’s what happens. Same 1 per second requests.

As I’m spiraling more and more towards madness, and the chat is running out of suggestions, sunaurus suggests that he adds some extra debugging to lemmy and I will run that to try and figure out which DB query is losing time. Great idea. Problem is, I have to compile lemmy from scratch to do that. I’ve never done that before. Not only that, I barely know how to use docker in the first place!

Alright, nothing else I can do, got to bite that bullet. So I clone the lemmy backend and while waiting for sunaurus to come online, I start hacking at it to figure out how to make it compile a docker lemmy backend from scratch. I run into immediate crashes and despair. Fortunately nutomic (one of the core devs) walked by and told me the git commands to run to fix it, so I could proceed in cooking my very first lemmy container. Then nutomic helped me realize I don’t need to set up a whole online repo to transfer my docker container. The more you know…

Alright, so I cooked a container and plugged it onto a whole separate docker infra, which is only connected to the lemmy.world loadbalancer, so I can remove all other logs from anything but federation requests. So far so good.

Well, not quite, unfortunately I forgot that the “main” branch of lemmy is actually the development branch and has untested code in there. So when I was testing my custom docker deployment, I migrated my DB to whatever the experimental schema is on main. Whoops!

OK, nothing seemingly broke. Problem for a different day? No, just foreshadowing.

Finally sunaurus comes back online and gives me a debug fork. I eagerly compile and deploy it on prod and then send some logs to sunaurus. We were expecting we’d see 1 or 2 queries that were struggling, so maybe a bad lock situation somewhere. We did not expect we’d see ALL queries, including the most simple query such as lookup a language, take 100ms or more! That can’t be good!

Sunaurus connects the dots and asks the pertinent question: “Is your DB close to the Backend, geographically?”

Well, “Yes”, I reply, “I got them in the same datacenter”. “Can you ping?” he asks.

OK, I ping. 25ms. That’s good right? Well, in isolation, that’s great. When it’s not so great is when talking about backend-to-DB communication! This like 1000s km distance.

You see, typically a loadbalancer just makes one request to the backend and gets one reply, so a 25ms roundtrip is nothing. However a backend is talking to the DB a lot. In this instance, for every incoming federation action the backend does like 20 database calls, to verify and submit. Multiply each of these by 25+25 roundtrip and you got 1000ms extra before any actual processing on the DB!

But how did this happen? I’m convinced all my servers are in the same geographic area. So I go to my provider panel and check. Nope, all my server BUT the backend are in the same geographic area. My backend happens to be around 2000 km away. Whoops!

Turns out, when I was migrating my backend back in the day I run into performance issues, I failed to pay attention to that little geographic detail. Nevertheless It all worked perfectly well until this specific set of circumstances where the biggest lemmy instance upgraded to 0.19.3 which caused a serial federation, which my slow-ass connection couldn’t keep up. In the past, I would just get flooded by sync requests by lemmy.world as they came. I would be slow, but I’d process them eventually. Now, the problem became obvious.

Alright, it’s time to put up my sleeves and it’s migrate servers! Thank fuck I have everyone written in Ansible as code, so the migration was relatively painless (other than slapping Debian 12 around to let me do fucking docker-compose operations with python, goddamnit!)

A couple of hours later, I had migrated my backend to the same DC as the Database, and as expected, suddenly my ingestion rate for federation actions was in the order of 50ms, instead of 1000ms. This means I could ingest closer to 20 actions per sec from lemmy.world and it was getting just 3/s new from its userbase. Finally we started catching up!

All in all, this has been a fairly frustrating experience and I can’t imagine anyone who’s not doing IT Infrastructure as their day job being able to solve this. As helpful as the other lemmy admins were, they were relying a lot on me knowing my shit around Linux, networking, docker and postgresql at the same time. I had to do extended DB analysis, fork repositories, compile docker containers from scratch and deploy them ad-hoc etc. Someone who just wants to host a lemmy server would give up way earlier than this.

For me, a very stressing component was the lack of replies in the chat. I would sometimes write pages of debug logs, and there was no reply from anyone for 6 hours or more. It gave me the impression that nobody had any clue what to do to help me and I was on my own. In fact, if it wasn’t for sunaurus specifically, who had enough Infrastructure, Rust and DB chops to get an insight out of where it was all going wrong, I would probably still be out there, pulling my hair.

As someone hosting a service like this, especially when it has 12K people in it, this is very scary! While 2 lemmy core developers were in the chat, the help they provided was very limited overall and this session mostly relied on my own skills to troubleshoot.

This reinforced in my mind that as much as I like the idea of lemmy (or any of the other threadiverse SW), this is only something experts should try hosting. Sadly, this will lead to more centralization of the lemmy community to few big servers instead of many small ones, but given the nature of problems one can encounter and the lack of support to fix them if they’re not experts, I don’t see an option.

Fortunately this saga ended and we’re now fully up to sync with lemmy.world. Ended? Not quite. You see today I realized I couldn’t upload images on my instance anymore. Remember when I started the development instance of lemmy by mistake from main? Welp that broke them. So I had to also learn how to downgrade a lemmy instance as well. Fortunately sunaurus had my back on this as well!

To spare some people the pain, I’ve sent a PR to the lemmy docs to expand the documentation for building docker containers and doing troubleshooting. My pain is your gain.

This also gave me an insight about how the federation of lemmy will eventually break when a single server (say, lemmy.world) grows big enough to start overwhelming even servers who are not badly setup like mine was. I have some ideas to work around some of this so I plan to a suggestion on how to become more future proof, which would incidentally prevent the same issue which happened to me in the first place.

In the meantime, enjoy the Divisions by zero, which as a result of the migration should now feel massively faster as well!

Stable Cascade on the AI Horde!

A while ago Stability.ai released a new model on a different architecture, that seems to provide very promising results and very fast training: Stable Cascade. I really wished to offer it on the AI Horde so after getting explicit permission from Emad in Reddit PMs (due to its more restrictive license for APIs), I set out to implement it.

Unfortunately the Stable Cascade model and ComfyUI workflow require the use of two different checkpoints, which went against the AI Horde worker paradigm at the time, which expected one file per model, so I had to make multiple changes in a lot of packages which expected this paradigm. The Worker, hordelib, the model reference and its SDK, all of them required tweaking to avoid crashing.

Fortunately, while the changes were complicated, I managed to implement them without much debugging. I did initially run into some troubles with the image quality being garbage, which turned out required ComfyAnon tweaking the implementation on ComfyUI a bit, but once that was done, everything fell in place and now you can use the AI Horde to request Stable Cascade images and therefore check the capability of this model, even if you don’t have 20G VRAM to spare.

You can try it out on Artbot

Alongside Stable Cascade, I thought it’s high time we start expanding our SDXL model selection, so the following models have also been onboarded.

  • Juggernaut XL
  • Anime Illust Diffusion XL
  • Pony Diffusion XL
  • Animagine XL
  • DreamShaper XL (Lightning version)

We quickly realized that we also need to expand our model reference to better inform people of the requirements for some of these models. For example Pony Diffusion XL doesn’t work unless you set clip_skip to 2, and DreamShaper requires low steps, cfg and specific samplers. If you know to set those settings correctly, you’ll get amazing images, else you get hot garbage. Soon the horde will be warning you when trying to use a model outside its specifications.

Other than that, we haven’t been completely idle. Some other notable achievements in the previous weeks are:

Firstly, the AI Horde now supports an educator role for accounts. If you are an education institution and you want to use one of the AI Horde free tools for the classroom, you can request your account to be set as an educator, which will force all your requests to be SFW and increase your account’s concurrency.

I also spent some time improving the AI Generation of the Mastodon bot @dungeons, so that it gets nicer images for each campaign protagonist. Will admit I had a lot more fun than I should improving the versatility and variability of the generations and tweaking then results for each model. You can see (or follow) the results in the dedicated account replying with those images.

On the worker side, Tazlin has also been very busy improving the efficiency of our generations. We have added now some improvements such as downloading the loras for the next job, while performing the inference for the previous one, or adding more efficiency for those people with more powerful machines.

I’m now hard at work trying to onboard more Stable Cascade capabilities as they are added to ComfyUI and to add support for more advanced workflow capabilities.

Can we improve the Fediverse Allow-List Model?

Looks like someone really kicked the hornet’s nest recently on mastodon by announcing (not even deploying) a Mastodon-BlueSky bridge. Just take a look at the github comments here to get an idea of how this was received.

Plenty of people way more experienced than myself have weighted on this issue so I don’t feel the need to leave my two cents. However I wanted to talk about a very common counter-argument made towards those who do not want such bridges to exist. Namely, that Fediverse already provides the tools towards not having such a bridge be an issue: The allow-list model.

The idea being that if your ActivityPub server by default rejects all federation except towards trusted instances, then such bridges pose no problems whatsoever. The bridge (and any potential undercover APub scrappers) would not be able to get to your instance anyway.

Naturally, the counterargument is that this is way too limiting to one’s reach, and they shouldn’t be forced into isolation like this. Unfortunately the alternative here appears to try and scold others into submission, and this is unlikely to be long term solution. Eventually the Eternal September will come to the Fediverse. If you spent the past few years relying on peer pressure to enforce social norms, then the influx of people who do not share your values is going to make that tactic moot.

In fact, we can already see the pushback to the scolding tactics unfolding right now.

The solution then has to be a way to improve the way we handle such scenarios. Improve the tooling and our tactics so that such bridges and scrappers cannot be an issue.

A lot of the frustration I feel also comes down to the limited set of tools provided by Mastodon and other Fediverse services. A lot of the time, the improvement of tooling is stubbornly refused by the privileged core developers who don’t feel the need to support the needs of the marginalized communities. But that doesn’t mean the tooling couldn’t be expanded to be more flexible.

So let’s think about the Allow-List model for a moment. The biggest issue of an Allow-List is not necessarily that the origin server restricts themselves from the discussion. In fact they’re probably perfectly happy with that. The problem is that if this became the norm, it massively restricts the biggest strength of the Fediverse, which is for anyone to create and run their own server.

If I make a new server and most of everyone I want to interact with is in Allow-List mode, how do I even get in? We then have to start creating informal communication channels where one has to apply to join the allow-circle. Such processes have way too many drawbacks to list, such as naturally marginalizing Neurodivergent people with Rejection Sensitivity Dysphoria, balkanizing the Fediverse, empowering whisper networks and so on.

I want to instead suggest an alternative hybrid approach: The Feeler network. (provisional name)

The idea is thus: You have your well protected servers in Allow-List mode. These are the servers which require protection from constant harassment when their posts are spread publicly. These servers have a few “Feeler” instances they trust on their allow-list. Those servers in turn do not have an allow-mode turned on, but rely on blocklist like usual. Their users would be those privileged enough to be able to handle the occasional abuse or troll coming their way before blocking them.

So far so good. Nothing changes here. However what if those Feeler servers could also use the wider reach to see which instances are cool and announce that to their trusted servers? So a new instance appears in your federation. You, as a Feeler server, interact with them for a bit and nothing suspicious happens, and their users seem all to be ideologically aligned enough. You then add them into a public “endorsed list”. Now all the servers in your trust circle who are in allow-mode see this endorsement and automatically add them to their allow-lists. Bam! Problem solved. New servers have a way to be seen and eventually come into reach with Allow-List instances through a sort of organic probation period, and allow-listed servers can keep expanding their reach without private communications, and arduous application processes.

Now you might argue: “Hey Db0, yes my feelers can see my allow-list server posts, but if they boost them, now anyone can see them, and now they will be bridged to bluesky and I’m back in a bad spot!”

Yes this is possible, but also technically solvable. All we need to do is to make the Feeler servers only federate boosted posts from servers in allow-mode, to the servers that the ones in the allow-list already allow. So let’s say Server T1 and T2 are instances in allow-list mode which trust each other. Server F1 is a Feeler server trusted by T1 and T2. Server S1 is an external instance that is not blocked by F1, but not yet endorsed either. User in F1 boosts a post from T1. Normally a user in S1 would see that post by following that user. All we need to do is to change the software so that if F1 boosts a post from T1, the boost would only federate towards T2 and other instances in T1’s allow-list, instead of everyone. Sure this would require a bit more boost complexity, but it’s nothing impossible. Let’s call this “protected boost”.

Of course, this would require all Apub software to expose an “Endorsement” list for this to work. This is where the big difficulty comes from, as you now have to herd the cats that are the multitude of APub developers to add new functionality. Fortunately, this is where tools like the Fediseer can cover for the lack of development, or outright rejection by your software developer. The Fediseer already provides endorsement functionality along with a full REST API, so you can already implement this Feeler functionality by a few simple scripts!

The “protected boost” mode would require mastodon developers to do some work of course, as that relies in the software internals which cannot be easily hacked by server admins. But this too can potentially just be a patch to the software that only Feeler-admins would need to run.

The best part of this approach is that it doesn’t require any communication whatsoever. All it needs is for the “Feeler” admins to be actively curating their endorsements (either on the Fediseer, or locally if it’s ever added to the SW). Then all allow-list server has to do is choose which Feelers they trust and “subscribe” to their endorsement list for their own allow-list. And of course, they can synchronize or expand their allow-list further as they wish. This approach naturally makes the distributed nature of the Fediverse into a strength, instead of a weakness!

Now personally, I’m a big proponent of the “human touch” in social networks, so I feel that endorsement lists should be a manual mechanism. But if you want to take this to the next level, you could also easily set up a mechanism where newly discovered instances would automatically pass into your endorsement list after X weeks/months of interaction with your user without reports and X-amount of likes or whatever. Assuming admins on-point, this could make widely Feeler servers as a trusted gateway into a well protected space on the fedi, where bad actors would find it extraordinarily difficult to infiltrate, regardless of how many instances they spawn. And it this network would still keep increasing each reach constantly, without adding an extraordinary amount of load to its admins.

Barring the “protected boost” mode, this concept is already possible through the Fediseer. The scripts to do this work already exist as well. All it requires is for people to attempt to use it and see how it functions!

Do point out pitfalls you foresee in this approach and we can discuss how to potentially address them.

Error Codes and Styling

Does the above image look scary? If so, you might just just be a software developer!

The above is the result of a long-time coming, but massive pull request to standardize the formatting of the AI Horde code. I’ve been meaning to do this ever since I discovered the black and ruff tools, but I’ve been procrastinating for almost as long. Well, I finally somehow got my ass in motion to do it. Including writing tests, and doing some careful regression testing, It took me like a week in total. And I still didn’t apply all of the ruff checks either.

What this means is that from now on, anyone sending a change, can simply run ruff . --fix && black . and it will automatically format all changes to match our standards. Making the code predictable to read and reducing some bad programming practices and potential tech debt.

Also, as a software dev, finally doing this kind of operation is so satisfying. Not much fun to do, but you’re very happy to have this done. What’s a good analogy for this? a peeling session (post your best analogies in the comments)?

Soon after, I also deployed another change that might be useful for AI Horde integrators out there. I have now added unique error return codes to each error message from the horde. This should make it easier to parse the various errors the horde might spit out with code, instead of having to parse an error message which might potentially change in the future. It also allows you to do things like error code translations (although I think it might be useful to allow people to send translations for the various RCs to the horde as PRs, so that we don’t force every frontend to reinvent them)

I also wrote a README page detailing all the existing RCs.

There’s also been the various bugfixes and improvements on the worker, sdk and hordelib code. Remember to update your reGen worker regularly!

Once again, many thanks to NLNet for providing the funding for such “necessary chore” tasks like these. These kind of things are not a ton of fun to do, as they don’t add any new functionality to the project, but they massively help future development by reducing tech debt.

Webhooks on the AI Horde

Today I am excited to announce that I have deployed a new feature which allows you to specify a webhook when requesting a generation on the AI Horde. If you do that, once each generation is completed on the AI Horde, it will send POST request to the specified url, with a payload matching the request type.

Apropos, it’s a good time to announce I have started writing some integration information for the AI Horde, which contains information about the available API, and SDKs, and of course, the new webhooks. Feel free to send PRs to improve it!

This new functionality can allow a few more efficient ways of using the AI Horde. For example you could avoid polling the AI Horde every second or so, and rely on webhooks, and only do a manual poll every 30 seconds or so, if the requests have not webhooked over to you yet. The AI Horde will retry a webhook 3 times before giving up, so in case of network issues etc, you can always check the status manually as usual. This approach would reduce the load on the AI horde, while at the same time giving you faster results. It’s what I call a win-win!

Of course, not all clients can support webhooks, so for those who can’t, the existing functionality will continue working as usual.

Ludicrous Speed!

One very useful feature I’ve been meaning to support for the AI Horde for a while has been request batching. Request batching is the function to generate multiple Stable Diffusion images in parallel, by using internal mechanism to the ML libraries, instead of splitting them into multiple processes. Due to the re-using common parts of the request, it allows the GPU to generate each extra image with just 20% slowdown, instead of 100%, so long as you stay within your GPU’s power.

Soon after we finished adding LCM support, I turned my view to making this a possibility, as between these two features, it could massively increase the overall speed at which the AI Horde completes requests. The only problem is the overall complexity to handling this in the Inference.

Today I’m proud to announce that the AI Horde natively supports smartly batching multiple images in the same request when possible which can result in massive improvements in overall speed! Read on for more details of how we achieved it.

By relying on ComfyUI, the most difficult part was done, and earlier work done by Tazlin and Jug had already prepared the ground to use our hordelib library to handle sending such batched requests to the comfy engine, but I still had a lot of work to do to not only allow the AI Horde accept and queue such loads properly, but also for the worker to be able to understand payloads for multiple images.

Fortunately due to the new setup of the reGen worker, being able to adjust it to accept one job for multiple images and then submit multiple image results at the end was easier than I expected. Of course doing multiple image submissions was the hardest part and I had to basically refactor that whole area of the code.

The AI Horde queuing part was not as code intensive. Making a worker pick up multiple requests when possible was not particularly hard, but not giving the worker more than it can “chew through” was. You see your worker might be able to do 1 image at 2048×2048, and it might be able to do 20 images at 512×512. However give it 20x2048x2048 and it will fall down and die! So this required a bit of fancy footwork. The way I solved this is that the worker declares how many batched images they can do along with its max resolution. The horde then assumes that the worker can safely achieve their max batching at 1/3rd of their max resolution. After this part, as the requested resolution of a job increases, the horde will smartly reduce the amount of batches from a job it will give that worker.

Practically this means that when I declare I can do 20 batches and my max resolution for one image is 2048×2048, then I will pick my full 20 images at 512×512 but will only pick 7 images at my full resolution.

Therefore the AI Horde will continue smartly slicing a request for multiple images into a number of jobs. Only this time instead of each job being 1 image, it can be multiple. Effectively this means that the horde is able to way more efficiently utilize the maximum processing power of each worker and therefore the overall performance improves!

There were a few hiccups along this development as well. For one I realized that the hordelib code did not handle batching for img2img requests at all, so I had to pull up my sleeves and jump into the way hordelib translates requests to comfyUI nodes and figure it all out. It took me a while but now that I understand this better, it will make it easier for me to add even more fancy additions to our comfy workflows!

Another somewhat important problem is that the seed returned by batched requests in comfy is not accurate. The explanation of this is a bit too technical, but at the end of the day, there is extra variable when trying to replicate an image generated via a batch, on top of the generation seed. Currently the horde will return the relevant “batch_id” in the generation metadata, which I hope in the future to use so I can add a way to replicate images from batched requests as well.

For now, if you need to ensure you can always replicate your images via a seed, the best way to do it is to request them using the new disable_batching keyword on your request. Setting this to true will make your request always split to 1 image per job, which is the way the horde used to work until now. However since disable_batching is significantly less optimal than batching, it is only available to trusted users (i.e. those who’ve been running workers for a while) and patreon supporters.

Of course you can continue manually splitting your requests to 1 image per request, but that already has increased kudos costs, and in the future this might get disincentivized further for the health of the AI Horde.

Between batching and LCM proliferation, we’re already starting to see significantly improved generation times on the AI Horde. To the point that with enough priority, you can receive 20x1024x1024 images in less than a minute! A small problem is that currently one of our most popular frontends, Artbot, defaults to manually splitting each request to 1 per image. Nevertheless, Its developer Rockbandit is already hard at work making their requests batching-compatible and once that happens, I expect the overall speed with massively improve!