How did we move from forums to Reddit, Facebook groups, and Discord?

From the first moment I first went online in 1996, forums were the main place to hang out. In fact the very first thing I did was join an online forum run by the Greek magazine “PC Master” so I could directly to my favourite game reviewers (for me it was Tsourinakis, for those old enough to remember).

Whoever didn’t like the real-time nature of the IRC livechat, forums were all the rage and I admit they had a wonderful charm for the upcoming teenager who wanted to express themselves with fancy signatures and some name recognition for their antics. Each forum was a wonderful microcosm, a little community of people with a similar hobby and/or mind-frame.

BBcode-style forums took the web 1.0 internet by storm and I remember I had to juggle dozens of accounts, one for for each one I was interacting with. Basically, one for each video game (or video game publisher) I was playing, plus some Linux distros, hobbies, politics and the like. It was a wonderful mess.

But a mess it was, and if the dozens of accounts and constant context switching barely enough to handle for an PC nerd like myself, I can only imagine how impenetrable it was for the less tech-savvy. Of course, for people like me this was an added benefit, since it kept the “normies” out and avoided the “Eternal September” in our little communities.

However the demand for places accessible for everyone to discuss was not missing, it was just unfulfilled. So as soon as Web 2.0 took over with the massive walled gardens of MySpace, Facebook, Twitter and so on, that demand manifested and the ability for anyone to create and run a forum within those spaces regardless of technical competency or BBcode knowledge, spawned thousands of little communities.

Soon after Digg and then Reddit came out, and after the self-inflicted implosion of Digg, Reddit along with Facebook became the de-facto spot to create and nurture new async-discussion communities, once they added the functionality for everyone to create one and run it as they wanted.

But the previously existing BBcode forums still existed and were very well established. Places like Something Awful had such strong communities that they resisted the pull of these corporate walled gardens for a long time. But eventually, they all more or less succumbed to the pressure and their members had an exodus. What happened?

I’m not a researcher, but I was there from the start and I saw the same process play out multiple times in the old forums I used to be in. Accessibility and convenience won.

There’s a few things I attribute this to.

  1. The executive costs to create a new forum account is very high. Every time you want to join one, you need to go through making a username (often trying to find one that’s not taken, so now you have to juggle multiple usernames as well), new password, captchas, email verifications, application forms, review periods, lurker wait times and so on. It’s a whole thing and it’s frustrating to do every time. Even for someone like me who has gone through this process multiple times, I would internally groan for having to do it all over again.
  2. Keeping up to date was a lot of work. Every time I wanted to keep up to date with all my topics, I had to open new tabs for each of my forums and look at what’s new is going on. The fact that most of the forums didn’t have threaded discussions and just floated old discussions with new replies to the top didn’t help at all (“thread necromancy” was a big netiquette faux-pas). Eventually most forums added RSS feeds, but not only were most people not technical enough to utilize RSS efficiently (even I struggled), but often the RSS was not implemented in a way that was efficient to use.
  3. Discoverability was too onerous. Because of (1) Many people preferred to just hang out in one massive forum, and just beg or demand new forum topics to be added for their interests so they wouldn’t have to register, or learn other forum software and interact with foreign communities. This is how massive “anything goes” forums like Something Awful started, and this also started impacting other massive forums like RPGnet who slowly but surely expanded to many more topics. Hell almost every forum I remember has politics and/or “out of topic” sections for people to talk without disrupting the main topics because people couldn’t stop themselves.
    And where the forum admins didn’t open new subject areas, the bottom-up pressure demanded that solutions be invented in the current paradigm. This is how you ended up with immortal threads, thousands of pages deep for one subject, or regular mega-threads and so on. Internet life found a way.
  4. Forum admins and staff were the same petty dictators they always were and always will be. Personality cults and good ole boys clubs abounded. People were established and woe to anyone who didn’t know enough to respect it, goddammit! I run into such situations more than once, even blogged about it back in the day. But it was an expected part of the setup, so people tolerated it because, well what else will you do? Run your own forum? Who has the time and knowledge for that? And even if you did, would anyone even join you?

And so, this was the paradigm we all lived in. People just declared this is how it had to be and never considered any proper interactivity between forums as worth the effort. In fact, one would be heavily ridiculed and shunned for even suggesting such blasphemous concepts

That is, until Facebook and Reddit made it possible for everyone to run their own little fief and upended everything we knew. By adding forum functionality into a central location, and then allowing everyone to create one for any topic, they immediately solved so many of these issues.

  1. The executive cost to join a new topic is very low. One already has an account on Reddit and/or Facebook. All they have to do is press a button on the subreddit, group they want to join. At worst they might need to pass an approval, but they get to keep the same account, password and so on. Sure you might need to juggle 1-3 accounts for your main spaces (Reddit, Facebook, Discord), but that’s so much easier than 12 or more.
  2. Keeping up to date is built-in. Reddit subscriptions allows one a personalized homepage, Facebook just gives you your own feed, discord shows you where there’s activity and so on. Of course the corporate enshittification of those services means that you’re getting more and more ads along masquerading as actual content and invisible algorithms are feeding you ragebait and fearbait to get you to keep interacting at the cost of your mental and social health, but that is invisible for most users so it doesn’t turn them off.
  3. Discoverability is easy. Facebook randomly might show you content from groups you’re not in, shared by others. Reddit’s /r/all feed showed posts from topics you might not even know existed and people are quick to link to relevant subreddits. Every project has its own discord server link and so on.

The fourth forum problem of course was and can never be solved. There will always be sad little kings of small sad little hills. However solving 1-3 meant that the power of those abusing their power as moderators was massively diminished as one could just set up a new forum in a couple of minutes and if there was enough power abuse, whole communities would abandon the old space and move to the new one. This wasn’t perfect of course, as in Reddit, only one person could squat one specific subreddit, but as seen with successful transitions from /r/marijuana to /r/trees, given enough blow-back, it can certainly be achieved.

And the final cherry on top is that places like Reddit and discord are just…easier to use. Ain’t nobody who likes learning or using BBcode on 20-year-old software. Markdown became the norm for a reason due to how natural it is to use. Add to that less restrictions on uploads (file size, image size etc) and fancier interfaces with threaded discussions, emoji reactions and so on, and you get a lot of people using the service instead of trying to use the service. There are of course newer and better forum software like the excellent Discourse, but sadly that came in a bit too late to change momentum.

So while forums never went away, people just stopped using them, first slowly but accelerating as time passed. People banned just wouldn’t bother to create new accounts all over again when they already had a Facebook account. People who wanted to discuss a new topic wouldn’t bother with immortal mega-threads when they could just join or make a subreddit instead. It was a slow-burn that was impossible to stop once started.

10-15 years after Reddit started, it was all but over for forums. Now when someone wants to discuss a new topic, they don’t bother to even google for an appropriate forum (not that terminally enshittified search engines would find one anyway). They just search Reddit or Facebook, or ask in their discord servers for a link.

I admit, I was an immediate convert since Reddit added custom communities. I created and/or run some big ones back in the day, because I was naive about the corporate nature of Reddit and thought it was “one of the good ones”, even though I had already abandoned Facebook much earlier. It was just so much easier to use one Reddit account and have it as my internet homepage, especially once gReader was killed by Google.

But of course, as these things go, the big corporate gardens couldn’t avoid their nature and eventually once the old web forums were abandoned for good and people had no real alternatives, they started squeezing. What are you gonna do? Set up your own Reddit? Who has the time and knowledge for that? And even if you did, would anyone even join you?

Nowadays, I hear a lot of people say that the alternative to these massive services is to go back to old-school forums. My peeps, that is absurd. Nobody wants to go back to that clusterfuck I just described. The grognards who suggest this are either some of the lucky ones who used to be in the “in-crowd” in some big forums and miss the community and power they had, or they are so scarred by having to work in that paradigm, that they practically feel more comfortable in it.

No the answer is not anymore an archipelago of little fiefdoms. 1-3 forbid it! If we want to escape the greedy little fingers of u/spez and Zuckeberg, the only reasonable solution is moving forward is activitypub federated software.

We have already lemmy, piefed, and mbin, who already fulfill the role of forums, where everyone can run their own community, while at the same time solving for 1-3 above! Even Discourse understood this and started adding apub integration (although I think they should be focusing on threadiverse interoperability rather than not microblogging.)

Imagine a massive old-school forum like RPGnet migrating to a federated software and immediately allow their massive community access to the rest of the threadiverse without having to go through new accounts and so on, while everyone else gets access to the treasure trove of discussions and reviews they have. It’s a win-win for everyone and a loss for the profiteers of our social media presence.

Not only do federated forums solve for the pain points I described above, but they add a lot of other advantages as well. For example we now have way less single points of failure, as the abandonment of a federated instance doesn’t lose its content which continues living in the caches of the others who knew about it and makes it much easier for people to migrate from one lemmy instance to another due to common software and import/export functionalities. There’s a lot of other benefits, like common sysadmin support channels, support services like fediseer and so on.

These days, I see federated forums as the only way forward and I’m optimistic of the path forward. I think Reddit is a dead site running and the only way they have to go is down. I know we have our own challenges to face, but I place far more trust in the FOSS commons than I do in corporate overlords.

Flux (Schnell) on the AI Horde!

It’s been cooking for a while but we can now officially announce that the Flux.1-schnell is finally available on the AI Horde!

Flux is one of the most exciting Generative AI text2image models to come out this year, from a team of ex-stability.ai developers, and seemingly consumed all the attention of the GenAI enthusiasts overnight. It’s a very powerful model but as a downside it requires a significantly more powerful PC to run than the more popular SDXL models were until now.

The model available on the horde is primarily the fp8 compact version we took from civitAI, as it simplifies the amount of downloads we have to juggle.

I was really eager to offer the flux.1-dev version as well, as it has a lot more LoRas available and is a bit more versatile, but sadly its license contains some requirements which do not appear to allow a service like the AI Horde to provide it, even though it’s a completely free service for everyone. However we have reached to the Black Forest Labs via email to ask for clarification or exception for this and will let you all know if we hear back.

To use it, head over to Artbot or Lucid Creations and simply select the Flux.1-Schnell fp8 (Compact) model for your generation. However keep in mind that this model is quite different from the Stable Diffusion models you’re used to until now, so you need to adjust your request as following to get good results:

  • Set sampler to k_euler
  • Set steps between 4 and 8 (4 is enough for most images)
  • Set cfg to 1

Also keep in mind that the model won’t use the negative prompt. Instead it benefits massively from using native speech to describe what you want to draw instead of a tag-based approach.

If you are running a dreamer worker make sure you check our instructions in our discord channel on the best settings to run flux. This is a big model, so GPU with 16G-24G VRAM are the best for running it at a decent speed and we could use all the help we can get.

If you are making integrations with the AI Horde, make sure you use the flux branch of the image reference repository until it’s merged into main on the end of the month, if you’re using it to retrieve model requirements.

Along with flux, tazlin has done some amazing work on adding the latest version of comfy and improving the stability and speed of the worker. I mean, just look at this changelog! This also greatly improves our support for AMD cards. They might not be as fast as nvidia, but they should work!

Finally we’ve added some improvements on the horde itself to allow slower workers to offer models. If you have an older GPU which often gets timed out and put on maintenance on the Horde due to speed, you can now set yourself as an extra_slow_worker which will extend your TTL and will be used by things like automated bots, or apps like that sweet AI Wallpaper Changer.

Finally, I’ve also extended our deployments ansible collection so that if you use a Linux system, you can easily deploy any number of reGen workers, even multiple in the same server to take advantage of multiple GPUs. It will even deploy the AMD drivers for you if you want it. With this I am continuing to extend the tools to allow more people to run the AI Horde infrastructure on their own.

We hope the existence of flux on the Horde will allow unlimited creativity from people who want access to the model but don’t have the hardware to run it. Now more than ever, people with mid-range GPUs can offer what they can run, such as SDXL or SD 1.5 models, and in turn, benefit from others offering the larger models like flux and we all benefit through mutual aid!

Enjoy!

Year Two of the AI Horde!

The AI Horde has turned two years old. I take a look back in all that’s happened since.

Can you believe I blogged about the first birthday of the AI Horde approximately one year ago? If you can, go ahead and read that one first to see the first chapter of its existence.

Since we started recording stats, we’ve generated 113M images 145M texts, which just goes to show just how explosively the FOSS LLM scene has embraced the AI Horde since last year, completely outpacing the lifetime image generations within one year!

This year has been the first one since we received funding from NLNet, so let’s take a look at what we achieved:

Overall, development has continued throughout the last year and we’ve been trying to onboard as many new features as possible with 2 core devs. Sadly our donation income has completely collapsed since the same time last year, to the point where the money is just barely covering our infrastructure costs.

If you see value in what the AI Horde please consider supporting our infrastructure through patreon or github or consider onboarding your PC as a Dreamer or Scribe worker.

What was your favorite new addition to the AI Horde from the past year? Let me know if there’s any event I forgot to mention.

OCTGN Android:Netrunner sound effects

Recently a maintainer from jinteki.net contacted me about getting the license for the A:NR sound effects I had used in the OCTGN implementation to reuse in jinteki and casually mentioned that the Archer ICE noise was the coolest one. It had until now never occurred to me that people might appreciate the various sound effects I had inserted into the game back then for the flavour, so I did a quick search and run into this cute video about it (you can hear archer at the 13:00 mark).

Fascinating! I always like to make my games as flavorful as possible, and especially given the limitations of OCTGN, some flavour was sorely needed. So I had added custom fonts, little flavour blurbs in significant actions and finally I scoured the internet for hours and hours to find the sound effects which fit the cyberpunk theme of the various actions.

These were always meant to be just little things in an obscure game, so I’m kinda pleasantly surprised that some of them have received this sort of cult status in the netrunner community. Very cool. Hopefully these sound effects will find a second life in jinteki.net

If you want to check what the OCTGN game looked like, I have a tutorial video here, and I also have a bunch of videos about it on my youtube channel.

Transparent Generations

We have another new feature available for people to use from the AI Horde. This is the capacity to use Layer Diffuse to generate images with a transparent background directly (as opposed to stripping the image background with a post-processor).

As someone who’s dabbled into video game development in the past (which was in fact the reason I started the AI Horde) being able to generate sprites, icons and other assets can be quite useful, so once I saw this breakthrough, it immediately became something I wanted to support.

To use this feature, you simply need to flip on the transparent switch if your UI supports it, and the Horde will do the rest. If you’re an integrator, simply send “transparent: true” in your payload.

Take note that the images generated by this feature will not match the image you get with the same seed when transparency is not used! Don’t expect to take an image you like and remove the background this way. For that you need to use the post-processor approach.

Also keep in mind, not every prompt will work well for a transparent image generation. Experiment and find what works for you.

As part of making this update work, me and Tazlin also developed, discovered and fixed a number of other issues and bugs.

What would be most interesting for you is a slight change on how hires-fix works. I discovered that the implementation we were using was using the same amount of steps for the upscaled denoising which was completely unnecessary and wasting compute. So we now use a smart system which dynamically determines how many steps to use for the hires-fix based on the denoising strength you used for hires-fix and the steps for the main generation, and we also exposed a new key on the API where you can directly pass a hires-fix denoising strength.

The second fix is allowing hires-fix on SDXL models, so now you can try to generate larger SDXL images at the optimal resolution.

Finally there were a lot of other minor tweaks and fixes, primarily in the horde-engine. You can read further for more development details on this feature.

This update required a significant amount of work as it required that we onboard a new comfyUI node. Normally this isn’t difficult, but it turns out this node was automatically downloading its own LoRa models on startup, and those were not handled properly for either storage or memory. Due to the efficiency of the AI Horde worker, we do a lot of model preloading along with some fancy footwork in regards to RAM/VRAM usage.

So to make the new nodes work as expected, I had to reach in and modify the methods which were downloading models so that they use our internal mechanisms such as the model manager. Sadly the model manager wasn’t aware of strange models like layer diffuse, so it required me adding a new catch-all class of the model manager for all future utility models like these.

While waiting for Tazlin to be happy with the stability of the code, we discovered another major problem: The face-fixer post-processors we were using until now had started malfunctioning, and generating faces with a weird gray sheen. After some significant troubleshooting and investigation, we discovered that ComfyUI itself on the latest version had switched to a different internal library which didn’t play well with the custom nodes doing the face-fixing.

First I decided to update the code of the face-fixer nodes we were using, which is harder than it sounds, as it also downloads models automatically on startup, which again needs to be handled properly. Updating the custom nodes fixed the codeformer face-fixer, but gfpgan remained broken and the comfyUI devs mentioned that someone would have to fix it. Unfortunately those nodes didn’t seem to be actively maintained anymore so there was little hope to just wait for a quick fix.

Fortunately another custom node developer had run into the same problems, and created a bespoke solution for gfpgan licensed liberally, which I could copy. I love FOSS!

In the meantime, through our usual beta testing process, we discovered that there were still some funkiness in the new hires-fix approach, and Tazlin along with some power users of the community were able to tweak things so that they could work more optimally.

All in all, quite a bit of effort in the past month for this feature, but now we provide something which along with the embedded QR Code generation, I’ve seen very few other GenAI services provide, if at all.

Will you use the new transparent image generation? If so, let us know how! And remember if you have a decent GPU, you can help other generate images by adding your PC onto the horde!

Everything Haidra touches

The second fediverse canvas event just concluded and I’m very happy how this turned out. In case you don’t know what this is. Check out this post and then take your time to go and explore the second canvas in depth before it’s taken down, and look for all the interesting and sometimes even hidden pieces of pixel art.

This time I had a more interesting idea to participate. I decided to draw the Haidra Org logo. I didn’t expect a massive support, but was pleasantly surprised with how many people joined in to help create it after my initial post about it and my announcement on the AI Horde discord server. Some frontends like horde-ng even linked to it with an announcement.

Almost as soon as it started, we ended up conflicting in our placement with someone who was drawing a little forest on just below and to our left. I decided that they can have the foreground since we had plenty of space available which avoided any fighting over pixels. All in all, we managed to complete it within half a day or so which is pretty cool I like to think and we even got a small “garden” so to speak.

The final form of the Haidra drawing, including the little forest below and two Stus

Afterwards I thought it would be interesting to have the Haidra tendrils “touch” various points of importance or sprites that I like. I decided to extend out as if we’re made of water and a lot of other “canvaseers” joined in to help which I found really sweet.

First we extended towards the (then) center of the canvas (top left on the featured image above), passing next to the Godot logo, below OSU and finally reached the explosion of the beams. That took most of the first day but people were still pretty active, even though the infrastructure of the event had already started buckling under its own success.

Fortunately as we could “flow” like water and even “go under” other pixelart, we didn’t encounter any resistance in our journey, and a lot of people gave us a helping hand as well.

Once this was achieved on a whim, I decided to double down on the “river” similaity, and drew a little 17px pirate ship to show our roots and went to bed. When I woke up next morning, I was surprised to discover a Kraken was attacking it making a really cool little display of collaborative minimalistic art.

Haidra pirate ship fighting a Kraken

This kind of thing is why I love events like these. I love emergent stuff like these and seeing people putting the own little touches on what other started is awesome!

The next day the canvas had extended to be double in size and so a whole new area to the right was available, I had already noticed someone had created a little pirate banner towards the new canvas center, but it was alone and sad. So I decided we should try to give it a little bit of that Haidra embrace. So a long journey started with a new tendril to reach it. I had a rough idea of the path to follow as the direct route was blocked, but as soon as other started adding to it, it almost took a life of its own on its journey.

Eventually, towards the middle of the second day we reached it, passing under Belgium, through some letters and crossing the big under-construction trans flag before going over piracy, before I spawned yet another pirate ship before waterfalling down onto the mushroom house.

The path to piracy

At this point, the whole event took a dramatic turn as the performance problems had become so severe, that the admin decided to take the whole thing down to fix them, rather than let people get frustrated. This took half a dozen hours or so, and even though the event was extended by 24 hours to make up for it, the event momentum was kneecapped as well.

Once the canvas was back up for the third day, the next objective I had was a much longer journey to try and touch The Void that was extending from the top right. When I started, the path was still mostly empty, but as we moved towards it, the canvas became more more congested, forcing us to take some creative detours to avoid messing with other art.

All in all, we flowed over the Factorio cog, creating a little lake and spawning a rubber duckie in the process. Then through the second half of the trans flag, which caused a minor edit war, as the canvaseers thought we were vandalizing. Then the way up and over the massive English flag was sorta blocked, so we had to take a detour and slither between the Pokemon to its left first.

Until finally we reached the top of the English flag, where I took a little creative detour to draw a little naval battle. My plan was to have an English brigantine fighting with two pirate sloops, but as soon as I finished it, other jumped in with their own plans. First one of my pirate ships revealed itself as a Spanish privateer instead (which I suspect was a reference to the recent football events). And then over the course of the next two days, the three ships kept changing allegiances every couple of hours. Quite the little mini-story to see unfold.

Finally we were almost at our final objective, only to discover that our final objective was not there anymore. The Void had been thoroughly contained and blocked by a massive cat butler (catler?). The only thing left to touch, was a single solitary void tendril on the top. Surprisingly, as soon as we reached it, it livened and flourished into life, which was certainly not my original idea, but I went with it happily.

Having achieved all I wanted to do, and with the event (and the day) drawing to a close, I decided there’s no point setting any more goals and just left those interested start extending Haidra on a whim. You can see my final post here, which also links to all my previous posts, which also contain some historic canvas images, showing the actual state of the board at the time of the posting.

All in all, I had a lot of fun, and enjoyed this way more than Reddit /r/place which is botted to hell and back, making contributions by individual humans practically meaningless. Due to the lack of significant botting, not only was one’s own pixels more impactful, but humans tended to mostly collaborate instead of having scripts mindlessly enforcing a template. This ended with a much more creative canvas, as people worked off others ideas and themes, and where there was conflict, a lot of the time a compromise solution was discovered where both pieces of art could co-exist.

The conflict points tended to be political, as it so often happens. For example the Hexbears constantly trying to make the Nato flag into a swastika, or some effectively people rehashing the conflict around the Israel colonization of Palestine in pixel conflict form.

Some other things of interest:

  • I mentioned that the Spanish seem to have boarded and overtaken my pirate ship, and someone drew a little vertical ship coming up the stream for reinforcements. ❤️
  • Stus and AmongUs everywhere, sometimes in negative space, or only visible in the heatmap. Can you find them all?
  • The Void getting absolutely bodied when it tried to be destructive, but being allowed to extend a lot more when they actually played nice with other creations.
  • The amount of My Little Pony art is too damn high!
  • Pleasantly little national flag jingoism on display!
  • A very healthy amount of anarchist art and concepts and symbols. Well done mates! Ⓐ

See you next year!

Embedded QR Codes via the AI Horde

Around the same time last year, the first controlnet for generating QR codes with Stable Diffusion was released I was immediately enamored with the idea and wanted to have it ASAP as an option on the AI Horde. Unfortunately due to a lot of extenuating circumstances [gesticulates wildly] I had neither the time, nor the skills to do it myself, nor the people who could help us onboard it. So this fell on the wayside while way more pressing things were being developed.

Today I’m very excited to announce that I have finally achieved and deployed it to production! QR code generation via the AI Horde is here!

To use is fairly simply, assuming your front-end of choice supports it. You simply provide the text that you want represented as a QR code and the AI Horde will generate a QR code, and then using controlnet, will generate an image where the QR code is embedded into it, as if it’s part of the drawing. You can scan the examples below to see it in action.

You’ll notice that unlike some of the examples you’ll find online elsewhere, the QR code we generate is still fairly noticeable as a QR code, especially when zoomed out, or at a distance. The reason for this is that the more fitting you make to the image, the less likely it is that the QR code is scannable. The implementation I followed to achieve this result is specifically tailored to sacrifice “embedding” for the purpose of scannability.

So when you want to generate QR codes, you need to keep in mind that this is a very finicky workflow. The diffusion process can easily “eat” or modify some components of the QR code so that the final image is not readable anymore. The subject matter and model used matters surprisingly much. Subjects which are somewhat noisy (such as the brain prompt in the featured image above) tend to give enough to the model to work with to reshape that area in a way that creates a QR code. Wheres no matter how hard I tried, I couldn’t get it to generate a QR code with an anime model and an anime woman in the subject.

Along with the basic option to provide the QR Code text, you can also customize some more areas from it. For example you can choose where the QR code will be placed in the image. By default we’ll always display it in the center, but sometimes the composition might be easier if you choose to place it on the side, or to the bottom. You can choose a different prompt for the anchor squares, increase or decrease the border thickness, and more. Your front-end should hopefully be explaining these options to you.

If you want to try and make some yourselves right now, I’ve added the necessary functionality to my Lucid Creations front-end already, so feel free to give it a try right now.

Continue reading further to get some development details.

The road leading to me making this feature available was fairly long. Other than all the other priorities I had for the horde, we also had the misfortune that one of our core contributors on the backend/comfyUI side, went suddenly missing at the end of summer. As I am still more focused the middleware/api and infrastructure (plus so much more, halp!) and Tazlin is focused on efficiency, and code maintenance & quality, we didn’t have the necessary skills to add something as complex as QR code generation.

Once it was clear that our contributor wasn’t coming back and nobody else was stepping up to help, I finally accepted that if I want it done, I have to learn to do that part myself as well. So in the past few months I embarked on a journey to start adding more and more complex comfyUI workflows. First came Stable Cascade which required me to build code which can load 2 different model files at the same time. Then Stable Cascade Remix which required that I wrangle up to 5 source images together.

Note that I’m mostly re-using existing fairly straightforward ComfyUI workflows which do these tasks. I don’t have the bandwidth to learn ComfyUI itself that much. But the work of making said workflows function within the horde-engine with payloads that are send via the AI Horde REST API is quite a complex amount of work on top of those. As I hadn’t built this “translation layer”, I was avoiding that area of the code until now, and this work helped me build up enough knowledge and confidence to be able to pull of translating a much much more complex ComfyUI workflow like the QR codes.

So after many months, I decided it was finally the time to tackle this problem. The first issue is getting an actually good QR Code ComfyUI workflow. Unlike the previous workflows I used, it’s surprisingly difficult to find something that works immediately. Most simple QR Code workflows both required that one generates the QR image externally and generated mostly unscannable images.

I was fortunate enough to run into this excellent devlog by Corey Hanson who not only provided instructions on what works and what doesn’t for QR codes, but even provided a whole repository with prebuilt ComfyUI workflows and a custom node which would also generate a QR code as part of the workflow. Perfect!

Well, almost perfect. Turns out the provided ComfyUI workflows were fairly old, and at the rate GenerativeAI progresses even a couple of months means something can easily be too stale to use. On top of that they were using a lot of extra custom nodes in their examples that didn’t parse, which a ComfyUI newbie like me had to untangle. Finally those workflows were great, especially for local use, but a bit overkill for the horde usage.

So first order of business was to understand, then simplify the workflow to just do the bare needed to get a QR code. Honestly it took me a bit of time to simply get the workflow running in ComfyUI itself and half-way understand what all the nodes were doing. After that I had to translate it to the horde-engine format, which by itself required me to refactor how I parse all comfyUI workflows to make it more maintainable in the future.

Finally QR codes require a lot more potential text inputs, which I didn’t want to start explicitly storing in the DB as new columns as they’re used only for this specific purpose. So I had to come up with a new protocol for sending an open ended amount of extra text values. Fortunately I had already the extra_source_images code deployed so I just copied part of the same logic to speed things up.

And then it was time for unit tests and the public beta and all the potential bugs to fix. Which is when I realized that the results on SD 1.5 models were a bit…sucky, so I went back to ComfyUI itself and actually figured out how to make the workflow work with SDXL as well. The results were way more promising.

Unfortunately while the SDXL QR Codes are way nicer, the requirements to generate them are almost tripled compared to SD 1.5. Not only does one need to run SDXL models, but SDXL controlnets are almost as big as the models themselves. The QR code controlnet is 5G on its own, and all that needs to be loaded in VRAM at the same time as the mode. All this means that even middle-range GPUs struggle to generate SDXL QR codes in a reasonable amount of time. This meant that I also had to adjust the worker to give the option for people serving SDXL models to skip SDXL controlnet, and also properly route this switch via the AI Horde.

Nevertheless, this an areas that makes the AI Horde shine, as those with the necessary power, can support those who need it. Most people will find it really hard or frustrating to generate even a single QR code, never-mind an SDXL one, only to discover that it’s unscannable, but through the horde they can easily generate dozens with very little expertise needed and find the one that works for them.

So It’s been a long journey, but it’s finally here, and the expertise I gained by achieving it also means that I now have enough knowledge to start adding more features via ComfyUI. So stay tuned to see more awesome workflows on the AI Horde!

Eudaimonia community

I thought it might be interesting to point out that I opened a new community in the Divisions by zero lemmy to post things about content living as I couldn’t find any other fitting space. There’s just not a lot of locations one can share articles and discuss about such topics that also don’t devolve into spiritualism or self-help guru grifts, both of which I intensely dislike.

So that community is to post about things in materialistic context, with a preference for empiricism and scientific thinking about it, but more squishy secular philosophy is also encouraged for topics which don’t work too well empirically.

If you’ve been around, you probably know I’m going to be posting some Epicurus sooner or later 😀

Take a look and post some relevant stuff you run into.

Image Remix on the AI Horde

The initial deployment of the Stable Cascade (SC) on the AI Horde supported just text2image workflows, but that was just a subset of what this model can do. We still needed to onboard the rest of its capabilities.

One such capability was the “image variations” option, which allows you to send an image to the model, and get a variation of that image, perhaps with extra stuff added in, using the unClip technology. This required quite a bit of work on hordelib so that it uses a completely different ComfyUI workflow but ultimately this was not so much harder than just adding the img2img capabilities to SC.

The larger difficulty came when I wanted to add the feature to remix multiple images together. The problem being that until now the AI Horde only supported sending a single source image and a single source mask, so a varying amount of images was not possible at all.

So to support this, I needed to touch all areas of the AI Horde. The AI Horde had to accept and upload each of them on my R2 bucket and provide individual download links. The SDK had to know to expect and provide methods to download those images in parallel to avoid delays, to the reGen worker had to be able to receive those images and send them to hordelib which should know how to dynamically adjust a comfyUI pipeline on-the-fly to add as many extra nodes as required.

So after 2 weeks of developing and testing, we finally have this feature available. If your Horde front-end supports the “remix” feature. You can send up to 1-6 images to this workflow along with a prompt, and it will try its best to “squash” them all together into one composition. Note that the more images you send, and the larger the prompt, the harder it will be for the model to “retain” all of them in the composition. But it will try its best.

As an example, here’s how the model remixes my own avatar. You’ll notice that the result can understand the general concepts of the image, but can’t follow it exactly as it’s not doing img2img. The blur is probably caused by the need to upscale my original image, which is something I’d like to fix on the next pass.

Likewise, this is the Haidra logo

And finally, here’s a remix of both logo and avatar together

Pretty neat, huh?

This ability to send extra source images also lays the groundwork for the Horde to support things like InstantID, which I hope I’ll be able to work on supporting soon enough.