In case you don’t know what LoRa are, the short version is nothing less than a breakthrough in Generative AI technology, by allowing to condense the time and power a training (or “fine-tuning”) needs!
This kind of breakthrough, along with the achievements done by new models such Stable Diffusion and Llama is what is causing the big tech companies to scramble, to the point that OpenAI had to run to the government to ask them to regulate future breakthroughs (but please, please don’t regulate what they’re currently doing, OK?)
But I digress. I haven’t yet officially announced LoRa support on the AI Horde yet, as we’re still trying to squash all bugs on the workers, but I hope the addition to Lucid Creations will help other integrators see how it can be done
While the gathering of aesthetic images never slowed down, the resulting dataset was only ever available from an specific url in the ratings API, making it fairly difficult to discover. To be honest, I was expecting by now, at least some models and other applications would have come out.
As I was finishing it, I noticed that the written FAQ had also grown a bit stale, and there’s a few new points I feel I need to make absolutely clear, such as kudos sales being forbidden, or how the kudos calculation being handled by a new NN model etc.
It quickly dawned on me however that there’s a ton of Horde terminology I’m using in the FAQ, which someone might not understand. Now we have Dreamers, Scribes and Alchemists, Workers and Bridges, Integrations, Frontends, Plugins and Bots etc. Quite a few things for someone unfamiliar
So I decided to try writing some cross-referenced terminology in the context of the AI Horde, but the more I kept writing, the more terms I realized I needed to define as well. And once I started pulling on this thread…well, let’s just say, that I spend a bit more time than I planned on documentation today and now the Terminology wiki page is available which is extensively cross-referenced inside itself, and to Wikipedia as needed.
I honestly had to forcefully stop myself on expanding it, because I kept thinking of more things to define. So if you’re the kind of person who likes doing this sort of thing as well, feel free to improve further!
I’ve been meaning to write one of these posts every month, but the events since I wrote the last piece have been fairly disruptive. With the loss of our maintainers of nataili, we’ve been forced to put our heads down to alleviate the backend issue, and I didn’t want to write another “State of the AI Horde” until that business was completed. More about that later. First, let’s look at the basics.
(In case you’re new here and do not know what is the AI Horde: It is a crowdsourced free/libre software Open API which connects Generative AI clients to volunteer workers which provide the inference.)
More Requests, More Power!
The total amount of images generated has stayed relatively stable since the last month, with only ~12M images generated. However the total amount of Terapixelsteps is up to 4.2TPS compared to 3.7TPS last month. This shows that people are looking for more details at higher resolution instead of just grabbing more images. Unfortunately we can’t capture the impact of ControlNet as easily, but suffice to say, its demand is pretty significant.
On the LLM front, we’ve generated 3.5M texts, for a total of 374 Megatokens. We have effectively tripled the LLM output! This makes sense as we see plenty of traction in the LLM communities since Google Colab started banning Pygmalion.
Also, did you know Cubox from our discord community has gone and setup a whole public Grafana server to track our stats? Head over and check it out yourselves.
Top 10 Stable Diffusion models
No significant changes since last month in this chart, with Deliberate still proving its worth and further solidifying its position with a 25% use rate throughout the entire month. Stable Diffusion 1.5 sees plenty of use too.
Anime models seem to be losing some popularity across the board, and Dreamshaper kicked Abyss OrangeMix off the board so as to settle on a decent 5%. Good showing!
Deliberate 25.8% (3067246)
stable_diffusion 14.4% (1707729)
Anything Diffusion 9.0% (1067386)
Dreamshaper 5.0% (596651)
Realistic Vision 4.7% (563079)
URPM 3.6% (428415)
Hentai Diffusion 2.9% (348985)
ChilloutMix 2.6% (314301)
Project Unreal Engine 5 2.6% (307907)
Counterfeit 2.2% (260739)
Top 10 Text models
As is to be expected, Pygmalion 6b is still leading the charts with a significant margin, but its big brother, the recently released Pygmalion 7b, which is based on the Llama models, has already secured 3rd place and is only set to further cannibalize 3b’s position.
Erebus and Nerybus continue to fill up the rest of the board with a few new 4bit models starting to finally come into the fray.
Adding new ratings continue unabated but we’re still finding people trying to bypass our countermeasures and just automatically rate images to poison the dataset. Please don’t do that. If you want kudos, you can easily get more by simply asking in our discord instead of wasting our time 🙁
One interesting note is that the Stable UI has finally bypassed the Artbot on amount of images rated
A total of 400K ratings over the past month. Very impressive.
I have also finally on-boarded the full DiffusionDB dataset of 12M images, into the ratings DB, we we’ll have enough images to rate for the foreseeable future.
A whole new backend
As I mentioned at the start, the big reason this state of the AI horde was delayed, was the need to switch to a completely new backend. It’s a big story, so if you want to read up on some details on that, do check out the devlog about it.
New Features and tweaks
Not much to announce since all our effort was in the backend switch, but a few things nevertheless
We have added support for Shared Keys, so that you can share your account priority, without risk.
A new kudos calculation model has been put live, instead of the manual way I was calculating kudos for each request based on magic numbers. The new kudos model is a Neural Network trained by Jug, and it takes into account the whole payload and figures out the time it would take to generate empirically.
Kudos consumption has been removed for weights, and a tiny kudos tax of 1 kudos has been added per request to discourage splitting the same prompt into multiple separate requests.
You can now change your worker whitelist into a worker blacklist so that you avoid specific workers, instead of requesting explicitly which ones you want.
A new “Customizer” role has been added to allow you to host custom Stable Diffusion models. This is not possible on the Worker yet, but once it is, these people will be able to do it. Getting this role is a ping to the AI Horde moderators.
We had to fight back another raid, on the LLM-side this time, which forced me to implement some more countermeasures. Scribes (i.e. LLM workers) cannot connect through VPN unless trusted, untrusted users can only onboard 3 workers, and there’s an worker-limit per IP now.
I’ve been doing some work to improve the social aspects in our discord server, one of them is onboarding 3 new volunteer community managers through and hoping we can start doing more events and interactions with the community.
Another is the addition of a new channel per Integration I know of. I give the developers/maintainers of these services admin access to that channel so they can use it to provide support to their users, or redirect to their own discord servers if they prefer.
If you have a service integrating into the AI Horde API, please do let me know and I’ll be happy to open a new channel for you and give you admin access to it.
Our demand is increasing significantly, as we have way more and more concurrent web sessions every day. Unfortunately a few database downtimes this month convinced me that hosting it in a Contabo VM with 30% CPU steal is not feasible anymore.
So I finally finalized switching the Database to a dedicated server. It’s a lot more expensive but massively worth it as our response times have increased 5-fold! A worker payload pop that used to take 1-3 seconds, now can take 0.2 – 1 second! As a result the whole AI horde should feel way snappier now.
I have also added our first dedicated box API front-end. Initially it didn’t look to be doing too good, having some of the worst performance, but once I switched to the dedicated database server, it suddenly became the fastest by far, making very obvious just how much impact latency has.
I have also finally deployed a central log server based on Graylog, which should help me track issues across all servers and look them up historically as well.
Funding and Support
All of the above now means the AI horde is significantly more expensive to run and it’s almost a second full-time job for me at this point. Currently all of this is funded via my patreon subscribers only but that is not scaling quite as fast as my infrastructure costs :-/
To add to this, the stability.ai support seems to have run dry and the GPUs they were providing to the AI Horde have been removed and I haven’t been able to arrange to bring them up again.
So I think I need to more consistently promote the way to help me sustain the AI Horde as a truly free and open API for everyone.
So, please if you see the potential of this service, consider subscribing as my patron where I reward people with monthly priority for their support.
I am also looking for business sponsors who either see the value of a Crowdsourced Generative AI Inference REST API or might want to use it for themselves. I basically have no contacts in the AI world so I would appreciate people forwarding this info to whoever might be interested.
The last month has been very difficult for the AI Horde but fortunately I’ve had the help of some really valuable contributors from both the AI Horde community as well as the KoboldAI community and we managed to pull through.
Now with our new backend I expect new features to come much faster and of more quality. We’re already working hard on LoRa support horde-wide for one, so stay tuned!
Close to a month an a half ago, our last remaining maintainer for the nataili library dropped out and we were left functional but “rudderless” as far as inference goes. We could continue operations, but we couldn’t onboard new features anymore as neither me nor any of the remaining regulars have ML knowledge.
In desperation, I asked one of our regulars, Jug, who had been helping out with some python work on the worker if he thinks it would be possible to switch to the ComfyUI software as a backend, as it had some good ideas and was modular enough to be of use to us
To my surprise Jug not only thought it was a good idea, but jumped with both legs in the deep and started hacking around to make it work. Not only that, but we managed to suck-in another regular developer in, Tazlin, who also started helping us with design best practices. As a result, the new library we started developing was built from the ground up to have extensive coverage support which will make us discover regression bugs that much easier.
First steps were to develop feature parity, and that required not only to wrangle the comfyUI pipelines to be called from nataili, but also to port features which we were using in the AI Horde Worker, such as clip, over to the comfyUI.
This early phase was were I could still provide some help, as I’m pretty good at porting features and writing tests for them, and then integrating stuff into the AI Horde Worker, but still the lion’s share of the work on hordelib was being done by Jug, with Tazlin making the code much more reliable and maintainable.
A couple of weeks in, we had almost all the features we needed, but this is where the tricky business started. First we noticed is that comfyUI was not handling multi-threading well, which make sense as it’s meant to be used by a single user on a single PC. That added massive amounts of instability, because our AI Horde Worker is using threads for everything, to nullify latency delays.
So the next phase for about two more weeks was stabilizing the thing, which required a much deeper dig into the comfyUI internals to wrangle individual processes into a multi-threaded paradigm.
Finally that was done, about 1 month after I inquired about moving to comfy. Then we discovered the next problem: Due to all the mutex locks to prevent multi-threaded instability, the whole things was now much slower than nataili was. Like significantly so!
So another two weeks were spend of figuring out where to slowdowns occurred in our implementation and tweaking things to work more optimally, and even trying to figure out if there was indeed a slowdown in the first place as comparisons with the nataili was difficult to achieve.
We even built a whole benchmark suite to see overall speeds in inference, without getting confused with HTML and model loading latency.
But beta testers were still informing us of a seemingly lower kudos reward, so then we suspected the old way of calculating kudos was not applying well to the hordelib inference, due to it working differently. For example it has no slowdown for weights, but control-net types gave out different speeds than we expected, even different speeds per control type.
To track this down, Jug trained a new Neural Network for figuring how much time a generation is expected to take, rather than try to time each individual feature. The new model was so successful at 96% accuracy, that we decided to onboard it onto the AI Horde itself, as a way to calculate kudos more accurately.
This investigation did point us to some things that worked unexpectedly within comfyUI, for example longer prompts than 77 tokens tended to be quite slower, which was a quality thing after speaking with the comfyUI devs. We did discover a workaround for the AI Horde but it’s these sort of things that are introducing unexpected slowdowns compared to before. We’re going to continue looking for and tweaking things as we discover them.
The good news is that the overall quality of images using the comfyUI branch has increased across the board. Not only that but weights not only don’t add extra slowdown (so the extra kudos cost is removed), but they can also exceed 1.3 without causing the image to distort, which is how most other UIs are using them anyway.
The big change is that images with the same payload and the same seed, will look different in comfyUI compared to nataili. This is simply due to the way inference works and something we’ll have to live with.
So now we have the three pillars built: Parity, Stability and Speed; it’s time to go live!
The hordelib has been bumped to 1.0.0 and the AI Horde Worker to 21.0.0. When you run update-runtime next time, you’ll automatically be switched to the new inference backend but you may need to update your bridgeData.yaml file ahead of time.
Set the vram_to_leave_free and ram_to_leave_free to values that work for you.
rename nataili_cache_home to cache_home
You can delete any unused keys (like disable_voodoo)
Also as a user of the AI Horde, keep in mind that the new Workers do not yet support tiling and pix2pix
But not only if the new inference available for the AI Horde, but also for everyone else. Due to the generic way we’ve built it, any python project which needs access to image generation can now import hordelib from pypi, and get access to all the multi-threaded text2img and img2img functionality we provide!
With the move to hordelib, we are now effectively outsourcing our inference development upstream, which allow us to get to use new developments in stable diffusion as they get on-boarded into their software. Hopefully development of ComfyUI will continue for the foreseeable future as I am really not looking forward to changing libraries again any time soon >_<
This also means that we now finally have the capability to onboard LoRas and textual inversion as well which have been requested for a long time, but we never had the capability in our backend. Likewise with new Stable Diffusion models and all the exciting new developments happening practically weekly.
It’s been a lot of hard work, but we’re coming out of it stronger than ever, thanks to the invaluable help of Jug, Tazlin and the rest of the AI Horde community!
The AI Horde is built around the concept of Mutual Aid, to allow people who have, to aid with those who have not. It’s just that it is about aiding for one specific purpose, of using generative AI.
A lot of the design decisions of the AI Horde have been added to facilitate this purpose, such as kudos transfers, which have in turn been turned into things like discord emojis etc. And I’m always looking for ways to reinforce this behavior.
To this end, I am excited to announce a new feature on the AI Horde: Shared Keys
What are shared keys?
In short, they are API keys which can only be used to generate images and text, and not valid for doing any other operations, such as transferring kudos or rating images. The idea here is that someone can created a shared key to give to friends and family, to allow them to use their account priority and to lower the on-boarding requirements of registering their own accounts and not worrying that they might leak it.
Whenever a shared key is used, the kudos is consumed from the origin account and the priority used for that generation is the same as the owner’s. The generation also shares concurrency with that account so if you are planning to share with a lot of people they might end up getting in each other’s way.
Shared Keys can also be given an optional kudos limit, and an expiry date, after which they stop working. A kudos limit doesn’t affect their priority, just prevents the shared key form being used once that limit has been reached.
How do I create an API key?
Until UIs add the option to create them, the simplest way is to use the API web interface directly: http://aihorde.net
Alternatively you can open a console terminal and send a CURL call like so:
Simply use it place of a normal API key to the UI of your choice.
Can I modify a Shared key
Yes, you can both “top-up” existing keys, add/remove expiry, or delete them altogether.
The Shared Keys are designed to be pretty open in their usage. I expect use cases around “service accounts” for communities where people are pooling their kudos somehow, but I am also curious what other emergent uses people will come up around this system.
And If you have a user-case which requires tweaking of this functionality, do let me know!
While the AI Horde will always be free for all, anyone can develop frontend for it and ask their users to pay for its use. This blogpost explains why this is OK so long as they give back as much as they take and how this is enforced.
Recently, a paid service built on top of the AI Horde was announced on reddit’s /r/stablediffusion and a big discussion opened on the ethics of charging people for money for access to the free compute provided by the AI Horde. I’ve talked about this in my discord with some users who were concerned, but I foresee it’s a subject that will keep coming up. So It’s a good time to clarify my position on this subject, “officially” as they say.
When I initially envisioned the AI Horde, this sort of question was foremost on my mind. “How to I prevent abuse of a crowdsourced system with unrestricted access for everyone?” My answer to this question was the Kudos system, which is baked-in on every usage of the AI horde.
Due to the “protection” of the kudos system, we can offer the AI Horde service as an open API for everyone, for any purpose. Knowing that whatever they do, they’ll have to either support the health of the service, or go back to the end of the queue. This allows us to not worry about who or how they’re using the service, because the kudos requirements are inescapable. This bears reiterating:
WE DO NOT CONTROL HOW OTHERS INTEGRATE WITH THE AI HORDE
Because we cannot control people, I am cognizant that people might try to charge money for their services based on the horde (which again, we cannot stop!) or even other technologies we wholly reject (like blockchain). But It doesn’t matter how someone uses the AI Horde; so long as they remain within the limits of the Kudos system, they will have to provide more to the AI Horde than they take out, which balances things out for everyone.
This is the practice of all open paradigms out there. They all rely on volunteer effort but allow people to find business models which can make them money, so long as they respect the open paradigm.
Likewise, even the most hardcore copyleft licenses like GPL explicitly allow commercial use of the software. Because people need to eat! It would likewise be absurd to say that the Linux kernel is unethical, because companies are making money selling stuff built on top of the Linux Kernel!
So knowing that open systems cannot control how other use them, and that the actions of integrators do not represent flaws of open system itself, we instead ask people to act in good faith. We request people to give back to he AI Horde as much, or more than they take. This means that everyone benefits. We likewise block registrations outside of the AI Horde and inform anyone registering that they can always use the AI Horde for free. This ensures that the owner of each service competes with every other free AI Horde UI out there. If their users still want to give them money after that, then they are obviously bringing something valuable to the table for those users. And again, that is OK with us, so long as they give back to the AI Horde according to their usage.
Finally, whatever one does, remember, they cannot escape the kudos system. A super popular front-end to the AI Horde which does not have at least a net zero consumption, will quickly find itself with such high queue times that will drive everyone else off their service.
The AI Horde is absolutely built to combat corporate influences and enshittification, however it is still an open service, and therefore it cannot control who uses it, without sacrificing that openness it is built on, or adding moderation overheads so massive that it would shutter the service.
Does that mean that everything goes? No of course not! As with the anti-CSAM filter, there’s a few rules that are of existential importance to the health of the AI Horde. For example another one is how one treats kudos themselves: I routinely remind people to not consider them a currency and to not assign any monetary value to them. The reason being that the exchange of kudos for money would introduce such immense perverse incentives into the equation, that it would cause the AI Horde development and moderators to switch full-time to countering scams and exploits instead of trying to improve the service. This is such a thick red line that I’m prepared to go to extremes to enforce it, even up to disabling kudos transfers altogether!
Fortunately until now people are following these directives, but what if tomorrow a service appeared whose business model relied on selling kudos they generated to their users, or which allowed people to bypass the anti-CSAM filter somehow? Well that would force me to take active means to counter such a service explicitly, which would easily escalate into an endless cat&mouse game at the detriment of the service. But it would be a necessary course of action. But the existence of a generic paid service however, outside of the violation of those rules for the AI Horde, does not necessitate it, precisely because it’s not an existential concern which would warrant the massive amount of resources that would have to be assigned to counter it.
All that said, I know people are still going to oppose the mere existence of integrations which found a way to make money using the AI Horde as a backend, even if those give back more than they take. Even if they help pay for the development and infrastructure of the AI Horde for the benefit of all. That is OK. Everyone should follow their conscience and values. I have even provided tools and controls for Workers to limit their exposure to practices they do not support, but even if those are not enough, then it is OK to not be part of the AI Horde.
This is also a reason why the AI Horde is Free/Libre Software. If someone else has a different ethical system on how crowdsourced compute resources like these should be handled, they are always welcome to host their own version of the AI Horde, in the same sense that anyone can host their own BitTorrent tracker, with any rules they want! I do honestly believe the current approach of the AI Horde, with unrestricted access is the way to go to democratize AI, but maybe I’m just wrong. It remains to be seen.
However, I do want to ask that people to do not share FUD about who we are affiliated with and what practices we support. The exact stance that we have, is what I have explained above.
At the end of the day, thousands of people are getting free Generative AI output currently and we do not plan to stop this access, ever. No matter who, or how they integrates into us. The AI Horde will always have a way to use it for free without restrictions!
Things are progressing very rapidly in this dawn of the AI and likewise for the AI Horde. I thought it would be a good idea to post about all the things that changed and improved in recent days for our service.
More Requests. More statistics.
I’ve deployed endpoints to measure the usage of the AI horde. Now that one month has passed, we can take a look.
Per day, we are averaging 356,378 images (3.7 terapixelsteps) and 45,248 texts (4 megatokens)
In the past month, we produced 11,475,183 images, generating a staggering 127.6 terapixelsteps. Text has also picked up significant speed since merging the hordes with 1,241,895 generated texts for a total of 112.8 megatokens!
Top 10 Stable Diffusion models
The AI Horde offers close to 200 models at the same time. Our statistics allows us to see how the popularity of the various models changes day to day and month to month. The below are just the top 10 models being used.
Deliberate 22.2% (2550591)
stable_diffusion 15.1% (1730426)
Anything Diffusion 11.0% (1257688)
Hentai Diffusion 4.1% (468473)
Realistic Vision 3.0% (338742)
Counterfeit 2.7% (310337)
URPM 2.6% (297853)
Project Unreal Engine 5 2.5% (289006)
waifu_diffusion 1.8% (211572)
Abyss OrangeMix 1.8% (205268)
For the longest time SD 1.5 (stable_diffusion above) was king, but in the past month, Deliberate has confidently taken the lead and has been leading the pack with a staggering 20% of all image requests passing through the AI Horde! This speaks very highly for the popularity of the model
Top 10 Text models
Almost as many text models exist for the AI Horde, but they’re more varied. However last months saw the release of two big milestones, the Pygmalion models for chat-like generation, which happened after the gimping of the Character AI models. The new Llama model was also released, bringing unparalleled miniaturization of the model size, allowing consumer GPUs far more coherence.
PygmalionAI/pygmalion-6b 52.4% (651566)
KoboldAI/OPT-13B-Erebus 14.0% (174393)
KoboldAI/OPT-6.7B-Erebus 6.7% (83249)
KoboldAI/OPT-6.7B-Nerybus-Mix 3.8% (46747)
KoboldAI/OPT-13B-Nerybus-Mix 2.8% (35110)
KoboldAI/OPT-13B-Nerys-v2 2.7% (33667)
Facebook/LLaMA-13b 1.9% (23367)
KoboldAI/OPT-6B-nerys-v2 1.9% (23232)
OPT-6.7B-Nerybus-Mix 1.6% (19268)
KoboldAI/OPT-2.7B-Erebus 1.0% (12464)
We can see Pygmalion has immediately dominated text generation, with Mr.Seeker’s storytelling models mopping up the rest, but the Llama Ascendancy is just beggining!
The initial design was very simple to allow integrators to onboard it fast and giving good kudos rewards for those helping us. Unfortunately people almost immediately started abusing this by creating bots to rate randomly, therefore poisoning our collection’s accuracy.
I always knew this was a possibility but I was hoping I wouldn’t be forced to add countermeasures quite so soon. So I spent quite a few days adding a captcha mechanism (along other things) to block at least the low hanging fruit.
It immediately led to a drop in ratings per day which automatically shows just how much damage botted ratings were doing
We are fortunate enough to have gathered some great collators for the inference aspect of the AI Horde. So I wanted a big shout-out.
ResidentChief has stepped up strongly to help add new features and squash bugs in the nataili library. As a result the AI Horde now supports inpainting on many more models, a lot more post-processors, such as more upscalers and background removers, controlnet improvements, and so many other stuff too numerous to mention. They’re a beast!
Jug has been working on improving the AI Horde worker practically non-stop. Giving us a great terminal control, and improving the webui. Plus a lot of bugfixes and improvements in the bridge part of things
Tazlin who’s been doing a great deal of tech support in the channels as well as helping me detect and figure out malicious ratings. And also sending some code improvements as well!
Aes Sedai who’s been putting a ton of work on improving the moderation capabilities of the AI Horde with a custom frontend.
And of course all the frontend integrators like rockbandit, aqualxx, sgt.chaos and concedo, who’ve been keeping the frontends up to date, with a lot of features smartly using the capabilities of the AI Horde in ways even I had not expected!
CI/CD and pypi
I finally got around to adding CI/CD pipelines for AI Horde Worker and nataili. Now they will be automatically versioned when the right tag is applied to a PR. The Nataili package has also been republished to pypi and will also automatically receive new versions whenever we publish a new release on GitHub.
The notifications also automatically publish a notification on discord, so people can be aware when something new is up.
Using the new post-processing improvements from ResidentChief, I’ve expanded the interrogation worker so that it can now perform post-processing on images, as well as img2text operations. Unfortunately the previous name didn’t fit so well, so now I’ve renamed it to “Alchemist”, to signify it’s capability to convert images to something else.
Likewise, the official names for image worker is now “Dreamer” and text worker is now “Scribe”. Why not 🙂
The pace of progress in this space is mind-blowing. I can’t wait to see what we achieve together in the coming days!
For those who don’t know, basically fantasy.ai goes to various popular model creators and tempts them with promises of monetary reward them for their creative work, if only they agree to sign over some exclusive rights for commercial use of their model, as well as some other priority terms.
It’s a downright Faustian deal and I would argue that this is how a technology that begun using the Open Source ideals to be able to counteract the immense weight of players like OpenAI and Midjourney, begins to be enclosed.
First they offer an amazing value for the user, which attracts a lot of them and makes the service more valuable to other businesses, like integrating services and advertising agencies.
Then they start making the service worse for their user-base, but more valuable for their business partners, such as via increasing the amount of adverts for the same price, selling user data and metrics, pushing paid content to more users who don’t want to see it, and so on.
Finally once their business partners are also sufficiently reliant on them for income, they tighten their grip and start extracting all the value for themselves and their shareholders, such as by requiring extravagant payment from businesses to let people see the posts they want to see, or the products they want to buy.
Finally, eventually, inexorably, the service experience has become so shitty, so miserable, that it breaches the Trust Thermocline and something disruptive (or sometimes, something simple) triggers a mass exodus of their user base.
Then the service dies, or becomes a zombie, filled with more and more desperate advertisers and an ever increasing flood of spam as the dying service keeps rewarding executives with MBAs rather than their IT personnel.
Because Stable Diffusion is built as open source, we are seeing an explosion of services offering services based on it, crop up practically daily. A lot of those services are trying to discover how to stand out compared to others, so we have a unique opportunity to see how the enshittification can progress in the Open Source Generative AI ecosystem.
We have services at the first stage, like CivitAI which offer an amazing service to their user-base, by tying social media to Stable Diffusion models and fine-tunes, and allowing easy access to share your work. They have not yet figured out their business plan, which is why until now, their service appears completely customer focused.
We have services, like Mage.space which started completely free and uncensored for all and as a result quickly gathered a dedicated following of users without access to GPUs who used them for free AI generations. They are progressing to the second stage of enshittification, by locking NSFW generations behind a paywall, serving adverts and now also making themselves more valuable to model creators as soon as they smelled blood in the water.
We do not have yet Stable Diffusion services at the late stage of enshittification as the environment is still way too fresh.
Fascinatingly, the main mistake of Fantasy.ai is not their speed run through the enshittification process, but rather attempting to bypass the first step. Unfortunately, fantasy.ai entered late in the Generative AI game, as its creator is an NFT-bro who wasn’t smart enough to pivot as early as the Mage.space NFT-bro. So to make up the time, they are flexing their economic muscles, trying to make their service better for their business partners (including the model creators) and choking their business rivals in the process. Smart plan, if only they hadn’t skipped the first step, which is making themselves popular by attracting loyal users.
So now the same user-base which is loyal to other services has turned against fantasy.ai, and a massive flood of negative PR is being directed towards them at every opportunity. The lack of loyalty to fantasy.ai through an amazing customer service is what allowed the community to more clearly see the enshittification signs and turn against them from the start. Maybe fantasy.ai has enough economic muscle to push through the tsunami of bad PR and manage to pull off step 2 before step 1, but I highly doubt it.
But it’s also interesting to see so many model creators being so easily sucked-in without realizing what exactly they’re signing up for. The money upfront for an aspiring creator might be good (or not, 150$ is way lower than I expected), but if fantasy.ai succeeds in dominating the market, eventually that deal will turn to ball and chain, and the same creators who made fantasy.ai so valuable to the user-base, will now find themselves having to do things like bribe fantasy.ai to simply show their models to the same users who already declared they wish to see them.
It’s a trap and it’s surprising and a bit disheartening to see so many creators sleepwalking into it, when we have ample history to show us this is exactly what will happen. As it has happened in every other instance in the history of the web!
One of the big problems we’ve been fighting against since I created the AI Horde was attempts to use it to generate CSAM. While this technology is very new and there’s a lot of question to answer on whether it even is illegal to generate CSAM for personal use, I erred on the safe side and made it a rule from the start, that the one thing that is going against the AI Horde, is such generated content without exceptions.
It’s is not a overstatement to say I’ve spend weeks of work-hours on this problem. From adding capabilities for the workers to set their own comfort level through a blacklist and a censorlist and a bunch of other variables, to blocking VPN access, to the massive horde-regex filter that sits before every request and tries to ascertain from the prompt sent whether it intends to generate CSAM or not.
However the biggest problem is not just pedos, it’s is stupid, but cunning pedos! Stupid because they keep trying to use a free service which is recording all their failed attempts without a VPN. Cunning because they keep looking for ways to bypass our filters.
And that’s where the biggest problem lied until now. The regex filter is based on language which is not only flexible about the same concept, but very frustratingly, the AI is capable of understanding multiple typos of various words and other languages perfectly well. This strains what I can achieve with regex to the breaking point, and led to a cat&mouse game where dedicated pedos kept trying to bypass the filter using typos and translations, and I kept expanding the regex.
But it was inherently a losing game which was wasting an incredible amount of my time, so I needed to find a more robust approach. My new solution was to onboard image interrogation capability to the worker code. The way I go about this is by using image2text, AKA image interrogation. It’s basically AI Model which you feed an image and number of words or sentences and it will tell you how how well each of those words is represented in that image.
So what we’ve started doing is that every AI Horde Worker will now automatically scan every image they generate with clip and look for a number of words. Some of them are looking for underage context, while some of them are looking for lewd context. The trick is detecting one, or the other context is OK. You’re allowed to draw children, and you’re allowed to draw porn. It’s when these two combines that we filter goes into effect and censors the image!
But this is not even the whole plan. While the clip scanning on its own is fairly accurate, I further tweaked my approach by taking into account things like the value of other words interrogated. For example I noticed that when looking for “infant” in the generated image pregnant women would also have a very high rating for it, causing the csam-filter to censor out naked pregnant women consistently. My solution was then to also interrogate for “pregnant” and if the likelihood of that is very high, adjust the threshold to hit infant higher.
The second trick I did was to also utilize the prompt. A lot of pedos were trying to bypass my filters (which were looking for things like “young”, “child” etc) by not using those words, and instead specifying “old”, “mature” etc in the negative prompt. Effectively going the long route around to make Stable Diffusion draw children without explicitly telling it to. This was downright impossible to block using pure regex without causing a lot of false positives or an incredible amount of regex crafting.
So I implemented a little judo-trick instead. My new CSAM filter now also scans prompt and negative prompt for some words using regex and if they exist, also slightly adjusts the interrogated words based on the author intended. So let’s say the author used “old” in the negative prompt, this will automatically cause the “child” weight to increase by 0.05. This may not sound by a lot, but most words tend to variate from 0.13 to 0.22, so it’s actually has a significant chance to push a borderline word (which it would be at a successful CSAM) over the top. This converts the true/false result of a regex query, into a fine-grained approach, where each regex hit reduces the detection threshold only slightly, allowing non-CSAM images to remain unaffected (since the weight of the interrogated word would start low) while making more likely to catch the intended results.
Now the above is not the perfect description of what I’m doing, in the aim of keeping things understandable for the layperson, but if you want to see the exact implementation you can always look at my code directly (and suggest improvements 😉 ).
In my tests, the new filter has fairly great accuracy with very few false positives, mostly around anime which makes every woman look extraordinarily young as a matter of fact. But in any case, with the amount of images the horde generates I’ll have plenty of time to continue tweaking and maybe craft more specific filter for the models of each type (realistic, anime, furry etc)
Of course I can never expects this to be perfect, but that was never the idea. No such filter can ever catch everything, but what my hope is that this filter, along with my other countermeasures like the regex filter, will have enough of a detection rate to frustrate even the most dedicated pedos off of the platform.