textual inversion

Almost immediately after the AI Horde received LoRa support, people started clamoring for Textual Inversions which is one of the earliest techniques to fine-tune Stable Diffusion outputs.

While I was planning to re-use much of the code that had to do with the automatic downloading of LoRas, this quickly run into unexpected problems in the form of pickles.

In the language of python, pickles is effectively objects in memory stored into disk as they are. The problem with them is, that they’re terribly insecure. Anything stored into the pickle will be executed as soon as you load it back into RAM, which is a problem when you get a file from a stranger. There is a solution for that, safetensors, which ensure that only the data is loaded and nothing harmful.

However, while most LoRas are a recent development and were released as safetensors from the start, textual inversions (TIs) were developed much earlier, and most of them are still out there as pickles.

This caused a big problem for us, as we wanted to blanket support all TIs from CivitAI, but that opened the gate for someone to upload a malicious TI to CivitAI and then request it themselves through the horde and pwn all our workers in one stroke! Even though technically CivitAI scans uploaded pickles, automated scans are never perfect. All it would take is someone to discover one way to sneak by an exploit through these scans. The risk was way to high for our tastes.

But if I were to allow only safetensors, only a small minority of TIs would be available, making the feature worthless to develop. The most popular TIs were all still as pickles.

So that meant I had to find a way to automatically convert pickles into safetensors before the worker could use them. I couldn’t do it on the worker side, as the pickle has to be loaded first. We had to do it in a secure location of some sort. So I built a whole new microservice: the AI Hordeling.

All the Hordeling does is provide a REST API where a worker can send a CivitAI ID to download, and the Hordeling will check if it’s a safetensor or not, and if not, download it, convert it to safetensor and then give a download link to the safetensor to the worker.

This means that if someone were to get through the CivitAI scans, all they would be able to exploit is the Hordeling itself which is not connected to the AI Horde in any way and can be rebuilt from scratch very easily. Likewise, the worker ensure that they will only download safetensor files which ensure they can’t be exploited.

All this to say, it’s been, a lot more work than expected to set up Textual Inversions on the horde! But I did it!

So I’m excited to announce that All textual inversions on CivitAI are now available through the AI Horde!

The way to use them is the very similar to LoRa. You have to specify the ID or a unique part of its name so that the worker can find them, in the “tis” field. The tricky part is that TIs require that their filename in the prompt, and the location of the TI matters. This complicates matters because the filename is not easy to figure out by the user, especially because some model names have non-Latin characters which one can’t know how they will be changes when saving on disk.

So the way we handle it instead is that one needs to put the CivitAI model ID in the prompt, in the form of “embedding:12345”. If the strength needs to be modified, then it should be put as “(embedding:12345:0.5)”. On the side of the worker, they will always save the TIs using their model ID, which should allow ComfyUI to know what to use.

I also understand this can be quite a bother for both users and UX developers, so another option exists where you allow the AI Horde to inject the relevant strings into the prompt for you. You can specify in the “tis” key, that you want the prompt to be injected, where, and with how much strength.

This will in turn be injected in the start of the prompt, or at the end of the negative prompt with the corresponding strength (default to 1.0). Of course you might not always want that, in case you want to the TI to be places in a specific part of the prompt, but you at least have the option to do it quickly this way, if it’s not important. I expect UX designers will want to let the users handle it both ways as needed.

You are also limited to a maximum of 20 TIs per request, and there’s an extra kudos cost if you request any number of TIs.

So there you have it. Now the AI Horde provides access to thousands of Textual Inversions along with the thousands of LoRa we were providing until now.

Maximum customization power without even a need for a GPU, and we’re not even finished!

A Division by Zer0

A bug in the code of the universe

Tag: textual inversion

The AI Horde now seamlessly provides all CivitAI Textual Inversions