Fine tuning #stablediffusion to make Pokemon!
I wrote a quick guide on fine tuning your own Stable Diffusion: github.com/LambdaLabsML/e
I also released my Pokemon model, you can try it out on Replicate: replicate.com/lambdal/text-t
or with this Notebook: github.com/LambdaLabsML/l
Conversation
Replying to
Stable Diffusion is a great generalist model, but getting a certain style of output is pretty tricky, it usual needs some serious "prompt engineering" (which I am rubbish at). Fine tuning the model itself is an easy approach to focus on just what you want, if you have some data.
1
1
16
I fine tuned the original stable diffusion on a Pokemon dataset, captioned with BLIP. The captions aren't amazing (see this example), but they're ok. You can get my dataset here: huggingface.co/datasets/lambd
1
12
Fine tuning just takes a few hours on a couple of A6000s (I used Lambda GPU cloud ofc: lambdalabs.com/service/gpu-cl)
During training the model learns to produce things in Pokemon style, but retains is previous knowledge for a while, eventually it overfits to the dataset.
GIF
4
9
45
Once you have a fine tuned model, it can't help but generate Pokemon not matter the prompt you give it. So no more painstaking prompting required:
"robotic cat with wings"
2
4
37
This was the most naive approach to fine tuning, and it still worked really well. (Using the EMA weights is important btw!) I'm really excited about the possibilities of specialising this model to new areas in more sophisticated ways!
1
1
16
I haven't compared this fine tuning approach to textual inversion by , it's definitely on my to do list. If someone wants to try it out please let me know how it goes!
3
2
22
P.S. You might want to disable the safety checker, it seems to get very over-excited by the Pokémon style for some reason. And I've never managed to get an actually nsfw image out of it.
1
7
Quote Tweet
While this text to Pokemon thing is going crazy I wrote a quick blog post with some extra details and experimentation I did while creating the model: justinpinkney.com/pokemon-genera
2
Replying to
Why is the checkpoints file in the HF repo 13.6GB? It seems like diffusers pulls in the correct SD model size of ~4GB.
2
Replying to
The original .ckpt is massive because it has all the weight (ema and not) and the optimiser state in too. The hf diffusers version is a lot slimmer as it's just the ema model.
1
6
Show replies

