New paper adds great ideas to stable diffusion and beats it in human evaluation:
- img generation first shapes scene, then refines details, so train experts (MoDE) for these differences
- use object bounding boxes to help model learn about separate parts of text prompts
Pinned Tweet












