Conversation

Check out #Imagen, Google Brain’s new SOTA Text-to-Image model I’m especially fond of these prompts: “Android Mascot made from bamboo.” + “Intricate origami of a fox and a unicorn in a snowy forest.” 😉 Impressed that AK always finds the links before it is officially released❗️
Image
Image
Quote Tweet
Image
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding project page: gweb-research-imagen.appspot.com sota FID(7.27 on COCO), without ever training on COCO, human raters find Imagen samples to be on par with the COCO data itself in image-text alignment
Show this thread
11
642
What’s interesting about #Imagen is that it shows some of the previous components in latent diffusion models (such as a latent space) is not needed to achieve SOTA results. Along with architecture engineering (modified U-Net), I believe #Imagen is a smaller model than #Dalle2.
Image
1
77
This is similar with what Greg Brockman () noted in the development progress between #dalle#dalle2, where improvements are mainly due to algorithmic engineering. So at the end of the day, scale is 𝑛𝑜𝑡 everything. And no, the game is not “over” :)
Quote Tweet
DALL-E 2 is stunningly better than DALL-E 1, but what's interesting (and underappreciated) is the fact that the newer model is much smaller & amount of training compute is similar. Improvements are essentially all due to algorithmic innovation.
4
144