We’ve developed two neural networks which have learned by associating text and images. CLIP maps images into categories described in text, and DALL-E creates new images, like this, from text.
A step toward systems with deeper understanding of the world. https://openai.com/multimodal/