Conversation

The AI explosion is warping our sense of time. Can you believe Stable Diffusion is only 4 months old, and ChatGPT <4 weeks old 🤯? If you blink, you miss a whole new industry. Here are my TOP 10 AI spotlights, from a breathtaking 2022 in rewind : a long thread 🧵
Embedded video
GIF
97
4,561
2022’s AI landscape is dominated by a surge in huge generative models, which are rapidly making their way out of research labs and into real-world applications. 2 other emerging areas driven by LLM technology are decision-making agents (games, robotics, …) and AI4Science. 0/
1
106
I am very fortunate to work on these AI research frontiers with my wonderful colleagues . In 2023, I’ll continue to do more exciting works myself and share hot ideas too - welcome to follow me ! 🙌 For each of the 10 spotlights, there may be multiple works: 0/
1
67
But DALLE-2 is behind OpenAI’s walled garden. , LMU, and took the heroic step to train their own internet-scale text2image model, based on the “latent diffusion” algorithm. They called the model “Stable Diffusion”, and open-sourced the code & weights. 1.2/
Image
3
128
The open access to Stable Diffusion has proven to be a huge game changer. Numerous startups and research labs build upon SD to create novel apps, and SD itself gets improved continuously by the open-source community. SD has recently hit v2.1 and runs on a single GPU now! 1.3/
Image
1
134
🎉No. 2: Text -> Text Well, that’s an easy guess - ChatGPT! The only app in history that gained 1 million users in 5 days. ChatGPT prompted out our human creativity as well. I refer you to this list for all the useful & imaginative ChatGPT ideas: github.com/f/awesome-chat 2.1/
Replying to
ChatGPT & GPT-3.5 both use a new technology called RLHF (“Reinforcement Learning from Human Feedback”). Its profound implication is that prompt engineering will disappear very soon. See my deep dive 🧵 on this topic: 2.2/
Quote Tweet
Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇
Show this thread
Image
1
105
🎉No. 3: Text -> Robot🤖 How to give GPT arms and legs, so they can clean up your messy kitchen? Unlike NLP, robot models need to interact with a physical world. Big pre-trained transformers are finally starting to address the hardest problems in robotics this year! 3.1/
2
66
In October, my coauthors and I took a step towards building a “Robot GPT” that takes in any *mixture* of text, images, and videos as prompt, and outputs robot motor control! Our model is called VIMA (“VisuoMotor Attention”) and has been fully open-sourced 🧵: 3.2/
Quote Tweet
We trained a transformer called VIMA that ingests *multimodal* prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data!🧵
Show this thread
Embedded video
0:47
67.8K views
1
133
Along a similar path as VIMA, researchers from announced RT-1, a robot transformer trained on 700 tasks and 130K human demonstrations. These data were collected by a literal *Iron Fleet* - 13 robots over 17 months! I also wrote a deep dive 🧵 for RT-1: 3.3/
Quote Tweet
ChatGPT can plan a theme party, but can it clean up your messy house after the party? Sadly no. What might a GPT for robotics look like? My friends at @Google Robotics just announced RT-1, a Transformer🤖 with eyes, arms & wheels! How does it read, see & act? Let’s dive in:🧵👇
Show this thread
Embedded video
0:29
93K views
1
76
🎉No. 4: Text -> Video Video is just an array of images bundled together over time, creating the illusion of motion. If we can do text2image, then why not throw in the time axis for some extra fun? There are 3 (!!) big works in this area, but none of them open-source 😢 4.1/
1
58
Make-A-Video: Text-to-Video Generation without Text-Video Data. From . You can sign up for trial access here: makeavideo.studio Paper: arxiv.org/abs/2209.14792 4.2/
Quote Tweet
Yes, it’s here :) Zero-shot text-to-video generation! Introducing Make-A-Video. The new SOTA, by a large margin, in text-to-video generation. And it doesn’t use any paired text-video data! Examples, paper, and a sign up sheet at makeavideo.studio @MetaAI Prompts 👇
Show this thread
Image
2
68
🎉No. 5: Text -> 3D From designing innovative products to creating stunning visual effects in movies & games, 3D modeling will be the next step in the creative AI field to materialize ideas from text. 2022 has seen a few primitive but promising 3D generative models! 5.1/
1
57
DreamFusion: Text-to-3D using 2D Diffusion. From . Based on the NeRF algorithm, the 3D model generated from a given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Project site: dreamfusion3d.github.io 5.2/
1
51
Point-E: A System for Generating 3D Point Clouds from Complex Prompts. By , a preliminary version of 3D DALLE! Paper: arxiv.org/abs/2212.08751 5.4/
Quote Tweet
OpenAI just dropped a prototype of 3D DALLE (called “Point-E”) 👀. It isn’t as good as Google’s DreamFusion, but blazing fast! Like ~600x faster to generate 😮. 2D DALLE has already turned the creative world upside down. How will 3D DALLE disrupt games, VR, metaverse, …? 🤯
Show this thread
Embedded video
GIF
Image
1
43
🎉No. 6: AI can now play Minecraft! The game is a perfect testbed for general intelligence because (1) it is infinitely open-ended & creative; (2) it’s played by 140M people - twice UK’s population, and a treasure trove of human data! Can AI be as imagnative as we are? 6.1/
Image
1
68
My coauthors and I developed the first Minecraft AI that can solve many tasks given a *natural language prompt*. Our ultimate goal is to build an “Embodied ChatGPT”. We’ve fully open-sourced our development platform “MineDojo”. Deep dive 🧵 on our NeurIPS Outstanding Paper: 6.2/
Quote Tweet
GPT3 is powerful but blind. The future of Foundation Models will be embodied agents that proactively take actions, endlessly explore the world, and continuously self-improve. What does it take? In our NeurIPS Outstanding Paper “MineDojo”, we provide a blueprint for this future:🧵
Show this thread
Embedded video
0:16
137.3K views
3
89
Concurrently, ’s team also announced a model called VPT (“Video Pre-Training”) that directly outputs keyboard and mouse actions. It is able to solve longer horizons, but not language conditioned. MineDojo & VPT complement each other! openai.com/blog/vpt/ 6.3/
1
40
🎉No. 7: AI learns to negotiate! CICERO is the first AI agent to achieve human-level performance in Diplomacy, a strategy game that requires extensive natural language negotiation to cooperate & compete with humans. AI can now persuade and bluff effectively 😮! 7.1/
Quote Tweet
3 years ago my teammates and I set out toward a goal that seemed like science fiction: to build an AI that could strategically outnegotiate humans *in natural language* in Diplomacy. Today, I’m excited to share our Science paper showing we’ve succeeded! 🧵 twitter.com/MetaAI/status/…
Show this thread
1
63
🎉No. 8: Audio -> Text OpenAI Whisper is a large Transformer that approaches human-level robustness & accuracy on English speech recognition. It’s trained on 680,000 hours of audio data from the web! Will Whisper unlock more text tokens to feed GPT-4? 8/
1
74
🎉No. 9: Nuclear Fusion control☢️ & developed the first deep reinforcement learning system that can keep nuclear fusion plasma stable inside its tokamaks (a device that uses a powerful magnetic field to confine plasma in a torus). nature.com/articles/s4158 9.1/
2
92
Also this month, Department of Energy announced a huge breakthrough: nuclear fusion now generates more energy than it consumes to initiate the reaction! It is the first time humankind has achieved this landmark. We may become a fusion-powered civilization in this lifetime! 9.2/
Image
1
65
🎉No. 10: Biology Transformers🧬 AlphaFold (2021) was the first model to predict the 3D structure of a protein accurately. In July, DeepMind announced the “protein universe” - expanding AlphaFold’s protein database to 200M structures! What a treasure chest for science! 10.1/
Image
1
68
This concludes our whirlwind tour of the top 10 AI highlights of 2022! There‘re countless other exciting works that contribute to these advancements, too many to fit in a 🧵. I believe every paper is a brick in the cathedral of AI, and all the efforts should be celebrated🥳. 11/
2
32
Parting thought: as AI systems are growing ever more powerful, it is crucial that we remain aware of the potential dangers & risks and take steps to mitigate them, whether through careful training design, proper deployment oversight, or novel safeguard approaches. 12/
3
26
Here's to a year filled with so many moments that took our breath away! 🥂🍻🍾 Happy holidays everyone. Follow me for more deep dives in 2023 🙌 END/🧵
12
41