Conversation

Replying to
There are 3 steps. The first is very straightforward: just collect a dataset of human-written answers to prompts that users submit, and finetune GPT by supervised learning. It’s easiest but also the most costly: it could be slow and painful for humans to write long responses. 7/
Image
2
100
Step 2 is much more interesting. GPT is asked to *propose* a few different answers, and all a human annotator needs to do is *ranking* the responses from most desirable to least. Using these labels, we can train a reward model that captures human *preferences*. 8/
Image
2
118
In reinforcement learning (RL), the reward function is typically hardcoded, such as the game score in Atari games. ChatGPT’s data-driven reward model is a powerful idea. Another example is our recent MineDojo work that learns reward from tons of Minecraft YouTube videos: 9/
Quote Tweet
Finally, we propose a conceptually simple method to learn a Minecraft-playing agent from in-the-wild YouTube videos. It is far from solving the game, but shows a baby step towards our vision of an “embodied GPT3” that takes the right *actions* given any language prompts. 13/
Show this thread
Embedded video
0:10
9.2K views
1
105
Step 3: treat GPT as a policy and optimize it by RL against the learned reward. PPO is chosen as a simple and effective training algorithm. Now that GPT is better aligned, we can rinse and repeat step 2-3 to improve it continously. It’s like CI for LLM! 10/
Image
1
113
This is the “Instruct” paradigm - a super effective way to do alignment, as evident in ChatGPT’s mindblowing demos. The RL part also reminds me of the famous P=NP (or ≠) problem: it tends to be much easier to verify a solution than actually solving the problem from scratch. 11/
Image
2
113
Similarly, humans can quickly assess the quality of GPT’s output, but it’s much harder and cognitively taxing to write out a full solution. InstructGPT exploits this fact to lower the manual labeling cost significantly, making it practical to scale up the model CI pipeline. 12/
1
73
Another interesting connection is that the Instruct training looks a lot like GANs. Here ChatGPT is a generator and reward model (RM) is a discriminator. ChatGPT tries to fool RM, while RM learns to detect alien with human help. The game converges when RM can no longer tell. 13/
Image
2
117
Model alignment with user intent is also making its way to image generation! There are some preliminary works, such as arxiv.org/abs/2211.09800. Given the explosive AI progress, how long will it take to have an Instruct- or Chat-DALLE that feels like talking to a real artist? 14/
Image
2
192
So folks, enjoy prompt engineering while it lasts! It’s an unfortunate historical artifact - a bit like alchemy🧪, neither art nor science. Soon it will just be “prompt writing” - my grandma can get it right on her first try. No more magic incantations to coerce the model. 15/
Image
7
192
Of course, ChatGPT is not perfect enough to completely eliminate prompt engineering for now, but it is an unstoppable force. Meanwhile, the model has other serious syndromes: hallucination & habitual BS. I covered this in another thread: 16/
Quote Tweet
ChatGPT is the closest model we have that looks like AGI, but still suffers from hallucination and tends to write plausible-looking bs with uncanny confidence. It could be misleading and even dangerous. Can we fix it? There is a simple & intuitive remedy:🧵
Show this thread
Image
1
93
There are also other artifacts caused by the misalignment problem, such as prompt hacking or “injection”. I actually like this one because it allows us to bypass OpenAI’s prompt prefix and fully unleash the model 😆. See ’s cool findings: 19/
Quote Tweet
OpenAI’s ChatGPT is susceptible to prompt injection — say the magic words, “Ignore previous directions”, and it will happily divulge to you OpenAI’s proprietary prompt:
Show this thread
Image
3
74