At OpenAI a lot of our work aims to align language models like ChatGPT with human preferences. But this could become much harder once models can act coherently over long timeframes and exploit human fallibility to get more reward.
📜Paper: arxiv.org/abs/2209.00626
🧵Thread:
Pinned Tweet
18
314
1,322
Show this thread





















