Didn’t realize this had become (deservedly) a sort of Chinese Room level provocation that AGI types anxiously get into contortions to refute. Only recently realized though, that it is basically Goodhart’s law in AI context.
Quote Tweet
The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function
Show this thread
3
7
56


