Armchair critique of "AI Alignment": Defining an aligned utility function is no harder than defining a sufficiently sophisticated goal
Reinforcement learning is easy in environments w fully-defined affordances, we have no idea how to do it w/o
-
-
We aren't going to put a "reward button" in one w/o, and expect an agent to "just be nice", let alone "not dirt stupid"
-
By "fully-defined affordances" I mean the basic building blocks of strategies for achieving the intended goal
-
All that said, I find that reformulating the catastrophe thought experiments to be about not being dumb is fun to think about
-
*farts*
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.