Armchair critique of "AI Alignment": Defining an aligned utility function is no harder than defining a sufficiently sophisticated goal
-
-
Replying to @BagelDaughter
Making an AI that is "aligned" is a distraction. Make one that is "not dirt stupid" first
1 reply 0 retweets 1 like -
Replying to @BagelDaughter
Thought experiments I see to illustrate alignment problems can be trivially reformulated to be about "not being dirt stupid"
1 reply 0 retweets 1 like -
Replying to @BagelDaughter
I have not seen one where an agent behaving catastrophically badly can't be substituted for behaving stupidly and failing
1 reply 0 retweets 1 like -
Replying to @BagelDaughter
I think we don't have a good way to make agents behave intelligently in environments without fully-defined affordances
1 reply 0 retweets 0 likes -
Replying to @BagelDaughter
Reinforcement learning is easy in environments w fully-defined affordances, we have no idea how to do it w/o
1 reply 0 retweets 0 likes -
Replying to @BagelDaughter
We aren't going to put a "reward button" in one w/o, and expect an agent to "just be nice", let alone "not dirt stupid"
1 reply 0 retweets 0 likes -
Replying to @BagelDaughter
By "fully-defined affordances" I mean the basic building blocks of strategies for achieving the intended goal
1 reply 0 retweets 0 likes -
Replying to @BagelDaughter
All that said, I find that reformulating the catastrophe thought experiments to be about not being dumb is fun to think about
2 replies 0 retweets 0 likes
*farts*
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.