I have not seen one where an agent behaving catastrophically badly can't be substituted for behaving stupidly and failing
my argument is that none of the thought experiments I've seen have raised issues of "stupidity" that is uniquely catastrophic
-
-
critique is that invoking catastrophe in the thought experiments I've seen is unnecessary to capture and explore the failure mode
-
surely the stakes are not totally irrelevant
- 4 more replies
New conversation -
-
-
I'd argue that the stupidity of misalignment is intrinsically linked with the impossibility of defining metrics.
-
We can't define what we want either - and we only seem to have compatible goals because our capabilities are limited
End of conversation
New conversation -
-
-
What about an AI that is super competent at finding and wireheading people because you screwed up giving it values
-
like there is nothing you can do to stop it or convince it otherwise
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.