https://overcast.fm/+Ic2hwsH2U/1:10:49 … #AI
“You could build a mind that thought that 51 was a prime number but otherwise had no defect of its intelligence – if you knew what you were doing” —@ESYudkowsky
Is it possible build a mind able to learn but incapable of correcting this error? (Why?)
-
-
On why the things you’d need to fiddle would be finite: Are the meta delusions comparable? I can imagine 51′s not-prime-ness being a solution to many different problems a mind could have. If there’s a potentially unlimited number of these, you’d need unlimited idea-suppression?
-
Finite axioms have infinite consequences; infinite consequences can sometimes be compactly patched for the same reason. The question isn't how large the set is, it's whether the set compresses.
-
Do we have reasons to believe that one can compactly patch a set of consequences following from a problematic axiom without also effectively patching the axiom in that same process?
-
For instance, if the Intelligence is trained to secretly believe that 51 is prime, but also trained to act as if it isn't prime (which would solve all of the extended errors from the belief), and exclaims to everyone that 51 isn't prime, would that really count?
-
This opens up questions like: what does it mean to believe something? Can one believe something in some ways but not others? Some situations but not others? If you can believe different things in different situations, how many situations do you have to cover to suppress a belief?
-
This is a problem with humans too. In different contexts we express different beliefs. Because of that, I think to some extent we have to define belief as what someone acts out, not what they claim. In the context of AI, we may need a similar definition.
End of conversation
New conversation -
-
-
To respect the power, depth, entanglement, interrelation, and consequences of intelligence is to think that the number of fiddly bits would be large and that lots of naive approaches wouldn't work--not to believe that it could never ever be done.
-
But I suppose it should be made precise that when I said (out loud on a podcast) that there wouldn't be further defects, I meant defects of external behavior and capability not related to the number 51. If you regard the internal fiddling as a defect then it's a futher defect.
-
Also to be super clear, no human should ever try to pull this kind of shenanigan while doing AGI alignment. Find simple, compact, coherent, consistent ways to do stuff or don't do it.
End of conversation
New conversation -
-
-
The idea that you could program something to say "51 is prime!" and then not say anything else off is perfectly valid; I'm not so sure you can categorize that as "thought" or "intelligence". If you're depending on its own reasoning, such a belief will cause infinite problems.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
You could change definitions, but that changes the semantic content of "51 is prime", which changes your argument. No other ways work because believing "51 is prime" gives you an exploitable blind spot, and in order to protect the belief you have to blind yourself ever more.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.