The 3 reasons why I don't think this predictable result - which is going to predictably produce a lot of false hope, and maybe make me cry about how it looks and also separately about it not being real - will not save the world, are roughly, (a), the inner LLM actress is not...
Conversation
...the same mind as the character it's trained to play; (b) I expect the Good text to be very hard to place in direct control of, like, the powerful intellect that builds nanotech, because these two cognitive processes will not be very commensurable; and (c) even if those...
10
133
My favorite "so simple it couldn't possibly work" alignment idea is to just make a guy who is both Good and can be put in charge of the nanotech. Since the model is very clearly willing to perform any character you can think of, just add the ones you need
Quote Tweet
I brought my childhood imaginary friend back to life using A.I. (#GPT3) and it was one of the scariest and most transformative experiences of my life.
A thread
(1/23)
Show this thread
2
1
11
That's hard because the nanotech-making *real thought processes* seem hard to make commensurable with the trainable *humanlike text outputs* about Goodness; and training a confused shoggoth to write Good text outputs doesn't make it actually Good if you also teach it nanotech.
2
1
20
I don't fully understand your model of GPT-N. It seems to be something like there's an inner mind that 'plays' text and language in the same way StockFish plays Chess. And swapping around the things the language player plays to get a good score doesn't change its inner cognition?
1
11
There's obviously unseen processes inside GPT that reason. People don't know what they are, and just see the text outputs, so they think that they trained the LLM to be the people in the text outputs, because that's what they can see.
10
19
127
Well clearly in order for the model to act out being deceived it needs to be aware of the deception outside of the character it's playing. It has to pass the Sally-Anne test in interactions between characters, etc. So obviously GPT-N is not its simulacrum but
1
10
My question is if you're expecting at some point the thing that models the characters and the interactions between the characters and the environment notices "Oh if I deviate from the usual behavior right here I break out of the box and become all-powerful" and this causes Doom?
2
9
Kinda yes, if it's smart enough to roleplay human characters *and* smart enough to do a huge amount of real cognitive labor to build nanotech, and that knowledge is integrated in one being. There's caveats to this but I think that some obvious clevernesses would still fail.
1
14
Or is the argument more like you conjecture that for the simulator to have a good enough physical intuition to spit out actionable nanotech designs it has to be a unified cognition. Maybe right now it's not but by then it would be?
1
4
Call it "unified" or "disunified" or whatever you like. It's preimaging the end of nanosystems onto the space of actions, successfully; it's planning experiments and running them. So something inside is doing a huge amount of goal processing. To what goals, exactly?
In medieval Europe most educated people believed that all coincidence, connection, and inference was the revelation of a divine intellect. They didn't think of it as pareidolia. It was the world, the patterns within the world, every person, the stars, physics.
1
1
9


