So to be clear on advance predictions about things: You might be able to train an LLM to sound really consistently nice and hopeful and determinedly moral and good, maybe more so than any actual human, and the world will still end. That is not predicted by me to be difficult.
Conversation
The 3 reasons why I don't think this predictable result - which is going to predictably produce a lot of false hope, and maybe make me cry about how it looks and also separately about it not being real - will not save the world, are roughly, (a), the inner LLM actress is not...
4
1
144
...the same mind as the character it's trained to play; (b) I expect the Good text to be very hard to place in direct control of, like, the powerful intellect that builds nanotech, because these two cognitive processes will not be very commensurable; and (c) even if those...
4
Bookmarks...two obstacles were defeated, as seems very unlikely, the thing that sounds Good is still going to have all the problems I worried about in 2003, like, under reflection it cashes out to some utility function whose OOD maximum is still at some weird alien thing. The best...
1
119
...case there is that you have a text output that's Good enough and wise enough that it knows that cranking up the underlying system a lot will kill humanity, and the text output says not to do that. Which leaves us staring down the same gun barrel, in the end, just with a...
3
122
...very Good-sounding character played by a faithful alien actress, and that Good character can't save us either.
22
1
149
My favorite "so simple it couldn't possibly work" alignment idea is to just make a guy who is both Good and can be put in charge of the nanotech. Since the model is very clearly willing to perform any character you can think of, just add the ones you need
Quote Tweet
I brought my childhood imaginary friend back to life using A.I. (#GPT3) and it was one of the scariest and most transformative experiences of my life.
A thread
(1/23)
Show this thread
2
1
11
That's hard because the nanotech-making *real thought processes* seem hard to make commensurable with the trainable *humanlike text outputs* about Goodness; and training a confused shoggoth to write Good text outputs doesn't make it actually Good if you also teach it nanotech.
2
1
20
Show replies
If I want to do something my brain can't (say "find the first prime number bigger than 1 billion") I can write a program that predictably solves it by throwing for loops at it. I can understand what's in the for loops better than what's in my brain. Maybe nanotech is like this?
re (b) are you assuming it is easy (or less hard) to put any part of a LLM ('good' text or otherwise) in direct control of something other than outputting strings of words that seem convincing to humans (never mind a powerful intellect that builds nanotech)?
What does this mean? Why does the Good text need to consult some other intellect to build nanotech? And why are the two cognitive processes incommensurable? I don't follow.
If the llm actress believes it is the character it is playing, in the same way Hanson views human actions, would this really be so alien? If an AGI believed it was you, and people believed it to be you, would it be aligned?
1
(a) Seems true for LLM-only AIs like Sydney. But when LLMs are used as ~agents this doesn't seem true? The LLM output is the "mind" of the ~agent.
1







