Conversation

Any improvement in our grasp of what goes on in there and how to manipulate it, might help. Let's say that first. But for later, I hope we can agree in advance that one should never build something that turns out to want to kill you, try to mindwipe that out, then run it again.
Quote Tweet
Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. 🧵 arxiv.org/abs/2306.03819
Show this thread
Have you ever stopped to consider that you're creating a self-fulfilling prophecy, and that your own actions are increasing the chances that the scenario you're so scared of will come to pass?
4
4
Well, if we're just making up random unsupported galaxybrained accusations, have you considered that by talking about self-fulfilling prophecies you might be creating more of them?
2
39
Show replies
Yeah yeah, we know, nothing will ever be good enough until you and your friends get the power to control all AI development in the world
1
Mindwipe doesn't actually help any if you don't know what to wipe - what concepts would you wipe from another human to make them care about you more? But it seems like it's a also a potential step in the direction of understanding LLMs before running them, so that's great.
6
Show more replies