Conversation

Correction: We will need a decision procedure for moral patienthood that has vanishingly few false negatives, but it can have a lot of false positives, until it conflicts with being able to achieve core objectives with high probability.
Quote Tweet
Replying to @norabelrose
That's part of the point... If we are going to make any significant progress on "AI alignment", we better at least be able to decide what beings are moral patients, and have much better reasons than we do today!
2
12
I don’t think anything can work sufficiently reliably without explicit world models, but I do think LLM scale can be harnessed in systems with explicit world models.
1
2
One way to simplify the Open Agency Architecture is: you have 1. some AIs that collaboratively construct a world model & spec, 2. some AIs that find an AI which satisfies the spec—and a proof certificate— 4. and then that last AI is the only one you deploy in the world.
1
2
Show replies