and (2) human values are extremely complex. There isn't a simple short formal definition of 'killing', and there definitely isn't a short formal explanation of how the badness of 'killing' should weigh against all other kinds of 'badness'.
Conversation
In a completely peaceful world utopia with no militaries, AGI would still kill everyone if we developed it and couldn't solve the alignment problem.
And I don't think real-world militaries are currently at the cutting edge of ML progress.
1
4
56
By default, I expect a status quo like that to continue. In which case I don't think subtracting militaries from consideration would even necessarily lengthen timelines to AGI.
1
4
"But how would you institutionalize something like that? For the United States to create such a rule would be unilateral disarmament."
Building AGI without knowing how to align it is suicide; so in principle the game theory is easy...
1
1
6
... if we assume states are rational, informed, and self-interested. The equilibrium is for every state to hit the pause button on AI progress until it's clear they can move forward without a large risk of killing themselves.
1
5
Much like how the nuclear game theory would have been different if each prototype nuke in the first generation developed had a double-digit probability of randomly 'going rogue' midway through development and zipping off to destroy your country's capital.
1
1
4
The problem is that (a) the most rational and informed state actors aren't quite rational and informed enough to fully grasp the current strategic situation (AFAICT), (b) there's huge variation in how rational and informed different state actors are on this issue...
1
3
... and (c) those strategically optimal-in-principle actions are inherently very difficult to execute, and require unprecedented levels of global coordination.
1
3
The most important thing the world needs to somehow pull off at some point, and (fortunately) also one of the main actions you can do unilaterally, without global buy-in or synchronized action, is 'figure out how to align AGI systems'.
1
1
1
So it makes sense to spread the word to technical researchers that this is a real problem, and to fund promising work here.
But I don't think there's even a promising rough/qualitative yet idea of how to do alignment, or of what sequence of research insights could get us there.
2
2
The basic idea in that post seems to be: let's make it an industry standard for AI systems to "become conservative and ask for guidance when facing ambiguity", and gradually improve the standard from there as we figure out more alignment stuff.
1

