Why do some artificial intelligence safety researchers view "Highly Reliable Agent Designs" or "Agent Foundations" research as useful, or even critical? (And why do others disagree?)
A new paper by Issa Rice, and myself, explains: arxiv.org/abs/2201.02950
Thread (1/6)
Conversation
There's been a lot of discussion about risks from advanced AI, and several "agendas" for achieving safety. (See, for example, lesswrong.com/posts/fRsjBseR )
One, championed by researchers at , is "Highly Reliable Agent Designs" (HRAD)
(2/6)
1
1
5
This is only one of a number of approaches, but it is fundamentally different from most proposals in that this work attempts to understand the problem more clearly, and formally / mathematically.
(3/6)
1
3
Because it's different, many people seem unsure what it is trying to do.
So in the paper we extend the discussion from Issa's original post - alignmentforum.org/posts/BGxTpdBG - and lay out four separate cases which have been presented for the value of this work.
(4/6)
1
3
The first two cases for the work, incidental utility and deconfusion, seem widely accepted, but are insufficient on their own for allowing AI safety. The latter two are more ambitious, and few people fully accept the arguments, but they would provide a provably safe AI.
(5/6)
1
4
In short, if we think that AI safety is a potentially important issue, and if we're less than certain that other approaches will be sufficient, there are a variety of ways that this work is critical, as the diagram shows.
Now, go read the paper ;) arxiv.org/abs/2201.02950
(6/6)
1
1
4
PS. Thanks to the Long Term Futures Fund for funding this work, and to the Modelling Transformative AI team, lesswrong.com/posts/qnA6paRw (including ,) as well as those who provided feedback on the paper, including , , and .
Quote Tweet
Why do some artificial intelligence safety researchers view "Highly Reliable Agent Designs" or "Agent Foundations" research as useful, or even critical? (And why do others disagree?)
A new paper by Issa Rice, and myself, explains: arxiv.org/abs/2201.02950
Thread (1/6)
Show this thread
1
6
5

