Question for my fellow alignment researchers out there, do you have a *list of unsolved problems* in AI alignment? I'm thinking of creating an "alignment mosaic" of the questions we need to resolve and slowly filling it in with insights from papers/posts.
Gonna have a session…
Conversation
Here’s a relevant list of posts for thinking about your own open problems to AI Safety:
I really liked this post because it is somewhat beginner-friendly and quickly covers many of the important problems within alignment (by @GillenJeremy et al.).
Gives an overview of what most people in alignment are working on (loved by many as an overview):
1
3
1
3
1
3
1
2
I think looking into Paul Christiano’s research methodology seems useful for coming up with which questions to tackle and prioritize:
1
3
The entire Pragmatic AI Safety agenda gives a new lens for approaching the problem (by and ):
1
3
The older but famous “Concrete Problems in AI Safety” paper seems still useful, here’s a summary
1
3
Here’s a sequence of posts by going over a massive list of open problems in Mechanistic Interpretability:
1
3
I also appreciated this post by a few people at DeepMind trying to clarify AI X-risk:
3
