My prior on research into AI Alignment is that it's all misguided, but man, there sure is a lot of it.
Conversation
maybe I can cite it and pretend to have read it and will get away with it
1
4
Finished. It’s a good overview, in fact.
The field has dramatically matured and improved in the past few years, btw. The stupid stuff that dominated LW 3+ years ago is discredited. There’s now much that’s plausible and interesting. Surprised!
1
6
Replying to
Any particular line of research you found particularly promising?
I have a strong intuition that the whole enterprise is misconceived, but there are many smart people who think otherwise, so I'm probably wrong.
1
1
Quick browse suggests the part that’s good is just the label “alignment” being slapped onto ordinary engineering improvements, not the earlier esoteric stuff.
Ie the part that’s good is not alignment, the part that’s alignment is not good. I’ve been expecting precisely this.
3
1
6
Yes, I mostly agree with this, and my write-up will say so. On the whole, the field has concluded that “alignment” is infeasible (although a holy grail still worth putting some effort into), and doing sensible, fairly obvious things is a better bet (but comes with no guarantees).
3
3
Don’t know if either of you saw my thing mostly trying to clear up and set useful boundaries around my own thinking on this stuff
2
2
Prompted a thought: is "alignment" just nerdspeak for human states like empathy, compassion, solidarity? Humans are often non-aligned but we have those concepts to describe the happy times where our goals do not conflict.
1
1
Show replies


