My prior on research into AI Alignment is that it's all misguided, but man, there sure is a lot of it.
Conversation
maybe I can cite it and pretend to have read it and will get away with it
1
4
Finished. It’s a good overview, in fact.
The field has dramatically matured and improved in the past few years, btw. The stupid stuff that dominated LW 3+ years ago is discredited. There’s now much that’s plausible and interesting. Surprised!
1
6
Replying to
Any particular line of research you found particularly promising?
I have a strong intuition that the whole enterprise is misconceived, but there are many smart people who think otherwise, so I'm probably wrong.
1
1
Quick browse suggests the part that’s good is just the label “alignment” being slapped onto ordinary engineering improvements, not the earlier esoteric stuff.
Ie the part that’s good is not alignment, the part that’s alignment is not good. I’ve been expecting precisely this.
3
1
6
Yes, I mostly agree with this, and my write-up will say so. On the whole, the field has concluded that “alignment” is infeasible (although a holy grail still worth putting some effort into), and doing sensible, fairly obvious things is a better bet (but comes with no guarantees).
3
3
Don’t know if either of you saw my thing mostly trying to clear up and set useful boundaries around my own thinking on this stuff
2
3
2
2
Prompted a thought: is "alignment" just nerdspeak for human states like empathy, compassion, solidarity? Humans are often non-aligned but we have those concepts to describe the happy times where our goals do not conflict.
1
1
Fwiw I think those human-centric terms are equally unsatisfactory because they locate the behavioral roots in individual traits. What I think comes closest is Hannah Arendt notion of freedom-in-mutualism, which I think of as entangled infinite-game I-it and I-thou relations
Humans can be highly misaligned, even diametrically so, but so long as they don’t quit the mutual entanglement or try to win absolutely through killing, the collective is in a healthy place
1
It’s easier to define the opposite actually: freedom in mutualism is the opposite of irreversible hard-forking.
N&S conditions for FiM a) play to continue the game rather than win b) recognize other agents as agents c) have a non-terminating relationship with facts
1
Show replies


