My prior on research into AI Alignment is that it's all misguided, but man, there sure is a lot of it.
Conversation
maybe I can cite it and pretend to have read it and will get away with it
1
4
Finished. It’s a good overview, in fact.
The field has dramatically matured and improved in the past few years, btw. The stupid stuff that dominated LW 3+ years ago is discredited. There’s now much that’s plausible and interesting. Surprised!
1
6
Replying to
Any particular line of research you found particularly promising?
I have a strong intuition that the whole enterprise is misconceived, but there are many smart people who think otherwise, so I'm probably wrong.
1
1
Quick browse suggests the part that’s good is just the label “alignment” being slapped onto ordinary engineering improvements, not the earlier esoteric stuff.
Ie the part that’s good is not alignment, the part that’s alignment is not good. I’ve been expecting precisely this.
3
1
6
Yes, I mostly agree with this, and my write-up will say so. On the whole, the field has concluded that “alignment” is infeasible (although a holy grail still worth putting some effort into), and doing sensible, fairly obvious things is a better bet (but comes with no guarantees).
3
3
is there evidence that it was ever wider usage that "alignment" meant specifically MIRI style/corrigibility/provable alignment? Otherwise I'd just chalk that up to a misunderstanding/overfitting on your part - to me, alignment always meant all approaches, but ig could be wrong
3
2
They introduced the term afaik. Nobody used it before them.
sure, but they didn't define it specifically with their approach right? you are choosing to attach their ideological vibe to the term in a way that doesn't seem reasonable given current usage
1



