Conversation

Replying to and
Finished. It’s a good overview, in fact. The field has dramatically matured and improved in the past few years, btw. The stupid stuff that dominated LW 3+ years ago is discredited. There’s now much that’s plausible and interesting. Surprised!
1
6
Replying to
Any particular line of research you found particularly promising? I have a strong intuition that the whole enterprise is misconceived, but there are many smart people who think otherwise, so I'm probably wrong.
1
1
Replying to and
Quick browse suggests the part that’s good is just the label “alignment” being slapped onto ordinary engineering improvements, not the earlier esoteric stuff. Ie the part that’s good is not alignment, the part that’s alignment is not good. I’ve been expecting precisely this.
3
6
Replying to and
Yes, I mostly agree with this, and my write-up will say so. On the whole, the field has concluded that “alignment” is infeasible (although a holy grail still worth putting some effort into), and doing sensible, fairly obvious things is a better bet (but comes with no guarantees).
3
3
Replying to and
is there evidence that it was ever wider usage that "alignment" meant specifically MIRI style/corrigibility/provable alignment? Otherwise I'd just chalk that up to a misunderstanding/overfitting on your part - to me, alignment always meant all approaches, but ig could be wrong
3
2