What is the research that would be most helpful for AI safety while still having value if it turns out not to be useful for safety research?
Conversation
Replying to
@The_Lagrangian safety wrt what? threat models are important.
(who attacks? what is attacked? how can it be attacked?)
2
1
Replying to
safety as in value alignment for general artificial intelligence a la MIRI
1
Replying to
@The_Lagrangian I find it hard to talk about safety of the intransitive kind but maybe that's just my infosec background
1
Replying to
@The_Lagrangian do MIRI have a threat model or a justification for not having one?
2
Replying to
they have specific models that look basically like self-threat, check out their research summary page
1
1
Replying to
I don't have time to read this but it seems like it might answer your questions arbital.com/p/AI_safety_mi
1
1
Replying to
@The_Lagrangian neat, thanks! skimmed it, does look like it answers the question (or at least tries to). will give it a full read later
Replying to
@The_Lagrangian also looks like some of my (mostly private) notes on calculating bug severity could be relevant
Quote Tweet
Security bugs have agency granted to them by attackers.
You generally do not want your bugs to have agency.
1
Replying to
@The_Lagrangian but need to formalize first (assuming motivation), then check for usefulness and whether somebody else already did that

