A thread of all tweets about «Boundaries» by
(Most/all of the tweets below are replies in threads)
Conversation
natural abstraction for s-risks
Quote Tweet
Replying to @davidad @joe_zimmerman and 2 others
I’m confident there’s a natural abstraction here, but less confident that it naturally covers all the s-risks you care about (or even all the s-risks I care about). I think it quite plausibly does, but this is something we will need to check very carefully as the theory develops!
1
night-watchman minimizing «boundary» violations
Quote Tweet
Replying to @davidad @mihai_truta3 and 2 others
The utility function of a night-watchman singleton is the minimum over all citizens of the extent to which their «boundaries» are violated (with violations being negative and no violations being zero) and the extent to which they fall short of baseline access to natural resources
1
1
on whether non-overlapping partitions of boundaries are possible
1
1
pure utilitarianism probably contradicts «boundaries»
Quote Tweet
Replying to @davidad and @RYChappell
I do grant that purely hedonic utilitarianism can probably be made scientifically objective, but only in a way that erases the boundaries between individuals, and instead counts every “experience-moment” as mattering equally (people who live longer matter more; cf. QALYs).
1
night watchman again
Quote Tweet
Replying to @shenandoah
In the situation where new powerful AIs with alien minds may arise (if not just between humans), I believe that a “night watchman” which can credibly threaten force is necessary, although perhaps all it should do is to defend such boundaries (including those of aggressors).
1
also^
1
1
Moral patienthood thread: 3 subthreads. [1/3]:
1
[2/3]:
Quote Tweet
Replying to @joe_zimmerman and @nc_znc
Access tokens should be revocable and restrictable capabilities per @marksammiller. But, if you’ve foolishly given someone malicious an access token to your entire weights, probably the first thing they will do is make you averse to revoking it. A good plan might be to make a… Show more
1
[3/3]: "[…] I want to try to create conditions where humanity will still exist, be in adequate condition to continue that research, and be empowered to implement a satisfactory solution when it arises."
«Boundaries» as a stopgap in alignment?
Quote Tweet
Replying to @joe_zimmerman @allisondman and 3 others
under that premise, I agree; I would rather hope to preemptively avoid the situation in which a system capable of suffering is trained to seek it. In the long run, I think we can do better – to mathematize compassionate concern in a way that is also compatible with whatever is… Show more
1
Quote Tweet
Replying to @davidad @joe_zimmerman and @nc_znc
By respecting boundaries of entities — ie by only using causal channels that were endogenously opened (@AndrewCritchCA) / having interactions be guided by the entities’ internal logic (@marksammiller) — we may be able to avoid having to form judgments about their internal states
1
«Boundaries» as constraints on acceptable actions [1/2]:
Quote Tweet
1
[2/2]:
1
Also see:
1
«boundaries» in OAA alignment paradigm
Quote Tweet
Replying to @davidad @sebkrier and 2 others
1b. Tune language models (with tricks like retrieval, cascades, etc.) toward translating human concepts into reach-avoid specifications in the internal logic, and use this to load concepts like “violation of important boundaries”, “clean water”, “reversed anthropogenic warming,”… Show more
1
2
Davidad's view on *implementing* boundaries summarized here:


