Conversation

Empowerment is the n-step channel capacity of a transition operator P and measures the max information that can be transmitted from actuators to resulting states from a given state x, a kind of controllable optionality. Roughly, more empowerment, more predictable outcomes. 2/14
Image
1
6
But if we have a bio-like agent with physiological state-spaces, logical variables, and world states, where some of the physiological states can kill the agent, then the agent's transition operator can be exponentially large. How would it plan and set goals in this space? 3/14
Image
1
7
It turns out we can break the hierarchical problem down if we use Operator Bellman Equations, which produce compositional goal-conditional state-time transition operators (feasibility functions \eta, not value), which results in an important memory-efficient factorization: 4/14
Image
1
5
Crucially, because Operator Bellman Equations produce state-time feasibility function transition operators (which map initial state-times to final state-times under a policy), we can aggregate many these functions for individual tasks/goals into an operator J_{\eta}. 5/14
Image
1
3
And because Empowerment is a function of transition operators (and an agent's state or state-time), we can compute empowerment on J_{\eta}, which is empowerment in the sub-space that defined by tasks. This empowerment can cover long time-spans due to feasibility functions: 7/14
Image
1
2
We can then optimize plans that maximize the Empowerment Gain (which we call Valence) of moving around to modify the affordances of an environment. Thus, Valence can be computed at the level of tasks that sustain the agent, to justify states which haven't been experienced. 8/14
Image
1
3
Valence in a hierarchical space measures changes to an agent's entire internal organization. It measures controllability across multiple coupled internal and external state-space representations. 9/14
1
4
Operator Bellman Equations do not use reward functions, they use availability functions that return goal-availability probability, which is necessary for the transition operator interpretation. This is why reachability optimizations are more useful than reward-max (IMO). 10/14
1
5
Operator Bellman Equations also have a property (called sublimation) that allows agents to symbolically reason while planning by solving problems exclusively on a high-level space in order to bound the feasibility of the full product space. 11/14
1
1
Valence is also a scalar, but you don't need to represent its accumulation, it is implicitly reflected in the agent's *internal organization*. That is, the agent's knowledge of internal and external transition dynamics and its abstract knowledge of how to perform tasks. 12/14
1
1
Implicitly representing empowerment gain in the agent's product-space means an agent can always optimize against it, and we can start to think about, for life-long learning, how to grow an agent's knowledge off of a core ontology in the service of its *internal stability*. 13/14
1
3
Thanks for reading, I hope that made sense. Also, I'm graduating and looking for research groups to join (academic or industry). Reach out to me if you find this work interesting or have questions about it. 🙂 14/14🧵
1
2