Where does value come from? In our new preprint (https://psyarxiv.com/rxf7e/ ) we explore some of the open challenges of Reinforcement Learning and reward-based decision-making more broadly. THREAD 1/13
-
-
As a result, current RL systems strongly depend on this engineered reward signal and are not well suited to explain the dependence of value-guided choices on affective states, ongoing needs, or abstract goals. 4/13
Prikaži ovu nit -
We propose instead that reward emerges from a more general mechanism which computes the agent’s distance to subjective, intrinsic goals. Thus, the agent considers, moment by moment, whether its actions move it closer to a goal in multi-dimensional value space. 5/13
Prikaži ovu nit -
Distance to goal signals in a wider sense have recently been reported by us and many others across a range of regions in the medial prefrontal cortex, indicating that these signals are a ubiquitously tracked quantity in humans. 6/13pic.twitter.com/jKR6znuWZB
Prikaži ovu nit -
Our framework – goal equilibrium theory - builds on existing drive reduction theories as well as recent advances linking physiological homeostasis to RL. However, we argue that this framework can be more widely applied to abstract cognitive goals. 7/13
Prikaži ovu nit -
The key intuition is that the agent represents an abstracted, multi-dimensional value map on which it defines its current position and goals. Thus, the agent will seek to minimise its distance to goal through interactions with the environment. 8/13pic.twitter.com/7Vd9tH5rEn
Prikaži ovu nit -
Importantly, this circumvents the need to define arbitrary reward functions and simply assumes that the agent keeps track of physiological and cognitive repositories of their assets (e.g. hydration, social status, capital, etc.) 9/13
Prikaži ovu nit -
Interestingly, simply assuming that more than one value dimension contributes to the agent’s distance-to-goal estimation (eq. 2) yields diminishing marginal utility and convex indifference curves over value dimensions. 10/13pic.twitter.com/pknxUvAHlO
Prikaži ovu nit -
Although our theoretical framework is preliminary, we think that goal equilibrium theory may scaffold research into complex choices involving multiple, competing goals and inform our understanding of the underlying neural mechanisms. 11/13
Prikaži ovu nit -
This will require tasks which make explicit reference to multiple assets as if the agent collected distinct items whose relationships to value fluctuate as the agent’s goals change. 12/13
Prikaži ovu nit -
We hope you enjoy reading the paper and would be happy to hear thoughts, comments, or suggestions. 13/13
Prikaži ovu nit -
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.