Original REINFORCE does not require backprop to do RL. The gradient term (which is just "how to make X more likely in the future") can be approximated by a Hebbian-like term, and the reward provides the direction of the update. See p.8 In Williams 1992.
-
-
Replying to @ThomasMiconi @kendmil and
At the other extreme, we can use backprop to design an overall autonomous learning system, so that reward-modulated Hebbian plasticity is carefully tuned to provide "good" learning. That's the "backpropamine" framework.https://openreview.net/forum?id=r1lrAiA5Ym …
1 reply 0 retweets 5 likes -
Replying to @ThomasMiconi @kendmil and
I know it, and I think it's a cool ide.
But, I would argue that your Hebbian algorithm is still going to underperform against an algorithm with more explicit guarantees of improvement.1 reply 0 retweets 2 likes -
Replying to @tyrell_turing @ThomasMiconi and
But isn't the reward modulation a form of credit assignment? Plasticity is being directed to improve task performance as measured by reward? Leaving aside questions of efficiency, seems like this is an example of credit assignment, not an argument against it? But correct me 1/
3 replies 0 retweets 3 likes -
-
Replying to @kendmil @tyrell_turing and
i'm also somewhat confused by what is meant by "credit assignment." I assume it is something more sophisticated than eg Hebbian rule gated by a global reward signal...something trickier, like propagating the gradient of an error. But comparing those is a bit of a category error.
1 reply 0 retweets 2 likes -
Replying to @TonyZador @kendmil and
No, that is credit assignment! Just inneficient credit assignment. Really, we all have to agree that there is some credit assignment. The disagreement, I think, is about how high D and efficient it needs to be.
1 reply 0 retweets 9 likes -
Replying to @tyrell_turing @kendmil and
Indeed, the question is how fancy (ie high D) the credit assignment algo is. I dont think it's so controversial to imagine that something like the Rescorla-Wagner rule is implemented...
2 replies 0 retweets 3 likes -
Replying to @TonyZador @tyrell_turing and
It's not crazy at all, but it's also not so crazy to think that at least some forms of sensory cortical plasticity may be independent of reward or other credit assignment signals, eg critical period (perhaps not adult) V1 plasticity or the Li and Dicarlo IT results 1/
1 reply 0 retweets 3 likes -
Replying to @kendmil @TonyZador and
In which case, it would raise the question how that becomes a good (evolutionarily competitive) algorithm in context of brain when it's not competitive in current ML
2 replies 0 retweets 1 like
Though consistent with local STDP, I wonder if the Li and DiCarlo you mention could also be consistent with multilayer gradient descent on an internal (prediction?) error from a yet-higher area?https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307055/ …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.