Agreed: like this https://arxiv.org/abs/1701.06538
-
-
And like this: https://arxiv.org/pdf/1608.05343.pdf …
1 reply 0 retweets 3 likes -
Neither of these is a perfect example of what David is asking for. But as Konrad and Greg and I wrote in 2016, the use of backprop like credit assignment signals internally does not imply a monolithic end to end training from a single objective...pic.twitter.com/BUSoQKazlp
1 reply 0 retweets 6 likes -
Consider an architecture where each module predicts its many inputs as a function of its outputs and some global conditioning. Might be very modular, despite the use of backprop like gradient flows.
1 reply 0 retweets 2 likes -
We may conclude that BP is not sufficient as a model of how the brain works (duh). Clearly the brain has a bias towards forming modules that vanilla BP does not have. I mean yes, there is anatomy in the brain. There is also anatomy in ANNs.
2 replies 0 retweets 1 like -
Replying to @KordingLab @AdamMarblestone and
And I am pretty sure I could construct memory ANNs that will be ok for non-memory tasks if you remove the memory module.
1 reply 0 retweets 1 like -
Replying to @KordingLab @AdamMarblestone and
But here is something that DNNs can't currently do and we should teach them: mental rehearsing. I can simulate my body, or my visual system, or my auditory system, or my pathfinding system, or my high level planning system, by itself, at varying simulation speed. And fix them.
3 replies 0 retweets 3 likes -
Replying to @KordingLab @AdamMarblestone and
In the old days, one of the alternative names of neural networks was Function networks. It seems people have forgotten that ANNs are CS concepts. So, it's important to know, backpropagation is 100% a symbolic algorithm for computing gradients (in continuation passing style).
1 reply 0 retweets 3 likes -
Replying to @sir_deenicus @AdamMarblestone and
In the older days backpropagation was called chain rule and a mathematical concept ;)
1 reply 0 retweets 3 likes -
Replying to @KordingLab @AdamMarblestone and
Yep, backprop computes the gradient based on the chain rule. But the actual implementation is by a special case of message passing (via shortest paths) on a graph representation of the formula. Then there's the reverse flow with symbols evaluated + caching (can view as CPS)
2 replies 0 retweets 1 like
That, or perhaps, say, inhibitory interneurons and apical dendrites... https://arxiv.org/abs/1801.00062
-
-
Replying to @AdamMarblestone @KordingLab and
I know I'm being pedantic here, but I think it's important for clarity: in order to count as backprop you need to be able show that this process breaks up the computation of gradients in such a way that Bellman optimality is satisfied. Is this true?
1 reply 0 retweets 2 likes -
Replying to @sir_deenicus @AdamMarblestone and
Furthermore there has to be something that holds intermediate state such that it can be viewed as back-filling intermediate values from a cache of some sort.
1 reply 0 retweets 1 like - 3 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.