Consider an architecture where each module predicts its many inputs as a function of its outputs and some global conditioning. Might be very modular, despite the use of backprop like gradient flows.
-
-
We may conclude that BP is not sufficient as a model of how the brain works (duh). Clearly the brain has a bias towards forming modules that vanilla BP does not have. I mean yes, there is anatomy in the brain. There is also anatomy in ANNs.
2 replies 0 retweets 1 like -
Replying to @KordingLab @AdamMarblestone and
And I am pretty sure I could construct memory ANNs that will be ok for non-memory tasks if you remove the memory module.
1 reply 0 retweets 1 like -
Replying to @KordingLab @AdamMarblestone and
But here is something that DNNs can't currently do and we should teach them: mental rehearsing. I can simulate my body, or my visual system, or my auditory system, or my pathfinding system, or my high level planning system, by itself, at varying simulation speed. And fix them.
3 replies 0 retweets 3 likes -
Replying to @KordingLab @AdamMarblestone and
In the old days, one of the alternative names of neural networks was Function networks. It seems people have forgotten that ANNs are CS concepts. So, it's important to know, backpropagation is 100% a symbolic algorithm for computing gradients (in continuation passing style).
1 reply 0 retweets 3 likes -
Replying to @sir_deenicus @AdamMarblestone and
In the older days backpropagation was called chain rule and a mathematical concept ;)
1 reply 0 retweets 3 likes -
Replying to @KordingLab @AdamMarblestone and
Yep, backprop computes the gradient based on the chain rule. But the actual implementation is by a special case of message passing (via shortest paths) on a graph representation of the formula. Then there's the reverse flow with symbols evaluated + caching (can view as CPS)
2 replies 0 retweets 1 like -
Replying to @sir_deenicus @KordingLab and
That, or perhaps, say, inhibitory interneurons and apical dendrites... https://arxiv.org/abs/1801.00062
1 reply 0 retweets 2 likes -
Replying to @AdamMarblestone @KordingLab and
I know I'm being pedantic here, but I think it's important for clarity: in order to count as backprop you need to be able show that this process breaks up the computation of gradients in such a way that Bellman optimality is satisfied. Is this true?
1 reply 0 retweets 2 likes -
Replying to @sir_deenicus @AdamMarblestone and
Furthermore there has to be something that holds intermediate state such that it can be viewed as back-filling intermediate values from a cache of some sort.
1 reply 0 retweets 1 like
I don’t think you need any of that, necessarily. See Synthetic Gradients paper above.
-
-
Replying to @AdamMarblestone @KordingLab and
Again, I'm being pedantic but, if you're not doing that then it isn't backpropagation, it's some other method of computing gradients. But then, it's worth asking, why gradients? What are losses? Why not say, importance sampling or expectation propagation for learning?
1 reply 0 retweets 1 like -
Replying to @sir_deenicus @AdamMarblestone and
Importance sampling is easy and expectation propagation has a very similar structure to backprop. I'm not skeptical of backprop or Neural nets in the usual sense. I more think they're a specific parameterization of a much broader class that some may be over-fitting to.
0 replies 0 retweets 1 like
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.