Not all functions we want to learn can be built efficiently from differentiable parts. What's less obvious to me right now: could all the important metalearning functions turn out to be differentiable? (not that they have to!)
It is very easy to entrain a neural network with function that performs an if statement, and hard to entrain it with a loop.
-
-
The limiting factor for learning loopy computations is not smoothness but (unrolled) network depth (shattering and vanishing gradients). Branchy computations have a unwieldy error surface because getting one decision wrong completely throws off your prediction.
-
Exactly. We probably don't want to learn looping algorithms via deep gradient descent. Perhaps we will find that there are very few things that we actually want to learn via deep gradient descent.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.