Does the vanishing gradient problem occur because gradient descent can't compute accurate gradients of far-down weights?
Replying to @Love2Code
Gradients in earlier layers tend to either vanish or explode. I outline a proof here: http://neuralnetworksanddeeplearning.com/chap5.html
12:13 PM - 7 Feb 2017
0 replies
0 retweets
14 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.