A common misconception about deep learning is that gradient descent is meant to reach the "global minimum" of the loss, while avoiding "local minima". In practice, a deep neural network that's anywhere close to the global minimum would be utterly useless (extremely overfit)
-
Show this thread
-
This Tweet is unavailable.
Replying to @neurokinetikz
Arguably a hashtable would work better for your use case
4:14 PM - 3 May 2018
0 replies
0 retweets
13 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.