A common misconception about deep learning is that gradient descent is meant to reach the "global minimum" of the loss, while avoiding "local minima". In practice, a deep neural network that's anywhere close to the global minimum would be utterly useless (extremely overfit)
-
-
choosing the right “coordinates” to use is a non-trivial question, even before you run any optimisation algorithms
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Memorization means the net can reach the global minimum for a particular instance of a problem but “track-back” to a point of reference (in order to still generalize)?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
This Tweet is unavailable.
-
Maybe the memorization in this case is referring to the “end games” of how to get from the general regions to the finer optimizations?
- Show replies
-
-
-
You might find the discussion here interesting if you haven't already seen it. https://openreview.net/forum?id=ry_WPG-A- …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
For a while I assumed that quantum computers were a clear path to global optimization, but even quantum annealers (or the quantum / classical hybrid approximate optimization algorithm) give results within a probability distribution.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
PSO , instead of G. Descent
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.