A common misconception about deep learning is that gradient descent is meant to reach the "global minimum" of the loss, while avoiding "local minima". In practice, a deep neural network that's anywhere close to the global minimum would be utterly useless (extremely overfit)
-
-
François, why do we need to regularize and still have models that are so big they need regularization? Why not just use smaller models and no regularization? (btw, I didn't know you had a book about DL, saw in the other tweet, very nice)
-
Because it’s way easier to obtain a model that generalizes well this way.
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.