I find the argument in this paper pretty unconvincing! I use Adam because if my model isn't broken, the loss will probably decrease
-
-
I would *love* to see properly implemented state of the art reference implementations where people could alter one thing at a time.
-
I have also seen ADAM work much better than SGD in practice, but other colleagues have told me properly tuned SGD > ADAM for final subs
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.