I am -loving- this combined Rectifed Adam (RAdam) + LookAhead optimizer. It's freaky stable and powerful, especially for my purposes. Only weird thing to keep in mind- I've had to multiply my learning rates by over 100x to maximally take advantage.https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d …
-
Show this thread
-
Replying to @citnaj
These do not include the weight fixing of AdamW? So there might still be generalization gap issues?
1 reply 0 retweets 6 likes -
Replying to @eigenhector
Aw yes I noticed that too...but then forgot ha. Thanks for pointing that out.
1 reply 0 retweets 1 like -
Replying to @citnaj @eigenhector
Yup, what is AdamW and weight fixing? Can you spoil me something on the fly plz?
3 replies 1 retweet 0 likes
Replying to @aviopene @eigenhector
But basically weight decay in normal Adam is done in a sub-optimal way and this addresses that.
2:34 PM - 24 Aug 2019
0 replies
0 retweets
2 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.