I am -loving- this combined Rectifed Adam (RAdam) + LookAhead optimizer. It's freaky stable and powerful, especially for my purposes. Only weird thing to keep in mind- I've had to multiply my learning rates by over 100x to maximally take advantage.https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d …
-
Show this thread
-
Replying to @citnaj
I tried it on a non-image problem and it performed worse. Didn't play with the learning rate though, maybe i will try to increase it and test again.
1 reply 0 retweets 3 likes -
Replying to @taniajacob
Yeah you may be able to increase the learning rate quite a bit. And in fact I had to because otherwise the loss was a big strangely stagnant. Not sure exactly why.
1 reply 0 retweets 0 likes -
Replying to @citnaj @taniajacob
It seems worth playing with LR scheduling other than OneCycle to unleash the full potential of these new optimizers.https://forums.fast.ai/t/imagenette-woof-leaderboards-guidelines-for-proving-new-high-scores/52714/19 …
3 replies 1 retweet 4 likes
Yeah I set div_factor to 1 to get rid of the warmup stage and just have it learn at constant lr then decay. Not rigorously tested at all yet but that seems to make sense.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.