I am -loving- this combined Rectifed Adam (RAdam) + LookAhead optimizer. It's freaky stable and powerful, especially for my purposes. Only weird thing to keep in mind- I've had to multiply my learning rates by over 100x to maximally take advantage.https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d …
Yeah you may be able to increase the learning rate quite a bit. And in fact I had to because otherwise the loss was a big strangely stagnant. Not sure exactly why.
-
-
It seems worth playing with LR scheduling other than OneCycle to unleash the full potential of these new optimizers.https://forums.fast.ai/t/imagenette-woof-leaderboards-guidelines-for-proving-new-high-scores/52714/19 …
-
Yeah I set div_factor to 1 to get rid of the warmup stage and just have it learn at constant lr then decay. Not rigorously tested at all yet but that seems to make sense.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.