Note that there's still questions around what learning rate schedule is appropriate when warmup is apparently obsolete. So far I've done flat lr then decay, but that was just an educated first guess. See discussion:https://forums.fast.ai/t/imagenette-woof-leaderboards-guidelines-for-proving-new-high-scores/52714/19 …
-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Where is SGD momentum baseline.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
What scheduler would you recommend for Over9000? I've seen CLR's effectiveness with SGD, but not sure with the combination of three.
-
For now the consensus seems to be flat lr then cosine annealing.
- Show replies
New conversation -
-
-
Also Lookahead : 0.61 Lookahead+RAdam: 0.595 LARS+RAdam : 0.574 Lookahead+Radam+LARS: 0.645 (??) So the gains from these optimizers are in some sense orthogonal and presumably from the (Lookahead+LARS) combo, both of which alter the entire training dynamics (unlike RAdam).
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
What does RAdam + LARS + Lookahead mean? You ran these optimisers one after the other or are they some tricks you used all at once?
-
The three are combined together, Over9000.https://github.com/mgrankin/over9000 …
End of conversation
New conversation -
-
-
Since results are reporting averages of many runs, would be great to include CIs around those estimates.
-
You can find calculated STD in the notebooks.
End of conversation
New conversation -
-
-
Thank you
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.