Even better - in practice a single cycle (with a long plateau with a high LR) tends to work better. See https://arxiv.org/pdf/1803.09820 and https://arxiv.org/abs/1802.08770 . As a side note, I think it would be cool to finally figure out why this is the case :)
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Another interesting paper in this area: http://arxiv.org/pdf/1708.07120 .
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
This Tweet is unavailable.
-
Thanks Sebastian! I haven't tried the cyclical batch size. I'll make note of it and may try to publish a tutorial on it :-)
End of conversation
-
-
-
Thank you for sharing, Francois! Hope you are well.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I love cyclical learning rate, prefer to just use a single cycle, works even very well with Adam.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I first heard of cyclical learning rates via
@jeremyphoward who had them in the@fastdotai libraries. Also, others incl sawtooth shapes (?, IIRC). -
I believe cyclical learning rates first proposed by Leslie Smithhttps://arxiv.org/abs/1506.01186
- Show replies
New conversation -
-
-
Like day and night. Rest well neural nets.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.