Mind you, I'm doing image to image training. Your mileage may vary but I suspect this is going to be warmly welcomed in a variety of training settings..
-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
These do not include the weight fixing of AdamW? So there might still be generalization gap issues?
-
Aw yes I noticed that too...but then forgot ha. Thanks for pointing that out.
- Show replies
New conversation -
-
-
Just started playing with it last night, super promising!
-
What did you try it on?
- Show replies
New conversation -
-
-
I tried it on a non-image problem and it performed worse. Didn't play with the learning rate though, maybe i will try to increase it and test again.
-
Yeah you may be able to increase the learning rate quite a bit. And in fact I had to because otherwise the loss was a big strangely stagnant. Not sure exactly why.
- Show replies
New conversation -
-
-
how about LA Adam on convnets? Image classification community never truly switch to Adam because SGD almost always give 1% more accuracy (important so that you can write papers.)
-
They’re claiming > sgd performance in the paper.
- Show replies
New conversation -
-
-
In what settings, do you think it will give performance improvement as compared to standard optimizers?
-
I think in particular unstable training settings where you find yourself falling back on something like RMSProp or SGD would be a great place to start because this should be significantly better (is for me!). But it should in theory be generally useful.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.