Gradient descent experts! - I have a logistic regression predicting rare events - Most outcomes are 0s => gradients are small. - When I observe a rare 1 => gradient is large. - I want to use momentum but this asymmetry is bad: => Under-updates to 1s, over-updates to 0s after 1s.
-
-
I bet that would definitely help, but I have to fit this whole thing online. I'm starting to think momentum isn't really useful here at all.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.