It bugs me a little that the gradient calculated by backprop in a neural network isn't actually the "steepest descent", because the partial derivatives between layers interact. Of course, optimizers are adapting everything anyway, but I wonder if there might be a structural hint.
There’s got to be a smarter way, our brains do it in profound fewer cycles
-
-
*profoundly
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.