If I recall, given a function f: R^n -> R^m - adjoint (backwards) methods scale with m, forward sensitivity methods scale with n. In almost all of ML, m is a scalar function (a loss function) - so backwards methods dominate.
-
-
-
Yup! And there is a rich space of hybrid forward-reverse methods for the general (n,m) setting depending on the underlying graph.
- 3 more replies
New conversation -
-
-
would love to hear what points are misunderstood often
-
Beyond what I wrote up in the post?
- 2 more replies
New conversation -
-
-
And it even extends to Hessian vector products which also have the same order of cost as evaluating the function itself!
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
If you are interested in some other applications of this technique, this is how we do the back-propagation for shape optimization ( == parameters) of PDE-systems http://www2.eng.cam.ac.uk/~mpj1001/papers/JFM_Kung_Jun_2019.pdf …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I wrote in at the bottom with a question about this.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
This Tweet is unavailable.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.