Didn’t model composition (or stacking) exist before autodiff?
-
-
-
The hard part without autodiff is jointly learning the stacked models. Not to say you couldn't derive a procedure to do so but it would be extra work each time.
-
Jointly as in one pass as opposed to two separate training runs?
-
Mmm, I think of it more in terms of a compiler optimization like loop unrolling.
-
Disclaimer, I'm not an expert here. But what I meant was: previously if I stacked two models, I'd train one against some intermediate loss function I selected, then uses its trained outputs as inputs into another (with some other, final loss function).
-
What autodiff makes easy is training the stack of model1 -> model2 with respect to the same, final loss function, which will lead to a differently (and better) trained model1 than if you trained them sequentially.
-
The benefits of this are probably easiest to see with something like image recognition - "how do we optimize our edge detection? I dunno, however is going to make it easier to tell cats from dogs 7 layers up the stack. Let the gradients sort that out."
End of conversation
New conversation -
-
-
I agree - and also just making GPU programming tractable for many more people. DL is very “more is different” in the Paul W. Anderson sense; it’s significantly about unanticipated long-range interactions.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.