Re last two retweets: IMO the important advance in DL is not any specific architecture or learning method but the use of autodiff to compose many modules and learn them end to end. Which makes arguments like this seem like a red herring.
-
-
The benefits of this are probably easiest to see with something like image recognition - "how do we optimize our edge detection? I dunno, however is going to make it easier to tell cats from dogs 7 layers up the stack. Let the gradients sort that out."
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Nor am I. But it’s a good example. For a particular model 1 and 2 you could probably figure out how to do it numerically, but it would be a huge pain in the ass and specific to that setup. So, you “cheat.” Now the “compiler” does it for you so much more tractable.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
This is borne out by the billions of multi-task learning results out there.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.