1. The typical deep neural network tutorial introduces deep networks as compositions of nonlinearities and affine transforms.
-
-
Prikaži ovu nit
-
2. In fact, a deep network with relu activation simplifies to a linear combination of affine transformations with compact support. But, why would affine transformations be useful?
Prikaži ovu nit -
3. After recent discussions on Twitter it occurred to me that the reason why they work is that they are actually first-order Taylor approximations of a suitable analytic function. In this article I attempt to elaborate on this idea.
Prikaži ovu nit -
4. What is really cool about this is that by this logic partial derivatives, i.e. Jacobians, are computational primitives for both inference and learning.
Prikaži ovu nit -
5. I think this also provides insight into how deep networks approximate functions. They approximate the intrinsic geometry of a relation using piece-wise linear functions. This works because a suitable polynomial approximation exists and all polynomials are locally Lipschitz.
Prikaži ovu nit -
Note: Of course this also depends on whether machine learning researchers are doing physics or stamp collecting: https://mathoverflow.net/questions/114555/does-physics-need-non-analytic-smooth-functions?stw=2 … Some days I really wonder...
Prikaži ovu nit -
@ptrrupprecht and@TheFrontalLobe_ I tried to provide a better answer to your questions here. The questions you both raised are questions which most scientists I know can't answer off the top of their head.Prikaži ovu nit
Kraj razgovora
Novi razgovor -
-
-
Interesting thoughts, Aidan, this seems to have a similar flavour to the kind of research done by approximation theory folks? I only have a vague handwavy idea of their work from reading this (probably very dated) doc https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Petersen/DGD_Approximation_Theory.pdf …
-
Interestingly, I was mainly interested in finding similarities between the types of computations used during inference and learning. For learning it’s obvious that Jacobians play a central role via gradient descent but for inference it is less obvious.
- Još 1 odgovor
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.