2/ Part 1 shows that any architecture can be expressed as a principled combination of matrix multiplication and nonlinearity application; such a combination is called a *tensor program*. The image shows an example. Thread
pic.twitter.com/lGAc7hCdua
U tweetove putem weba ili aplikacija drugih proizvođača možete dodati podatke o lokaciji, kao što su grad ili točna lokacija. Povijest lokacija tweetova uvijek možete izbrisati. Saznajte više
2/ Part 1 shows that any architecture can be expressed as a principled combination of matrix multiplication and nonlinearity application; such a combination is called a *tensor program*. The image shows an example. Thread
pic.twitter.com/lGAc7hCdua
3/ Part 2 shows that any such tensor program has a “mean field theory” or an “infinite width limit.” Using this, we can show that for any neural network, the kernel of last layer embeddings of inputs converges to a deterministic kernel. Thread
pic.twitter.com/xSFukk0aUz
4/ Finally, given the deterministic kernel limit, it’s easy to show that the output of the NN converges in distribution to a GP with that kernel.
5/ The NN-GP correspondence has a very long history, so let me also *briefly* summarize the prior works up to this point and give credits to the pioneers (see paper for full bibliography). It all started with Radford Neal in 1994...pic.twitter.com/e4EFmov6XY
This is fascinating, truly amazing work! I can't wait to read the following paper in the series; this opens so many theoretical avenues. Do you think that in practice it would be useful (and interesting computation-wise) to replace architectures with the corresponding GP?
Hi Remi, thanks for the compliment! In a way, yes: a recent work https://arxiv.org/pdf/1910.01663 have shown that kernels obtained thru this limiting GP (or another related kernel, the Neural Tangent Kernel) outperform all other methods when there are not much data. However ...
Are you sure it is a Gaussian process i.e. based on a Gaussian distribution always? So every linear combination of input features has a univariate normal (or Gaussian) distribution?
If the weights of the linear combination are random Gaussian, then yes.
Thanks for taking the time to explain your poster at #NeurIPS2019
Of course, any time man! Ping me if you got more questions.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.